Commit 670e9f34ee3c7e052514c85014d2fdd99b672cdc
Committed by
Adrian Bunk
1 parent
53cb47268e
Exists in
master
and in
4 other branches
Documentation: remove duplicated words
Remove many duplicated words under Documentation/ and do other small cleanups. Examples: "and and" --> "and" "in in" --> "in" "the the" --> "the" "the the" --> "to the" ... Signed-off-by: Paolo Ornati <ornati@fastwebnet.it> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Showing 52 changed files with 61 additions and 62 deletions Inline Diff
- Documentation/DMA-mapping.txt
- Documentation/DocBook/libata.tmpl
- Documentation/DocBook/usb.tmpl
- Documentation/RCU/whatisRCU.txt
- Documentation/block/biodoc.txt
- Documentation/driver-model/overview.txt
- Documentation/exception.txt
- Documentation/fb/fbcon.txt
- Documentation/filesystems/directory-locking
- Documentation/filesystems/files.txt
- Documentation/filesystems/spufs.txt
- Documentation/filesystems/tmpfs.txt
- Documentation/filesystems/vfat.txt
- Documentation/filesystems/vfs.txt
- Documentation/fujitsu/frv/mmu-layout.txt
- Documentation/ia64/efirtc.txt
- Documentation/ia64/mca.txt
- Documentation/input/input.txt
- Documentation/isdn/INTERFACE.fax
- Documentation/isdn/README.hysdn
- Documentation/kdump/kdump.txt
- Documentation/keys.txt
- Documentation/m68k/kernel-options.txt
- Documentation/memory-barriers.txt
- Documentation/networking/bonding.txt
- Documentation/networking/cs89x0.txt
- Documentation/networking/decnet.txt
- Documentation/networking/e1000.txt
- Documentation/networking/s2io.txt
- Documentation/networking/sk98lin.txt
- Documentation/pci-error-recovery.txt
- Documentation/power/swsusp.txt
- Documentation/prio_tree.txt
- Documentation/rpc-cache.txt
- Documentation/s390/Debugging390.txt
- Documentation/s390/s390dbf.txt
- Documentation/scsi/ChangeLog.1992-1997
- Documentation/scsi/st.txt
- Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
- Documentation/sound/oss/AWE32
- Documentation/sound/oss/solo1
- Documentation/sound/oss/ultrasound
- Documentation/sound/oss/vwsnd
- Documentation/spi/pxa2xx
- Documentation/spi/spi-summary
- Documentation/unshare.txt
- Documentation/usb/error-codes.txt
- Documentation/usb/hiddev.txt
- Documentation/usb/usb-serial.txt
- Documentation/video4linux/README.pvrusb2
- Documentation/video4linux/Zoran
- Documentation/vm/numa
Documentation/DMA-mapping.txt
1 | Dynamic DMA mapping | 1 | Dynamic DMA mapping |
2 | =================== | 2 | =================== |
3 | 3 | ||
4 | David S. Miller <davem@redhat.com> | 4 | David S. Miller <davem@redhat.com> |
5 | Richard Henderson <rth@cygnus.com> | 5 | Richard Henderson <rth@cygnus.com> |
6 | Jakub Jelinek <jakub@redhat.com> | 6 | Jakub Jelinek <jakub@redhat.com> |
7 | 7 | ||
8 | This document describes the DMA mapping system in terms of the pci_ | 8 | This document describes the DMA mapping system in terms of the pci_ |
9 | API. For a similar API that works for generic devices, see | 9 | API. For a similar API that works for generic devices, see |
10 | DMA-API.txt. | 10 | DMA-API.txt. |
11 | 11 | ||
12 | Most of the 64bit platforms have special hardware that translates bus | 12 | Most of the 64bit platforms have special hardware that translates bus |
13 | addresses (DMA addresses) into physical addresses. This is similar to | 13 | addresses (DMA addresses) into physical addresses. This is similar to |
14 | how page tables and/or a TLB translates virtual addresses to physical | 14 | how page tables and/or a TLB translates virtual addresses to physical |
15 | addresses on a CPU. This is needed so that e.g. PCI devices can | 15 | addresses on a CPU. This is needed so that e.g. PCI devices can |
16 | access with a Single Address Cycle (32bit DMA address) any page in the | 16 | access with a Single Address Cycle (32bit DMA address) any page in the |
17 | 64bit physical address space. Previously in Linux those 64bit | 17 | 64bit physical address space. Previously in Linux those 64bit |
18 | platforms had to set artificial limits on the maximum RAM size in the | 18 | platforms had to set artificial limits on the maximum RAM size in the |
19 | system, so that the virt_to_bus() static scheme works (the DMA address | 19 | system, so that the virt_to_bus() static scheme works (the DMA address |
20 | translation tables were simply filled on bootup to map each bus | 20 | translation tables were simply filled on bootup to map each bus |
21 | address to the physical page __pa(bus_to_virt())). | 21 | address to the physical page __pa(bus_to_virt())). |
22 | 22 | ||
23 | So that Linux can use the dynamic DMA mapping, it needs some help from the | 23 | So that Linux can use the dynamic DMA mapping, it needs some help from the |
24 | drivers, namely it has to take into account that DMA addresses should be | 24 | drivers, namely it has to take into account that DMA addresses should be |
25 | mapped only for the time they are actually used and unmapped after the DMA | 25 | mapped only for the time they are actually used and unmapped after the DMA |
26 | transfer. | 26 | transfer. |
27 | 27 | ||
28 | The following API will work of course even on platforms where no such | 28 | The following API will work of course even on platforms where no such |
29 | hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on | 29 | hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on |
30 | top of the virt_to_bus interface. | 30 | top of the virt_to_bus interface. |
31 | 31 | ||
32 | First of all, you should make sure | 32 | First of all, you should make sure |
33 | 33 | ||
34 | #include <linux/pci.h> | 34 | #include <linux/pci.h> |
35 | 35 | ||
36 | is in your driver. This file will obtain for you the definition of the | 36 | is in your driver. This file will obtain for you the definition of the |
37 | dma_addr_t (which can hold any valid DMA address for the platform) | 37 | dma_addr_t (which can hold any valid DMA address for the platform) |
38 | type which should be used everywhere you hold a DMA (bus) address | 38 | type which should be used everywhere you hold a DMA (bus) address |
39 | returned from the DMA mapping functions. | 39 | returned from the DMA mapping functions. |
40 | 40 | ||
41 | What memory is DMA'able? | 41 | What memory is DMA'able? |
42 | 42 | ||
43 | The first piece of information you must know is what kernel memory can | 43 | The first piece of information you must know is what kernel memory can |
44 | be used with the DMA mapping facilities. There has been an unwritten | 44 | be used with the DMA mapping facilities. There has been an unwritten |
45 | set of rules regarding this, and this text is an attempt to finally | 45 | set of rules regarding this, and this text is an attempt to finally |
46 | write them down. | 46 | write them down. |
47 | 47 | ||
48 | If you acquired your memory via the page allocator | 48 | If you acquired your memory via the page allocator |
49 | (i.e. __get_free_page*()) or the generic memory allocators | 49 | (i.e. __get_free_page*()) or the generic memory allocators |
50 | (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from | 50 | (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from |
51 | that memory using the addresses returned from those routines. | 51 | that memory using the addresses returned from those routines. |
52 | 52 | ||
53 | This means specifically that you may _not_ use the memory/addresses | 53 | This means specifically that you may _not_ use the memory/addresses |
54 | returned from vmalloc() for DMA. It is possible to DMA to the | 54 | returned from vmalloc() for DMA. It is possible to DMA to the |
55 | _underlying_ memory mapped into a vmalloc() area, but this requires | 55 | _underlying_ memory mapped into a vmalloc() area, but this requires |
56 | walking page tables to get the physical addresses, and then | 56 | walking page tables to get the physical addresses, and then |
57 | translating each of those pages back to a kernel address using | 57 | translating each of those pages back to a kernel address using |
58 | something like __va(). [ EDIT: Update this when we integrate | 58 | something like __va(). [ EDIT: Update this when we integrate |
59 | Gerd Knorr's generic code which does this. ] | 59 | Gerd Knorr's generic code which does this. ] |
60 | 60 | ||
61 | This rule also means that you may use neither kernel image addresses | 61 | This rule also means that you may use neither kernel image addresses |
62 | (items in data/text/bss segments), nor module image addresses, nor | 62 | (items in data/text/bss segments), nor module image addresses, nor |
63 | stack addresses for DMA. These could all be mapped somewhere entirely | 63 | stack addresses for DMA. These could all be mapped somewhere entirely |
64 | different than the rest of physical memory. Even if those classes of | 64 | different than the rest of physical memory. Even if those classes of |
65 | memory could physically work with DMA, you'd need to ensure the I/O | 65 | memory could physically work with DMA, you'd need to ensure the I/O |
66 | buffers were cacheline-aligned. Without that, you'd see cacheline | 66 | buffers were cacheline-aligned. Without that, you'd see cacheline |
67 | sharing problems (data corruption) on CPUs with DMA-incoherent caches. | 67 | sharing problems (data corruption) on CPUs with DMA-incoherent caches. |
68 | (The CPU could write to one word, DMA would write to a different one | 68 | (The CPU could write to one word, DMA would write to a different one |
69 | in the same cache line, and one of them could be overwritten.) | 69 | in the same cache line, and one of them could be overwritten.) |
70 | 70 | ||
71 | Also, this means that you cannot take the return of a kmap() | 71 | Also, this means that you cannot take the return of a kmap() |
72 | call and DMA to/from that. This is similar to vmalloc(). | 72 | call and DMA to/from that. This is similar to vmalloc(). |
73 | 73 | ||
74 | What about block I/O and networking buffers? The block I/O and | 74 | What about block I/O and networking buffers? The block I/O and |
75 | networking subsystems make sure that the buffers they use are valid | 75 | networking subsystems make sure that the buffers they use are valid |
76 | for you to DMA from/to. | 76 | for you to DMA from/to. |
77 | 77 | ||
78 | DMA addressing limitations | 78 | DMA addressing limitations |
79 | 79 | ||
80 | Does your device have any DMA addressing limitations? For example, is | 80 | Does your device have any DMA addressing limitations? For example, is |
81 | your device only capable of driving the low order 24-bits of address | 81 | your device only capable of driving the low order 24-bits of address |
82 | on the PCI bus for SAC DMA transfers? If so, you need to inform the | 82 | on the PCI bus for SAC DMA transfers? If so, you need to inform the |
83 | PCI layer of this fact. | 83 | PCI layer of this fact. |
84 | 84 | ||
85 | By default, the kernel assumes that your device can address the full | 85 | By default, the kernel assumes that your device can address the full |
86 | 32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs | 86 | 32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs |
87 | to be increased. And for a device with limitations, as discussed in | 87 | to be increased. And for a device with limitations, as discussed in |
88 | the previous paragraph, it needs to be decreased. | 88 | the previous paragraph, it needs to be decreased. |
89 | 89 | ||
90 | pci_alloc_consistent() by default will return 32-bit DMA addresses. | 90 | pci_alloc_consistent() by default will return 32-bit DMA addresses. |
91 | PCI-X specification requires PCI-X devices to support 64-bit | 91 | PCI-X specification requires PCI-X devices to support 64-bit |
92 | addressing (DAC) for all transactions. And at least one platform (SGI | 92 | addressing (DAC) for all transactions. And at least one platform (SGI |
93 | SN2) requires 64-bit consistent allocations to operate correctly when | 93 | SN2) requires 64-bit consistent allocations to operate correctly when |
94 | the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), | 94 | the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), |
95 | it's good practice to call pci_set_consistent_dma_mask() to set the | 95 | it's good practice to call pci_set_consistent_dma_mask() to set the |
96 | appropriate mask even if your device only supports 32-bit DMA | 96 | appropriate mask even if your device only supports 32-bit DMA |
97 | (default) and especially if it's a PCI-X device. | 97 | (default) and especially if it's a PCI-X device. |
98 | 98 | ||
99 | For correct operation, you must interrogate the PCI layer in your | 99 | For correct operation, you must interrogate the PCI layer in your |
100 | device probe routine to see if the PCI controller on the machine can | 100 | device probe routine to see if the PCI controller on the machine can |
101 | properly support the DMA addressing limitation your device has. It is | 101 | properly support the DMA addressing limitation your device has. It is |
102 | good style to do this even if your device holds the default setting, | 102 | good style to do this even if your device holds the default setting, |
103 | because this shows that you did think about these issues wrt. your | 103 | because this shows that you did think about these issues wrt. your |
104 | device. | 104 | device. |
105 | 105 | ||
106 | The query is performed via a call to pci_set_dma_mask(): | 106 | The query is performed via a call to pci_set_dma_mask(): |
107 | 107 | ||
108 | int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); | 108 | int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); |
109 | 109 | ||
110 | The query for consistent allocations is performed via a a call to | 110 | The query for consistent allocations is performed via a call to |
111 | pci_set_consistent_dma_mask(): | 111 | pci_set_consistent_dma_mask(): |
112 | 112 | ||
113 | int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); | 113 | int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); |
114 | 114 | ||
115 | Here, pdev is a pointer to the PCI device struct of your device, and | 115 | Here, pdev is a pointer to the PCI device struct of your device, and |
116 | device_mask is a bit mask describing which bits of a PCI address your | 116 | device_mask is a bit mask describing which bits of a PCI address your |
117 | device supports. It returns zero if your card can perform DMA | 117 | device supports. It returns zero if your card can perform DMA |
118 | properly on the machine given the address mask you provided. | 118 | properly on the machine given the address mask you provided. |
119 | 119 | ||
120 | If it returns non-zero, your device cannot perform DMA properly on | 120 | If it returns non-zero, your device cannot perform DMA properly on |
121 | this platform, and attempting to do so will result in undefined | 121 | this platform, and attempting to do so will result in undefined |
122 | behavior. You must either use a different mask, or not use DMA. | 122 | behavior. You must either use a different mask, or not use DMA. |
123 | 123 | ||
124 | This means that in the failure case, you have three options: | 124 | This means that in the failure case, you have three options: |
125 | 125 | ||
126 | 1) Use another DMA mask, if possible (see below). | 126 | 1) Use another DMA mask, if possible (see below). |
127 | 2) Use some non-DMA mode for data transfer, if possible. | 127 | 2) Use some non-DMA mode for data transfer, if possible. |
128 | 3) Ignore this device and do not initialize it. | 128 | 3) Ignore this device and do not initialize it. |
129 | 129 | ||
130 | It is recommended that your driver print a kernel KERN_WARNING message | 130 | It is recommended that your driver print a kernel KERN_WARNING message |
131 | when you end up performing either #2 or #3. In this manner, if a user | 131 | when you end up performing either #2 or #3. In this manner, if a user |
132 | of your driver reports that performance is bad or that the device is not | 132 | of your driver reports that performance is bad or that the device is not |
133 | even detected, you can ask them for the kernel messages to find out | 133 | even detected, you can ask them for the kernel messages to find out |
134 | exactly why. | 134 | exactly why. |
135 | 135 | ||
136 | The standard 32-bit addressing PCI device would do something like | 136 | The standard 32-bit addressing PCI device would do something like |
137 | this: | 137 | this: |
138 | 138 | ||
139 | if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { | 139 | if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { |
140 | printk(KERN_WARNING | 140 | printk(KERN_WARNING |
141 | "mydev: No suitable DMA available.\n"); | 141 | "mydev: No suitable DMA available.\n"); |
142 | goto ignore_this_device; | 142 | goto ignore_this_device; |
143 | } | 143 | } |
144 | 144 | ||
145 | Another common scenario is a 64-bit capable device. The approach | 145 | Another common scenario is a 64-bit capable device. The approach |
146 | here is to try for 64-bit DAC addressing, but back down to a | 146 | here is to try for 64-bit DAC addressing, but back down to a |
147 | 32-bit mask should that fail. The PCI platform code may fail the | 147 | 32-bit mask should that fail. The PCI platform code may fail the |
148 | 64-bit mask not because the platform is not capable of 64-bit | 148 | 64-bit mask not because the platform is not capable of 64-bit |
149 | addressing. Rather, it may fail in this case simply because | 149 | addressing. Rather, it may fail in this case simply because |
150 | 32-bit SAC addressing is done more efficiently than DAC addressing. | 150 | 32-bit SAC addressing is done more efficiently than DAC addressing. |
151 | Sparc64 is one platform which behaves in this way. | 151 | Sparc64 is one platform which behaves in this way. |
152 | 152 | ||
153 | Here is how you would handle a 64-bit capable device which can drive | 153 | Here is how you would handle a 64-bit capable device which can drive |
154 | all 64-bits when accessing streaming DMA: | 154 | all 64-bits when accessing streaming DMA: |
155 | 155 | ||
156 | int using_dac; | 156 | int using_dac; |
157 | 157 | ||
158 | if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { | 158 | if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { |
159 | using_dac = 1; | 159 | using_dac = 1; |
160 | } else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { | 160 | } else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { |
161 | using_dac = 0; | 161 | using_dac = 0; |
162 | } else { | 162 | } else { |
163 | printk(KERN_WARNING | 163 | printk(KERN_WARNING |
164 | "mydev: No suitable DMA available.\n"); | 164 | "mydev: No suitable DMA available.\n"); |
165 | goto ignore_this_device; | 165 | goto ignore_this_device; |
166 | } | 166 | } |
167 | 167 | ||
168 | If a card is capable of using 64-bit consistent allocations as well, | 168 | If a card is capable of using 64-bit consistent allocations as well, |
169 | the case would look like this: | 169 | the case would look like this: |
170 | 170 | ||
171 | int using_dac, consistent_using_dac; | 171 | int using_dac, consistent_using_dac; |
172 | 172 | ||
173 | if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { | 173 | if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { |
174 | using_dac = 1; | 174 | using_dac = 1; |
175 | consistent_using_dac = 1; | 175 | consistent_using_dac = 1; |
176 | pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); | 176 | pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); |
177 | } else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { | 177 | } else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { |
178 | using_dac = 0; | 178 | using_dac = 0; |
179 | consistent_using_dac = 0; | 179 | consistent_using_dac = 0; |
180 | pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK); | 180 | pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK); |
181 | } else { | 181 | } else { |
182 | printk(KERN_WARNING | 182 | printk(KERN_WARNING |
183 | "mydev: No suitable DMA available.\n"); | 183 | "mydev: No suitable DMA available.\n"); |
184 | goto ignore_this_device; | 184 | goto ignore_this_device; |
185 | } | 185 | } |
186 | 186 | ||
187 | pci_set_consistent_dma_mask() will always be able to set the same or a | 187 | pci_set_consistent_dma_mask() will always be able to set the same or a |
188 | smaller mask as pci_set_dma_mask(). However for the rare case that a | 188 | smaller mask as pci_set_dma_mask(). However for the rare case that a |
189 | device driver only uses consistent allocations, one would have to | 189 | device driver only uses consistent allocations, one would have to |
190 | check the return value from pci_set_consistent_dma_mask(). | 190 | check the return value from pci_set_consistent_dma_mask(). |
191 | 191 | ||
192 | If your 64-bit device is going to be an enormous consumer of DMA | 192 | If your 64-bit device is going to be an enormous consumer of DMA |
193 | mappings, this can be problematic since the DMA mappings are a | 193 | mappings, this can be problematic since the DMA mappings are a |
194 | finite resource on many platforms. Please see the "DAC Addressing | 194 | finite resource on many platforms. Please see the "DAC Addressing |
195 | for Address Space Hungry Devices" section near the end of this | 195 | for Address Space Hungry Devices" section near the end of this |
196 | document for how to handle this case. | 196 | document for how to handle this case. |
197 | 197 | ||
198 | Finally, if your device can only drive the low 24-bits of | 198 | Finally, if your device can only drive the low 24-bits of |
199 | address during PCI bus mastering you might do something like: | 199 | address during PCI bus mastering you might do something like: |
200 | 200 | ||
201 | if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) { | 201 | if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) { |
202 | printk(KERN_WARNING | 202 | printk(KERN_WARNING |
203 | "mydev: 24-bit DMA addressing not available.\n"); | 203 | "mydev: 24-bit DMA addressing not available.\n"); |
204 | goto ignore_this_device; | 204 | goto ignore_this_device; |
205 | } | 205 | } |
206 | [Better use DMA_24BIT_MASK instead of 0x00ffffff. | 206 | [Better use DMA_24BIT_MASK instead of 0x00ffffff. |
207 | See linux/include/dma-mapping.h for reference.] | 207 | See linux/include/dma-mapping.h for reference.] |
208 | 208 | ||
209 | When pci_set_dma_mask() is successful, and returns zero, the PCI layer | 209 | When pci_set_dma_mask() is successful, and returns zero, the PCI layer |
210 | saves away this mask you have provided. The PCI layer will use this | 210 | saves away this mask you have provided. The PCI layer will use this |
211 | information later when you make DMA mappings. | 211 | information later when you make DMA mappings. |
212 | 212 | ||
213 | There is a case which we are aware of at this time, which is worth | 213 | There is a case which we are aware of at this time, which is worth |
214 | mentioning in this documentation. If your device supports multiple | 214 | mentioning in this documentation. If your device supports multiple |
215 | functions (for example a sound card provides playback and record | 215 | functions (for example a sound card provides playback and record |
216 | functions) and the various different functions have _different_ | 216 | functions) and the various different functions have _different_ |
217 | DMA addressing limitations, you may wish to probe each mask and | 217 | DMA addressing limitations, you may wish to probe each mask and |
218 | only provide the functionality which the machine can handle. It | 218 | only provide the functionality which the machine can handle. It |
219 | is important that the last call to pci_set_dma_mask() be for the | 219 | is important that the last call to pci_set_dma_mask() be for the |
220 | most specific mask. | 220 | most specific mask. |
221 | 221 | ||
222 | Here is pseudo-code showing how this might be done: | 222 | Here is pseudo-code showing how this might be done: |
223 | 223 | ||
224 | #define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK | 224 | #define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK |
225 | #define RECORD_ADDRESS_BITS 0x00ffffff | 225 | #define RECORD_ADDRESS_BITS 0x00ffffff |
226 | 226 | ||
227 | struct my_sound_card *card; | 227 | struct my_sound_card *card; |
228 | struct pci_dev *pdev; | 228 | struct pci_dev *pdev; |
229 | 229 | ||
230 | ... | 230 | ... |
231 | if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { | 231 | if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { |
232 | card->playback_enabled = 1; | 232 | card->playback_enabled = 1; |
233 | } else { | 233 | } else { |
234 | card->playback_enabled = 0; | 234 | card->playback_enabled = 0; |
235 | printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n", | 235 | printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n", |
236 | card->name); | 236 | card->name); |
237 | } | 237 | } |
238 | if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { | 238 | if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { |
239 | card->record_enabled = 1; | 239 | card->record_enabled = 1; |
240 | } else { | 240 | } else { |
241 | card->record_enabled = 0; | 241 | card->record_enabled = 0; |
242 | printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n", | 242 | printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n", |
243 | card->name); | 243 | card->name); |
244 | } | 244 | } |
245 | 245 | ||
246 | A sound card was used as an example here because this genre of PCI | 246 | A sound card was used as an example here because this genre of PCI |
247 | devices seems to be littered with ISA chips given a PCI front end, | 247 | devices seems to be littered with ISA chips given a PCI front end, |
248 | and thus retaining the 16MB DMA addressing limitations of ISA. | 248 | and thus retaining the 16MB DMA addressing limitations of ISA. |
249 | 249 | ||
250 | Types of DMA mappings | 250 | Types of DMA mappings |
251 | 251 | ||
252 | There are two types of DMA mappings: | 252 | There are two types of DMA mappings: |
253 | 253 | ||
254 | - Consistent DMA mappings which are usually mapped at driver | 254 | - Consistent DMA mappings which are usually mapped at driver |
255 | initialization, unmapped at the end and for which the hardware should | 255 | initialization, unmapped at the end and for which the hardware should |
256 | guarantee that the device and the CPU can access the data | 256 | guarantee that the device and the CPU can access the data |
257 | in parallel and will see updates made by each other without any | 257 | in parallel and will see updates made by each other without any |
258 | explicit software flushing. | 258 | explicit software flushing. |
259 | 259 | ||
260 | Think of "consistent" as "synchronous" or "coherent". | 260 | Think of "consistent" as "synchronous" or "coherent". |
261 | 261 | ||
262 | The current default is to return consistent memory in the low 32 | 262 | The current default is to return consistent memory in the low 32 |
263 | bits of the PCI bus space. However, for future compatibility you | 263 | bits of the PCI bus space. However, for future compatibility you |
264 | should set the consistent mask even if this default is fine for your | 264 | should set the consistent mask even if this default is fine for your |
265 | driver. | 265 | driver. |
266 | 266 | ||
267 | Good examples of what to use consistent mappings for are: | 267 | Good examples of what to use consistent mappings for are: |
268 | 268 | ||
269 | - Network card DMA ring descriptors. | 269 | - Network card DMA ring descriptors. |
270 | - SCSI adapter mailbox command data structures. | 270 | - SCSI adapter mailbox command data structures. |
271 | - Device firmware microcode executed out of | 271 | - Device firmware microcode executed out of |
272 | main memory. | 272 | main memory. |
273 | 273 | ||
274 | The invariant these examples all require is that any CPU store | 274 | The invariant these examples all require is that any CPU store |
275 | to memory is immediately visible to the device, and vice | 275 | to memory is immediately visible to the device, and vice |
276 | versa. Consistent mappings guarantee this. | 276 | versa. Consistent mappings guarantee this. |
277 | 277 | ||
278 | IMPORTANT: Consistent DMA memory does not preclude the usage of | 278 | IMPORTANT: Consistent DMA memory does not preclude the usage of |
279 | proper memory barriers. The CPU may reorder stores to | 279 | proper memory barriers. The CPU may reorder stores to |
280 | consistent memory just as it may normal memory. Example: | 280 | consistent memory just as it may normal memory. Example: |
281 | if it is important for the device to see the first word | 281 | if it is important for the device to see the first word |
282 | of a descriptor updated before the second, you must do | 282 | of a descriptor updated before the second, you must do |
283 | something like: | 283 | something like: |
284 | 284 | ||
285 | desc->word0 = address; | 285 | desc->word0 = address; |
286 | wmb(); | 286 | wmb(); |
287 | desc->word1 = DESC_VALID; | 287 | desc->word1 = DESC_VALID; |
288 | 288 | ||
289 | in order to get correct behavior on all platforms. | 289 | in order to get correct behavior on all platforms. |
290 | 290 | ||
291 | Also, on some platforms your driver may need to flush CPU write | 291 | Also, on some platforms your driver may need to flush CPU write |
292 | buffers in much the same way as it needs to flush write buffers | 292 | buffers in much the same way as it needs to flush write buffers |
293 | found in PCI bridges (such as by reading a register's value | 293 | found in PCI bridges (such as by reading a register's value |
294 | after writing it). | 294 | after writing it). |
295 | 295 | ||
296 | - Streaming DMA mappings which are usually mapped for one DMA transfer, | 296 | - Streaming DMA mappings which are usually mapped for one DMA transfer, |
297 | unmapped right after it (unless you use pci_dma_sync_* below) and for which | 297 | unmapped right after it (unless you use pci_dma_sync_* below) and for which |
298 | hardware can optimize for sequential accesses. | 298 | hardware can optimize for sequential accesses. |
299 | 299 | ||
300 | This of "streaming" as "asynchronous" or "outside the coherency | 300 | This of "streaming" as "asynchronous" or "outside the coherency |
301 | domain". | 301 | domain". |
302 | 302 | ||
303 | Good examples of what to use streaming mappings for are: | 303 | Good examples of what to use streaming mappings for are: |
304 | 304 | ||
305 | - Networking buffers transmitted/received by a device. | 305 | - Networking buffers transmitted/received by a device. |
306 | - Filesystem buffers written/read by a SCSI device. | 306 | - Filesystem buffers written/read by a SCSI device. |
307 | 307 | ||
308 | The interfaces for using this type of mapping were designed in | 308 | The interfaces for using this type of mapping were designed in |
309 | such a way that an implementation can make whatever performance | 309 | such a way that an implementation can make whatever performance |
310 | optimizations the hardware allows. To this end, when using | 310 | optimizations the hardware allows. To this end, when using |
311 | such mappings you must be explicit about what you want to happen. | 311 | such mappings you must be explicit about what you want to happen. |
312 | 312 | ||
313 | Neither type of DMA mapping has alignment restrictions that come | 313 | Neither type of DMA mapping has alignment restrictions that come |
314 | from PCI, although some devices may have such restrictions. | 314 | from PCI, although some devices may have such restrictions. |
315 | Also, systems with caches that aren't DMA-coherent will work better | 315 | Also, systems with caches that aren't DMA-coherent will work better |
316 | when the underlying buffers don't share cache lines with other data. | 316 | when the underlying buffers don't share cache lines with other data. |
317 | 317 | ||
318 | 318 | ||
319 | Using Consistent DMA mappings. | 319 | Using Consistent DMA mappings. |
320 | 320 | ||
321 | To allocate and map large (PAGE_SIZE or so) consistent DMA regions, | 321 | To allocate and map large (PAGE_SIZE or so) consistent DMA regions, |
322 | you should do: | 322 | you should do: |
323 | 323 | ||
324 | dma_addr_t dma_handle; | 324 | dma_addr_t dma_handle; |
325 | 325 | ||
326 | cpu_addr = pci_alloc_consistent(dev, size, &dma_handle); | 326 | cpu_addr = pci_alloc_consistent(dev, size, &dma_handle); |
327 | 327 | ||
328 | where dev is a struct pci_dev *. You should pass NULL for PCI like buses | 328 | where dev is a struct pci_dev *. You should pass NULL for PCI like buses |
329 | where devices don't have struct pci_dev (like ISA, EISA). This may be | 329 | where devices don't have struct pci_dev (like ISA, EISA). This may be |
330 | called in interrupt context. | 330 | called in interrupt context. |
331 | 331 | ||
332 | This argument is needed because the DMA translations may be bus | 332 | This argument is needed because the DMA translations may be bus |
333 | specific (and often is private to the bus which the device is attached | 333 | specific (and often is private to the bus which the device is attached |
334 | to). | 334 | to). |
335 | 335 | ||
336 | Size is the length of the region you want to allocate, in bytes. | 336 | Size is the length of the region you want to allocate, in bytes. |
337 | 337 | ||
338 | This routine will allocate RAM for that region, so it acts similarly to | 338 | This routine will allocate RAM for that region, so it acts similarly to |
339 | __get_free_pages (but takes size instead of a page order). If your | 339 | __get_free_pages (but takes size instead of a page order). If your |
340 | driver needs regions sized smaller than a page, you may prefer using | 340 | driver needs regions sized smaller than a page, you may prefer using |
341 | the pci_pool interface, described below. | 341 | the pci_pool interface, described below. |
342 | 342 | ||
343 | The consistent DMA mapping interfaces, for non-NULL dev, will by | 343 | The consistent DMA mapping interfaces, for non-NULL dev, will by |
344 | default return a DMA address which is SAC (Single Address Cycle) | 344 | default return a DMA address which is SAC (Single Address Cycle) |
345 | addressable. Even if the device indicates (via PCI dma mask) that it | 345 | addressable. Even if the device indicates (via PCI dma mask) that it |
346 | may address the upper 32-bits and thus perform DAC cycles, consistent | 346 | may address the upper 32-bits and thus perform DAC cycles, consistent |
347 | allocation will only return > 32-bit PCI addresses for DMA if the | 347 | allocation will only return > 32-bit PCI addresses for DMA if the |
348 | consistent dma mask has been explicitly changed via | 348 | consistent dma mask has been explicitly changed via |
349 | pci_set_consistent_dma_mask(). This is true of the pci_pool interface | 349 | pci_set_consistent_dma_mask(). This is true of the pci_pool interface |
350 | as well. | 350 | as well. |
351 | 351 | ||
352 | pci_alloc_consistent returns two values: the virtual address which you | 352 | pci_alloc_consistent returns two values: the virtual address which you |
353 | can use to access it from the CPU and dma_handle which you pass to the | 353 | can use to access it from the CPU and dma_handle which you pass to the |
354 | card. | 354 | card. |
355 | 355 | ||
356 | The cpu return address and the DMA bus master address are both | 356 | The cpu return address and the DMA bus master address are both |
357 | guaranteed to be aligned to the smallest PAGE_SIZE order which | 357 | guaranteed to be aligned to the smallest PAGE_SIZE order which |
358 | is greater than or equal to the requested size. This invariant | 358 | is greater than or equal to the requested size. This invariant |
359 | exists (for example) to guarantee that if you allocate a chunk | 359 | exists (for example) to guarantee that if you allocate a chunk |
360 | which is smaller than or equal to 64 kilobytes, the extent of the | 360 | which is smaller than or equal to 64 kilobytes, the extent of the |
361 | buffer you receive will not cross a 64K boundary. | 361 | buffer you receive will not cross a 64K boundary. |
362 | 362 | ||
363 | To unmap and free such a DMA region, you call: | 363 | To unmap and free such a DMA region, you call: |
364 | 364 | ||
365 | pci_free_consistent(dev, size, cpu_addr, dma_handle); | 365 | pci_free_consistent(dev, size, cpu_addr, dma_handle); |
366 | 366 | ||
367 | where dev, size are the same as in the above call and cpu_addr and | 367 | where dev, size are the same as in the above call and cpu_addr and |
368 | dma_handle are the values pci_alloc_consistent returned to you. | 368 | dma_handle are the values pci_alloc_consistent returned to you. |
369 | This function may not be called in interrupt context. | 369 | This function may not be called in interrupt context. |
370 | 370 | ||
371 | If your driver needs lots of smaller memory regions, you can write | 371 | If your driver needs lots of smaller memory regions, you can write |
372 | custom code to subdivide pages returned by pci_alloc_consistent, | 372 | custom code to subdivide pages returned by pci_alloc_consistent, |
373 | or you can use the pci_pool API to do that. A pci_pool is like | 373 | or you can use the pci_pool API to do that. A pci_pool is like |
374 | a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. | 374 | a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. |
375 | Also, it understands common hardware constraints for alignment, | 375 | Also, it understands common hardware constraints for alignment, |
376 | like queue heads needing to be aligned on N byte boundaries. | 376 | like queue heads needing to be aligned on N byte boundaries. |
377 | 377 | ||
378 | Create a pci_pool like this: | 378 | Create a pci_pool like this: |
379 | 379 | ||
380 | struct pci_pool *pool; | 380 | struct pci_pool *pool; |
381 | 381 | ||
382 | pool = pci_pool_create(name, dev, size, align, alloc); | 382 | pool = pci_pool_create(name, dev, size, align, alloc); |
383 | 383 | ||
384 | The "name" is for diagnostics (like a kmem_cache name); dev and size | 384 | The "name" is for diagnostics (like a kmem_cache name); dev and size |
385 | are as above. The device's hardware alignment requirement for this | 385 | are as above. The device's hardware alignment requirement for this |
386 | type of data is "align" (which is expressed in bytes, and must be a | 386 | type of data is "align" (which is expressed in bytes, and must be a |
387 | power of two). If your device has no boundary crossing restrictions, | 387 | power of two). If your device has no boundary crossing restrictions, |
388 | pass 0 for alloc; passing 4096 says memory allocated from this pool | 388 | pass 0 for alloc; passing 4096 says memory allocated from this pool |
389 | must not cross 4KByte boundaries (but at that time it may be better to | 389 | must not cross 4KByte boundaries (but at that time it may be better to |
390 | go for pci_alloc_consistent directly instead). | 390 | go for pci_alloc_consistent directly instead). |
391 | 391 | ||
392 | Allocate memory from a pci pool like this: | 392 | Allocate memory from a pci pool like this: |
393 | 393 | ||
394 | cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); | 394 | cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); |
395 | 395 | ||
396 | flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor | 396 | flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor |
397 | holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, | 397 | holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, |
398 | this returns two values, cpu_addr and dma_handle. | 398 | this returns two values, cpu_addr and dma_handle. |
399 | 399 | ||
400 | Free memory that was allocated from a pci_pool like this: | 400 | Free memory that was allocated from a pci_pool like this: |
401 | 401 | ||
402 | pci_pool_free(pool, cpu_addr, dma_handle); | 402 | pci_pool_free(pool, cpu_addr, dma_handle); |
403 | 403 | ||
404 | where pool is what you passed to pci_pool_alloc, and cpu_addr and | 404 | where pool is what you passed to pci_pool_alloc, and cpu_addr and |
405 | dma_handle are the values pci_pool_alloc returned. This function | 405 | dma_handle are the values pci_pool_alloc returned. This function |
406 | may be called in interrupt context. | 406 | may be called in interrupt context. |
407 | 407 | ||
408 | Destroy a pci_pool by calling: | 408 | Destroy a pci_pool by calling: |
409 | 409 | ||
410 | pci_pool_destroy(pool); | 410 | pci_pool_destroy(pool); |
411 | 411 | ||
412 | Make sure you've called pci_pool_free for all memory allocated | 412 | Make sure you've called pci_pool_free for all memory allocated |
413 | from a pool before you destroy the pool. This function may not | 413 | from a pool before you destroy the pool. This function may not |
414 | be called in interrupt context. | 414 | be called in interrupt context. |
415 | 415 | ||
416 | DMA Direction | 416 | DMA Direction |
417 | 417 | ||
418 | The interfaces described in subsequent portions of this document | 418 | The interfaces described in subsequent portions of this document |
419 | take a DMA direction argument, which is an integer and takes on | 419 | take a DMA direction argument, which is an integer and takes on |
420 | one of the following values: | 420 | one of the following values: |
421 | 421 | ||
422 | PCI_DMA_BIDIRECTIONAL | 422 | PCI_DMA_BIDIRECTIONAL |
423 | PCI_DMA_TODEVICE | 423 | PCI_DMA_TODEVICE |
424 | PCI_DMA_FROMDEVICE | 424 | PCI_DMA_FROMDEVICE |
425 | PCI_DMA_NONE | 425 | PCI_DMA_NONE |
426 | 426 | ||
427 | One should provide the exact DMA direction if you know it. | 427 | One should provide the exact DMA direction if you know it. |
428 | 428 | ||
429 | PCI_DMA_TODEVICE means "from main memory to the PCI device" | 429 | PCI_DMA_TODEVICE means "from main memory to the PCI device" |
430 | PCI_DMA_FROMDEVICE means "from the PCI device to main memory" | 430 | PCI_DMA_FROMDEVICE means "from the PCI device to main memory" |
431 | It is the direction in which the data moves during the DMA | 431 | It is the direction in which the data moves during the DMA |
432 | transfer. | 432 | transfer. |
433 | 433 | ||
434 | You are _strongly_ encouraged to specify this as precisely | 434 | You are _strongly_ encouraged to specify this as precisely |
435 | as you possibly can. | 435 | as you possibly can. |
436 | 436 | ||
437 | If you absolutely cannot know the direction of the DMA transfer, | 437 | If you absolutely cannot know the direction of the DMA transfer, |
438 | specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in | 438 | specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in |
439 | either direction. The platform guarantees that you may legally | 439 | either direction. The platform guarantees that you may legally |
440 | specify this, and that it will work, but this may be at the | 440 | specify this, and that it will work, but this may be at the |
441 | cost of performance for example. | 441 | cost of performance for example. |
442 | 442 | ||
443 | The value PCI_DMA_NONE is to be used for debugging. One can | 443 | The value PCI_DMA_NONE is to be used for debugging. One can |
444 | hold this in a data structure before you come to know the | 444 | hold this in a data structure before you come to know the |
445 | precise direction, and this will help catch cases where your | 445 | precise direction, and this will help catch cases where your |
446 | direction tracking logic has failed to set things up properly. | 446 | direction tracking logic has failed to set things up properly. |
447 | 447 | ||
448 | Another advantage of specifying this value precisely (outside of | 448 | Another advantage of specifying this value precisely (outside of |
449 | potential platform-specific optimizations of such) is for debugging. | 449 | potential platform-specific optimizations of such) is for debugging. |
450 | Some platforms actually have a write permission boolean which DMA | 450 | Some platforms actually have a write permission boolean which DMA |
451 | mappings can be marked with, much like page protections in the user | 451 | mappings can be marked with, much like page protections in the user |
452 | program address space. Such platforms can and do report errors in the | 452 | program address space. Such platforms can and do report errors in the |
453 | kernel logs when the PCI controller hardware detects violation of the | 453 | kernel logs when the PCI controller hardware detects violation of the |
454 | permission setting. | 454 | permission setting. |
455 | 455 | ||
456 | Only streaming mappings specify a direction, consistent mappings | 456 | Only streaming mappings specify a direction, consistent mappings |
457 | implicitly have a direction attribute setting of | 457 | implicitly have a direction attribute setting of |
458 | PCI_DMA_BIDIRECTIONAL. | 458 | PCI_DMA_BIDIRECTIONAL. |
459 | 459 | ||
460 | The SCSI subsystem tells you the direction to use in the | 460 | The SCSI subsystem tells you the direction to use in the |
461 | 'sc_data_direction' member of the SCSI command your driver is | 461 | 'sc_data_direction' member of the SCSI command your driver is |
462 | working on. | 462 | working on. |
463 | 463 | ||
464 | For Networking drivers, it's a rather simple affair. For transmit | 464 | For Networking drivers, it's a rather simple affair. For transmit |
465 | packets, map/unmap them with the PCI_DMA_TODEVICE direction | 465 | packets, map/unmap them with the PCI_DMA_TODEVICE direction |
466 | specifier. For receive packets, just the opposite, map/unmap them | 466 | specifier. For receive packets, just the opposite, map/unmap them |
467 | with the PCI_DMA_FROMDEVICE direction specifier. | 467 | with the PCI_DMA_FROMDEVICE direction specifier. |
468 | 468 | ||
469 | Using Streaming DMA mappings | 469 | Using Streaming DMA mappings |
470 | 470 | ||
471 | The streaming DMA mapping routines can be called from interrupt | 471 | The streaming DMA mapping routines can be called from interrupt |
472 | context. There are two versions of each map/unmap, one which will | 472 | context. There are two versions of each map/unmap, one which will |
473 | map/unmap a single memory region, and one which will map/unmap a | 473 | map/unmap a single memory region, and one which will map/unmap a |
474 | scatterlist. | 474 | scatterlist. |
475 | 475 | ||
476 | To map a single region, you do: | 476 | To map a single region, you do: |
477 | 477 | ||
478 | struct pci_dev *pdev = mydev->pdev; | 478 | struct pci_dev *pdev = mydev->pdev; |
479 | dma_addr_t dma_handle; | 479 | dma_addr_t dma_handle; |
480 | void *addr = buffer->ptr; | 480 | void *addr = buffer->ptr; |
481 | size_t size = buffer->len; | 481 | size_t size = buffer->len; |
482 | 482 | ||
483 | dma_handle = pci_map_single(dev, addr, size, direction); | 483 | dma_handle = pci_map_single(dev, addr, size, direction); |
484 | 484 | ||
485 | and to unmap it: | 485 | and to unmap it: |
486 | 486 | ||
487 | pci_unmap_single(dev, dma_handle, size, direction); | 487 | pci_unmap_single(dev, dma_handle, size, direction); |
488 | 488 | ||
489 | You should call pci_unmap_single when the DMA activity is finished, e.g. | 489 | You should call pci_unmap_single when the DMA activity is finished, e.g. |
490 | from the interrupt which told you that the DMA transfer is done. | 490 | from the interrupt which told you that the DMA transfer is done. |
491 | 491 | ||
492 | Using cpu pointers like this for single mappings has a disadvantage, | 492 | Using cpu pointers like this for single mappings has a disadvantage, |
493 | you cannot reference HIGHMEM memory in this way. Thus, there is a | 493 | you cannot reference HIGHMEM memory in this way. Thus, there is a |
494 | map/unmap interface pair akin to pci_{map,unmap}_single. These | 494 | map/unmap interface pair akin to pci_{map,unmap}_single. These |
495 | interfaces deal with page/offset pairs instead of cpu pointers. | 495 | interfaces deal with page/offset pairs instead of cpu pointers. |
496 | Specifically: | 496 | Specifically: |
497 | 497 | ||
498 | struct pci_dev *pdev = mydev->pdev; | 498 | struct pci_dev *pdev = mydev->pdev; |
499 | dma_addr_t dma_handle; | 499 | dma_addr_t dma_handle; |
500 | struct page *page = buffer->page; | 500 | struct page *page = buffer->page; |
501 | unsigned long offset = buffer->offset; | 501 | unsigned long offset = buffer->offset; |
502 | size_t size = buffer->len; | 502 | size_t size = buffer->len; |
503 | 503 | ||
504 | dma_handle = pci_map_page(dev, page, offset, size, direction); | 504 | dma_handle = pci_map_page(dev, page, offset, size, direction); |
505 | 505 | ||
506 | ... | 506 | ... |
507 | 507 | ||
508 | pci_unmap_page(dev, dma_handle, size, direction); | 508 | pci_unmap_page(dev, dma_handle, size, direction); |
509 | 509 | ||
510 | Here, "offset" means byte offset within the given page. | 510 | Here, "offset" means byte offset within the given page. |
511 | 511 | ||
512 | With scatterlists, you map a region gathered from several regions by: | 512 | With scatterlists, you map a region gathered from several regions by: |
513 | 513 | ||
514 | int i, count = pci_map_sg(dev, sglist, nents, direction); | 514 | int i, count = pci_map_sg(dev, sglist, nents, direction); |
515 | struct scatterlist *sg; | 515 | struct scatterlist *sg; |
516 | 516 | ||
517 | for (i = 0, sg = sglist; i < count; i++, sg++) { | 517 | for (i = 0, sg = sglist; i < count; i++, sg++) { |
518 | hw_address[i] = sg_dma_address(sg); | 518 | hw_address[i] = sg_dma_address(sg); |
519 | hw_len[i] = sg_dma_len(sg); | 519 | hw_len[i] = sg_dma_len(sg); |
520 | } | 520 | } |
521 | 521 | ||
522 | where nents is the number of entries in the sglist. | 522 | where nents is the number of entries in the sglist. |
523 | 523 | ||
524 | The implementation is free to merge several consecutive sglist entries | 524 | The implementation is free to merge several consecutive sglist entries |
525 | into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any | 525 | into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any |
526 | consecutive sglist entries can be merged into one provided the first one | 526 | consecutive sglist entries can be merged into one provided the first one |
527 | ends and the second one starts on a page boundary - in fact this is a huge | 527 | ends and the second one starts on a page boundary - in fact this is a huge |
528 | advantage for cards which either cannot do scatter-gather or have very | 528 | advantage for cards which either cannot do scatter-gather or have very |
529 | limited number of scatter-gather entries) and returns the actual number | 529 | limited number of scatter-gather entries) and returns the actual number |
530 | of sg entries it mapped them to. On failure 0 is returned. | 530 | of sg entries it mapped them to. On failure 0 is returned. |
531 | 531 | ||
532 | Then you should loop count times (note: this can be less than nents times) | 532 | Then you should loop count times (note: this can be less than nents times) |
533 | and use sg_dma_address() and sg_dma_len() macros where you previously | 533 | and use sg_dma_address() and sg_dma_len() macros where you previously |
534 | accessed sg->address and sg->length as shown above. | 534 | accessed sg->address and sg->length as shown above. |
535 | 535 | ||
536 | To unmap a scatterlist, just call: | 536 | To unmap a scatterlist, just call: |
537 | 537 | ||
538 | pci_unmap_sg(dev, sglist, nents, direction); | 538 | pci_unmap_sg(dev, sglist, nents, direction); |
539 | 539 | ||
540 | Again, make sure DMA activity has already finished. | 540 | Again, make sure DMA activity has already finished. |
541 | 541 | ||
542 | PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be | 542 | PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be |
543 | the _same_ one you passed into the pci_map_sg call, | 543 | the _same_ one you passed into the pci_map_sg call, |
544 | it should _NOT_ be the 'count' value _returned_ from the | 544 | it should _NOT_ be the 'count' value _returned_ from the |
545 | pci_map_sg call. | 545 | pci_map_sg call. |
546 | 546 | ||
547 | Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} | 547 | Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} |
548 | counterpart, because the bus address space is a shared resource (although | 548 | counterpart, because the bus address space is a shared resource (although |
549 | in some ports the mapping is per each BUS so less devices contend for the | 549 | in some ports the mapping is per each BUS so less devices contend for the |
550 | same bus address space) and you could render the machine unusable by eating | 550 | same bus address space) and you could render the machine unusable by eating |
551 | all bus addresses. | 551 | all bus addresses. |
552 | 552 | ||
553 | If you need to use the same streaming DMA region multiple times and touch | 553 | If you need to use the same streaming DMA region multiple times and touch |
554 | the data in between the DMA transfers, the buffer needs to be synced | 554 | the data in between the DMA transfers, the buffer needs to be synced |
555 | properly in order for the cpu and device to see the most uptodate and | 555 | properly in order for the cpu and device to see the most uptodate and |
556 | correct copy of the DMA buffer. | 556 | correct copy of the DMA buffer. |
557 | 557 | ||
558 | So, firstly, just map it with pci_map_{single,sg}, and after each DMA | 558 | So, firstly, just map it with pci_map_{single,sg}, and after each DMA |
559 | transfer call either: | 559 | transfer call either: |
560 | 560 | ||
561 | pci_dma_sync_single_for_cpu(dev, dma_handle, size, direction); | 561 | pci_dma_sync_single_for_cpu(dev, dma_handle, size, direction); |
562 | 562 | ||
563 | or: | 563 | or: |
564 | 564 | ||
565 | pci_dma_sync_sg_for_cpu(dev, sglist, nents, direction); | 565 | pci_dma_sync_sg_for_cpu(dev, sglist, nents, direction); |
566 | 566 | ||
567 | as appropriate. | 567 | as appropriate. |
568 | 568 | ||
569 | Then, if you wish to let the device get at the DMA area again, | 569 | Then, if you wish to let the device get at the DMA area again, |
570 | finish accessing the data with the cpu, and then before actually | 570 | finish accessing the data with the cpu, and then before actually |
571 | giving the buffer to the hardware call either: | 571 | giving the buffer to the hardware call either: |
572 | 572 | ||
573 | pci_dma_sync_single_for_device(dev, dma_handle, size, direction); | 573 | pci_dma_sync_single_for_device(dev, dma_handle, size, direction); |
574 | 574 | ||
575 | or: | 575 | or: |
576 | 576 | ||
577 | pci_dma_sync_sg_for_device(dev, sglist, nents, direction); | 577 | pci_dma_sync_sg_for_device(dev, sglist, nents, direction); |
578 | 578 | ||
579 | as appropriate. | 579 | as appropriate. |
580 | 580 | ||
581 | After the last DMA transfer call one of the DMA unmap routines | 581 | After the last DMA transfer call one of the DMA unmap routines |
582 | pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* | 582 | pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* |
583 | call till pci_unmap_*, then you don't have to call the pci_dma_sync_* | 583 | call till pci_unmap_*, then you don't have to call the pci_dma_sync_* |
584 | routines at all. | 584 | routines at all. |
585 | 585 | ||
586 | Here is pseudo code which shows a situation in which you would need | 586 | Here is pseudo code which shows a situation in which you would need |
587 | to use the pci_dma_sync_*() interfaces. | 587 | to use the pci_dma_sync_*() interfaces. |
588 | 588 | ||
589 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) | 589 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) |
590 | { | 590 | { |
591 | dma_addr_t mapping; | 591 | dma_addr_t mapping; |
592 | 592 | ||
593 | mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); | 593 | mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); |
594 | 594 | ||
595 | cp->rx_buf = buffer; | 595 | cp->rx_buf = buffer; |
596 | cp->rx_len = len; | 596 | cp->rx_len = len; |
597 | cp->rx_dma = mapping; | 597 | cp->rx_dma = mapping; |
598 | 598 | ||
599 | give_rx_buf_to_card(cp); | 599 | give_rx_buf_to_card(cp); |
600 | } | 600 | } |
601 | 601 | ||
602 | ... | 602 | ... |
603 | 603 | ||
604 | my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) | 604 | my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) |
605 | { | 605 | { |
606 | struct my_card *cp = devid; | 606 | struct my_card *cp = devid; |
607 | 607 | ||
608 | ... | 608 | ... |
609 | if (read_card_status(cp) == RX_BUF_TRANSFERRED) { | 609 | if (read_card_status(cp) == RX_BUF_TRANSFERRED) { |
610 | struct my_card_header *hp; | 610 | struct my_card_header *hp; |
611 | 611 | ||
612 | /* Examine the header to see if we wish | 612 | /* Examine the header to see if we wish |
613 | * to accept the data. But synchronize | 613 | * to accept the data. But synchronize |
614 | * the DMA transfer with the CPU first | 614 | * the DMA transfer with the CPU first |
615 | * so that we see updated contents. | 615 | * so that we see updated contents. |
616 | */ | 616 | */ |
617 | pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, | 617 | pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, |
618 | cp->rx_len, | 618 | cp->rx_len, |
619 | PCI_DMA_FROMDEVICE); | 619 | PCI_DMA_FROMDEVICE); |
620 | 620 | ||
621 | /* Now it is safe to examine the buffer. */ | 621 | /* Now it is safe to examine the buffer. */ |
622 | hp = (struct my_card_header *) cp->rx_buf; | 622 | hp = (struct my_card_header *) cp->rx_buf; |
623 | if (header_is_ok(hp)) { | 623 | if (header_is_ok(hp)) { |
624 | pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, | 624 | pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, |
625 | PCI_DMA_FROMDEVICE); | 625 | PCI_DMA_FROMDEVICE); |
626 | pass_to_upper_layers(cp->rx_buf); | 626 | pass_to_upper_layers(cp->rx_buf); |
627 | make_and_setup_new_rx_buf(cp); | 627 | make_and_setup_new_rx_buf(cp); |
628 | } else { | 628 | } else { |
629 | /* Just sync the buffer and give it back | 629 | /* Just sync the buffer and give it back |
630 | * to the card. | 630 | * to the card. |
631 | */ | 631 | */ |
632 | pci_dma_sync_single_for_device(cp->pdev, | 632 | pci_dma_sync_single_for_device(cp->pdev, |
633 | cp->rx_dma, | 633 | cp->rx_dma, |
634 | cp->rx_len, | 634 | cp->rx_len, |
635 | PCI_DMA_FROMDEVICE); | 635 | PCI_DMA_FROMDEVICE); |
636 | give_rx_buf_to_card(cp); | 636 | give_rx_buf_to_card(cp); |
637 | } | 637 | } |
638 | } | 638 | } |
639 | } | 639 | } |
640 | 640 | ||
641 | Drivers converted fully to this interface should not use virt_to_bus any | 641 | Drivers converted fully to this interface should not use virt_to_bus any |
642 | longer, nor should they use bus_to_virt. Some drivers have to be changed a | 642 | longer, nor should they use bus_to_virt. Some drivers have to be changed a |
643 | little bit, because there is no longer an equivalent to bus_to_virt in the | 643 | little bit, because there is no longer an equivalent to bus_to_virt in the |
644 | dynamic DMA mapping scheme - you have to always store the DMA addresses | 644 | dynamic DMA mapping scheme - you have to always store the DMA addresses |
645 | returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single | 645 | returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single |
646 | calls (pci_map_sg stores them in the scatterlist itself if the platform | 646 | calls (pci_map_sg stores them in the scatterlist itself if the platform |
647 | supports dynamic DMA mapping in hardware) in your driver structures and/or | 647 | supports dynamic DMA mapping in hardware) in your driver structures and/or |
648 | in the card registers. | 648 | in the card registers. |
649 | 649 | ||
650 | All PCI drivers should be using these interfaces with no exceptions. | 650 | All PCI drivers should be using these interfaces with no exceptions. |
651 | It is planned to completely remove virt_to_bus() and bus_to_virt() as | 651 | It is planned to completely remove virt_to_bus() and bus_to_virt() as |
652 | they are entirely deprecated. Some ports already do not provide these | 652 | they are entirely deprecated. Some ports already do not provide these |
653 | as it is impossible to correctly support them. | 653 | as it is impossible to correctly support them. |
654 | 654 | ||
655 | 64-bit DMA and DAC cycle support | 655 | 64-bit DMA and DAC cycle support |
656 | 656 | ||
657 | Do you understand all of the text above? Great, then you already | 657 | Do you understand all of the text above? Great, then you already |
658 | know how to use 64-bit DMA addressing under Linux. Simply make | 658 | know how to use 64-bit DMA addressing under Linux. Simply make |
659 | the appropriate pci_set_dma_mask() calls based upon your cards | 659 | the appropriate pci_set_dma_mask() calls based upon your cards |
660 | capabilities, then use the mapping APIs above. | 660 | capabilities, then use the mapping APIs above. |
661 | 661 | ||
662 | It is that simple. | 662 | It is that simple. |
663 | 663 | ||
664 | Well, not for some odd devices. See the next section for information | 664 | Well, not for some odd devices. See the next section for information |
665 | about that. | 665 | about that. |
666 | 666 | ||
667 | DAC Addressing for Address Space Hungry Devices | 667 | DAC Addressing for Address Space Hungry Devices |
668 | 668 | ||
669 | There exists a class of devices which do not mesh well with the PCI | 669 | There exists a class of devices which do not mesh well with the PCI |
670 | DMA mapping API. By definition these "mappings" are a finite | 670 | DMA mapping API. By definition these "mappings" are a finite |
671 | resource. The number of total available mappings per bus is platform | 671 | resource. The number of total available mappings per bus is platform |
672 | specific, but there will always be a reasonable amount. | 672 | specific, but there will always be a reasonable amount. |
673 | 673 | ||
674 | What is "reasonable"? Reasonable means that networking and block I/O | 674 | What is "reasonable"? Reasonable means that networking and block I/O |
675 | devices need not worry about using too many mappings. | 675 | devices need not worry about using too many mappings. |
676 | 676 | ||
677 | As an example of a problematic device, consider compute cluster cards. | 677 | As an example of a problematic device, consider compute cluster cards. |
678 | They can potentially need to access gigabytes of memory at once via | 678 | They can potentially need to access gigabytes of memory at once via |
679 | DMA. Dynamic mappings are unsuitable for this kind of access pattern. | 679 | DMA. Dynamic mappings are unsuitable for this kind of access pattern. |
680 | 680 | ||
681 | To this end we've provided a small API by which a device driver | 681 | To this end we've provided a small API by which a device driver |
682 | may use DAC cycles to directly address all of physical memory. | 682 | may use DAC cycles to directly address all of physical memory. |
683 | Not all platforms support this, but most do. It is easy to determine | 683 | Not all platforms support this, but most do. It is easy to determine |
684 | whether the platform will work properly at probe time. | 684 | whether the platform will work properly at probe time. |
685 | 685 | ||
686 | First, understand that there may be a SEVERE performance penalty for | 686 | First, understand that there may be a SEVERE performance penalty for |
687 | using these interfaces on some platforms. Therefore, you MUST only | 687 | using these interfaces on some platforms. Therefore, you MUST only |
688 | use these interfaces if it is absolutely required. %99 of devices can | 688 | use these interfaces if it is absolutely required. %99 of devices can |
689 | use the normal APIs without any problems. | 689 | use the normal APIs without any problems. |
690 | 690 | ||
691 | Note that for streaming type mappings you must either use these | 691 | Note that for streaming type mappings you must either use these |
692 | interfaces, or the dynamic mapping interfaces above. You may not mix | 692 | interfaces, or the dynamic mapping interfaces above. You may not mix |
693 | usage of both for the same device. Such an act is illegal and is | 693 | usage of both for the same device. Such an act is illegal and is |
694 | guaranteed to put a banana in your tailpipe. | 694 | guaranteed to put a banana in your tailpipe. |
695 | 695 | ||
696 | However, consistent mappings may in fact be used in conjunction with | 696 | However, consistent mappings may in fact be used in conjunction with |
697 | these interfaces. Remember that, as defined, consistent mappings are | 697 | these interfaces. Remember that, as defined, consistent mappings are |
698 | always going to be SAC addressable. | 698 | always going to be SAC addressable. |
699 | 699 | ||
700 | The first thing your driver needs to do is query the PCI platform | 700 | The first thing your driver needs to do is query the PCI platform |
701 | layer if it is capable of handling your devices DAC addressing | 701 | layer if it is capable of handling your devices DAC addressing |
702 | capabilities: | 702 | capabilities: |
703 | 703 | ||
704 | int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); | 704 | int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); |
705 | 705 | ||
706 | You may not use the following interfaces if this routine fails. | 706 | You may not use the following interfaces if this routine fails. |
707 | 707 | ||
708 | Next, DMA addresses using this API are kept track of using the | 708 | Next, DMA addresses using this API are kept track of using the |
709 | dma64_addr_t type. It is guaranteed to be big enough to hold any | 709 | dma64_addr_t type. It is guaranteed to be big enough to hold any |
710 | DAC address the platform layer will give to you from the following | 710 | DAC address the platform layer will give to you from the following |
711 | routines. If you have consistent mappings as well, you still | 711 | routines. If you have consistent mappings as well, you still |
712 | use plain dma_addr_t to keep track of those. | 712 | use plain dma_addr_t to keep track of those. |
713 | 713 | ||
714 | All mappings obtained here will be direct. The mappings are not | 714 | All mappings obtained here will be direct. The mappings are not |
715 | translated, and this is the purpose of this dialect of the DMA API. | 715 | translated, and this is the purpose of this dialect of the DMA API. |
716 | 716 | ||
717 | All routines work with page/offset pairs. This is the _ONLY_ way to | 717 | All routines work with page/offset pairs. This is the _ONLY_ way to |
718 | portably refer to any piece of memory. If you have a cpu pointer | 718 | portably refer to any piece of memory. If you have a cpu pointer |
719 | (which may be validly DMA'd too) you may easily obtain the page | 719 | (which may be validly DMA'd too) you may easily obtain the page |
720 | and offset using something like this: | 720 | and offset using something like this: |
721 | 721 | ||
722 | struct page *page = virt_to_page(ptr); | 722 | struct page *page = virt_to_page(ptr); |
723 | unsigned long offset = offset_in_page(ptr); | 723 | unsigned long offset = offset_in_page(ptr); |
724 | 724 | ||
725 | Here are the interfaces: | 725 | Here are the interfaces: |
726 | 726 | ||
727 | dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, | 727 | dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, |
728 | struct page *page, | 728 | struct page *page, |
729 | unsigned long offset, | 729 | unsigned long offset, |
730 | int direction); | 730 | int direction); |
731 | 731 | ||
732 | The DAC address for the tuple PAGE/OFFSET are returned. The direction | 732 | The DAC address for the tuple PAGE/OFFSET are returned. The direction |
733 | argument is the same as for pci_{map,unmap}_single(). The same rules | 733 | argument is the same as for pci_{map,unmap}_single(). The same rules |
734 | for cpu/device access apply here as for the streaming mapping | 734 | for cpu/device access apply here as for the streaming mapping |
735 | interfaces. To reiterate: | 735 | interfaces. To reiterate: |
736 | 736 | ||
737 | The cpu may touch the buffer before pci_dac_page_to_dma. | 737 | The cpu may touch the buffer before pci_dac_page_to_dma. |
738 | The device may touch the buffer after pci_dac_page_to_dma | 738 | The device may touch the buffer after pci_dac_page_to_dma |
739 | is made, but the cpu may NOT. | 739 | is made, but the cpu may NOT. |
740 | 740 | ||
741 | When the DMA transfer is complete, invoke: | 741 | When the DMA transfer is complete, invoke: |
742 | 742 | ||
743 | void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, | 743 | void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, |
744 | dma64_addr_t dma_addr, | 744 | dma64_addr_t dma_addr, |
745 | size_t len, int direction); | 745 | size_t len, int direction); |
746 | 746 | ||
747 | This must be done before the CPU looks at the buffer again. | 747 | This must be done before the CPU looks at the buffer again. |
748 | This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). | 748 | This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). |
749 | 749 | ||
750 | And likewise, if you wish to let the device get back at the buffer after | 750 | And likewise, if you wish to let the device get back at the buffer after |
751 | the cpu has read/written it, invoke: | 751 | the cpu has read/written it, invoke: |
752 | 752 | ||
753 | void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, | 753 | void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, |
754 | dma64_addr_t dma_addr, | 754 | dma64_addr_t dma_addr, |
755 | size_t len, int direction); | 755 | size_t len, int direction); |
756 | 756 | ||
757 | before letting the device access the DMA area again. | 757 | before letting the device access the DMA area again. |
758 | 758 | ||
759 | If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t | 759 | If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t |
760 | the following interfaces are provided: | 760 | the following interfaces are provided: |
761 | 761 | ||
762 | struct page *pci_dac_dma_to_page(struct pci_dev *pdev, | 762 | struct page *pci_dac_dma_to_page(struct pci_dev *pdev, |
763 | dma64_addr_t dma_addr); | 763 | dma64_addr_t dma_addr); |
764 | unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, | 764 | unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, |
765 | dma64_addr_t dma_addr); | 765 | dma64_addr_t dma_addr); |
766 | 766 | ||
767 | This is possible with the DAC interfaces purely because they are | 767 | This is possible with the DAC interfaces purely because they are |
768 | not translated in any way. | 768 | not translated in any way. |
769 | 769 | ||
770 | Optimizing Unmap State Space Consumption | 770 | Optimizing Unmap State Space Consumption |
771 | 771 | ||
772 | On many platforms, pci_unmap_{single,page}() is simply a nop. | 772 | On many platforms, pci_unmap_{single,page}() is simply a nop. |
773 | Therefore, keeping track of the mapping address and length is a waste | 773 | Therefore, keeping track of the mapping address and length is a waste |
774 | of space. Instead of filling your drivers up with ifdefs and the like | 774 | of space. Instead of filling your drivers up with ifdefs and the like |
775 | to "work around" this (which would defeat the whole purpose of a | 775 | to "work around" this (which would defeat the whole purpose of a |
776 | portable API) the following facilities are provided. | 776 | portable API) the following facilities are provided. |
777 | 777 | ||
778 | Actually, instead of describing the macros one by one, we'll | 778 | Actually, instead of describing the macros one by one, we'll |
779 | transform some example code. | 779 | transform some example code. |
780 | 780 | ||
781 | 1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. | 781 | 1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. |
782 | Example, before: | 782 | Example, before: |
783 | 783 | ||
784 | struct ring_state { | 784 | struct ring_state { |
785 | struct sk_buff *skb; | 785 | struct sk_buff *skb; |
786 | dma_addr_t mapping; | 786 | dma_addr_t mapping; |
787 | __u32 len; | 787 | __u32 len; |
788 | }; | 788 | }; |
789 | 789 | ||
790 | after: | 790 | after: |
791 | 791 | ||
792 | struct ring_state { | 792 | struct ring_state { |
793 | struct sk_buff *skb; | 793 | struct sk_buff *skb; |
794 | DECLARE_PCI_UNMAP_ADDR(mapping) | 794 | DECLARE_PCI_UNMAP_ADDR(mapping) |
795 | DECLARE_PCI_UNMAP_LEN(len) | 795 | DECLARE_PCI_UNMAP_LEN(len) |
796 | }; | 796 | }; |
797 | 797 | ||
798 | NOTE: DO NOT put a semicolon at the end of the DECLARE_*() | 798 | NOTE: DO NOT put a semicolon at the end of the DECLARE_*() |
799 | macro. | 799 | macro. |
800 | 800 | ||
801 | 2) Use pci_unmap_{addr,len}_set to set these values. | 801 | 2) Use pci_unmap_{addr,len}_set to set these values. |
802 | Example, before: | 802 | Example, before: |
803 | 803 | ||
804 | ringp->mapping = FOO; | 804 | ringp->mapping = FOO; |
805 | ringp->len = BAR; | 805 | ringp->len = BAR; |
806 | 806 | ||
807 | after: | 807 | after: |
808 | 808 | ||
809 | pci_unmap_addr_set(ringp, mapping, FOO); | 809 | pci_unmap_addr_set(ringp, mapping, FOO); |
810 | pci_unmap_len_set(ringp, len, BAR); | 810 | pci_unmap_len_set(ringp, len, BAR); |
811 | 811 | ||
812 | 3) Use pci_unmap_{addr,len} to access these values. | 812 | 3) Use pci_unmap_{addr,len} to access these values. |
813 | Example, before: | 813 | Example, before: |
814 | 814 | ||
815 | pci_unmap_single(pdev, ringp->mapping, ringp->len, | 815 | pci_unmap_single(pdev, ringp->mapping, ringp->len, |
816 | PCI_DMA_FROMDEVICE); | 816 | PCI_DMA_FROMDEVICE); |
817 | 817 | ||
818 | after: | 818 | after: |
819 | 819 | ||
820 | pci_unmap_single(pdev, | 820 | pci_unmap_single(pdev, |
821 | pci_unmap_addr(ringp, mapping), | 821 | pci_unmap_addr(ringp, mapping), |
822 | pci_unmap_len(ringp, len), | 822 | pci_unmap_len(ringp, len), |
823 | PCI_DMA_FROMDEVICE); | 823 | PCI_DMA_FROMDEVICE); |
824 | 824 | ||
825 | It really should be self-explanatory. We treat the ADDR and LEN | 825 | It really should be self-explanatory. We treat the ADDR and LEN |
826 | separately, because it is possible for an implementation to only | 826 | separately, because it is possible for an implementation to only |
827 | need the address in order to perform the unmap operation. | 827 | need the address in order to perform the unmap operation. |
828 | 828 | ||
829 | Platform Issues | 829 | Platform Issues |
830 | 830 | ||
831 | If you are just writing drivers for Linux and do not maintain | 831 | If you are just writing drivers for Linux and do not maintain |
832 | an architecture port for the kernel, you can safely skip down | 832 | an architecture port for the kernel, you can safely skip down |
833 | to "Closing". | 833 | to "Closing". |
834 | 834 | ||
835 | 1) Struct scatterlist requirements. | 835 | 1) Struct scatterlist requirements. |
836 | 836 | ||
837 | Struct scatterlist must contain, at a minimum, the following | 837 | Struct scatterlist must contain, at a minimum, the following |
838 | members: | 838 | members: |
839 | 839 | ||
840 | struct page *page; | 840 | struct page *page; |
841 | unsigned int offset; | 841 | unsigned int offset; |
842 | unsigned int length; | 842 | unsigned int length; |
843 | 843 | ||
844 | The base address is specified by a "page+offset" pair. | 844 | The base address is specified by a "page+offset" pair. |
845 | 845 | ||
846 | Previous versions of struct scatterlist contained a "void *address" | 846 | Previous versions of struct scatterlist contained a "void *address" |
847 | field that was sometimes used instead of page+offset. As of Linux | 847 | field that was sometimes used instead of page+offset. As of Linux |
848 | 2.5., page+offset is always used, and the "address" field has been | 848 | 2.5., page+offset is always used, and the "address" field has been |
849 | deleted. | 849 | deleted. |
850 | 850 | ||
851 | 2) More to come... | 851 | 2) More to come... |
852 | 852 | ||
853 | Handling Errors | 853 | Handling Errors |
854 | 854 | ||
855 | DMA address space is limited on some architectures and an allocation | 855 | DMA address space is limited on some architectures and an allocation |
856 | failure can be determined by: | 856 | failure can be determined by: |
857 | 857 | ||
858 | - checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 | 858 | - checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 |
859 | 859 | ||
860 | - checking the returned dma_addr_t of pci_map_single and pci_map_page | 860 | - checking the returned dma_addr_t of pci_map_single and pci_map_page |
861 | by using pci_dma_mapping_error(): | 861 | by using pci_dma_mapping_error(): |
862 | 862 | ||
863 | dma_addr_t dma_handle; | 863 | dma_addr_t dma_handle; |
864 | 864 | ||
865 | dma_handle = pci_map_single(dev, addr, size, direction); | 865 | dma_handle = pci_map_single(dev, addr, size, direction); |
866 | if (pci_dma_mapping_error(dma_handle)) { | 866 | if (pci_dma_mapping_error(dma_handle)) { |
867 | /* | 867 | /* |
868 | * reduce current DMA mapping usage, | 868 | * reduce current DMA mapping usage, |
869 | * delay and try again later or | 869 | * delay and try again later or |
870 | * reset driver. | 870 | * reset driver. |
871 | */ | 871 | */ |
872 | } | 872 | } |
873 | 873 | ||
874 | Closing | 874 | Closing |
875 | 875 | ||
876 | This document, and the API itself, would not be in it's current | 876 | This document, and the API itself, would not be in it's current |
877 | form without the feedback and suggestions from numerous individuals. | 877 | form without the feedback and suggestions from numerous individuals. |
878 | We would like to specifically mention, in no particular order, the | 878 | We would like to specifically mention, in no particular order, the |
879 | following people: | 879 | following people: |
880 | 880 | ||
881 | Russell King <rmk@arm.linux.org.uk> | 881 | Russell King <rmk@arm.linux.org.uk> |
882 | Leo Dagum <dagum@barrel.engr.sgi.com> | 882 | Leo Dagum <dagum@barrel.engr.sgi.com> |
883 | Ralf Baechle <ralf@oss.sgi.com> | 883 | Ralf Baechle <ralf@oss.sgi.com> |
884 | Grant Grundler <grundler@cup.hp.com> | 884 | Grant Grundler <grundler@cup.hp.com> |
885 | Jay Estabrook <Jay.Estabrook@compaq.com> | 885 | Jay Estabrook <Jay.Estabrook@compaq.com> |
886 | Thomas Sailer <sailer@ife.ee.ethz.ch> | 886 | Thomas Sailer <sailer@ife.ee.ethz.ch> |
887 | Andrea Arcangeli <andrea@suse.de> | 887 | Andrea Arcangeli <andrea@suse.de> |
888 | Jens Axboe <axboe@suse.de> | 888 | Jens Axboe <axboe@suse.de> |
889 | David Mosberger-Tang <davidm@hpl.hp.com> | 889 | David Mosberger-Tang <davidm@hpl.hp.com> |
890 | 890 |
Documentation/DocBook/libata.tmpl
1 | <?xml version="1.0" encoding="UTF-8"?> | 1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | 2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" |
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | 3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> |
4 | 4 | ||
5 | <book id="libataDevGuide"> | 5 | <book id="libataDevGuide"> |
6 | <bookinfo> | 6 | <bookinfo> |
7 | <title>libATA Developer's Guide</title> | 7 | <title>libATA Developer's Guide</title> |
8 | 8 | ||
9 | <authorgroup> | 9 | <authorgroup> |
10 | <author> | 10 | <author> |
11 | <firstname>Jeff</firstname> | 11 | <firstname>Jeff</firstname> |
12 | <surname>Garzik</surname> | 12 | <surname>Garzik</surname> |
13 | </author> | 13 | </author> |
14 | </authorgroup> | 14 | </authorgroup> |
15 | 15 | ||
16 | <copyright> | 16 | <copyright> |
17 | <year>2003-2005</year> | 17 | <year>2003-2005</year> |
18 | <holder>Jeff Garzik</holder> | 18 | <holder>Jeff Garzik</holder> |
19 | </copyright> | 19 | </copyright> |
20 | 20 | ||
21 | <legalnotice> | 21 | <legalnotice> |
22 | <para> | 22 | <para> |
23 | The contents of this file are subject to the Open | 23 | The contents of this file are subject to the Open |
24 | Software License version 1.1 that can be found at | 24 | Software License version 1.1 that can be found at |
25 | <ulink url="http://www.opensource.org/licenses/osl-1.1.txt">http://www.opensource.org/licenses/osl-1.1.txt</ulink> and is included herein | 25 | <ulink url="http://www.opensource.org/licenses/osl-1.1.txt">http://www.opensource.org/licenses/osl-1.1.txt</ulink> and is included herein |
26 | by reference. | 26 | by reference. |
27 | </para> | 27 | </para> |
28 | 28 | ||
29 | <para> | 29 | <para> |
30 | Alternatively, the contents of this file may be used under the terms | 30 | Alternatively, the contents of this file may be used under the terms |
31 | of the GNU General Public License version 2 (the "GPL") as distributed | 31 | of the GNU General Public License version 2 (the "GPL") as distributed |
32 | in the kernel source COPYING file, in which case the provisions of | 32 | in the kernel source COPYING file, in which case the provisions of |
33 | the GPL are applicable instead of the above. If you wish to allow | 33 | the GPL are applicable instead of the above. If you wish to allow |
34 | the use of your version of this file only under the terms of the | 34 | the use of your version of this file only under the terms of the |
35 | GPL and not to allow others to use your version of this file under | 35 | GPL and not to allow others to use your version of this file under |
36 | the OSL, indicate your decision by deleting the provisions above and | 36 | the OSL, indicate your decision by deleting the provisions above and |
37 | replace them with the notice and other provisions required by the GPL. | 37 | replace them with the notice and other provisions required by the GPL. |
38 | If you do not delete the provisions above, a recipient may use your | 38 | If you do not delete the provisions above, a recipient may use your |
39 | version of this file under either the OSL or the GPL. | 39 | version of this file under either the OSL or the GPL. |
40 | </para> | 40 | </para> |
41 | 41 | ||
42 | </legalnotice> | 42 | </legalnotice> |
43 | </bookinfo> | 43 | </bookinfo> |
44 | 44 | ||
45 | <toc></toc> | 45 | <toc></toc> |
46 | 46 | ||
47 | <chapter id="libataIntroduction"> | 47 | <chapter id="libataIntroduction"> |
48 | <title>Introduction</title> | 48 | <title>Introduction</title> |
49 | <para> | 49 | <para> |
50 | libATA is a library used inside the Linux kernel to support ATA host | 50 | libATA is a library used inside the Linux kernel to support ATA host |
51 | controllers and devices. libATA provides an ATA driver API, class | 51 | controllers and devices. libATA provides an ATA driver API, class |
52 | transports for ATA and ATAPI devices, and SCSI<->ATA translation | 52 | transports for ATA and ATAPI devices, and SCSI<->ATA translation |
53 | for ATA devices according to the T10 SAT specification. | 53 | for ATA devices according to the T10 SAT specification. |
54 | </para> | 54 | </para> |
55 | <para> | 55 | <para> |
56 | This Guide documents the libATA driver API, library functions, library | 56 | This Guide documents the libATA driver API, library functions, library |
57 | internals, and a couple sample ATA low-level drivers. | 57 | internals, and a couple sample ATA low-level drivers. |
58 | </para> | 58 | </para> |
59 | </chapter> | 59 | </chapter> |
60 | 60 | ||
61 | <chapter id="libataDriverApi"> | 61 | <chapter id="libataDriverApi"> |
62 | <title>libata Driver API</title> | 62 | <title>libata Driver API</title> |
63 | <para> | 63 | <para> |
64 | struct ata_port_operations is defined for every low-level libata | 64 | struct ata_port_operations is defined for every low-level libata |
65 | hardware driver, and it controls how the low-level driver | 65 | hardware driver, and it controls how the low-level driver |
66 | interfaces with the ATA and SCSI layers. | 66 | interfaces with the ATA and SCSI layers. |
67 | </para> | 67 | </para> |
68 | <para> | 68 | <para> |
69 | FIS-based drivers will hook into the system with ->qc_prep() and | 69 | FIS-based drivers will hook into the system with ->qc_prep() and |
70 | ->qc_issue() high-level hooks. Hardware which behaves in a manner | 70 | ->qc_issue() high-level hooks. Hardware which behaves in a manner |
71 | similar to PCI IDE hardware may utilize several generic helpers, | 71 | similar to PCI IDE hardware may utilize several generic helpers, |
72 | defining at a bare minimum the bus I/O addresses of the ATA shadow | 72 | defining at a bare minimum the bus I/O addresses of the ATA shadow |
73 | register blocks. | 73 | register blocks. |
74 | </para> | 74 | </para> |
75 | <sect1> | 75 | <sect1> |
76 | <title>struct ata_port_operations</title> | 76 | <title>struct ata_port_operations</title> |
77 | 77 | ||
78 | <sect2><title>Disable ATA port</title> | 78 | <sect2><title>Disable ATA port</title> |
79 | <programlisting> | 79 | <programlisting> |
80 | void (*port_disable) (struct ata_port *); | 80 | void (*port_disable) (struct ata_port *); |
81 | </programlisting> | 81 | </programlisting> |
82 | 82 | ||
83 | <para> | 83 | <para> |
84 | Called from ata_bus_probe() and ata_bus_reset() error paths, | 84 | Called from ata_bus_probe() and ata_bus_reset() error paths, |
85 | as well as when unregistering from the SCSI module (rmmod, hot | 85 | as well as when unregistering from the SCSI module (rmmod, hot |
86 | unplug). | 86 | unplug). |
87 | This function should do whatever needs to be done to take the | 87 | This function should do whatever needs to be done to take the |
88 | port out of use. In most cases, ata_port_disable() can be used | 88 | port out of use. In most cases, ata_port_disable() can be used |
89 | as this hook. | 89 | as this hook. |
90 | </para> | 90 | </para> |
91 | <para> | 91 | <para> |
92 | Called from ata_bus_probe() on a failed probe. | 92 | Called from ata_bus_probe() on a failed probe. |
93 | Called from ata_bus_reset() on a failed bus reset. | 93 | Called from ata_bus_reset() on a failed bus reset. |
94 | Called from ata_scsi_release(). | 94 | Called from ata_scsi_release(). |
95 | </para> | 95 | </para> |
96 | 96 | ||
97 | </sect2> | 97 | </sect2> |
98 | 98 | ||
99 | <sect2><title>Post-IDENTIFY device configuration</title> | 99 | <sect2><title>Post-IDENTIFY device configuration</title> |
100 | <programlisting> | 100 | <programlisting> |
101 | void (*dev_config) (struct ata_port *, struct ata_device *); | 101 | void (*dev_config) (struct ata_port *, struct ata_device *); |
102 | </programlisting> | 102 | </programlisting> |
103 | 103 | ||
104 | <para> | 104 | <para> |
105 | Called after IDENTIFY [PACKET] DEVICE is issued to each device | 105 | Called after IDENTIFY [PACKET] DEVICE is issued to each device |
106 | found. Typically used to apply device-specific fixups prior to | 106 | found. Typically used to apply device-specific fixups prior to |
107 | issue of SET FEATURES - XFER MODE, and prior to operation. | 107 | issue of SET FEATURES - XFER MODE, and prior to operation. |
108 | </para> | 108 | </para> |
109 | <para> | 109 | <para> |
110 | Called by ata_device_add() after ata_dev_identify() determines | 110 | Called by ata_device_add() after ata_dev_identify() determines |
111 | a device is present. | 111 | a device is present. |
112 | </para> | 112 | </para> |
113 | <para> | 113 | <para> |
114 | This entry may be specified as NULL in ata_port_operations. | 114 | This entry may be specified as NULL in ata_port_operations. |
115 | </para> | 115 | </para> |
116 | 116 | ||
117 | </sect2> | 117 | </sect2> |
118 | 118 | ||
119 | <sect2><title>Set PIO/DMA mode</title> | 119 | <sect2><title>Set PIO/DMA mode</title> |
120 | <programlisting> | 120 | <programlisting> |
121 | void (*set_piomode) (struct ata_port *, struct ata_device *); | 121 | void (*set_piomode) (struct ata_port *, struct ata_device *); |
122 | void (*set_dmamode) (struct ata_port *, struct ata_device *); | 122 | void (*set_dmamode) (struct ata_port *, struct ata_device *); |
123 | void (*post_set_mode) (struct ata_port *); | 123 | void (*post_set_mode) (struct ata_port *); |
124 | unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); | 124 | unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); |
125 | </programlisting> | 125 | </programlisting> |
126 | 126 | ||
127 | <para> | 127 | <para> |
128 | Hooks called prior to the issue of SET FEATURES - XFER MODE | 128 | Hooks called prior to the issue of SET FEATURES - XFER MODE |
129 | command. The optional ->mode_filter() hook is called when libata | 129 | command. The optional ->mode_filter() hook is called when libata |
130 | has built a mask of the possible modes. This is passed to the | 130 | has built a mask of the possible modes. This is passed to the |
131 | ->mode_filter() function which should return a mask of valid modes | 131 | ->mode_filter() function which should return a mask of valid modes |
132 | after filtering those unsuitable due to hardware limits. It is not | 132 | after filtering those unsuitable due to hardware limits. It is not |
133 | valid to use this interface to add modes. | 133 | valid to use this interface to add modes. |
134 | </para> | 134 | </para> |
135 | <para> | 135 | <para> |
136 | dev->pio_mode and dev->dma_mode are guaranteed to be valid when | 136 | dev->pio_mode and dev->dma_mode are guaranteed to be valid when |
137 | ->set_piomode() and when ->set_dmamode() is called. The timings for | 137 | ->set_piomode() and when ->set_dmamode() is called. The timings for |
138 | any other drive sharing the cable will also be valid at this point. | 138 | any other drive sharing the cable will also be valid at this point. |
139 | That is the library records the decisions for the modes of each | 139 | That is the library records the decisions for the modes of each |
140 | drive on a channel before it attempts to set any of them. | 140 | drive on a channel before it attempts to set any of them. |
141 | </para> | 141 | </para> |
142 | <para> | 142 | <para> |
143 | ->post_set_mode() is | 143 | ->post_set_mode() is |
144 | called unconditionally, after the SET FEATURES - XFER MODE | 144 | called unconditionally, after the SET FEATURES - XFER MODE |
145 | command completes successfully. | 145 | command completes successfully. |
146 | </para> | 146 | </para> |
147 | 147 | ||
148 | <para> | 148 | <para> |
149 | ->set_piomode() is always called (if present), but | 149 | ->set_piomode() is always called (if present), but |
150 | ->set_dma_mode() is only called if DMA is possible. | 150 | ->set_dma_mode() is only called if DMA is possible. |
151 | </para> | 151 | </para> |
152 | 152 | ||
153 | </sect2> | 153 | </sect2> |
154 | 154 | ||
155 | <sect2><title>Taskfile read/write</title> | 155 | <sect2><title>Taskfile read/write</title> |
156 | <programlisting> | 156 | <programlisting> |
157 | void (*tf_load) (struct ata_port *ap, struct ata_taskfile *tf); | 157 | void (*tf_load) (struct ata_port *ap, struct ata_taskfile *tf); |
158 | void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); | 158 | void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); |
159 | </programlisting> | 159 | </programlisting> |
160 | 160 | ||
161 | <para> | 161 | <para> |
162 | ->tf_load() is called to load the given taskfile into hardware | 162 | ->tf_load() is called to load the given taskfile into hardware |
163 | registers / DMA buffers. ->tf_read() is called to read the | 163 | registers / DMA buffers. ->tf_read() is called to read the |
164 | hardware registers / DMA buffers, to obtain the current set of | 164 | hardware registers / DMA buffers, to obtain the current set of |
165 | taskfile register values. | 165 | taskfile register values. |
166 | Most drivers for taskfile-based hardware (PIO or MMIO) use | 166 | Most drivers for taskfile-based hardware (PIO or MMIO) use |
167 | ata_tf_load() and ata_tf_read() for these hooks. | 167 | ata_tf_load() and ata_tf_read() for these hooks. |
168 | </para> | 168 | </para> |
169 | 169 | ||
170 | </sect2> | 170 | </sect2> |
171 | 171 | ||
172 | <sect2><title>PIO data read/write</title> | 172 | <sect2><title>PIO data read/write</title> |
173 | <programlisting> | 173 | <programlisting> |
174 | void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); | 174 | void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); |
175 | </programlisting> | 175 | </programlisting> |
176 | 176 | ||
177 | <para> | 177 | <para> |
178 | All bmdma-style drivers must implement this hook. This is the low-level | 178 | All bmdma-style drivers must implement this hook. This is the low-level |
179 | operation that actually copies the data bytes during a PIO data | 179 | operation that actually copies the data bytes during a PIO data |
180 | transfer. | 180 | transfer. |
181 | Typically the driver | 181 | Typically the driver |
182 | will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or | 182 | will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or |
183 | ata_mmio_data_xfer(). | 183 | ata_mmio_data_xfer(). |
184 | </para> | 184 | </para> |
185 | 185 | ||
186 | </sect2> | 186 | </sect2> |
187 | 187 | ||
188 | <sect2><title>ATA command execute</title> | 188 | <sect2><title>ATA command execute</title> |
189 | <programlisting> | 189 | <programlisting> |
190 | void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); | 190 | void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); |
191 | </programlisting> | 191 | </programlisting> |
192 | 192 | ||
193 | <para> | 193 | <para> |
194 | causes an ATA command, previously loaded with | 194 | causes an ATA command, previously loaded with |
195 | ->tf_load(), to be initiated in hardware. | 195 | ->tf_load(), to be initiated in hardware. |
196 | Most drivers for taskfile-based hardware use ata_exec_command() | 196 | Most drivers for taskfile-based hardware use ata_exec_command() |
197 | for this hook. | 197 | for this hook. |
198 | </para> | 198 | </para> |
199 | 199 | ||
200 | </sect2> | 200 | </sect2> |
201 | 201 | ||
202 | <sect2><title>Per-cmd ATAPI DMA capabilities filter</title> | 202 | <sect2><title>Per-cmd ATAPI DMA capabilities filter</title> |
203 | <programlisting> | 203 | <programlisting> |
204 | int (*check_atapi_dma) (struct ata_queued_cmd *qc); | 204 | int (*check_atapi_dma) (struct ata_queued_cmd *qc); |
205 | </programlisting> | 205 | </programlisting> |
206 | 206 | ||
207 | <para> | 207 | <para> |
208 | Allow low-level driver to filter ATA PACKET commands, returning a status | 208 | Allow low-level driver to filter ATA PACKET commands, returning a status |
209 | indicating whether or not it is OK to use DMA for the supplied PACKET | 209 | indicating whether or not it is OK to use DMA for the supplied PACKET |
210 | command. | 210 | command. |
211 | </para> | 211 | </para> |
212 | <para> | 212 | <para> |
213 | This hook may be specified as NULL, in which case libata will | 213 | This hook may be specified as NULL, in which case libata will |
214 | assume that atapi dma can be supported. | 214 | assume that atapi dma can be supported. |
215 | </para> | 215 | </para> |
216 | 216 | ||
217 | </sect2> | 217 | </sect2> |
218 | 218 | ||
219 | <sect2><title>Read specific ATA shadow registers</title> | 219 | <sect2><title>Read specific ATA shadow registers</title> |
220 | <programlisting> | 220 | <programlisting> |
221 | u8 (*check_status)(struct ata_port *ap); | 221 | u8 (*check_status)(struct ata_port *ap); |
222 | u8 (*check_altstatus)(struct ata_port *ap); | 222 | u8 (*check_altstatus)(struct ata_port *ap); |
223 | </programlisting> | 223 | </programlisting> |
224 | 224 | ||
225 | <para> | 225 | <para> |
226 | Reads the Status/AltStatus ATA shadow register from | 226 | Reads the Status/AltStatus ATA shadow register from |
227 | hardware. On some hardware, reading the Status register has | 227 | hardware. On some hardware, reading the Status register has |
228 | the side effect of clearing the interrupt condition. | 228 | the side effect of clearing the interrupt condition. |
229 | Most drivers for taskfile-based hardware use | 229 | Most drivers for taskfile-based hardware use |
230 | ata_check_status() for this hook. | 230 | ata_check_status() for this hook. |
231 | </para> | 231 | </para> |
232 | <para> | 232 | <para> |
233 | Note that because this is called from ata_device_add(), at | 233 | Note that because this is called from ata_device_add(), at |
234 | least a dummy function that clears device interrupts must be | 234 | least a dummy function that clears device interrupts must be |
235 | provided for all drivers, even if the controller doesn't | 235 | provided for all drivers, even if the controller doesn't |
236 | actually have a taskfile status register. | 236 | actually have a taskfile status register. |
237 | </para> | 237 | </para> |
238 | 238 | ||
239 | </sect2> | 239 | </sect2> |
240 | 240 | ||
241 | <sect2><title>Select ATA device on bus</title> | 241 | <sect2><title>Select ATA device on bus</title> |
242 | <programlisting> | 242 | <programlisting> |
243 | void (*dev_select)(struct ata_port *ap, unsigned int device); | 243 | void (*dev_select)(struct ata_port *ap, unsigned int device); |
244 | </programlisting> | 244 | </programlisting> |
245 | 245 | ||
246 | <para> | 246 | <para> |
247 | Issues the low-level hardware command(s) that causes one of N | 247 | Issues the low-level hardware command(s) that causes one of N |
248 | hardware devices to be considered 'selected' (active and | 248 | hardware devices to be considered 'selected' (active and |
249 | available for use) on the ATA bus. This generally has no | 249 | available for use) on the ATA bus. This generally has no |
250 | meaning on FIS-based devices. | 250 | meaning on FIS-based devices. |
251 | </para> | 251 | </para> |
252 | <para> | 252 | <para> |
253 | Most drivers for taskfile-based hardware use | 253 | Most drivers for taskfile-based hardware use |
254 | ata_std_dev_select() for this hook. Controllers which do not | 254 | ata_std_dev_select() for this hook. Controllers which do not |
255 | support second drives on a port (such as SATA contollers) will | 255 | support second drives on a port (such as SATA contollers) will |
256 | use ata_noop_dev_select(). | 256 | use ata_noop_dev_select(). |
257 | </para> | 257 | </para> |
258 | 258 | ||
259 | </sect2> | 259 | </sect2> |
260 | 260 | ||
261 | <sect2><title>Private tuning method</title> | 261 | <sect2><title>Private tuning method</title> |
262 | <programlisting> | 262 | <programlisting> |
263 | void (*set_mode) (struct ata_port *ap); | 263 | void (*set_mode) (struct ata_port *ap); |
264 | </programlisting> | 264 | </programlisting> |
265 | 265 | ||
266 | <para> | 266 | <para> |
267 | By default libata performs drive and controller tuning in | 267 | By default libata performs drive and controller tuning in |
268 | accordance with the ATA timing rules and also applies blacklists | 268 | accordance with the ATA timing rules and also applies blacklists |
269 | and cable limits. Some controllers need special handling and have | 269 | and cable limits. Some controllers need special handling and have |
270 | custom tuning rules, typically raid controllers that use ATA | 270 | custom tuning rules, typically raid controllers that use ATA |
271 | commands but do not actually do drive timing. | 271 | commands but do not actually do drive timing. |
272 | </para> | 272 | </para> |
273 | 273 | ||
274 | <warning> | 274 | <warning> |
275 | <para> | 275 | <para> |
276 | This hook should not be used to replace the standard controller | 276 | This hook should not be used to replace the standard controller |
277 | tuning logic when a controller has quirks. Replacing the default | 277 | tuning logic when a controller has quirks. Replacing the default |
278 | tuning logic in that case would bypass handling for drive and | 278 | tuning logic in that case would bypass handling for drive and |
279 | bridge quirks that may be important to data reliability. If a | 279 | bridge quirks that may be important to data reliability. If a |
280 | controller needs to filter the mode selection it should use the | 280 | controller needs to filter the mode selection it should use the |
281 | mode_filter hook instead. | 281 | mode_filter hook instead. |
282 | </para> | 282 | </para> |
283 | </warning> | 283 | </warning> |
284 | 284 | ||
285 | </sect2> | 285 | </sect2> |
286 | 286 | ||
287 | <sect2><title>Control PCI IDE BMDMA engine</title> | 287 | <sect2><title>Control PCI IDE BMDMA engine</title> |
288 | <programlisting> | 288 | <programlisting> |
289 | void (*bmdma_setup) (struct ata_queued_cmd *qc); | 289 | void (*bmdma_setup) (struct ata_queued_cmd *qc); |
290 | void (*bmdma_start) (struct ata_queued_cmd *qc); | 290 | void (*bmdma_start) (struct ata_queued_cmd *qc); |
291 | void (*bmdma_stop) (struct ata_port *ap); | 291 | void (*bmdma_stop) (struct ata_port *ap); |
292 | u8 (*bmdma_status) (struct ata_port *ap); | 292 | u8 (*bmdma_status) (struct ata_port *ap); |
293 | </programlisting> | 293 | </programlisting> |
294 | 294 | ||
295 | <para> | 295 | <para> |
296 | When setting up an IDE BMDMA transaction, these hooks arm | 296 | When setting up an IDE BMDMA transaction, these hooks arm |
297 | (->bmdma_setup), fire (->bmdma_start), and halt (->bmdma_stop) | 297 | (->bmdma_setup), fire (->bmdma_start), and halt (->bmdma_stop) |
298 | the hardware's DMA engine. ->bmdma_status is used to read the standard | 298 | the hardware's DMA engine. ->bmdma_status is used to read the standard |
299 | PCI IDE DMA Status register. | 299 | PCI IDE DMA Status register. |
300 | </para> | 300 | </para> |
301 | 301 | ||
302 | <para> | 302 | <para> |
303 | These hooks are typically either no-ops, or simply not implemented, in | 303 | These hooks are typically either no-ops, or simply not implemented, in |
304 | FIS-based drivers. | 304 | FIS-based drivers. |
305 | </para> | 305 | </para> |
306 | <para> | 306 | <para> |
307 | Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup() | 307 | Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup() |
308 | hook. ata_bmdma_setup() will write the pointer to the PRD table to | 308 | hook. ata_bmdma_setup() will write the pointer to the PRD table to |
309 | the IDE PRD Table Address register, enable DMA in the DMA Command | 309 | the IDE PRD Table Address register, enable DMA in the DMA Command |
310 | register, and call exec_command() to begin the transfer. | 310 | register, and call exec_command() to begin the transfer. |
311 | </para> | 311 | </para> |
312 | <para> | 312 | <para> |
313 | Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start() | 313 | Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start() |
314 | hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA | 314 | hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA |
315 | Command register. | 315 | Command register. |
316 | </para> | 316 | </para> |
317 | <para> | 317 | <para> |
318 | Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop() | 318 | Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop() |
319 | hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA | 319 | hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA |
320 | command register. | 320 | command register. |
321 | </para> | 321 | </para> |
322 | <para> | 322 | <para> |
323 | Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook. | 323 | Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook. |
324 | </para> | 324 | </para> |
325 | 325 | ||
326 | </sect2> | 326 | </sect2> |
327 | 327 | ||
328 | <sect2><title>High-level taskfile hooks</title> | 328 | <sect2><title>High-level taskfile hooks</title> |
329 | <programlisting> | 329 | <programlisting> |
330 | void (*qc_prep) (struct ata_queued_cmd *qc); | 330 | void (*qc_prep) (struct ata_queued_cmd *qc); |
331 | int (*qc_issue) (struct ata_queued_cmd *qc); | 331 | int (*qc_issue) (struct ata_queued_cmd *qc); |
332 | </programlisting> | 332 | </programlisting> |
333 | 333 | ||
334 | <para> | 334 | <para> |
335 | Higher-level hooks, these two hooks can potentially supercede | 335 | Higher-level hooks, these two hooks can potentially supercede |
336 | several of the above taskfile/DMA engine hooks. ->qc_prep is | 336 | several of the above taskfile/DMA engine hooks. ->qc_prep is |
337 | called after the buffers have been DMA-mapped, and is typically | 337 | called after the buffers have been DMA-mapped, and is typically |
338 | used to populate the hardware's DMA scatter-gather table. | 338 | used to populate the hardware's DMA scatter-gather table. |
339 | Most drivers use the standard ata_qc_prep() helper function, but | 339 | Most drivers use the standard ata_qc_prep() helper function, but |
340 | more advanced drivers roll their own. | 340 | more advanced drivers roll their own. |
341 | </para> | 341 | </para> |
342 | <para> | 342 | <para> |
343 | ->qc_issue is used to make a command active, once the hardware | 343 | ->qc_issue is used to make a command active, once the hardware |
344 | and S/G tables have been prepared. IDE BMDMA drivers use the | 344 | and S/G tables have been prepared. IDE BMDMA drivers use the |
345 | helper function ata_qc_issue_prot() for taskfile protocol-based | 345 | helper function ata_qc_issue_prot() for taskfile protocol-based |
346 | dispatch. More advanced drivers implement their own ->qc_issue. | 346 | dispatch. More advanced drivers implement their own ->qc_issue. |
347 | </para> | 347 | </para> |
348 | <para> | 348 | <para> |
349 | ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and | 349 | ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and |
350 | ->bmdma_start() as necessary to initiate a transfer. | 350 | ->bmdma_start() as necessary to initiate a transfer. |
351 | </para> | 351 | </para> |
352 | 352 | ||
353 | </sect2> | 353 | </sect2> |
354 | 354 | ||
355 | <sect2><title>Exception and probe handling (EH)</title> | 355 | <sect2><title>Exception and probe handling (EH)</title> |
356 | <programlisting> | 356 | <programlisting> |
357 | void (*eng_timeout) (struct ata_port *ap); | 357 | void (*eng_timeout) (struct ata_port *ap); |
358 | void (*phy_reset) (struct ata_port *ap); | 358 | void (*phy_reset) (struct ata_port *ap); |
359 | </programlisting> | 359 | </programlisting> |
360 | 360 | ||
361 | <para> | 361 | <para> |
362 | Deprecated. Use ->error_handler() instead. | 362 | Deprecated. Use ->error_handler() instead. |
363 | </para> | 363 | </para> |
364 | 364 | ||
365 | <programlisting> | 365 | <programlisting> |
366 | void (*freeze) (struct ata_port *ap); | 366 | void (*freeze) (struct ata_port *ap); |
367 | void (*thaw) (struct ata_port *ap); | 367 | void (*thaw) (struct ata_port *ap); |
368 | </programlisting> | 368 | </programlisting> |
369 | 369 | ||
370 | <para> | 370 | <para> |
371 | ata_port_freeze() is called when HSM violations or some other | 371 | ata_port_freeze() is called when HSM violations or some other |
372 | condition disrupts normal operation of the port. A frozen port | 372 | condition disrupts normal operation of the port. A frozen port |
373 | is not allowed to perform any operation until the port is | 373 | is not allowed to perform any operation until the port is |
374 | thawed, which usually follows a successful reset. | 374 | thawed, which usually follows a successful reset. |
375 | </para> | 375 | </para> |
376 | 376 | ||
377 | <para> | 377 | <para> |
378 | The optional ->freeze() callback can be used for freezing the port | 378 | The optional ->freeze() callback can be used for freezing the port |
379 | hardware-wise (e.g. mask interrupt and stop DMA engine). If a | 379 | hardware-wise (e.g. mask interrupt and stop DMA engine). If a |
380 | port cannot be frozen hardware-wise, the interrupt handler | 380 | port cannot be frozen hardware-wise, the interrupt handler |
381 | must ack and clear interrupts unconditionally while the port | 381 | must ack and clear interrupts unconditionally while the port |
382 | is frozen. | 382 | is frozen. |
383 | </para> | 383 | </para> |
384 | <para> | 384 | <para> |
385 | The optional ->thaw() callback is called to perform the opposite of ->freeze(): | 385 | The optional ->thaw() callback is called to perform the opposite of ->freeze(): |
386 | prepare the port for normal operation once again. Unmask interrupts, | 386 | prepare the port for normal operation once again. Unmask interrupts, |
387 | start DMA engine, etc. | 387 | start DMA engine, etc. |
388 | </para> | 388 | </para> |
389 | 389 | ||
390 | <programlisting> | 390 | <programlisting> |
391 | void (*error_handler) (struct ata_port *ap); | 391 | void (*error_handler) (struct ata_port *ap); |
392 | </programlisting> | 392 | </programlisting> |
393 | 393 | ||
394 | <para> | 394 | <para> |
395 | ->error_handler() is a driver's hook into probe, hotplug, and recovery | 395 | ->error_handler() is a driver's hook into probe, hotplug, and recovery |
396 | and other exceptional conditions. The primary responsibility of an | 396 | and other exceptional conditions. The primary responsibility of an |
397 | implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set | 397 | implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set |
398 | of EH hooks as arguments: | 398 | of EH hooks as arguments: |
399 | </para> | 399 | </para> |
400 | 400 | ||
401 | <para> | 401 | <para> |
402 | 'prereset' hook (may be NULL) is called during an EH reset, before any other actions | 402 | 'prereset' hook (may be NULL) is called during an EH reset, before any other actions |
403 | are taken. | 403 | are taken. |
404 | </para> | 404 | </para> |
405 | 405 | ||
406 | <para> | 406 | <para> |
407 | 'postreset' hook (may be NULL) is called after the EH reset is performed. Based on | 407 | 'postreset' hook (may be NULL) is called after the EH reset is performed. Based on |
408 | existing conditions, severity of the problem, and hardware capabilities, | 408 | existing conditions, severity of the problem, and hardware capabilities, |
409 | </para> | 409 | </para> |
410 | 410 | ||
411 | <para> | 411 | <para> |
412 | Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be | 412 | Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be |
413 | called to perform the low-level EH reset. | 413 | called to perform the low-level EH reset. |
414 | </para> | 414 | </para> |
415 | 415 | ||
416 | <programlisting> | 416 | <programlisting> |
417 | void (*post_internal_cmd) (struct ata_queued_cmd *qc); | 417 | void (*post_internal_cmd) (struct ata_queued_cmd *qc); |
418 | </programlisting> | 418 | </programlisting> |
419 | 419 | ||
420 | <para> | 420 | <para> |
421 | Perform any hardware-specific actions necessary to finish processing | 421 | Perform any hardware-specific actions necessary to finish processing |
422 | after executing a probe-time or EH-time command via ata_exec_internal(). | 422 | after executing a probe-time or EH-time command via ata_exec_internal(). |
423 | </para> | 423 | </para> |
424 | 424 | ||
425 | </sect2> | 425 | </sect2> |
426 | 426 | ||
427 | <sect2><title>Hardware interrupt handling</title> | 427 | <sect2><title>Hardware interrupt handling</title> |
428 | <programlisting> | 428 | <programlisting> |
429 | irqreturn_t (*irq_handler)(int, void *, struct pt_regs *); | 429 | irqreturn_t (*irq_handler)(int, void *, struct pt_regs *); |
430 | void (*irq_clear) (struct ata_port *); | 430 | void (*irq_clear) (struct ata_port *); |
431 | </programlisting> | 431 | </programlisting> |
432 | 432 | ||
433 | <para> | 433 | <para> |
434 | ->irq_handler is the interrupt handling routine registered with | 434 | ->irq_handler is the interrupt handling routine registered with |
435 | the system, by libata. ->irq_clear is called during probe just | 435 | the system, by libata. ->irq_clear is called during probe just |
436 | before the interrupt handler is registered, to be sure hardware | 436 | before the interrupt handler is registered, to be sure hardware |
437 | is quiet. | 437 | is quiet. |
438 | </para> | 438 | </para> |
439 | <para> | 439 | <para> |
440 | The second argument, dev_instance, should be cast to a pointer | 440 | The second argument, dev_instance, should be cast to a pointer |
441 | to struct ata_host_set. | 441 | to struct ata_host_set. |
442 | </para> | 442 | </para> |
443 | <para> | 443 | <para> |
444 | Most legacy IDE drivers use ata_interrupt() for the | 444 | Most legacy IDE drivers use ata_interrupt() for the |
445 | irq_handler hook, which scans all ports in the host_set, | 445 | irq_handler hook, which scans all ports in the host_set, |
446 | determines which queued command was active (if any), and calls | 446 | determines which queued command was active (if any), and calls |
447 | ata_host_intr(ap,qc). | 447 | ata_host_intr(ap,qc). |
448 | </para> | 448 | </para> |
449 | <para> | 449 | <para> |
450 | Most legacy IDE drivers use ata_bmdma_irq_clear() for the | 450 | Most legacy IDE drivers use ata_bmdma_irq_clear() for the |
451 | irq_clear() hook, which simply clears the interrupt and error | 451 | irq_clear() hook, which simply clears the interrupt and error |
452 | flags in the DMA status register. | 452 | flags in the DMA status register. |
453 | </para> | 453 | </para> |
454 | 454 | ||
455 | </sect2> | 455 | </sect2> |
456 | 456 | ||
457 | <sect2><title>SATA phy read/write</title> | 457 | <sect2><title>SATA phy read/write</title> |
458 | <programlisting> | 458 | <programlisting> |
459 | u32 (*scr_read) (struct ata_port *ap, unsigned int sc_reg); | 459 | u32 (*scr_read) (struct ata_port *ap, unsigned int sc_reg); |
460 | void (*scr_write) (struct ata_port *ap, unsigned int sc_reg, | 460 | void (*scr_write) (struct ata_port *ap, unsigned int sc_reg, |
461 | u32 val); | 461 | u32 val); |
462 | </programlisting> | 462 | </programlisting> |
463 | 463 | ||
464 | <para> | 464 | <para> |
465 | Read and write standard SATA phy registers. Currently only used | 465 | Read and write standard SATA phy registers. Currently only used |
466 | if ->phy_reset hook called the sata_phy_reset() helper function. | 466 | if ->phy_reset hook called the sata_phy_reset() helper function. |
467 | sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. | 467 | sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. |
468 | </para> | 468 | </para> |
469 | 469 | ||
470 | </sect2> | 470 | </sect2> |
471 | 471 | ||
472 | <sect2><title>Init and shutdown</title> | 472 | <sect2><title>Init and shutdown</title> |
473 | <programlisting> | 473 | <programlisting> |
474 | int (*port_start) (struct ata_port *ap); | 474 | int (*port_start) (struct ata_port *ap); |
475 | void (*port_stop) (struct ata_port *ap); | 475 | void (*port_stop) (struct ata_port *ap); |
476 | void (*host_stop) (struct ata_host_set *host_set); | 476 | void (*host_stop) (struct ata_host_set *host_set); |
477 | </programlisting> | 477 | </programlisting> |
478 | 478 | ||
479 | <para> | 479 | <para> |
480 | ->port_start() is called just after the data structures for each | 480 | ->port_start() is called just after the data structures for each |
481 | port are initialized. Typically this is used to alloc per-port | 481 | port are initialized. Typically this is used to alloc per-port |
482 | DMA buffers / tables / rings, enable DMA engines, and similar | 482 | DMA buffers / tables / rings, enable DMA engines, and similar |
483 | tasks. Some drivers also use this entry point as a chance to | 483 | tasks. Some drivers also use this entry point as a chance to |
484 | allocate driver-private memory for ap->private_data. | 484 | allocate driver-private memory for ap->private_data. |
485 | </para> | 485 | </para> |
486 | <para> | 486 | <para> |
487 | Many drivers use ata_port_start() as this hook or call | 487 | Many drivers use ata_port_start() as this hook or call |
488 | it from their own port_start() hooks. ata_port_start() | 488 | it from their own port_start() hooks. ata_port_start() |
489 | allocates space for a legacy IDE PRD table and returns. | 489 | allocates space for a legacy IDE PRD table and returns. |
490 | </para> | 490 | </para> |
491 | <para> | 491 | <para> |
492 | ->port_stop() is called after ->host_stop(). It's sole function | 492 | ->port_stop() is called after ->host_stop(). It's sole function |
493 | is to release DMA/memory resources, now that they are no longer | 493 | is to release DMA/memory resources, now that they are no longer |
494 | actively being used. Many drivers also free driver-private | 494 | actively being used. Many drivers also free driver-private |
495 | data from port at this time. | 495 | data from port at this time. |
496 | </para> | 496 | </para> |
497 | <para> | 497 | <para> |
498 | Many drivers use ata_port_stop() as this hook, which frees the | 498 | Many drivers use ata_port_stop() as this hook, which frees the |
499 | PRD table. | 499 | PRD table. |
500 | </para> | 500 | </para> |
501 | <para> | 501 | <para> |
502 | ->host_stop() is called after all ->port_stop() calls | 502 | ->host_stop() is called after all ->port_stop() calls |
503 | have completed. The hook must finalize hardware shutdown, release DMA | 503 | have completed. The hook must finalize hardware shutdown, release DMA |
504 | and other resources, etc. | 504 | and other resources, etc. |
505 | This hook may be specified as NULL, in which case it is not called. | 505 | This hook may be specified as NULL, in which case it is not called. |
506 | </para> | 506 | </para> |
507 | 507 | ||
508 | </sect2> | 508 | </sect2> |
509 | 509 | ||
510 | </sect1> | 510 | </sect1> |
511 | </chapter> | 511 | </chapter> |
512 | 512 | ||
513 | <chapter id="libataEH"> | 513 | <chapter id="libataEH"> |
514 | <title>Error handling</title> | 514 | <title>Error handling</title> |
515 | 515 | ||
516 | <para> | 516 | <para> |
517 | This chapter describes how errors are handled under libata. | 517 | This chapter describes how errors are handled under libata. |
518 | Readers are advised to read SCSI EH | 518 | Readers are advised to read SCSI EH |
519 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. | 519 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. |
520 | </para> | 520 | </para> |
521 | 521 | ||
522 | <sect1><title>Origins of commands</title> | 522 | <sect1><title>Origins of commands</title> |
523 | <para> | 523 | <para> |
524 | In libata, a command is represented with struct ata_queued_cmd | 524 | In libata, a command is represented with struct ata_queued_cmd |
525 | or qc. qc's are preallocated during port initialization and | 525 | or qc. qc's are preallocated during port initialization and |
526 | repetitively used for command executions. Currently only one | 526 | repetitively used for command executions. Currently only one |
527 | qc is allocated per port but yet-to-be-merged NCQ branch | 527 | qc is allocated per port but yet-to-be-merged NCQ branch |
528 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. | 528 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. |
529 | </para> | 529 | </para> |
530 | <para> | 530 | <para> |
531 | libata commands can originate from two sources - libata itself | 531 | libata commands can originate from two sources - libata itself |
532 | and SCSI midlayer. libata internal commands are used for | 532 | and SCSI midlayer. libata internal commands are used for |
533 | initialization and error handling. All normal blk requests | 533 | initialization and error handling. All normal blk requests |
534 | and commands for SCSI emulation are passed as SCSI commands | 534 | and commands for SCSI emulation are passed as SCSI commands |
535 | through queuecommand callback of SCSI host template. | 535 | through queuecommand callback of SCSI host template. |
536 | </para> | 536 | </para> |
537 | </sect1> | 537 | </sect1> |
538 | 538 | ||
539 | <sect1><title>How commands are issued</title> | 539 | <sect1><title>How commands are issued</title> |
540 | 540 | ||
541 | <variablelist> | 541 | <variablelist> |
542 | 542 | ||
543 | <varlistentry><term>Internal commands</term> | 543 | <varlistentry><term>Internal commands</term> |
544 | <listitem> | 544 | <listitem> |
545 | <para> | 545 | <para> |
546 | First, qc is allocated and initialized using | 546 | First, qc is allocated and initialized using |
547 | ata_qc_new_init(). Although ata_qc_new_init() doesn't | 547 | ata_qc_new_init(). Although ata_qc_new_init() doesn't |
548 | implement any wait or retry mechanism when qc is not | 548 | implement any wait or retry mechanism when qc is not |
549 | available, internal commands are currently issued only during | 549 | available, internal commands are currently issued only during |
550 | initialization and error recovery, so no other command is | 550 | initialization and error recovery, so no other command is |
551 | active and allocation is guaranteed to succeed. | 551 | active and allocation is guaranteed to succeed. |
552 | </para> | 552 | </para> |
553 | <para> | 553 | <para> |
554 | Once allocated qc's taskfile is initialized for the command to | 554 | Once allocated qc's taskfile is initialized for the command to |
555 | be executed. qc currently has two mechanisms to notify | 555 | be executed. qc currently has two mechanisms to notify |
556 | completion. One is via qc->complete_fn() callback and the | 556 | completion. One is via qc->complete_fn() callback and the |
557 | other is completion qc->waiting. qc->complete_fn() callback | 557 | other is completion qc->waiting. qc->complete_fn() callback |
558 | is the asynchronous path used by normal SCSI translated | 558 | is the asynchronous path used by normal SCSI translated |
559 | commands and qc->waiting is the synchronous (issuer sleeps in | 559 | commands and qc->waiting is the synchronous (issuer sleeps in |
560 | process context) path used by internal commands. | 560 | process context) path used by internal commands. |
561 | </para> | 561 | </para> |
562 | <para> | 562 | <para> |
563 | Once initialization is complete, host_set lock is acquired | 563 | Once initialization is complete, host_set lock is acquired |
564 | and the qc is issued. | 564 | and the qc is issued. |
565 | </para> | 565 | </para> |
566 | </listitem> | 566 | </listitem> |
567 | </varlistentry> | 567 | </varlistentry> |
568 | 568 | ||
569 | <varlistentry><term>SCSI commands</term> | 569 | <varlistentry><term>SCSI commands</term> |
570 | <listitem> | 570 | <listitem> |
571 | <para> | 571 | <para> |
572 | All libata drivers use ata_scsi_queuecmd() as | 572 | All libata drivers use ata_scsi_queuecmd() as |
573 | hostt->queuecommand callback. scmds can either be simulated | 573 | hostt->queuecommand callback. scmds can either be simulated |
574 | or translated. No qc is involved in processing a simulated | 574 | or translated. No qc is involved in processing a simulated |
575 | scmd. The result is computed right away and the scmd is | 575 | scmd. The result is computed right away and the scmd is |
576 | completed. | 576 | completed. |
577 | </para> | 577 | </para> |
578 | <para> | 578 | <para> |
579 | For a translated scmd, ata_qc_new_init() is invoked to | 579 | For a translated scmd, ata_qc_new_init() is invoked to |
580 | allocate a qc and the scmd is translated into the qc. SCSI | 580 | allocate a qc and the scmd is translated into the qc. SCSI |
581 | midlayer's completion notification function pointer is stored | 581 | midlayer's completion notification function pointer is stored |
582 | into qc->scsidone. | 582 | into qc->scsidone. |
583 | </para> | 583 | </para> |
584 | <para> | 584 | <para> |
585 | qc->complete_fn() callback is used for completion | 585 | qc->complete_fn() callback is used for completion |
586 | notification. ATA commands use ata_scsi_qc_complete() while | 586 | notification. ATA commands use ata_scsi_qc_complete() while |
587 | ATAPI commands use atapi_qc_complete(). Both functions end up | 587 | ATAPI commands use atapi_qc_complete(). Both functions end up |
588 | calling qc->scsidone to notify upper layer when the qc is | 588 | calling qc->scsidone to notify upper layer when the qc is |
589 | finished. After translation is completed, the qc is issued | 589 | finished. After translation is completed, the qc is issued |
590 | with ata_qc_issue(). | 590 | with ata_qc_issue(). |
591 | </para> | 591 | </para> |
592 | <para> | 592 | <para> |
593 | Note that SCSI midlayer invokes hostt->queuecommand while | 593 | Note that SCSI midlayer invokes hostt->queuecommand while |
594 | holding host_set lock, so all above occur while holding | 594 | holding host_set lock, so all above occur while holding |
595 | host_set lock. | 595 | host_set lock. |
596 | </para> | 596 | </para> |
597 | </listitem> | 597 | </listitem> |
598 | </varlistentry> | 598 | </varlistentry> |
599 | 599 | ||
600 | </variablelist> | 600 | </variablelist> |
601 | </sect1> | 601 | </sect1> |
602 | 602 | ||
603 | <sect1><title>How commands are processed</title> | 603 | <sect1><title>How commands are processed</title> |
604 | <para> | 604 | <para> |
605 | Depending on which protocol and which controller are used, | 605 | Depending on which protocol and which controller are used, |
606 | commands are processed differently. For the purpose of | 606 | commands are processed differently. For the purpose of |
607 | discussion, a controller which uses taskfile interface and all | 607 | discussion, a controller which uses taskfile interface and all |
608 | standard callbacks is assumed. | 608 | standard callbacks is assumed. |
609 | </para> | 609 | </para> |
610 | <para> | 610 | <para> |
611 | Currently 6 ATA command protocols are used. They can be | 611 | Currently 6 ATA command protocols are used. They can be |
612 | sorted into the following four categories according to how | 612 | sorted into the following four categories according to how |
613 | they are processed. | 613 | they are processed. |
614 | </para> | 614 | </para> |
615 | 615 | ||
616 | <variablelist> | 616 | <variablelist> |
617 | <varlistentry><term>ATA NO DATA or DMA</term> | 617 | <varlistentry><term>ATA NO DATA or DMA</term> |
618 | <listitem> | 618 | <listitem> |
619 | <para> | 619 | <para> |
620 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. | 620 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. |
621 | These types of commands don't require any software | 621 | These types of commands don't require any software |
622 | intervention once issued. Device will raise interrupt on | 622 | intervention once issued. Device will raise interrupt on |
623 | completion. | 623 | completion. |
624 | </para> | 624 | </para> |
625 | </listitem> | 625 | </listitem> |
626 | </varlistentry> | 626 | </varlistentry> |
627 | 627 | ||
628 | <varlistentry><term>ATA PIO</term> | 628 | <varlistentry><term>ATA PIO</term> |
629 | <listitem> | 629 | <listitem> |
630 | <para> | 630 | <para> |
631 | ATA_PROT_PIO is in this category. libata currently | 631 | ATA_PROT_PIO is in this category. libata currently |
632 | implements PIO with polling. ATA_NIEN bit is set to turn | 632 | implements PIO with polling. ATA_NIEN bit is set to turn |
633 | off interrupt and pio_task on ata_wq performs polling and | 633 | off interrupt and pio_task on ata_wq performs polling and |
634 | IO. | 634 | IO. |
635 | </para> | 635 | </para> |
636 | </listitem> | 636 | </listitem> |
637 | </varlistentry> | 637 | </varlistentry> |
638 | 638 | ||
639 | <varlistentry><term>ATAPI NODATA or DMA</term> | 639 | <varlistentry><term>ATAPI NODATA or DMA</term> |
640 | <listitem> | 640 | <listitem> |
641 | <para> | 641 | <para> |
642 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this | 642 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this |
643 | category. packet_task is used to poll BSY bit after | 643 | category. packet_task is used to poll BSY bit after |
644 | issuing PACKET command. Once BSY is turned off by the | 644 | issuing PACKET command. Once BSY is turned off by the |
645 | device, packet_task transfers CDB and hands off processing | 645 | device, packet_task transfers CDB and hands off processing |
646 | to interrupt handler. | 646 | to interrupt handler. |
647 | </para> | 647 | </para> |
648 | </listitem> | 648 | </listitem> |
649 | </varlistentry> | 649 | </varlistentry> |
650 | 650 | ||
651 | <varlistentry><term>ATAPI PIO</term> | 651 | <varlistentry><term>ATAPI PIO</term> |
652 | <listitem> | 652 | <listitem> |
653 | <para> | 653 | <para> |
654 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set | 654 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set |
655 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. | 655 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. |
656 | However, after submitting cdb, further processing (data | 656 | However, after submitting cdb, further processing (data |
657 | transfer) is handed off to pio_task. | 657 | transfer) is handed off to pio_task. |
658 | </para> | 658 | </para> |
659 | </listitem> | 659 | </listitem> |
660 | </varlistentry> | 660 | </varlistentry> |
661 | </variablelist> | 661 | </variablelist> |
662 | </sect1> | 662 | </sect1> |
663 | 663 | ||
664 | <sect1><title>How commands are completed</title> | 664 | <sect1><title>How commands are completed</title> |
665 | <para> | 665 | <para> |
666 | Once issued, all qc's are either completed with | 666 | Once issued, all qc's are either completed with |
667 | ata_qc_complete() or time out. For commands which are handled | 667 | ata_qc_complete() or time out. For commands which are handled |
668 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, | 668 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, |
669 | for PIO tasks, pio_task invokes ata_qc_complete(). In error | 669 | for PIO tasks, pio_task invokes ata_qc_complete(). In error |
670 | cases, packet_task may also complete commands. | 670 | cases, packet_task may also complete commands. |
671 | </para> | 671 | </para> |
672 | <para> | 672 | <para> |
673 | ata_qc_complete() does the following. | 673 | ata_qc_complete() does the following. |
674 | </para> | 674 | </para> |
675 | 675 | ||
676 | <orderedlist> | 676 | <orderedlist> |
677 | 677 | ||
678 | <listitem> | 678 | <listitem> |
679 | <para> | 679 | <para> |
680 | DMA memory is unmapped. | 680 | DMA memory is unmapped. |
681 | </para> | 681 | </para> |
682 | </listitem> | 682 | </listitem> |
683 | 683 | ||
684 | <listitem> | 684 | <listitem> |
685 | <para> | 685 | <para> |
686 | ATA_QCFLAG_ACTIVE is clared from qc->flags. | 686 | ATA_QCFLAG_ACTIVE is clared from qc->flags. |
687 | </para> | 687 | </para> |
688 | </listitem> | 688 | </listitem> |
689 | 689 | ||
690 | <listitem> | 690 | <listitem> |
691 | <para> | 691 | <para> |
692 | qc->complete_fn() callback is invoked. If the return value of | 692 | qc->complete_fn() callback is invoked. If the return value of |
693 | the callback is not zero. Completion is short circuited and | 693 | the callback is not zero. Completion is short circuited and |
694 | ata_qc_complete() returns. | 694 | ata_qc_complete() returns. |
695 | </para> | 695 | </para> |
696 | </listitem> | 696 | </listitem> |
697 | 697 | ||
698 | <listitem> | 698 | <listitem> |
699 | <para> | 699 | <para> |
700 | __ata_qc_complete() is called, which does | 700 | __ata_qc_complete() is called, which does |
701 | <orderedlist> | 701 | <orderedlist> |
702 | 702 | ||
703 | <listitem> | 703 | <listitem> |
704 | <para> | 704 | <para> |
705 | qc->flags is cleared to zero. | 705 | qc->flags is cleared to zero. |
706 | </para> | 706 | </para> |
707 | </listitem> | 707 | </listitem> |
708 | 708 | ||
709 | <listitem> | 709 | <listitem> |
710 | <para> | 710 | <para> |
711 | ap->active_tag and qc->tag are poisoned. | 711 | ap->active_tag and qc->tag are poisoned. |
712 | </para> | 712 | </para> |
713 | </listitem> | 713 | </listitem> |
714 | 714 | ||
715 | <listitem> | 715 | <listitem> |
716 | <para> | 716 | <para> |
717 | qc->waiting is claread & completed (in that order). | 717 | qc->waiting is claread & completed (in that order). |
718 | </para> | 718 | </para> |
719 | </listitem> | 719 | </listitem> |
720 | 720 | ||
721 | <listitem> | 721 | <listitem> |
722 | <para> | 722 | <para> |
723 | qc is deallocated by clearing appropriate bit in ap->qactive. | 723 | qc is deallocated by clearing appropriate bit in ap->qactive. |
724 | </para> | 724 | </para> |
725 | </listitem> | 725 | </listitem> |
726 | 726 | ||
727 | </orderedlist> | 727 | </orderedlist> |
728 | </para> | 728 | </para> |
729 | </listitem> | 729 | </listitem> |
730 | 730 | ||
731 | </orderedlist> | 731 | </orderedlist> |
732 | 732 | ||
733 | <para> | 733 | <para> |
734 | So, it basically notifies upper layer and deallocates qc. One | 734 | So, it basically notifies upper layer and deallocates qc. One |
735 | exception is short-circuit path in #3 which is used by | 735 | exception is short-circuit path in #3 which is used by |
736 | atapi_qc_complete(). | 736 | atapi_qc_complete(). |
737 | </para> | 737 | </para> |
738 | <para> | 738 | <para> |
739 | For all non-ATAPI commands, whether it fails or not, almost | 739 | For all non-ATAPI commands, whether it fails or not, almost |
740 | the same code path is taken and very little error handling | 740 | the same code path is taken and very little error handling |
741 | takes place. A qc is completed with success status if it | 741 | takes place. A qc is completed with success status if it |
742 | succeeded, with failed status otherwise. | 742 | succeeded, with failed status otherwise. |
743 | </para> | 743 | </para> |
744 | <para> | 744 | <para> |
745 | However, failed ATAPI commands require more handling as | 745 | However, failed ATAPI commands require more handling as |
746 | REQUEST SENSE is needed to acquire sense data. If an ATAPI | 746 | REQUEST SENSE is needed to acquire sense data. If an ATAPI |
747 | command fails, ata_qc_complete() is invoked with error status, | 747 | command fails, ata_qc_complete() is invoked with error status, |
748 | which in turn invokes atapi_qc_complete() via | 748 | which in turn invokes atapi_qc_complete() via |
749 | qc->complete_fn() callback. | 749 | qc->complete_fn() callback. |
750 | </para> | 750 | </para> |
751 | <para> | 751 | <para> |
752 | This makes atapi_qc_complete() set scmd->result to | 752 | This makes atapi_qc_complete() set scmd->result to |
753 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As | 753 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As |
754 | the sense data is empty but scmd->result is CHECK CONDITION, | 754 | the sense data is empty but scmd->result is CHECK CONDITION, |
755 | SCSI midlayer will invoke EH for the scmd, and returning 1 | 755 | SCSI midlayer will invoke EH for the scmd, and returning 1 |
756 | makes ata_qc_complete() to return without deallocating the qc. | 756 | makes ata_qc_complete() to return without deallocating the qc. |
757 | This leads us to ata_scsi_error() with partially completed qc. | 757 | This leads us to ata_scsi_error() with partially completed qc. |
758 | </para> | 758 | </para> |
759 | 759 | ||
760 | </sect1> | 760 | </sect1> |
761 | 761 | ||
762 | <sect1><title>ata_scsi_error()</title> | 762 | <sect1><title>ata_scsi_error()</title> |
763 | <para> | 763 | <para> |
764 | ata_scsi_error() is the current transportt->eh_strategy_handler() | 764 | ata_scsi_error() is the current transportt->eh_strategy_handler() |
765 | for libata. As discussed above, this will be entered in two | 765 | for libata. As discussed above, this will be entered in two |
766 | cases - timeout and ATAPI error completion. This function | 766 | cases - timeout and ATAPI error completion. This function |
767 | calls low level libata driver's eng_timeout() callback, the | 767 | calls low level libata driver's eng_timeout() callback, the |
768 | standard callback for which is ata_eng_timeout(). It checks | 768 | standard callback for which is ata_eng_timeout(). It checks |
769 | if a qc is active and calls ata_qc_timeout() on the qc if so. | 769 | if a qc is active and calls ata_qc_timeout() on the qc if so. |
770 | Actual error handling occurs in ata_qc_timeout(). | 770 | Actual error handling occurs in ata_qc_timeout(). |
771 | </para> | 771 | </para> |
772 | <para> | 772 | <para> |
773 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and | 773 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and |
774 | completes the qc. Note that as we're currently in EH, we | 774 | completes the qc. Note that as we're currently in EH, we |
775 | cannot call scsi_done. As described in SCSI EH doc, a | 775 | cannot call scsi_done. As described in SCSI EH doc, a |
776 | recovered scmd should be either retried with | 776 | recovered scmd should be either retried with |
777 | scsi_queue_insert() or finished with scsi_finish_command(). | 777 | scsi_queue_insert() or finished with scsi_finish_command(). |
778 | Here, we override qc->scsidone with scsi_finish_command() and | 778 | Here, we override qc->scsidone with scsi_finish_command() and |
779 | calls ata_qc_complete(). | 779 | calls ata_qc_complete(). |
780 | </para> | 780 | </para> |
781 | <para> | 781 | <para> |
782 | If EH is invoked due to a failed ATAPI qc, the qc here is | 782 | If EH is invoked due to a failed ATAPI qc, the qc here is |
783 | completed but not deallocated. The purpose of this | 783 | completed but not deallocated. The purpose of this |
784 | half-completion is to use the qc as place holder to make EH | 784 | half-completion is to use the qc as place holder to make EH |
785 | code reach this place. This is a bit hackish, but it works. | 785 | code reach this place. This is a bit hackish, but it works. |
786 | </para> | 786 | </para> |
787 | <para> | 787 | <para> |
788 | Once control reaches here, the qc is deallocated by invoking | 788 | Once control reaches here, the qc is deallocated by invoking |
789 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST | 789 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST |
790 | SENSE is issued. Once sense data is acquired, scmd is | 790 | SENSE is issued. Once sense data is acquired, scmd is |
791 | finished by directly invoking scsi_finish_command() on the | 791 | finished by directly invoking scsi_finish_command() on the |
792 | scmd. Note that as we already have completed and deallocated | 792 | scmd. Note that as we already have completed and deallocated |
793 | the qc which was associated with the scmd, we don't need | 793 | the qc which was associated with the scmd, we don't need |
794 | to/cannot call ata_qc_complete() again. | 794 | to/cannot call ata_qc_complete() again. |
795 | </para> | 795 | </para> |
796 | 796 | ||
797 | </sect1> | 797 | </sect1> |
798 | 798 | ||
799 | <sect1><title>Problems with the current EH</title> | 799 | <sect1><title>Problems with the current EH</title> |
800 | 800 | ||
801 | <itemizedlist> | 801 | <itemizedlist> |
802 | 802 | ||
803 | <listitem> | 803 | <listitem> |
804 | <para> | 804 | <para> |
805 | Error representation is too crude. Currently any and all | 805 | Error representation is too crude. Currently any and all |
806 | error conditions are represented with ATA STATUS and ERROR | 806 | error conditions are represented with ATA STATUS and ERROR |
807 | registers. Errors which aren't ATA device errors are treated | 807 | registers. Errors which aren't ATA device errors are treated |
808 | as ATA device errors by setting ATA_ERR bit. Better error | 808 | as ATA device errors by setting ATA_ERR bit. Better error |
809 | descriptor which can properly represent ATA and other | 809 | descriptor which can properly represent ATA and other |
810 | errors/exceptions is needed. | 810 | errors/exceptions is needed. |
811 | </para> | 811 | </para> |
812 | </listitem> | 812 | </listitem> |
813 | 813 | ||
814 | <listitem> | 814 | <listitem> |
815 | <para> | 815 | <para> |
816 | When handling timeouts, no action is taken to make device | 816 | When handling timeouts, no action is taken to make device |
817 | forget about the timed out command and ready for new commands. | 817 | forget about the timed out command and ready for new commands. |
818 | </para> | 818 | </para> |
819 | </listitem> | 819 | </listitem> |
820 | 820 | ||
821 | <listitem> | 821 | <listitem> |
822 | <para> | 822 | <para> |
823 | EH handling via ata_scsi_error() is not properly protected | 823 | EH handling via ata_scsi_error() is not properly protected |
824 | from usual command processing. On EH entrance, the device is | 824 | from usual command processing. On EH entrance, the device is |
825 | not in quiescent state. Timed out commands may succeed or | 825 | not in quiescent state. Timed out commands may succeed or |
826 | fail any time. pio_task and atapi_task may still be running. | 826 | fail any time. pio_task and atapi_task may still be running. |
827 | </para> | 827 | </para> |
828 | </listitem> | 828 | </listitem> |
829 | 829 | ||
830 | <listitem> | 830 | <listitem> |
831 | <para> | 831 | <para> |
832 | Too weak error recovery. Devices / controllers causing HSM | 832 | Too weak error recovery. Devices / controllers causing HSM |
833 | mismatch errors and other errors quite often require reset to | 833 | mismatch errors and other errors quite often require reset to |
834 | return to known state. Also, advanced error handling is | 834 | return to known state. Also, advanced error handling is |
835 | necessary to support features like NCQ and hotplug. | 835 | necessary to support features like NCQ and hotplug. |
836 | </para> | 836 | </para> |
837 | </listitem> | 837 | </listitem> |
838 | 838 | ||
839 | <listitem> | 839 | <listitem> |
840 | <para> | 840 | <para> |
841 | ATA errors are directly handled in the interrupt handler and | 841 | ATA errors are directly handled in the interrupt handler and |
842 | PIO errors in pio_task. This is problematic for advanced | 842 | PIO errors in pio_task. This is problematic for advanced |
843 | error handling for the following reasons. | 843 | error handling for the following reasons. |
844 | </para> | 844 | </para> |
845 | <para> | 845 | <para> |
846 | First, advanced error handling often requires context and | 846 | First, advanced error handling often requires context and |
847 | internal qc execution. | 847 | internal qc execution. |
848 | </para> | 848 | </para> |
849 | <para> | 849 | <para> |
850 | Second, even a simple failure (say, CRC error) needs | 850 | Second, even a simple failure (say, CRC error) needs |
851 | information gathering and could trigger complex error handling | 851 | information gathering and could trigger complex error handling |
852 | (say, resetting & reconfiguring). Having multiple code | 852 | (say, resetting & reconfiguring). Having multiple code |
853 | paths to gather information, enter EH and trigger actions | 853 | paths to gather information, enter EH and trigger actions |
854 | makes life painful. | 854 | makes life painful. |
855 | </para> | 855 | </para> |
856 | <para> | 856 | <para> |
857 | Third, scattered EH code makes implementing low level drivers | 857 | Third, scattered EH code makes implementing low level drivers |
858 | difficult. Low level drivers override libata callbacks. If | 858 | difficult. Low level drivers override libata callbacks. If |
859 | EH is scattered over several places, each affected callbacks | 859 | EH is scattered over several places, each affected callbacks |
860 | should perform its part of error handling. This can be error | 860 | should perform its part of error handling. This can be error |
861 | prone and painful. | 861 | prone and painful. |
862 | </para> | 862 | </para> |
863 | </listitem> | 863 | </listitem> |
864 | 864 | ||
865 | </itemizedlist> | 865 | </itemizedlist> |
866 | </sect1> | 866 | </sect1> |
867 | </chapter> | 867 | </chapter> |
868 | 868 | ||
869 | <chapter id="libataExt"> | 869 | <chapter id="libataExt"> |
870 | <title>libata Library</title> | 870 | <title>libata Library</title> |
871 | !Edrivers/ata/libata-core.c | 871 | !Edrivers/ata/libata-core.c |
872 | </chapter> | 872 | </chapter> |
873 | 873 | ||
874 | <chapter id="libataInt"> | 874 | <chapter id="libataInt"> |
875 | <title>libata Core Internals</title> | 875 | <title>libata Core Internals</title> |
876 | !Idrivers/ata/libata-core.c | 876 | !Idrivers/ata/libata-core.c |
877 | </chapter> | 877 | </chapter> |
878 | 878 | ||
879 | <chapter id="libataScsiInt"> | 879 | <chapter id="libataScsiInt"> |
880 | <title>libata SCSI translation/emulation</title> | 880 | <title>libata SCSI translation/emulation</title> |
881 | !Edrivers/ata/libata-scsi.c | 881 | !Edrivers/ata/libata-scsi.c |
882 | !Idrivers/ata/libata-scsi.c | 882 | !Idrivers/ata/libata-scsi.c |
883 | </chapter> | 883 | </chapter> |
884 | 884 | ||
885 | <chapter id="ataExceptions"> | 885 | <chapter id="ataExceptions"> |
886 | <title>ATA errors & exceptions</title> | 886 | <title>ATA errors & exceptions</title> |
887 | 887 | ||
888 | <para> | 888 | <para> |
889 | This chapter tries to identify what error/exception conditions exist | 889 | This chapter tries to identify what error/exception conditions exist |
890 | for ATA/ATAPI devices and describe how they should be handled in | 890 | for ATA/ATAPI devices and describe how they should be handled in |
891 | implementation-neutral way. | 891 | implementation-neutral way. |
892 | </para> | 892 | </para> |
893 | 893 | ||
894 | <para> | 894 | <para> |
895 | The term 'error' is used to describe conditions where either an | 895 | The term 'error' is used to describe conditions where either an |
896 | explicit error condition is reported from device or a command has | 896 | explicit error condition is reported from device or a command has |
897 | timed out. | 897 | timed out. |
898 | </para> | 898 | </para> |
899 | 899 | ||
900 | <para> | 900 | <para> |
901 | The term 'exception' is either used to describe exceptional | 901 | The term 'exception' is either used to describe exceptional |
902 | conditions which are not errors (say, power or hotplug events), or | 902 | conditions which are not errors (say, power or hotplug events), or |
903 | to describe both errors and non-error exceptional conditions. Where | 903 | to describe both errors and non-error exceptional conditions. Where |
904 | explicit distinction between error and exception is necessary, the | 904 | explicit distinction between error and exception is necessary, the |
905 | term 'non-error exception' is used. | 905 | term 'non-error exception' is used. |
906 | </para> | 906 | </para> |
907 | 907 | ||
908 | <sect1 id="excat"> | 908 | <sect1 id="excat"> |
909 | <title>Exception categories</title> | 909 | <title>Exception categories</title> |
910 | <para> | 910 | <para> |
911 | Exceptions are described primarily with respect to legacy | 911 | Exceptions are described primarily with respect to legacy |
912 | taskfile + bus master IDE interface. If a controller provides | 912 | taskfile + bus master IDE interface. If a controller provides |
913 | other better mechanism for error reporting, mapping those into | 913 | other better mechanism for error reporting, mapping those into |
914 | categories described below shouldn't be difficult. | 914 | categories described below shouldn't be difficult. |
915 | </para> | 915 | </para> |
916 | 916 | ||
917 | <para> | 917 | <para> |
918 | In the following sections, two recovery actions - reset and | 918 | In the following sections, two recovery actions - reset and |
919 | reconfiguring transport - are mentioned. These are described | 919 | reconfiguring transport - are mentioned. These are described |
920 | further in <xref linkend="exrec"/>. | 920 | further in <xref linkend="exrec"/>. |
921 | </para> | 921 | </para> |
922 | 922 | ||
923 | <sect2 id="excatHSMviolation"> | 923 | <sect2 id="excatHSMviolation"> |
924 | <title>HSM violation</title> | 924 | <title>HSM violation</title> |
925 | <para> | 925 | <para> |
926 | This error is indicated when STATUS value doesn't match HSM | 926 | This error is indicated when STATUS value doesn't match HSM |
927 | requirement during issuing or excution any ATA/ATAPI command. | 927 | requirement during issuing or excution any ATA/ATAPI command. |
928 | </para> | 928 | </para> |
929 | 929 | ||
930 | <itemizedlist> | 930 | <itemizedlist> |
931 | <title>Examples</title> | 931 | <title>Examples</title> |
932 | 932 | ||
933 | <listitem> | 933 | <listitem> |
934 | <para> | 934 | <para> |
935 | ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying | 935 | ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying |
936 | to issue a command. | 936 | to issue a command. |
937 | </para> | 937 | </para> |
938 | </listitem> | 938 | </listitem> |
939 | 939 | ||
940 | <listitem> | 940 | <listitem> |
941 | <para> | 941 | <para> |
942 | !BSY && !DRQ during PIO data transfer. | 942 | !BSY && !DRQ during PIO data transfer. |
943 | </para> | 943 | </para> |
944 | </listitem> | 944 | </listitem> |
945 | 945 | ||
946 | <listitem> | 946 | <listitem> |
947 | <para> | 947 | <para> |
948 | DRQ on command completion. | 948 | DRQ on command completion. |
949 | </para> | 949 | </para> |
950 | </listitem> | 950 | </listitem> |
951 | 951 | ||
952 | <listitem> | 952 | <listitem> |
953 | <para> | 953 | <para> |
954 | !BSY && ERR after CDB tranfer starts but before the | 954 | !BSY && ERR after CDB tranfer starts but before the |
955 | last byte of CDB is transferred. ATA/ATAPI standard states | 955 | last byte of CDB is transferred. ATA/ATAPI standard states |
956 | that "The device shall not terminate the PACKET command | 956 | that "The device shall not terminate the PACKET command |
957 | with an error before the last byte of the command packet has | 957 | with an error before the last byte of the command packet has |
958 | been written" in the error outputs description of PACKET | 958 | been written" in the error outputs description of PACKET |
959 | command and the state diagram doesn't include such | 959 | command and the state diagram doesn't include such |
960 | transitions. | 960 | transitions. |
961 | </para> | 961 | </para> |
962 | </listitem> | 962 | </listitem> |
963 | 963 | ||
964 | </itemizedlist> | 964 | </itemizedlist> |
965 | 965 | ||
966 | <para> | 966 | <para> |
967 | In these cases, HSM is violated and not much information | 967 | In these cases, HSM is violated and not much information |
968 | regarding the error can be acquired from STATUS or ERROR | 968 | regarding the error can be acquired from STATUS or ERROR |
969 | register. IOW, this error can be anything - driver bug, | 969 | register. IOW, this error can be anything - driver bug, |
970 | faulty device, controller and/or cable. | 970 | faulty device, controller and/or cable. |
971 | </para> | 971 | </para> |
972 | 972 | ||
973 | <para> | 973 | <para> |
974 | As HSM is violated, reset is necessary to restore known state. | 974 | As HSM is violated, reset is necessary to restore known state. |
975 | Reconfiguring transport for lower speed might be helpful too | 975 | Reconfiguring transport for lower speed might be helpful too |
976 | as transmission errors sometimes cause this kind of errors. | 976 | as transmission errors sometimes cause this kind of errors. |
977 | </para> | 977 | </para> |
978 | </sect2> | 978 | </sect2> |
979 | 979 | ||
980 | <sect2 id="excatDevErr"> | 980 | <sect2 id="excatDevErr"> |
981 | <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title> | 981 | <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title> |
982 | 982 | ||
983 | <para> | 983 | <para> |
984 | These are errors detected and reported by ATA/ATAPI devices | 984 | These are errors detected and reported by ATA/ATAPI devices |
985 | indicating device problems. For this type of errors, STATUS | 985 | indicating device problems. For this type of errors, STATUS |
986 | and ERROR register values are valid and describe error | 986 | and ERROR register values are valid and describe error |
987 | condition. Note that some of ATA bus errors are detected by | 987 | condition. Note that some of ATA bus errors are detected by |
988 | ATA/ATAPI devices and reported using the same mechanism as | 988 | ATA/ATAPI devices and reported using the same mechanism as |
989 | device errors. Those cases are described later in this | 989 | device errors. Those cases are described later in this |
990 | section. | 990 | section. |
991 | </para> | 991 | </para> |
992 | 992 | ||
993 | <para> | 993 | <para> |
994 | For ATA commands, this type of errors are indicated by !BSY | 994 | For ATA commands, this type of errors are indicated by !BSY |
995 | && ERR during command execution and on completion. | 995 | && ERR during command execution and on completion. |
996 | </para> | 996 | </para> |
997 | 997 | ||
998 | <para>For ATAPI commands,</para> | 998 | <para>For ATAPI commands,</para> |
999 | 999 | ||
1000 | <itemizedlist> | 1000 | <itemizedlist> |
1001 | 1001 | ||
1002 | <listitem> | 1002 | <listitem> |
1003 | <para> | 1003 | <para> |
1004 | !BSY && ERR && ABRT right after issuing PACKET | 1004 | !BSY && ERR && ABRT right after issuing PACKET |
1005 | indicates that PACKET command is not supported and falls in | 1005 | indicates that PACKET command is not supported and falls in |
1006 | this category. | 1006 | this category. |
1007 | </para> | 1007 | </para> |
1008 | </listitem> | 1008 | </listitem> |
1009 | 1009 | ||
1010 | <listitem> | 1010 | <listitem> |
1011 | <para> | 1011 | <para> |
1012 | !BSY && ERR(==CHK) && !ABRT after the last | 1012 | !BSY && ERR(==CHK) && !ABRT after the last |
1013 | byte of CDB is transferred indicates CHECK CONDITION and | 1013 | byte of CDB is transferred indicates CHECK CONDITION and |
1014 | doesn't fall in this category. | 1014 | doesn't fall in this category. |
1015 | </para> | 1015 | </para> |
1016 | </listitem> | 1016 | </listitem> |
1017 | 1017 | ||
1018 | <listitem> | 1018 | <listitem> |
1019 | <para> | 1019 | <para> |
1020 | !BSY && ERR(==CHK) && ABRT after the last byte | 1020 | !BSY && ERR(==CHK) && ABRT after the last byte |
1021 | of CDB is transferred *probably* indicates CHECK CONDITION and | 1021 | of CDB is transferred *probably* indicates CHECK CONDITION and |
1022 | doesn't fall in this category. | 1022 | doesn't fall in this category. |
1023 | </para> | 1023 | </para> |
1024 | </listitem> | 1024 | </listitem> |
1025 | 1025 | ||
1026 | </itemizedlist> | 1026 | </itemizedlist> |
1027 | 1027 | ||
1028 | <para> | 1028 | <para> |
1029 | Of errors detected as above, the followings are not ATA/ATAPI | 1029 | Of errors detected as above, the followings are not ATA/ATAPI |
1030 | device errors but ATA bus errors and should be handled | 1030 | device errors but ATA bus errors and should be handled |
1031 | according to <xref linkend="excatATAbusErr"/>. | 1031 | according to <xref linkend="excatATAbusErr"/>. |
1032 | </para> | 1032 | </para> |
1033 | 1033 | ||
1034 | <variablelist> | 1034 | <variablelist> |
1035 | 1035 | ||
1036 | <varlistentry> | 1036 | <varlistentry> |
1037 | <term>CRC error during data transfer</term> | 1037 | <term>CRC error during data transfer</term> |
1038 | <listitem> | 1038 | <listitem> |
1039 | <para> | 1039 | <para> |
1040 | This is indicated by ICRC bit in the ERROR register and | 1040 | This is indicated by ICRC bit in the ERROR register and |
1041 | means that corruption occurred during data transfer. Upto | 1041 | means that corruption occurred during data transfer. Upto |
1042 | ATA/ATAPI-7, the standard specifies that this bit is only | 1042 | ATA/ATAPI-7, the standard specifies that this bit is only |
1043 | applicable to UDMA transfers but ATA/ATAPI-8 draft revision | 1043 | applicable to UDMA transfers but ATA/ATAPI-8 draft revision |
1044 | 1f says that the bit may be applicable to multiword DMA and | 1044 | 1f says that the bit may be applicable to multiword DMA and |
1045 | PIO. | 1045 | PIO. |
1046 | </para> | 1046 | </para> |
1047 | </listitem> | 1047 | </listitem> |
1048 | </varlistentry> | 1048 | </varlistentry> |
1049 | 1049 | ||
1050 | <varlistentry> | 1050 | <varlistentry> |
1051 | <term>ABRT error during data transfer or on completion</term> | 1051 | <term>ABRT error during data transfer or on completion</term> |
1052 | <listitem> | 1052 | <listitem> |
1053 | <para> | 1053 | <para> |
1054 | Upto ATA/ATAPI-7, the standard specifies that ABRT could be | 1054 | Upto ATA/ATAPI-7, the standard specifies that ABRT could be |
1055 | set on ICRC errors and on cases where a device is not able | 1055 | set on ICRC errors and on cases where a device is not able |
1056 | to complete a command. Combined with the fact that MWDMA | 1056 | to complete a command. Combined with the fact that MWDMA |
1057 | and PIO transfer errors aren't allowed to use ICRC bit upto | 1057 | and PIO transfer errors aren't allowed to use ICRC bit upto |
1058 | ATA/ATAPI-7, it seems to imply that ABRT bit alone could | 1058 | ATA/ATAPI-7, it seems to imply that ABRT bit alone could |
1059 | indicate tranfer errors. | 1059 | indicate tranfer errors. |
1060 | </para> | 1060 | </para> |
1061 | <para> | 1061 | <para> |
1062 | However, ATA/ATAPI-8 draft revision 1f removes the part | 1062 | However, ATA/ATAPI-8 draft revision 1f removes the part |
1063 | that ICRC errors can turn on ABRT. So, this is kind of | 1063 | that ICRC errors can turn on ABRT. So, this is kind of |
1064 | gray area. Some heuristics are needed here. | 1064 | gray area. Some heuristics are needed here. |
1065 | </para> | 1065 | </para> |
1066 | </listitem> | 1066 | </listitem> |
1067 | </varlistentry> | 1067 | </varlistentry> |
1068 | 1068 | ||
1069 | </variablelist> | 1069 | </variablelist> |
1070 | 1070 | ||
1071 | <para> | 1071 | <para> |
1072 | ATA/ATAPI device errors can be further categorized as follows. | 1072 | ATA/ATAPI device errors can be further categorized as follows. |
1073 | </para> | 1073 | </para> |
1074 | 1074 | ||
1075 | <variablelist> | 1075 | <variablelist> |
1076 | 1076 | ||
1077 | <varlistentry> | 1077 | <varlistentry> |
1078 | <term>Media errors</term> | 1078 | <term>Media errors</term> |
1079 | <listitem> | 1079 | <listitem> |
1080 | <para> | 1080 | <para> |
1081 | This is indicated by UNC bit in the ERROR register. ATA | 1081 | This is indicated by UNC bit in the ERROR register. ATA |
1082 | devices reports UNC error only after certain number of | 1082 | devices reports UNC error only after certain number of |
1083 | retries cannot recover the data, so there's nothing much | 1083 | retries cannot recover the data, so there's nothing much |
1084 | else to do other than notifying upper layer. | 1084 | else to do other than notifying upper layer. |
1085 | </para> | 1085 | </para> |
1086 | <para> | 1086 | <para> |
1087 | READ and WRITE commands report CHS or LBA of the first | 1087 | READ and WRITE commands report CHS or LBA of the first |
1088 | failed sector but ATA/ATAPI standard specifies that the | 1088 | failed sector but ATA/ATAPI standard specifies that the |
1089 | amount of transferred data on error completion is | 1089 | amount of transferred data on error completion is |
1090 | indeterminate, so we cannot assume that sectors preceding | 1090 | indeterminate, so we cannot assume that sectors preceding |
1091 | the failed sector have been transferred and thus cannot | 1091 | the failed sector have been transferred and thus cannot |
1092 | complete those sectors successfully as SCSI does. | 1092 | complete those sectors successfully as SCSI does. |
1093 | </para> | 1093 | </para> |
1094 | </listitem> | 1094 | </listitem> |
1095 | </varlistentry> | 1095 | </varlistentry> |
1096 | 1096 | ||
1097 | <varlistentry> | 1097 | <varlistentry> |
1098 | <term>Media changed / media change requested error</term> | 1098 | <term>Media changed / media change requested error</term> |
1099 | <listitem> | 1099 | <listitem> |
1100 | <para> | 1100 | <para> |
1101 | <<TODO: fill here>> | 1101 | <<TODO: fill here>> |
1102 | </para> | 1102 | </para> |
1103 | </listitem> | 1103 | </listitem> |
1104 | </varlistentry> | 1104 | </varlistentry> |
1105 | 1105 | ||
1106 | <varlistentry><term>Address error</term> | 1106 | <varlistentry><term>Address error</term> |
1107 | <listitem> | 1107 | <listitem> |
1108 | <para> | 1108 | <para> |
1109 | This is indicated by IDNF bit in the ERROR register. | 1109 | This is indicated by IDNF bit in the ERROR register. |
1110 | Report to upper layer. | 1110 | Report to upper layer. |
1111 | </para> | 1111 | </para> |
1112 | </listitem> | 1112 | </listitem> |
1113 | </varlistentry> | 1113 | </varlistentry> |
1114 | 1114 | ||
1115 | <varlistentry><term>Other errors</term> | 1115 | <varlistentry><term>Other errors</term> |
1116 | <listitem> | 1116 | <listitem> |
1117 | <para> | 1117 | <para> |
1118 | This can be invalid command or parameter indicated by ABRT | 1118 | This can be invalid command or parameter indicated by ABRT |
1119 | ERROR bit or some other error condition. Note that ABRT | 1119 | ERROR bit or some other error condition. Note that ABRT |
1120 | bit can indicate a lot of things including ICRC and Address | 1120 | bit can indicate a lot of things including ICRC and Address |
1121 | errors. Heuristics needed. | 1121 | errors. Heuristics needed. |
1122 | </para> | 1122 | </para> |
1123 | </listitem> | 1123 | </listitem> |
1124 | </varlistentry> | 1124 | </varlistentry> |
1125 | 1125 | ||
1126 | </variablelist> | 1126 | </variablelist> |
1127 | 1127 | ||
1128 | <para> | 1128 | <para> |
1129 | Depending on commands, not all STATUS/ERROR bits are | 1129 | Depending on commands, not all STATUS/ERROR bits are |
1130 | applicable. These non-applicable bits are marked with | 1130 | applicable. These non-applicable bits are marked with |
1131 | "na" in the output descriptions but upto ATA/ATAPI-7 | 1131 | "na" in the output descriptions but upto ATA/ATAPI-7 |
1132 | no definition of "na" can be found. However, | 1132 | no definition of "na" can be found. However, |
1133 | ATA/ATAPI-8 draft revision 1f describes "N/A" as | 1133 | ATA/ATAPI-8 draft revision 1f describes "N/A" as |
1134 | follows. | 1134 | follows. |
1135 | </para> | 1135 | </para> |
1136 | 1136 | ||
1137 | <blockquote> | 1137 | <blockquote> |
1138 | <variablelist> | 1138 | <variablelist> |
1139 | <varlistentry><term>3.2.3.3a N/A</term> | 1139 | <varlistentry><term>3.2.3.3a N/A</term> |
1140 | <listitem> | 1140 | <listitem> |
1141 | <para> | 1141 | <para> |
1142 | A keyword the indicates a field has no defined value in | 1142 | A keyword the indicates a field has no defined value in |
1143 | this standard and should not be checked by the host or | 1143 | this standard and should not be checked by the host or |
1144 | device. N/A fields should be cleared to zero. | 1144 | device. N/A fields should be cleared to zero. |
1145 | </para> | 1145 | </para> |
1146 | </listitem> | 1146 | </listitem> |
1147 | </varlistentry> | 1147 | </varlistentry> |
1148 | </variablelist> | 1148 | </variablelist> |
1149 | </blockquote> | 1149 | </blockquote> |
1150 | 1150 | ||
1151 | <para> | 1151 | <para> |
1152 | So, it seems reasonable to assume that "na" bits are | 1152 | So, it seems reasonable to assume that "na" bits are |
1153 | cleared to zero by devices and thus need no explicit masking. | 1153 | cleared to zero by devices and thus need no explicit masking. |
1154 | </para> | 1154 | </para> |
1155 | 1155 | ||
1156 | </sect2> | 1156 | </sect2> |
1157 | 1157 | ||
1158 | <sect2 id="excatATAPIcc"> | 1158 | <sect2 id="excatATAPIcc"> |
1159 | <title>ATAPI device CHECK CONDITION</title> | 1159 | <title>ATAPI device CHECK CONDITION</title> |
1160 | 1160 | ||
1161 | <para> | 1161 | <para> |
1162 | ATAPI device CHECK CONDITION error is indicated by set CHK bit | 1162 | ATAPI device CHECK CONDITION error is indicated by set CHK bit |
1163 | (ERR bit) in the STATUS register after the last byte of CDB is | 1163 | (ERR bit) in the STATUS register after the last byte of CDB is |
1164 | transferred for a PACKET command. For this kind of errors, | 1164 | transferred for a PACKET command. For this kind of errors, |
1165 | sense data should be acquired to gather information regarding | 1165 | sense data should be acquired to gather information regarding |
1166 | the errors. REQUEST SENSE packet command should be used to | 1166 | the errors. REQUEST SENSE packet command should be used to |
1167 | acquire sense data. | 1167 | acquire sense data. |
1168 | </para> | 1168 | </para> |
1169 | 1169 | ||
1170 | <para> | 1170 | <para> |
1171 | Once sense data is acquired, this type of errors can be | 1171 | Once sense data is acquired, this type of errors can be |
1172 | handled similary to other SCSI errors. Note that sense data | 1172 | handled similary to other SCSI errors. Note that sense data |
1173 | may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR | 1173 | may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR |
1174 | && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such | 1174 | && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such |
1175 | cases, the error should be considered as an ATA bus error and | 1175 | cases, the error should be considered as an ATA bus error and |
1176 | handled according to <xref linkend="excatATAbusErr"/>. | 1176 | handled according to <xref linkend="excatATAbusErr"/>. |
1177 | </para> | 1177 | </para> |
1178 | 1178 | ||
1179 | </sect2> | 1179 | </sect2> |
1180 | 1180 | ||
1181 | <sect2 id="excatNCQerr"> | 1181 | <sect2 id="excatNCQerr"> |
1182 | <title>ATA device error (NCQ)</title> | 1182 | <title>ATA device error (NCQ)</title> |
1183 | 1183 | ||
1184 | <para> | 1184 | <para> |
1185 | NCQ command error is indicated by cleared BSY and set ERR bit | 1185 | NCQ command error is indicated by cleared BSY and set ERR bit |
1186 | during NCQ command phase (one or more NCQ commands | 1186 | during NCQ command phase (one or more NCQ commands |
1187 | outstanding). Although STATUS and ERROR registers will | 1187 | outstanding). Although STATUS and ERROR registers will |
1188 | contain valid values describing the error, READ LOG EXT is | 1188 | contain valid values describing the error, READ LOG EXT is |
1189 | required to clear the error condition, determine which command | 1189 | required to clear the error condition, determine which command |
1190 | has failed and acquire more information. | 1190 | has failed and acquire more information. |
1191 | </para> | 1191 | </para> |
1192 | 1192 | ||
1193 | <para> | 1193 | <para> |
1194 | READ LOG EXT Log Page 10h reports which tag has failed and | 1194 | READ LOG EXT Log Page 10h reports which tag has failed and |
1195 | taskfile register values describing the error. With this | 1195 | taskfile register values describing the error. With this |
1196 | information the failed command can be handled as a normal ATA | 1196 | information the failed command can be handled as a normal ATA |
1197 | command error as in <xref linkend="excatDevErr"/> and all | 1197 | command error as in <xref linkend="excatDevErr"/> and all |
1198 | other in-flight commands must be retried. Note that this | 1198 | other in-flight commands must be retried. Note that this |
1199 | retry should not be counted - it's likely that commands | 1199 | retry should not be counted - it's likely that commands |
1200 | retried this way would have completed normally if it were not | 1200 | retried this way would have completed normally if it were not |
1201 | for the failed command. | 1201 | for the failed command. |
1202 | </para> | 1202 | </para> |
1203 | 1203 | ||
1204 | <para> | 1204 | <para> |
1205 | Note that ATA bus errors can be reported as ATA device NCQ | 1205 | Note that ATA bus errors can be reported as ATA device NCQ |
1206 | errors. This should be handled as described in <xref | 1206 | errors. This should be handled as described in <xref |
1207 | linkend="excatATAbusErr"/>. | 1207 | linkend="excatATAbusErr"/>. |
1208 | </para> | 1208 | </para> |
1209 | 1209 | ||
1210 | <para> | 1210 | <para> |
1211 | If READ LOG EXT Log Page 10h fails or reports NQ, we're | 1211 | If READ LOG EXT Log Page 10h fails or reports NQ, we're |
1212 | thoroughly screwed. This condition should be treated | 1212 | thoroughly screwed. This condition should be treated |
1213 | according to <xref linkend="excatHSMviolation"/>. | 1213 | according to <xref linkend="excatHSMviolation"/>. |
1214 | </para> | 1214 | </para> |
1215 | 1215 | ||
1216 | </sect2> | 1216 | </sect2> |
1217 | 1217 | ||
1218 | <sect2 id="excatATAbusErr"> | 1218 | <sect2 id="excatATAbusErr"> |
1219 | <title>ATA bus error</title> | 1219 | <title>ATA bus error</title> |
1220 | 1220 | ||
1221 | <para> | 1221 | <para> |
1222 | ATA bus error means that data corruption occurred during | 1222 | ATA bus error means that data corruption occurred during |
1223 | transmission over ATA bus (SATA or PATA). This type of errors | 1223 | transmission over ATA bus (SATA or PATA). This type of errors |
1224 | can be indicated by | 1224 | can be indicated by |
1225 | </para> | 1225 | </para> |
1226 | 1226 | ||
1227 | <itemizedlist> | 1227 | <itemizedlist> |
1228 | 1228 | ||
1229 | <listitem> | 1229 | <listitem> |
1230 | <para> | 1230 | <para> |
1231 | ICRC or ABRT error as described in <xref linkend="excatDevErr"/>. | 1231 | ICRC or ABRT error as described in <xref linkend="excatDevErr"/>. |
1232 | </para> | 1232 | </para> |
1233 | </listitem> | 1233 | </listitem> |
1234 | 1234 | ||
1235 | <listitem> | 1235 | <listitem> |
1236 | <para> | 1236 | <para> |
1237 | Controller-specific error completion with error information | 1237 | Controller-specific error completion with error information |
1238 | indicating transmission error. | 1238 | indicating transmission error. |
1239 | </para> | 1239 | </para> |
1240 | </listitem> | 1240 | </listitem> |
1241 | 1241 | ||
1242 | <listitem> | 1242 | <listitem> |
1243 | <para> | 1243 | <para> |
1244 | On some controllers, command timeout. In this case, there may | 1244 | On some controllers, command timeout. In this case, there may |
1245 | be a mechanism to determine that the timeout is due to | 1245 | be a mechanism to determine that the timeout is due to |
1246 | transmission error. | 1246 | transmission error. |
1247 | </para> | 1247 | </para> |
1248 | </listitem> | 1248 | </listitem> |
1249 | 1249 | ||
1250 | <listitem> | 1250 | <listitem> |
1251 | <para> | 1251 | <para> |
1252 | Unknown/random errors, timeouts and all sorts of weirdities. | 1252 | Unknown/random errors, timeouts and all sorts of weirdities. |
1253 | </para> | 1253 | </para> |
1254 | </listitem> | 1254 | </listitem> |
1255 | 1255 | ||
1256 | </itemizedlist> | 1256 | </itemizedlist> |
1257 | 1257 | ||
1258 | <para> | 1258 | <para> |
1259 | As described above, transmission errors can cause wide variety | 1259 | As described above, transmission errors can cause wide variety |
1260 | of symptoms ranging from device ICRC error to random device | 1260 | of symptoms ranging from device ICRC error to random device |
1261 | lockup, and, for many cases, there is no way to tell if an | 1261 | lockup, and, for many cases, there is no way to tell if an |
1262 | error condition is due to transmission error or not; | 1262 | error condition is due to transmission error or not; |
1263 | therefore, it's necessary to employ some kind of heuristic | 1263 | therefore, it's necessary to employ some kind of heuristic |
1264 | when dealing with errors and timeouts. For example, | 1264 | when dealing with errors and timeouts. For example, |
1265 | encountering repetitive ABRT errors for known supported | 1265 | encountering repetitive ABRT errors for known supported |
1266 | command is likely to indicate ATA bus error. | 1266 | command is likely to indicate ATA bus error. |
1267 | </para> | 1267 | </para> |
1268 | 1268 | ||
1269 | <para> | 1269 | <para> |
1270 | Once it's determined that ATA bus errors have possibly | 1270 | Once it's determined that ATA bus errors have possibly |
1271 | occurred, lowering ATA bus transmission speed is one of | 1271 | occurred, lowering ATA bus transmission speed is one of |
1272 | actions which may alleviate the problem. See <xref | 1272 | actions which may alleviate the problem. See <xref |
1273 | linkend="exrecReconf"/> for more information. | 1273 | linkend="exrecReconf"/> for more information. |
1274 | </para> | 1274 | </para> |
1275 | 1275 | ||
1276 | </sect2> | 1276 | </sect2> |
1277 | 1277 | ||
1278 | <sect2 id="excatPCIbusErr"> | 1278 | <sect2 id="excatPCIbusErr"> |
1279 | <title>PCI bus error</title> | 1279 | <title>PCI bus error</title> |
1280 | 1280 | ||
1281 | <para> | 1281 | <para> |
1282 | Data corruption or other failures during transmission over PCI | 1282 | Data corruption or other failures during transmission over PCI |
1283 | (or other system bus). For standard BMDMA, this is indicated | 1283 | (or other system bus). For standard BMDMA, this is indicated |
1284 | by Error bit in the BMDMA Status register. This type of | 1284 | by Error bit in the BMDMA Status register. This type of |
1285 | errors must be logged as it indicates something is very wrong | 1285 | errors must be logged as it indicates something is very wrong |
1286 | with the system. Resetting host controller is recommended. | 1286 | with the system. Resetting host controller is recommended. |
1287 | </para> | 1287 | </para> |
1288 | 1288 | ||
1289 | </sect2> | 1289 | </sect2> |
1290 | 1290 | ||
1291 | <sect2 id="excatLateCompletion"> | 1291 | <sect2 id="excatLateCompletion"> |
1292 | <title>Late completion</title> | 1292 | <title>Late completion</title> |
1293 | 1293 | ||
1294 | <para> | 1294 | <para> |
1295 | This occurs when timeout occurs and the timeout handler finds | 1295 | This occurs when timeout occurs and the timeout handler finds |
1296 | out that the timed out command has completed successfully or | 1296 | out that the timed out command has completed successfully or |
1297 | with error. This is usually caused by lost interrupts. This | 1297 | with error. This is usually caused by lost interrupts. This |
1298 | type of errors must be logged. Resetting host controller is | 1298 | type of errors must be logged. Resetting host controller is |
1299 | recommended. | 1299 | recommended. |
1300 | </para> | 1300 | </para> |
1301 | 1301 | ||
1302 | </sect2> | 1302 | </sect2> |
1303 | 1303 | ||
1304 | <sect2 id="excatUnknown"> | 1304 | <sect2 id="excatUnknown"> |
1305 | <title>Unknown error (timeout)</title> | 1305 | <title>Unknown error (timeout)</title> |
1306 | 1306 | ||
1307 | <para> | 1307 | <para> |
1308 | This is when timeout occurs and the command is still | 1308 | This is when timeout occurs and the command is still |
1309 | processing or the host and device are in unknown state. When | 1309 | processing or the host and device are in unknown state. When |
1310 | this occurs, HSM could be in any valid or invalid state. To | 1310 | this occurs, HSM could be in any valid or invalid state. To |
1311 | bring the device to known state and make it forget about the | 1311 | bring the device to known state and make it forget about the |
1312 | timed out command, resetting is necessary. The timed out | 1312 | timed out command, resetting is necessary. The timed out |
1313 | command may be retried. | 1313 | command may be retried. |
1314 | </para> | 1314 | </para> |
1315 | 1315 | ||
1316 | <para> | 1316 | <para> |
1317 | Timeouts can also be caused by transmission errors. Refer to | 1317 | Timeouts can also be caused by transmission errors. Refer to |
1318 | <xref linkend="excatATAbusErr"/> for more details. | 1318 | <xref linkend="excatATAbusErr"/> for more details. |
1319 | </para> | 1319 | </para> |
1320 | 1320 | ||
1321 | </sect2> | 1321 | </sect2> |
1322 | 1322 | ||
1323 | <sect2 id="excatHoplugPM"> | 1323 | <sect2 id="excatHoplugPM"> |
1324 | <title>Hotplug and power management exceptions</title> | 1324 | <title>Hotplug and power management exceptions</title> |
1325 | 1325 | ||
1326 | <para> | 1326 | <para> |
1327 | <<TODO: fill here>> | 1327 | <<TODO: fill here>> |
1328 | </para> | 1328 | </para> |
1329 | 1329 | ||
1330 | </sect2> | 1330 | </sect2> |
1331 | 1331 | ||
1332 | </sect1> | 1332 | </sect1> |
1333 | 1333 | ||
1334 | <sect1 id="exrec"> | 1334 | <sect1 id="exrec"> |
1335 | <title>EH recovery actions</title> | 1335 | <title>EH recovery actions</title> |
1336 | 1336 | ||
1337 | <para> | 1337 | <para> |
1338 | This section discusses several important recovery actions. | 1338 | This section discusses several important recovery actions. |
1339 | </para> | 1339 | </para> |
1340 | 1340 | ||
1341 | <sect2 id="exrecClr"> | 1341 | <sect2 id="exrecClr"> |
1342 | <title>Clearing error condition</title> | 1342 | <title>Clearing error condition</title> |
1343 | 1343 | ||
1344 | <para> | 1344 | <para> |
1345 | Many controllers require its error registers to be cleared by | 1345 | Many controllers require its error registers to be cleared by |
1346 | error handler. Different controllers may have different | 1346 | error handler. Different controllers may have different |
1347 | requirements. | 1347 | requirements. |
1348 | </para> | 1348 | </para> |
1349 | 1349 | ||
1350 | <para> | 1350 | <para> |
1351 | For SATA, it's strongly recommended to clear at least SError | 1351 | For SATA, it's strongly recommended to clear at least SError |
1352 | register during error handling. | 1352 | register during error handling. |
1353 | </para> | 1353 | </para> |
1354 | </sect2> | 1354 | </sect2> |
1355 | 1355 | ||
1356 | <sect2 id="exrecRst"> | 1356 | <sect2 id="exrecRst"> |
1357 | <title>Reset</title> | 1357 | <title>Reset</title> |
1358 | 1358 | ||
1359 | <para> | 1359 | <para> |
1360 | During EH, resetting is necessary in the following cases. | 1360 | During EH, resetting is necessary in the following cases. |
1361 | </para> | 1361 | </para> |
1362 | 1362 | ||
1363 | <itemizedlist> | 1363 | <itemizedlist> |
1364 | 1364 | ||
1365 | <listitem> | 1365 | <listitem> |
1366 | <para> | 1366 | <para> |
1367 | HSM is in unknown or invalid state | 1367 | HSM is in unknown or invalid state |
1368 | </para> | 1368 | </para> |
1369 | </listitem> | 1369 | </listitem> |
1370 | 1370 | ||
1371 | <listitem> | 1371 | <listitem> |
1372 | <para> | 1372 | <para> |
1373 | HBA is in unknown or invalid state | 1373 | HBA is in unknown or invalid state |
1374 | </para> | 1374 | </para> |
1375 | </listitem> | 1375 | </listitem> |
1376 | 1376 | ||
1377 | <listitem> | 1377 | <listitem> |
1378 | <para> | 1378 | <para> |
1379 | EH needs to make HBA/device forget about in-flight commands | 1379 | EH needs to make HBA/device forget about in-flight commands |
1380 | </para> | 1380 | </para> |
1381 | </listitem> | 1381 | </listitem> |
1382 | 1382 | ||
1383 | <listitem> | 1383 | <listitem> |
1384 | <para> | 1384 | <para> |
1385 | HBA/device behaves weirdly | 1385 | HBA/device behaves weirdly |
1386 | </para> | 1386 | </para> |
1387 | </listitem> | 1387 | </listitem> |
1388 | 1388 | ||
1389 | </itemizedlist> | 1389 | </itemizedlist> |
1390 | 1390 | ||
1391 | <para> | 1391 | <para> |
1392 | Resetting during EH might be a good idea regardless of error | 1392 | Resetting during EH might be a good idea regardless of error |
1393 | condition to improve EH robustness. Whether to reset both or | 1393 | condition to improve EH robustness. Whether to reset both or |
1394 | either one of HBA and device depends on situation but the | 1394 | either one of HBA and device depends on situation but the |
1395 | following scheme is recommended. | 1395 | following scheme is recommended. |
1396 | </para> | 1396 | </para> |
1397 | 1397 | ||
1398 | <itemizedlist> | 1398 | <itemizedlist> |
1399 | 1399 | ||
1400 | <listitem> | 1400 | <listitem> |
1401 | <para> | 1401 | <para> |
1402 | When it's known that HBA is in ready state but ATA/ATAPI | 1402 | When it's known that HBA is in ready state but ATA/ATAPI |
1403 | device in in unknown state, reset only device. | 1403 | device is in unknown state, reset only device. |
1404 | </para> | 1404 | </para> |
1405 | </listitem> | 1405 | </listitem> |
1406 | 1406 | ||
1407 | <listitem> | 1407 | <listitem> |
1408 | <para> | 1408 | <para> |
1409 | If HBA is in unknown state, reset both HBA and device. | 1409 | If HBA is in unknown state, reset both HBA and device. |
1410 | </para> | 1410 | </para> |
1411 | </listitem> | 1411 | </listitem> |
1412 | 1412 | ||
1413 | </itemizedlist> | 1413 | </itemizedlist> |
1414 | 1414 | ||
1415 | <para> | 1415 | <para> |
1416 | HBA resetting is implementation specific. For a controller | 1416 | HBA resetting is implementation specific. For a controller |
1417 | complying to taskfile/BMDMA PCI IDE, stopping active DMA | 1417 | complying to taskfile/BMDMA PCI IDE, stopping active DMA |
1418 | transaction may be sufficient iff BMDMA state is the only HBA | 1418 | transaction may be sufficient iff BMDMA state is the only HBA |
1419 | context. But even mostly taskfile/BMDMA PCI IDE complying | 1419 | context. But even mostly taskfile/BMDMA PCI IDE complying |
1420 | controllers may have implementation specific requirements and | 1420 | controllers may have implementation specific requirements and |
1421 | mechanism to reset themselves. This must be addressed by | 1421 | mechanism to reset themselves. This must be addressed by |
1422 | specific drivers. | 1422 | specific drivers. |
1423 | </para> | 1423 | </para> |
1424 | 1424 | ||
1425 | <para> | 1425 | <para> |
1426 | OTOH, ATA/ATAPI standard describes in detail ways to reset | 1426 | OTOH, ATA/ATAPI standard describes in detail ways to reset |
1427 | ATA/ATAPI devices. | 1427 | ATA/ATAPI devices. |
1428 | </para> | 1428 | </para> |
1429 | 1429 | ||
1430 | <variablelist> | 1430 | <variablelist> |
1431 | 1431 | ||
1432 | <varlistentry><term>PATA hardware reset</term> | 1432 | <varlistentry><term>PATA hardware reset</term> |
1433 | <listitem> | 1433 | <listitem> |
1434 | <para> | 1434 | <para> |
1435 | This is hardware initiated device reset signalled with | 1435 | This is hardware initiated device reset signalled with |
1436 | asserted PATA RESET- signal. There is no standard way to | 1436 | asserted PATA RESET- signal. There is no standard way to |
1437 | initiate hardware reset from software although some | 1437 | initiate hardware reset from software although some |
1438 | hardware provides registers that allow driver to directly | 1438 | hardware provides registers that allow driver to directly |
1439 | tweak the RESET- signal. | 1439 | tweak the RESET- signal. |
1440 | </para> | 1440 | </para> |
1441 | </listitem> | 1441 | </listitem> |
1442 | </varlistentry> | 1442 | </varlistentry> |
1443 | 1443 | ||
1444 | <varlistentry><term>Software reset</term> | 1444 | <varlistentry><term>Software reset</term> |
1445 | <listitem> | 1445 | <listitem> |
1446 | <para> | 1446 | <para> |
1447 | This is achieved by turning CONTROL SRST bit on for at | 1447 | This is achieved by turning CONTROL SRST bit on for at |
1448 | least 5us. Both PATA and SATA support it but, in case of | 1448 | least 5us. Both PATA and SATA support it but, in case of |
1449 | SATA, this may require controller-specific support as the | 1449 | SATA, this may require controller-specific support as the |
1450 | second Register FIS to clear SRST should be transmitted | 1450 | second Register FIS to clear SRST should be transmitted |
1451 | while BSY bit is still set. Note that on PATA, this resets | 1451 | while BSY bit is still set. Note that on PATA, this resets |
1452 | both master and slave devices on a channel. | 1452 | both master and slave devices on a channel. |
1453 | </para> | 1453 | </para> |
1454 | </listitem> | 1454 | </listitem> |
1455 | </varlistentry> | 1455 | </varlistentry> |
1456 | 1456 | ||
1457 | <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term> | 1457 | <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term> |
1458 | <listitem> | 1458 | <listitem> |
1459 | <para> | 1459 | <para> |
1460 | Although ATA/ATAPI standard doesn't describe exactly, EDD | 1460 | Although ATA/ATAPI standard doesn't describe exactly, EDD |
1461 | implies some level of resetting, possibly similar level | 1461 | implies some level of resetting, possibly similar level |
1462 | with software reset. Host-side EDD protocol can be handled | 1462 | with software reset. Host-side EDD protocol can be handled |
1463 | with normal command processing and most SATA controllers | 1463 | with normal command processing and most SATA controllers |
1464 | should be able to handle EDD's just like other commands. | 1464 | should be able to handle EDD's just like other commands. |
1465 | As in software reset, EDD affects both devices on a PATA | 1465 | As in software reset, EDD affects both devices on a PATA |
1466 | bus. | 1466 | bus. |
1467 | </para> | 1467 | </para> |
1468 | <para> | 1468 | <para> |
1469 | Although EDD does reset devices, this doesn't suit error | 1469 | Although EDD does reset devices, this doesn't suit error |
1470 | handling as EDD cannot be issued while BSY is set and it's | 1470 | handling as EDD cannot be issued while BSY is set and it's |
1471 | unclear how it will act when device is in unknown/weird | 1471 | unclear how it will act when device is in unknown/weird |
1472 | state. | 1472 | state. |
1473 | </para> | 1473 | </para> |
1474 | </listitem> | 1474 | </listitem> |
1475 | </varlistentry> | 1475 | </varlistentry> |
1476 | 1476 | ||
1477 | <varlistentry><term>ATAPI DEVICE RESET command</term> | 1477 | <varlistentry><term>ATAPI DEVICE RESET command</term> |
1478 | <listitem> | 1478 | <listitem> |
1479 | <para> | 1479 | <para> |
1480 | This is very similar to software reset except that reset | 1480 | This is very similar to software reset except that reset |
1481 | can be restricted to the selected device without affecting | 1481 | can be restricted to the selected device without affecting |
1482 | the other device sharing the cable. | 1482 | the other device sharing the cable. |
1483 | </para> | 1483 | </para> |
1484 | </listitem> | 1484 | </listitem> |
1485 | </varlistentry> | 1485 | </varlistentry> |
1486 | 1486 | ||
1487 | <varlistentry><term>SATA phy reset</term> | 1487 | <varlistentry><term>SATA phy reset</term> |
1488 | <listitem> | 1488 | <listitem> |
1489 | <para> | 1489 | <para> |
1490 | This is the preferred way of resetting a SATA device. In | 1490 | This is the preferred way of resetting a SATA device. In |
1491 | effect, it's identical to PATA hardware reset. Note that | 1491 | effect, it's identical to PATA hardware reset. Note that |
1492 | this can be done with the standard SCR Control register. | 1492 | this can be done with the standard SCR Control register. |
1493 | As such, it's usually easier to implement than software | 1493 | As such, it's usually easier to implement than software |
1494 | reset. | 1494 | reset. |
1495 | </para> | 1495 | </para> |
1496 | </listitem> | 1496 | </listitem> |
1497 | </varlistentry> | 1497 | </varlistentry> |
1498 | 1498 | ||
1499 | </variablelist> | 1499 | </variablelist> |
1500 | 1500 | ||
1501 | <para> | 1501 | <para> |
1502 | One more thing to consider when resetting devices is that | 1502 | One more thing to consider when resetting devices is that |
1503 | resetting clears certain configuration parameters and they | 1503 | resetting clears certain configuration parameters and they |
1504 | need to be set to their previous or newly adjusted values | 1504 | need to be set to their previous or newly adjusted values |
1505 | after reset. | 1505 | after reset. |
1506 | </para> | 1506 | </para> |
1507 | 1507 | ||
1508 | <para> | 1508 | <para> |
1509 | Parameters affected are. | 1509 | Parameters affected are. |
1510 | </para> | 1510 | </para> |
1511 | 1511 | ||
1512 | <itemizedlist> | 1512 | <itemizedlist> |
1513 | 1513 | ||
1514 | <listitem> | 1514 | <listitem> |
1515 | <para> | 1515 | <para> |
1516 | CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used) | 1516 | CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used) |
1517 | </para> | 1517 | </para> |
1518 | </listitem> | 1518 | </listitem> |
1519 | 1519 | ||
1520 | <listitem> | 1520 | <listitem> |
1521 | <para> | 1521 | <para> |
1522 | Parameters set with SET FEATURES including transfer mode setting | 1522 | Parameters set with SET FEATURES including transfer mode setting |
1523 | </para> | 1523 | </para> |
1524 | </listitem> | 1524 | </listitem> |
1525 | 1525 | ||
1526 | <listitem> | 1526 | <listitem> |
1527 | <para> | 1527 | <para> |
1528 | Block count set with SET MULTIPLE MODE | 1528 | Block count set with SET MULTIPLE MODE |
1529 | </para> | 1529 | </para> |
1530 | </listitem> | 1530 | </listitem> |
1531 | 1531 | ||
1532 | <listitem> | 1532 | <listitem> |
1533 | <para> | 1533 | <para> |
1534 | Other parameters (SET MAX, MEDIA LOCK...) | 1534 | Other parameters (SET MAX, MEDIA LOCK...) |
1535 | </para> | 1535 | </para> |
1536 | </listitem> | 1536 | </listitem> |
1537 | 1537 | ||
1538 | </itemizedlist> | 1538 | </itemizedlist> |
1539 | 1539 | ||
1540 | <para> | 1540 | <para> |
1541 | ATA/ATAPI standard specifies that some parameters must be | 1541 | ATA/ATAPI standard specifies that some parameters must be |
1542 | maintained across hardware or software reset, but doesn't | 1542 | maintained across hardware or software reset, but doesn't |
1543 | strictly specify all of them. Always reconfiguring needed | 1543 | strictly specify all of them. Always reconfiguring needed |
1544 | parameters after reset is required for robustness. Note that | 1544 | parameters after reset is required for robustness. Note that |
1545 | this also applies when resuming from deep sleep (power-off). | 1545 | this also applies when resuming from deep sleep (power-off). |
1546 | </para> | 1546 | </para> |
1547 | 1547 | ||
1548 | <para> | 1548 | <para> |
1549 | Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / | 1549 | Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / |
1550 | IDENTIFY PACKET DEVICE is issued after any configuration | 1550 | IDENTIFY PACKET DEVICE is issued after any configuration |
1551 | parameter is updated or a hardware reset and the result used | 1551 | parameter is updated or a hardware reset and the result used |
1552 | for further operation. OS driver is required to implement | 1552 | for further operation. OS driver is required to implement |
1553 | revalidation mechanism to support this. | 1553 | revalidation mechanism to support this. |
1554 | </para> | 1554 | </para> |
1555 | 1555 | ||
1556 | </sect2> | 1556 | </sect2> |
1557 | 1557 | ||
1558 | <sect2 id="exrecReconf"> | 1558 | <sect2 id="exrecReconf"> |
1559 | <title>Reconfigure transport</title> | 1559 | <title>Reconfigure transport</title> |
1560 | 1560 | ||
1561 | <para> | 1561 | <para> |
1562 | For both PATA and SATA, a lot of corners are cut for cheap | 1562 | For both PATA and SATA, a lot of corners are cut for cheap |
1563 | connectors, cables or controllers and it's quite common to see | 1563 | connectors, cables or controllers and it's quite common to see |
1564 | high transmission error rate. This can be mitigated by | 1564 | high transmission error rate. This can be mitigated by |
1565 | lowering transmission speed. | 1565 | lowering transmission speed. |
1566 | </para> | 1566 | </para> |
1567 | 1567 | ||
1568 | <para> | 1568 | <para> |
1569 | The following is a possible scheme Jeff Garzik suggested. | 1569 | The following is a possible scheme Jeff Garzik suggested. |
1570 | </para> | 1570 | </para> |
1571 | 1571 | ||
1572 | <blockquote> | 1572 | <blockquote> |
1573 | <para> | 1573 | <para> |
1574 | If more than $N (3?) transmission errors happen in 15 minutes, | 1574 | If more than $N (3?) transmission errors happen in 15 minutes, |
1575 | </para> | 1575 | </para> |
1576 | <itemizedlist> | 1576 | <itemizedlist> |
1577 | <listitem> | 1577 | <listitem> |
1578 | <para> | 1578 | <para> |
1579 | if SATA, decrease SATA PHY speed. if speed cannot be decreased, | 1579 | if SATA, decrease SATA PHY speed. if speed cannot be decreased, |
1580 | </para> | 1580 | </para> |
1581 | </listitem> | 1581 | </listitem> |
1582 | <listitem> | 1582 | <listitem> |
1583 | <para> | 1583 | <para> |
1584 | decrease UDMA xfer speed. if at UDMA0, switch to PIO4, | 1584 | decrease UDMA xfer speed. if at UDMA0, switch to PIO4, |
1585 | </para> | 1585 | </para> |
1586 | </listitem> | 1586 | </listitem> |
1587 | <listitem> | 1587 | <listitem> |
1588 | <para> | 1588 | <para> |
1589 | decrease PIO xfer speed. if at PIO3, complain, but continue | 1589 | decrease PIO xfer speed. if at PIO3, complain, but continue |
1590 | </para> | 1590 | </para> |
1591 | </listitem> | 1591 | </listitem> |
1592 | </itemizedlist> | 1592 | </itemizedlist> |
1593 | </blockquote> | 1593 | </blockquote> |
1594 | 1594 | ||
1595 | </sect2> | 1595 | </sect2> |
1596 | 1596 | ||
1597 | </sect1> | 1597 | </sect1> |
1598 | 1598 | ||
1599 | </chapter> | 1599 | </chapter> |
1600 | 1600 | ||
1601 | <chapter id="PiixInt"> | 1601 | <chapter id="PiixInt"> |
1602 | <title>ata_piix Internals</title> | 1602 | <title>ata_piix Internals</title> |
1603 | !Idrivers/ata/ata_piix.c | 1603 | !Idrivers/ata/ata_piix.c |
1604 | </chapter> | 1604 | </chapter> |
1605 | 1605 | ||
1606 | <chapter id="SILInt"> | 1606 | <chapter id="SILInt"> |
1607 | <title>sata_sil Internals</title> | 1607 | <title>sata_sil Internals</title> |
1608 | !Idrivers/ata/sata_sil.c | 1608 | !Idrivers/ata/sata_sil.c |
1609 | </chapter> | 1609 | </chapter> |
1610 | 1610 | ||
1611 | <chapter id="libataThanks"> | 1611 | <chapter id="libataThanks"> |
1612 | <title>Thanks</title> | 1612 | <title>Thanks</title> |
1613 | <para> | 1613 | <para> |
1614 | The bulk of the ATA knowledge comes thanks to long conversations with | 1614 | The bulk of the ATA knowledge comes thanks to long conversations with |
1615 | Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA | 1615 | Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA |
1616 | and SCSI specifications. | 1616 | and SCSI specifications. |
1617 | </para> | 1617 | </para> |
1618 | <para> | 1618 | <para> |
1619 | Thanks to Alan Cox for pointing out similarities | 1619 | Thanks to Alan Cox for pointing out similarities |
1620 | between SATA and SCSI, and in general for motivation to hack on | 1620 | between SATA and SCSI, and in general for motivation to hack on |
1621 | libata. | 1621 | libata. |
1622 | </para> | 1622 | </para> |
1623 | <para> | 1623 | <para> |
1624 | libata's device detection | 1624 | libata's device detection |
1625 | method, ata_pio_devchk, and in general all the early probing was | 1625 | method, ata_pio_devchk, and in general all the early probing was |
1626 | based on extensive study of Hale Landis's probe/reset code in his | 1626 | based on extensive study of Hale Landis's probe/reset code in his |
1627 | ATADRVR driver (www.ata-atapi.com). | 1627 | ATADRVR driver (www.ata-atapi.com). |
1628 | </para> | 1628 | </para> |
1629 | </chapter> | 1629 | </chapter> |
1630 | 1630 | ||
1631 | </book> | 1631 | </book> |
1632 | 1632 |
Documentation/DocBook/usb.tmpl
1 | <?xml version="1.0" encoding="UTF-8"?> | 1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | 2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" |
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | 3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> |
4 | 4 | ||
5 | <book id="Linux-USB-API"> | 5 | <book id="Linux-USB-API"> |
6 | <bookinfo> | 6 | <bookinfo> |
7 | <title>The Linux-USB Host Side API</title> | 7 | <title>The Linux-USB Host Side API</title> |
8 | 8 | ||
9 | <legalnotice> | 9 | <legalnotice> |
10 | <para> | 10 | <para> |
11 | This documentation is free software; you can redistribute | 11 | This documentation is free software; you can redistribute |
12 | it and/or modify it under the terms of the GNU General Public | 12 | it and/or modify it under the terms of the GNU General Public |
13 | License as published by the Free Software Foundation; either | 13 | License as published by the Free Software Foundation; either |
14 | version 2 of the License, or (at your option) any later | 14 | version 2 of the License, or (at your option) any later |
15 | version. | 15 | version. |
16 | </para> | 16 | </para> |
17 | 17 | ||
18 | <para> | 18 | <para> |
19 | This program is distributed in the hope that it will be | 19 | This program is distributed in the hope that it will be |
20 | useful, but WITHOUT ANY WARRANTY; without even the implied | 20 | useful, but WITHOUT ANY WARRANTY; without even the implied |
21 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | 21 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
22 | See the GNU General Public License for more details. | 22 | See the GNU General Public License for more details. |
23 | </para> | 23 | </para> |
24 | 24 | ||
25 | <para> | 25 | <para> |
26 | You should have received a copy of the GNU General Public | 26 | You should have received a copy of the GNU General Public |
27 | License along with this program; if not, write to the Free | 27 | License along with this program; if not, write to the Free |
28 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | 28 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, |
29 | MA 02111-1307 USA | 29 | MA 02111-1307 USA |
30 | </para> | 30 | </para> |
31 | 31 | ||
32 | <para> | 32 | <para> |
33 | For more details see the file COPYING in the source | 33 | For more details see the file COPYING in the source |
34 | distribution of Linux. | 34 | distribution of Linux. |
35 | </para> | 35 | </para> |
36 | </legalnotice> | 36 | </legalnotice> |
37 | </bookinfo> | 37 | </bookinfo> |
38 | 38 | ||
39 | <toc></toc> | 39 | <toc></toc> |
40 | 40 | ||
41 | <chapter id="intro"> | 41 | <chapter id="intro"> |
42 | <title>Introduction to USB on Linux</title> | 42 | <title>Introduction to USB on Linux</title> |
43 | 43 | ||
44 | <para>A Universal Serial Bus (USB) is used to connect a host, | 44 | <para>A Universal Serial Bus (USB) is used to connect a host, |
45 | such as a PC or workstation, to a number of peripheral | 45 | such as a PC or workstation, to a number of peripheral |
46 | devices. USB uses a tree structure, with the host as the | 46 | devices. USB uses a tree structure, with the host as the |
47 | root (the system's master), hubs as interior nodes, and | 47 | root (the system's master), hubs as interior nodes, and |
48 | peripherals as leaves (and slaves). | 48 | peripherals as leaves (and slaves). |
49 | Modern PCs support several such trees of USB devices, usually | 49 | Modern PCs support several such trees of USB devices, usually |
50 | one USB 2.0 tree (480 Mbit/sec each) with | 50 | one USB 2.0 tree (480 Mbit/sec each) with |
51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you | 51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you |
52 | connect a USB 1.1 device directly to the machine's "root hub". | 52 | connect a USB 1.1 device directly to the machine's "root hub". |
53 | </para> | 53 | </para> |
54 | 54 | ||
55 | <para>That master/slave asymmetry was designed-in for a number of | 55 | <para>That master/slave asymmetry was designed-in for a number of |
56 | reasons, one being ease of use. It is not physically possible to | 56 | reasons, one being ease of use. It is not physically possible to |
57 | assemble (legal) USB cables incorrectly: all upstream "to the host" | 57 | assemble (legal) USB cables incorrectly: all upstream "to the host" |
58 | connectors are the rectangular type (matching the sockets on | 58 | connectors are the rectangular type (matching the sockets on |
59 | root hubs), and all downstream connectors are the squarish type | 59 | root hubs), and all downstream connectors are the squarish type |
60 | (or they are built into the peripheral). | 60 | (or they are built into the peripheral). |
61 | Also, the host software doesn't need to deal with distributed | 61 | Also, the host software doesn't need to deal with distributed |
62 | auto-configuration since the pre-designated master node manages all that. | 62 | auto-configuration since the pre-designated master node manages all that. |
63 | And finally, at the electrical level, bus protocol overhead is reduced by | 63 | And finally, at the electrical level, bus protocol overhead is reduced by |
64 | eliminating arbitration and moving scheduling into the host software. | 64 | eliminating arbitration and moving scheduling into the host software. |
65 | </para> | 65 | </para> |
66 | 66 | ||
67 | <para>USB 1.0 was announced in January 1996 and was revised | 67 | <para>USB 1.0 was announced in January 1996 and was revised |
68 | as USB 1.1 (with improvements in hub specification and | 68 | as USB 1.1 (with improvements in hub specification and |
69 | support for interrupt-out transfers) in September 1998. | 69 | support for interrupt-out transfers) in September 1998. |
70 | USB 2.0 was released in April 2000, adding high-speed | 70 | USB 2.0 was released in April 2000, adding high-speed |
71 | transfers and transaction-translating hubs (used for USB 1.1 | 71 | transfers and transaction-translating hubs (used for USB 1.1 |
72 | and 1.0 backward compatibility). | 72 | and 1.0 backward compatibility). |
73 | </para> | 73 | </para> |
74 | 74 | ||
75 | <para>Kernel developers added USB support to Linux early in the 2.2 kernel | 75 | <para>Kernel developers added USB support to Linux early in the 2.2 kernel |
76 | series, shortly before 2.3 development forked. Updates from 2.3 were | 76 | series, shortly before 2.3 development forked. Updates from 2.3 were |
77 | regularly folded back into 2.2 releases, which improved reliability and | 77 | regularly folded back into 2.2 releases, which improved reliability and |
78 | brought <filename>/sbin/hotplug</filename> support as well more drivers. | 78 | brought <filename>/sbin/hotplug</filename> support as well more drivers. |
79 | Such improvements were continued in the 2.5 kernel series, where they added | 79 | Such improvements were continued in the 2.5 kernel series, where they added |
80 | USB 2.0 support, improved performance, and made the host controller drivers | 80 | USB 2.0 support, improved performance, and made the host controller drivers |
81 | (HCDs) more consistent. They also simplified the API (to make bugs less | 81 | (HCDs) more consistent. They also simplified the API (to make bugs less |
82 | likely) and added internal "kerneldoc" documentation. | 82 | likely) and added internal "kerneldoc" documentation. |
83 | </para> | 83 | </para> |
84 | 84 | ||
85 | <para>Linux can run inside USB devices as well as on | 85 | <para>Linux can run inside USB devices as well as on |
86 | the hosts that control the devices. | 86 | the hosts that control the devices. |
87 | But USB device drivers running inside those peripherals | 87 | But USB device drivers running inside those peripherals |
88 | don't do the same things as the ones running inside hosts, | 88 | don't do the same things as the ones running inside hosts, |
89 | so they've been given a different name: | 89 | so they've been given a different name: |
90 | <emphasis>gadget drivers</emphasis>. | 90 | <emphasis>gadget drivers</emphasis>. |
91 | This document does not cover gadget drivers. | 91 | This document does not cover gadget drivers. |
92 | </para> | 92 | </para> |
93 | 93 | ||
94 | </chapter> | 94 | </chapter> |
95 | 95 | ||
96 | <chapter id="host"> | 96 | <chapter id="host"> |
97 | <title>USB Host-Side API Model</title> | 97 | <title>USB Host-Side API Model</title> |
98 | 98 | ||
99 | <para>Host-side drivers for USB devices talk to the "usbcore" APIs. | 99 | <para>Host-side drivers for USB devices talk to the "usbcore" APIs. |
100 | There are two. One is intended for | 100 | There are two. One is intended for |
101 | <emphasis>general-purpose</emphasis> drivers (exposed through | 101 | <emphasis>general-purpose</emphasis> drivers (exposed through |
102 | driver frameworks), and the other is for drivers that are | 102 | driver frameworks), and the other is for drivers that are |
103 | <emphasis>part of the core</emphasis>. | 103 | <emphasis>part of the core</emphasis>. |
104 | Such core drivers include the <emphasis>hub</emphasis> driver | 104 | Such core drivers include the <emphasis>hub</emphasis> driver |
105 | (which manages trees of USB devices) and several different kinds | 105 | (which manages trees of USB devices) and several different kinds |
106 | of <emphasis>host controller drivers</emphasis>, | 106 | of <emphasis>host controller drivers</emphasis>, |
107 | which control individual busses. | 107 | which control individual busses. |
108 | </para> | 108 | </para> |
109 | 109 | ||
110 | <para>The device model seen by USB drivers is relatively complex. | 110 | <para>The device model seen by USB drivers is relatively complex. |
111 | </para> | 111 | </para> |
112 | 112 | ||
113 | <itemizedlist> | 113 | <itemizedlist> |
114 | 114 | ||
115 | <listitem><para>USB supports four kinds of data transfers | 115 | <listitem><para>USB supports four kinds of data transfers |
116 | (control, bulk, interrupt, and isochronous). Two of them (control | 116 | (control, bulk, interrupt, and isochronous). Two of them (control |
117 | and bulk) use bandwidth as it's available, | 117 | and bulk) use bandwidth as it's available, |
118 | while the other two (interrupt and isochronous) | 118 | while the other two (interrupt and isochronous) |
119 | are scheduled to provide guaranteed bandwidth. | 119 | are scheduled to provide guaranteed bandwidth. |
120 | </para></listitem> | 120 | </para></listitem> |
121 | 121 | ||
122 | <listitem><para>The device description model includes one or more | 122 | <listitem><para>The device description model includes one or more |
123 | "configurations" per device, only one of which is active at a time. | 123 | "configurations" per device, only one of which is active at a time. |
124 | Devices that are capable of high-speed operation must also support | 124 | Devices that are capable of high-speed operation must also support |
125 | full-speed configurations, along with a way to ask about the | 125 | full-speed configurations, along with a way to ask about the |
126 | "other speed" configurations which might be used. | 126 | "other speed" configurations which might be used. |
127 | </para></listitem> | 127 | </para></listitem> |
128 | 128 | ||
129 | <listitem><para>Configurations have one or more "interfaces", each | 129 | <listitem><para>Configurations have one or more "interfaces", each |
130 | of which may have "alternate settings". Interfaces may be | 130 | of which may have "alternate settings". Interfaces may be |
131 | standardized by USB "Class" specifications, or may be specific to | 131 | standardized by USB "Class" specifications, or may be specific to |
132 | a vendor or device.</para> | 132 | a vendor or device.</para> |
133 | 133 | ||
134 | <para>USB device drivers actually bind to interfaces, not devices. | 134 | <para>USB device drivers actually bind to interfaces, not devices. |
135 | Think of them as "interface drivers", though you | 135 | Think of them as "interface drivers", though you |
136 | may not see many devices where the distinction is important. | 136 | may not see many devices where the distinction is important. |
137 | <emphasis>Most USB devices are simple, with only one configuration, | 137 | <emphasis>Most USB devices are simple, with only one configuration, |
138 | one interface, and one alternate setting.</emphasis> | 138 | one interface, and one alternate setting.</emphasis> |
139 | </para></listitem> | 139 | </para></listitem> |
140 | 140 | ||
141 | <listitem><para>Interfaces have one or more "endpoints", each of | 141 | <listitem><para>Interfaces have one or more "endpoints", each of |
142 | which supports one type and direction of data transfer such as | 142 | which supports one type and direction of data transfer such as |
143 | "bulk out" or "interrupt in". The entire configuration may have | 143 | "bulk out" or "interrupt in". The entire configuration may have |
144 | up to sixteen endpoints in each direction, allocated as needed | 144 | up to sixteen endpoints in each direction, allocated as needed |
145 | among all the interfaces. | 145 | among all the interfaces. |
146 | </para></listitem> | 146 | </para></listitem> |
147 | 147 | ||
148 | <listitem><para>Data transfer on USB is packetized; each endpoint | 148 | <listitem><para>Data transfer on USB is packetized; each endpoint |
149 | has a maximum packet size. | 149 | has a maximum packet size. |
150 | Drivers must often be aware of conventions such as flagging the end | 150 | Drivers must often be aware of conventions such as flagging the end |
151 | of bulk transfers using "short" (including zero length) packets. | 151 | of bulk transfers using "short" (including zero length) packets. |
152 | </para></listitem> | 152 | </para></listitem> |
153 | 153 | ||
154 | <listitem><para>The Linux USB API supports synchronous calls for | 154 | <listitem><para>The Linux USB API supports synchronous calls for |
155 | control and bulk messages. | 155 | control and bulk messages. |
156 | It also supports asynchnous calls for all kinds of data transfer, | 156 | It also supports asynchnous calls for all kinds of data transfer, |
157 | using request structures called "URBs" (USB Request Blocks). | 157 | using request structures called "URBs" (USB Request Blocks). |
158 | </para></listitem> | 158 | </para></listitem> |
159 | 159 | ||
160 | </itemizedlist> | 160 | </itemizedlist> |
161 | 161 | ||
162 | <para>Accordingly, the USB Core API exposed to device drivers | 162 | <para>Accordingly, the USB Core API exposed to device drivers |
163 | covers quite a lot of territory. You'll probably need to consult | 163 | covers quite a lot of territory. You'll probably need to consult |
164 | the USB 2.0 specification, available online from www.usb.org at | 164 | the USB 2.0 specification, available online from www.usb.org at |
165 | no cost, as well as class or device specifications. | 165 | no cost, as well as class or device specifications. |
166 | </para> | 166 | </para> |
167 | 167 | ||
168 | <para>The only host-side drivers that actually touch hardware | 168 | <para>The only host-side drivers that actually touch hardware |
169 | (reading/writing registers, handling IRQs, and so on) are the HCDs. | 169 | (reading/writing registers, handling IRQs, and so on) are the HCDs. |
170 | In theory, all HCDs provide the same functionality through the same | 170 | In theory, all HCDs provide the same functionality through the same |
171 | API. In practice, that's becoming more true on the 2.5 kernels, | 171 | API. In practice, that's becoming more true on the 2.5 kernels, |
172 | but there are still differences that crop up especially with | 172 | but there are still differences that crop up especially with |
173 | fault handling. Different controllers don't necessarily report | 173 | fault handling. Different controllers don't necessarily report |
174 | the same aspects of failures, and recovery from faults (including | 174 | the same aspects of failures, and recovery from faults (including |
175 | software-induced ones like unlinking an URB) isn't yet fully | 175 | software-induced ones like unlinking an URB) isn't yet fully |
176 | consistent. | 176 | consistent. |
177 | Device driver authors should make a point of doing disconnect | 177 | Device driver authors should make a point of doing disconnect |
178 | testing (while the device is active) with each different host | 178 | testing (while the device is active) with each different host |
179 | controller driver, to make sure drivers don't have bugs of | 179 | controller driver, to make sure drivers don't have bugs of |
180 | their own as well as to make sure they aren't relying on some | 180 | their own as well as to make sure they aren't relying on some |
181 | HCD-specific behavior. | 181 | HCD-specific behavior. |
182 | (You will need external USB 1.1 and/or | 182 | (You will need external USB 1.1 and/or |
183 | USB 2.0 hubs to perform all those tests.) | 183 | USB 2.0 hubs to perform all those tests.) |
184 | </para> | 184 | </para> |
185 | 185 | ||
186 | </chapter> | 186 | </chapter> |
187 | 187 | ||
188 | <chapter><title>USB-Standard Types</title> | 188 | <chapter><title>USB-Standard Types</title> |
189 | 189 | ||
190 | <para>In <filename><linux/usb_ch9.h></filename> you will find | 190 | <para>In <filename><linux/usb_ch9.h></filename> you will find |
191 | the USB data types defined in chapter 9 of the USB specification. | 191 | the USB data types defined in chapter 9 of the USB specification. |
192 | These data types are used throughout USB, and in APIs including | 192 | These data types are used throughout USB, and in APIs including |
193 | this host side API, gadget APIs, and usbfs. | 193 | this host side API, gadget APIs, and usbfs. |
194 | </para> | 194 | </para> |
195 | 195 | ||
196 | !Iinclude/linux/usb_ch9.h | 196 | !Iinclude/linux/usb_ch9.h |
197 | 197 | ||
198 | </chapter> | 198 | </chapter> |
199 | 199 | ||
200 | <chapter><title>Host-Side Data Types and Macros</title> | 200 | <chapter><title>Host-Side Data Types and Macros</title> |
201 | 201 | ||
202 | <para>The host side API exposes several layers to drivers, some of | 202 | <para>The host side API exposes several layers to drivers, some of |
203 | which are more necessary than others. | 203 | which are more necessary than others. |
204 | These support lifecycle models for host side drivers | 204 | These support lifecycle models for host side drivers |
205 | and devices, and support passing buffers through usbcore to | 205 | and devices, and support passing buffers through usbcore to |
206 | some HCD that performs the I/O for the device driver. | 206 | some HCD that performs the I/O for the device driver. |
207 | </para> | 207 | </para> |
208 | 208 | ||
209 | 209 | ||
210 | !Iinclude/linux/usb.h | 210 | !Iinclude/linux/usb.h |
211 | 211 | ||
212 | </chapter> | 212 | </chapter> |
213 | 213 | ||
214 | <chapter><title>USB Core APIs</title> | 214 | <chapter><title>USB Core APIs</title> |
215 | 215 | ||
216 | <para>There are two basic I/O models in the USB API. | 216 | <para>There are two basic I/O models in the USB API. |
217 | The most elemental one is asynchronous: drivers submit requests | 217 | The most elemental one is asynchronous: drivers submit requests |
218 | in the form of an URB, and the URB's completion callback | 218 | in the form of an URB, and the URB's completion callback |
219 | handle the next step. | 219 | handle the next step. |
220 | All USB transfer types support that model, although there | 220 | All USB transfer types support that model, although there |
221 | are special cases for control URBs (which always have setup | 221 | are special cases for control URBs (which always have setup |
222 | and status stages, but may not have a data stage) and | 222 | and status stages, but may not have a data stage) and |
223 | isochronous URBs (which allow large packets and include | 223 | isochronous URBs (which allow large packets and include |
224 | per-packet fault reports). | 224 | per-packet fault reports). |
225 | Built on top of that is synchronous API support, where a | 225 | Built on top of that is synchronous API support, where a |
226 | driver calls a routine that allocates one or more URBs, | 226 | driver calls a routine that allocates one or more URBs, |
227 | submits them, and waits until they complete. | 227 | submits them, and waits until they complete. |
228 | There are synchronous wrappers for single-buffer control | 228 | There are synchronous wrappers for single-buffer control |
229 | and bulk transfers (which are awkward to use in some | 229 | and bulk transfers (which are awkward to use in some |
230 | driver disconnect scenarios), and for scatterlist based | 230 | driver disconnect scenarios), and for scatterlist based |
231 | streaming i/o (bulk or interrupt). | 231 | streaming i/o (bulk or interrupt). |
232 | </para> | 232 | </para> |
233 | 233 | ||
234 | <para>USB drivers need to provide buffers that can be | 234 | <para>USB drivers need to provide buffers that can be |
235 | used for DMA, although they don't necessarily need to | 235 | used for DMA, although they don't necessarily need to |
236 | provide the DMA mapping themselves. | 236 | provide the DMA mapping themselves. |
237 | There are APIs to use used when allocating DMA buffers, | 237 | There are APIs to use used when allocating DMA buffers, |
238 | which can prevent use of bounce buffers on some systems. | 238 | which can prevent use of bounce buffers on some systems. |
239 | In some cases, drivers may be able to rely on 64bit DMA | 239 | In some cases, drivers may be able to rely on 64bit DMA |
240 | to eliminate another kind of bounce buffer. | 240 | to eliminate another kind of bounce buffer. |
241 | </para> | 241 | </para> |
242 | 242 | ||
243 | !Edrivers/usb/core/urb.c | 243 | !Edrivers/usb/core/urb.c |
244 | !Edrivers/usb/core/message.c | 244 | !Edrivers/usb/core/message.c |
245 | !Edrivers/usb/core/file.c | 245 | !Edrivers/usb/core/file.c |
246 | !Edrivers/usb/core/driver.c | 246 | !Edrivers/usb/core/driver.c |
247 | !Edrivers/usb/core/usb.c | 247 | !Edrivers/usb/core/usb.c |
248 | !Edrivers/usb/core/hub.c | 248 | !Edrivers/usb/core/hub.c |
249 | </chapter> | 249 | </chapter> |
250 | 250 | ||
251 | <chapter><title>Host Controller APIs</title> | 251 | <chapter><title>Host Controller APIs</title> |
252 | 252 | ||
253 | <para>These APIs are only for use by host controller drivers, | 253 | <para>These APIs are only for use by host controller drivers, |
254 | most of which implement standard register interfaces such as | 254 | most of which implement standard register interfaces such as |
255 | EHCI, OHCI, or UHCI. | 255 | EHCI, OHCI, or UHCI. |
256 | UHCI was one of the first interfaces, designed by Intel and | 256 | UHCI was one of the first interfaces, designed by Intel and |
257 | also used by VIA; it doesn't do much in hardware. | 257 | also used by VIA; it doesn't do much in hardware. |
258 | OHCI was designed later, to have the hardware do more work | 258 | OHCI was designed later, to have the hardware do more work |
259 | (bigger transfers, tracking protocol state, and so on). | 259 | (bigger transfers, tracking protocol state, and so on). |
260 | EHCI was designed with USB 2.0; its design has features that | 260 | EHCI was designed with USB 2.0; its design has features that |
261 | resemble OHCI (hardware does much more work) as well as | 261 | resemble OHCI (hardware does much more work) as well as |
262 | UHCI (some parts of ISO support, TD list processing). | 262 | UHCI (some parts of ISO support, TD list processing). |
263 | </para> | 263 | </para> |
264 | 264 | ||
265 | <para>There are host controllers other than the "big three", | 265 | <para>There are host controllers other than the "big three", |
266 | although most PCI based controllers (and a few non-PCI based | 266 | although most PCI based controllers (and a few non-PCI based |
267 | ones) use one of those interfaces. | 267 | ones) use one of those interfaces. |
268 | Not all host controllers use DMA; some use PIO, and there | 268 | Not all host controllers use DMA; some use PIO, and there |
269 | is also a simulator. | 269 | is also a simulator. |
270 | </para> | 270 | </para> |
271 | 271 | ||
272 | <para>The same basic APIs are available to drivers for all | 272 | <para>The same basic APIs are available to drivers for all |
273 | those controllers. | 273 | those controllers. |
274 | For historical reasons they are in two layers: | 274 | For historical reasons they are in two layers: |
275 | <structname>struct usb_bus</structname> is a rather thin | 275 | <structname>struct usb_bus</structname> is a rather thin |
276 | layer that became available in the 2.2 kernels, while | 276 | layer that became available in the 2.2 kernels, while |
277 | <structname>struct usb_hcd</structname> is a more featureful | 277 | <structname>struct usb_hcd</structname> is a more featureful |
278 | layer (available in later 2.4 kernels and in 2.5) that | 278 | layer (available in later 2.4 kernels and in 2.5) that |
279 | lets HCDs share common code, to shrink driver size | 279 | lets HCDs share common code, to shrink driver size |
280 | and significantly reduce hcd-specific behaviors. | 280 | and significantly reduce hcd-specific behaviors. |
281 | </para> | 281 | </para> |
282 | 282 | ||
283 | !Edrivers/usb/core/hcd.c | 283 | !Edrivers/usb/core/hcd.c |
284 | !Edrivers/usb/core/hcd-pci.c | 284 | !Edrivers/usb/core/hcd-pci.c |
285 | !Idrivers/usb/core/buffer.c | 285 | !Idrivers/usb/core/buffer.c |
286 | </chapter> | 286 | </chapter> |
287 | 287 | ||
288 | <chapter> | 288 | <chapter> |
289 | <title>The USB Filesystem (usbfs)</title> | 289 | <title>The USB Filesystem (usbfs)</title> |
290 | 290 | ||
291 | <para>This chapter presents the Linux <emphasis>usbfs</emphasis>. | 291 | <para>This chapter presents the Linux <emphasis>usbfs</emphasis>. |
292 | You may prefer to avoid writing new kernel code for your | 292 | You may prefer to avoid writing new kernel code for your |
293 | USB driver; that's the problem that usbfs set out to solve. | 293 | USB driver; that's the problem that usbfs set out to solve. |
294 | User mode device drivers are usually packaged as applications | 294 | User mode device drivers are usually packaged as applications |
295 | or libraries, and may use usbfs through some programming library | 295 | or libraries, and may use usbfs through some programming library |
296 | that wraps it. Such libraries include | 296 | that wraps it. Such libraries include |
297 | <ulink url="http://libusb.sourceforge.net">libusb</ulink> | 297 | <ulink url="http://libusb.sourceforge.net">libusb</ulink> |
298 | for C/C++, and | 298 | for C/C++, and |
299 | <ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java. | 299 | <ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java. |
300 | </para> | 300 | </para> |
301 | 301 | ||
302 | <note><title>Unfinished</title> | 302 | <note><title>Unfinished</title> |
303 | <para>This particular documentation is incomplete, | 303 | <para>This particular documentation is incomplete, |
304 | especially with respect to the asynchronous mode. | 304 | especially with respect to the asynchronous mode. |
305 | As of kernel 2.5.66 the code and this (new) documentation | 305 | As of kernel 2.5.66 the code and this (new) documentation |
306 | need to be cross-reviewed. | 306 | need to be cross-reviewed. |
307 | </para> | 307 | </para> |
308 | </note> | 308 | </note> |
309 | 309 | ||
310 | <para>Configure usbfs into Linux kernels by enabling the | 310 | <para>Configure usbfs into Linux kernels by enabling the |
311 | <emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS), | 311 | <emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS), |
312 | and you get basic support for user mode USB device drivers. | 312 | and you get basic support for user mode USB device drivers. |
313 | Until relatively recently it was often (confusingly) called | 313 | Until relatively recently it was often (confusingly) called |
314 | <emphasis>usbdevfs</emphasis> although it wasn't solving what | 314 | <emphasis>usbdevfs</emphasis> although it wasn't solving what |
315 | <emphasis>devfs</emphasis> was. | 315 | <emphasis>devfs</emphasis> was. |
316 | Every USB device will appear in usbfs, regardless of whether or | 316 | Every USB device will appear in usbfs, regardless of whether or |
317 | not it has a kernel driver. | 317 | not it has a kernel driver. |
318 | </para> | 318 | </para> |
319 | 319 | ||
320 | <sect1> | 320 | <sect1> |
321 | <title>What files are in "usbfs"?</title> | 321 | <title>What files are in "usbfs"?</title> |
322 | 322 | ||
323 | <para>Conventionally mounted at | 323 | <para>Conventionally mounted at |
324 | <filename>/proc/bus/usb</filename>, usbfs | 324 | <filename>/proc/bus/usb</filename>, usbfs |
325 | features include: | 325 | features include: |
326 | <itemizedlist> | 326 | <itemizedlist> |
327 | <listitem><para><filename>/proc/bus/usb/devices</filename> | 327 | <listitem><para><filename>/proc/bus/usb/devices</filename> |
328 | ... a text file | 328 | ... a text file |
329 | showing each of the USB devices on known to the kernel, | 329 | showing each of the USB devices on known to the kernel, |
330 | and their configuration descriptors. | 330 | and their configuration descriptors. |
331 | You can also poll() this to learn about new devices. | 331 | You can also poll() this to learn about new devices. |
332 | </para></listitem> | 332 | </para></listitem> |
333 | <listitem><para><filename>/proc/bus/usb/BBB/DDD</filename> | 333 | <listitem><para><filename>/proc/bus/usb/BBB/DDD</filename> |
334 | ... magic files | 334 | ... magic files |
335 | exposing the each device's configuration descriptors, and | 335 | exposing the each device's configuration descriptors, and |
336 | supporting a series of ioctls for making device requests, | 336 | supporting a series of ioctls for making device requests, |
337 | including I/O to devices. (Purely for access by programs.) | 337 | including I/O to devices. (Purely for access by programs.) |
338 | </para></listitem> | 338 | </para></listitem> |
339 | </itemizedlist> | 339 | </itemizedlist> |
340 | </para> | 340 | </para> |
341 | 341 | ||
342 | <para> Each bus is given a number (BBB) based on when it was | 342 | <para> Each bus is given a number (BBB) based on when it was |
343 | enumerated; within each bus, each device is given a similar | 343 | enumerated; within each bus, each device is given a similar |
344 | number (DDD). | 344 | number (DDD). |
345 | Those BBB/DDD paths are not "stable" identifiers; | 345 | Those BBB/DDD paths are not "stable" identifiers; |
346 | expect them to change even if you always leave the devices | 346 | expect them to change even if you always leave the devices |
347 | plugged in to the same hub port. | 347 | plugged in to the same hub port. |
348 | <emphasis>Don't even think of saving these in application | 348 | <emphasis>Don't even think of saving these in application |
349 | configuration files.</emphasis> | 349 | configuration files.</emphasis> |
350 | Stable identifiers are available, for user mode applications | 350 | Stable identifiers are available, for user mode applications |
351 | that want to use them. HID and networking devices expose | 351 | that want to use them. HID and networking devices expose |
352 | these stable IDs, so that for example you can be sure that | 352 | these stable IDs, so that for example you can be sure that |
353 | you told the right UPS to power down its second server. | 353 | you told the right UPS to power down its second server. |
354 | "usbfs" doesn't (yet) expose those IDs. | 354 | "usbfs" doesn't (yet) expose those IDs. |
355 | </para> | 355 | </para> |
356 | 356 | ||
357 | </sect1> | 357 | </sect1> |
358 | 358 | ||
359 | <sect1> | 359 | <sect1> |
360 | <title>Mounting and Access Control</title> | 360 | <title>Mounting and Access Control</title> |
361 | 361 | ||
362 | <para>There are a number of mount options for usbfs, which will | 362 | <para>There are a number of mount options for usbfs, which will |
363 | be of most interest to you if you need to override the default | 363 | be of most interest to you if you need to override the default |
364 | access control policy. | 364 | access control policy. |
365 | That policy is that only root may read or write device files | 365 | That policy is that only root may read or write device files |
366 | (<filename>/proc/bus/BBB/DDD</filename>) although anyone may read | 366 | (<filename>/proc/bus/BBB/DDD</filename>) although anyone may read |
367 | the <filename>devices</filename> | 367 | the <filename>devices</filename> |
368 | or <filename>drivers</filename> files. | 368 | or <filename>drivers</filename> files. |
369 | I/O requests to the device also need the CAP_SYS_RAWIO capability, | 369 | I/O requests to the device also need the CAP_SYS_RAWIO capability, |
370 | </para> | 370 | </para> |
371 | 371 | ||
372 | <para>The significance of that is that by default, all user mode | 372 | <para>The significance of that is that by default, all user mode |
373 | device drivers need super-user privileges. | 373 | device drivers need super-user privileges. |
374 | You can change modes or ownership in a driver setup | 374 | You can change modes or ownership in a driver setup |
375 | when the device hotplugs, or maye just start the | 375 | when the device hotplugs, or maye just start the |
376 | driver right then, as a privileged server (or some activity | 376 | driver right then, as a privileged server (or some activity |
377 | within one). | 377 | within one). |
378 | That's the most secure approach for multi-user systems, | 378 | That's the most secure approach for multi-user systems, |
379 | but for single user systems ("trusted" by that user) | 379 | but for single user systems ("trusted" by that user) |
380 | it's more convenient just to grant everyone all access | 380 | it's more convenient just to grant everyone all access |
381 | (using the <emphasis>devmode=0666</emphasis> option) | 381 | (using the <emphasis>devmode=0666</emphasis> option) |
382 | so the driver can start whenever it's needed. | 382 | so the driver can start whenever it's needed. |
383 | </para> | 383 | </para> |
384 | 384 | ||
385 | <para>The mount options for usbfs, usable in /etc/fstab or | 385 | <para>The mount options for usbfs, usable in /etc/fstab or |
386 | in command line invocations of <emphasis>mount</emphasis>, are: | 386 | in command line invocations of <emphasis>mount</emphasis>, are: |
387 | 387 | ||
388 | <variablelist> | 388 | <variablelist> |
389 | <varlistentry> | 389 | <varlistentry> |
390 | <term><emphasis>busgid</emphasis>=NNNNN</term> | 390 | <term><emphasis>busgid</emphasis>=NNNNN</term> |
391 | <listitem><para>Controls the GID used for the | 391 | <listitem><para>Controls the GID used for the |
392 | /proc/bus/usb/BBB | 392 | /proc/bus/usb/BBB |
393 | directories. (Default: 0)</para></listitem></varlistentry> | 393 | directories. (Default: 0)</para></listitem></varlistentry> |
394 | <varlistentry><term><emphasis>busmode</emphasis>=MMM</term> | 394 | <varlistentry><term><emphasis>busmode</emphasis>=MMM</term> |
395 | <listitem><para>Controls the file mode used for the | 395 | <listitem><para>Controls the file mode used for the |
396 | /proc/bus/usb/BBB | 396 | /proc/bus/usb/BBB |
397 | directories. (Default: 0555) | 397 | directories. (Default: 0555) |
398 | </para></listitem></varlistentry> | 398 | </para></listitem></varlistentry> |
399 | <varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term> | 399 | <varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term> |
400 | <listitem><para>Controls the UID used for the | 400 | <listitem><para>Controls the UID used for the |
401 | /proc/bus/usb/BBB | 401 | /proc/bus/usb/BBB |
402 | directories. (Default: 0)</para></listitem></varlistentry> | 402 | directories. (Default: 0)</para></listitem></varlistentry> |
403 | 403 | ||
404 | <varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term> | 404 | <varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term> |
405 | <listitem><para>Controls the GID used for the | 405 | <listitem><para>Controls the GID used for the |
406 | /proc/bus/usb/BBB/DDD | 406 | /proc/bus/usb/BBB/DDD |
407 | files. (Default: 0)</para></listitem></varlistentry> | 407 | files. (Default: 0)</para></listitem></varlistentry> |
408 | <varlistentry><term><emphasis>devmode</emphasis>=MMM</term> | 408 | <varlistentry><term><emphasis>devmode</emphasis>=MMM</term> |
409 | <listitem><para>Controls the file mode used for the | 409 | <listitem><para>Controls the file mode used for the |
410 | /proc/bus/usb/BBB/DDD | 410 | /proc/bus/usb/BBB/DDD |
411 | files. (Default: 0644)</para></listitem></varlistentry> | 411 | files. (Default: 0644)</para></listitem></varlistentry> |
412 | <varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term> | 412 | <varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term> |
413 | <listitem><para>Controls the UID used for the | 413 | <listitem><para>Controls the UID used for the |
414 | /proc/bus/usb/BBB/DDD | 414 | /proc/bus/usb/BBB/DDD |
415 | files. (Default: 0)</para></listitem></varlistentry> | 415 | files. (Default: 0)</para></listitem></varlistentry> |
416 | 416 | ||
417 | <varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term> | 417 | <varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term> |
418 | <listitem><para>Controls the GID used for the | 418 | <listitem><para>Controls the GID used for the |
419 | /proc/bus/usb/devices and drivers files. | 419 | /proc/bus/usb/devices and drivers files. |
420 | (Default: 0)</para></listitem></varlistentry> | 420 | (Default: 0)</para></listitem></varlistentry> |
421 | <varlistentry><term><emphasis>listmode</emphasis>=MMM</term> | 421 | <varlistentry><term><emphasis>listmode</emphasis>=MMM</term> |
422 | <listitem><para>Controls the file mode used for the | 422 | <listitem><para>Controls the file mode used for the |
423 | /proc/bus/usb/devices and drivers files. | 423 | /proc/bus/usb/devices and drivers files. |
424 | (Default: 0444)</para></listitem></varlistentry> | 424 | (Default: 0444)</para></listitem></varlistentry> |
425 | <varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term> | 425 | <varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term> |
426 | <listitem><para>Controls the UID used for the | 426 | <listitem><para>Controls the UID used for the |
427 | /proc/bus/usb/devices and drivers files. | 427 | /proc/bus/usb/devices and drivers files. |
428 | (Default: 0)</para></listitem></varlistentry> | 428 | (Default: 0)</para></listitem></varlistentry> |
429 | </variablelist> | 429 | </variablelist> |
430 | 430 | ||
431 | </para> | 431 | </para> |
432 | 432 | ||
433 | <para>Note that many Linux distributions hard-wire the mount options | 433 | <para>Note that many Linux distributions hard-wire the mount options |
434 | for usbfs in their init scripts, such as | 434 | for usbfs in their init scripts, such as |
435 | <filename>/etc/rc.d/rc.sysinit</filename>, | 435 | <filename>/etc/rc.d/rc.sysinit</filename>, |
436 | rather than making it easy to set this per-system | 436 | rather than making it easy to set this per-system |
437 | policy in <filename>/etc/fstab</filename>. | 437 | policy in <filename>/etc/fstab</filename>. |
438 | </para> | 438 | </para> |
439 | 439 | ||
440 | </sect1> | 440 | </sect1> |
441 | 441 | ||
442 | <sect1> | 442 | <sect1> |
443 | <title>/proc/bus/usb/devices</title> | 443 | <title>/proc/bus/usb/devices</title> |
444 | 444 | ||
445 | <para>This file is handy for status viewing tools in user | 445 | <para>This file is handy for status viewing tools in user |
446 | mode, which can scan the text format and ignore most of it. | 446 | mode, which can scan the text format and ignore most of it. |
447 | More detailed device status (including class and vendor | 447 | More detailed device status (including class and vendor |
448 | status) is available from device-specific files. | 448 | status) is available from device-specific files. |
449 | For information about the current format of this file, | 449 | For information about the current format of this file, |
450 | see the | 450 | see the |
451 | <filename>Documentation/usb/proc_usb_info.txt</filename> | 451 | <filename>Documentation/usb/proc_usb_info.txt</filename> |
452 | file in your Linux kernel sources. | 452 | file in your Linux kernel sources. |
453 | </para> | 453 | </para> |
454 | 454 | ||
455 | <para>This file, in combination with the poll() system call, can | 455 | <para>This file, in combination with the poll() system call, can |
456 | also be used to detect when devices are added or removed: | 456 | also be used to detect when devices are added or removed: |
457 | <programlisting>int fd; | 457 | <programlisting>int fd; |
458 | struct pollfd pfd; | 458 | struct pollfd pfd; |
459 | 459 | ||
460 | fd = open("/proc/bus/usb/devices", O_RDONLY); | 460 | fd = open("/proc/bus/usb/devices", O_RDONLY); |
461 | pfd = { fd, POLLIN, 0 }; | 461 | pfd = { fd, POLLIN, 0 }; |
462 | for (;;) { | 462 | for (;;) { |
463 | /* The first time through, this call will return immediately. */ | 463 | /* The first time through, this call will return immediately. */ |
464 | poll(&pfd, 1, -1); | 464 | poll(&pfd, 1, -1); |
465 | 465 | ||
466 | /* To see what's changed, compare the file's previous and current | 466 | /* To see what's changed, compare the file's previous and current |
467 | contents or scan the filesystem. (Scanning is more precise.) */ | 467 | contents or scan the filesystem. (Scanning is more precise.) */ |
468 | }</programlisting> | 468 | }</programlisting> |
469 | Note that this behavior is intended to be used for informational | 469 | Note that this behavior is intended to be used for informational |
470 | and debug purposes. It would be more appropriate to use programs | 470 | and debug purposes. It would be more appropriate to use programs |
471 | such as udev or HAL to initialize a device or start a user-mode | 471 | such as udev or HAL to initialize a device or start a user-mode |
472 | helper program, for instance. | 472 | helper program, for instance. |
473 | </para> | 473 | </para> |
474 | </sect1> | 474 | </sect1> |
475 | 475 | ||
476 | <sect1> | 476 | <sect1> |
477 | <title>/proc/bus/usb/BBB/DDD</title> | 477 | <title>/proc/bus/usb/BBB/DDD</title> |
478 | 478 | ||
479 | <para>Use these files in one of these basic ways: | 479 | <para>Use these files in one of these basic ways: |
480 | </para> | 480 | </para> |
481 | 481 | ||
482 | <para><emphasis>They can be read,</emphasis> | 482 | <para><emphasis>They can be read,</emphasis> |
483 | producing first the device descriptor | 483 | producing first the device descriptor |
484 | (18 bytes) and then the descriptors for the current configuration. | 484 | (18 bytes) and then the descriptors for the current configuration. |
485 | See the USB 2.0 spec for details about those binary data formats. | 485 | See the USB 2.0 spec for details about those binary data formats. |
486 | You'll need to convert most multibyte values from little endian | 486 | You'll need to convert most multibyte values from little endian |
487 | format to your native host byte order, although a few of the | 487 | format to your native host byte order, although a few of the |
488 | fields in the device descriptor (both of the BCD-encoded fields, | 488 | fields in the device descriptor (both of the BCD-encoded fields, |
489 | and the vendor and product IDs) will be byteswapped for you. | 489 | and the vendor and product IDs) will be byteswapped for you. |
490 | Note that configuration descriptors include descriptors for | 490 | Note that configuration descriptors include descriptors for |
491 | interfaces, altsettings, endpoints, and maybe additional | 491 | interfaces, altsettings, endpoints, and maybe additional |
492 | class descriptors. | 492 | class descriptors. |
493 | </para> | 493 | </para> |
494 | 494 | ||
495 | <para><emphasis>Perform USB operations</emphasis> using | 495 | <para><emphasis>Perform USB operations</emphasis> using |
496 | <emphasis>ioctl()</emphasis> requests to make endpoint I/O | 496 | <emphasis>ioctl()</emphasis> requests to make endpoint I/O |
497 | requests (synchronously or asynchronously) or manage | 497 | requests (synchronously or asynchronously) or manage |
498 | the device. | 498 | the device. |
499 | These requests need the CAP_SYS_RAWIO capability, | 499 | These requests need the CAP_SYS_RAWIO capability, |
500 | as well as filesystem access permissions. | 500 | as well as filesystem access permissions. |
501 | Only one ioctl request can be made on one of these | 501 | Only one ioctl request can be made on one of these |
502 | device files at a time. | 502 | device files at a time. |
503 | This means that if you are synchronously reading an endpoint | 503 | This means that if you are synchronously reading an endpoint |
504 | from one thread, you won't be able to write to a different | 504 | from one thread, you won't be able to write to a different |
505 | endpoint from another thread until the read completes. | 505 | endpoint from another thread until the read completes. |
506 | This works for <emphasis>half duplex</emphasis> protocols, | 506 | This works for <emphasis>half duplex</emphasis> protocols, |
507 | but otherwise you'd use asynchronous i/o requests. | 507 | but otherwise you'd use asynchronous i/o requests. |
508 | </para> | 508 | </para> |
509 | 509 | ||
510 | </sect1> | 510 | </sect1> |
511 | 511 | ||
512 | 512 | ||
513 | <sect1> | 513 | <sect1> |
514 | <title>Life Cycle of User Mode Drivers</title> | 514 | <title>Life Cycle of User Mode Drivers</title> |
515 | 515 | ||
516 | <para>Such a driver first needs to find a device file | 516 | <para>Such a driver first needs to find a device file |
517 | for a device it knows how to handle. | 517 | for a device it knows how to handle. |
518 | Maybe it was told about it because a | 518 | Maybe it was told about it because a |
519 | <filename>/sbin/hotplug</filename> event handling agent | 519 | <filename>/sbin/hotplug</filename> event handling agent |
520 | chose that driver to handle the new device. | 520 | chose that driver to handle the new device. |
521 | Or maybe it's an application that scans all the | 521 | Or maybe it's an application that scans all the |
522 | /proc/bus/usb device files, and ignores most devices. | 522 | /proc/bus/usb device files, and ignores most devices. |
523 | In either case, it should <function>read()</function> all | 523 | In either case, it should <function>read()</function> all |
524 | the descriptors from the device file, | 524 | the descriptors from the device file, |
525 | and check them against what it knows how to handle. | 525 | and check them against what it knows how to handle. |
526 | It might just reject everything except a particular | 526 | It might just reject everything except a particular |
527 | vendor and product ID, or need a more complex policy. | 527 | vendor and product ID, or need a more complex policy. |
528 | </para> | 528 | </para> |
529 | 529 | ||
530 | <para>Never assume there will only be one such device | 530 | <para>Never assume there will only be one such device |
531 | on the system at a time! | 531 | on the system at a time! |
532 | If your code can't handle more than one device at | 532 | If your code can't handle more than one device at |
533 | a time, at least detect when there's more than one, and | 533 | a time, at least detect when there's more than one, and |
534 | have your users choose which device to use. | 534 | have your users choose which device to use. |
535 | </para> | 535 | </para> |
536 | 536 | ||
537 | <para>Once your user mode driver knows what device to use, | 537 | <para>Once your user mode driver knows what device to use, |
538 | it interacts with it in either of two styles. | 538 | it interacts with it in either of two styles. |
539 | The simple style is to make only control requests; some | 539 | The simple style is to make only control requests; some |
540 | devices don't need more complex interactions than those. | 540 | devices don't need more complex interactions than those. |
541 | (An example might be software using vendor-specific control | 541 | (An example might be software using vendor-specific control |
542 | requests for some initialization or configuration tasks, | 542 | requests for some initialization or configuration tasks, |
543 | with a kernel driver for the rest.) | 543 | with a kernel driver for the rest.) |
544 | </para> | 544 | </para> |
545 | 545 | ||
546 | <para>More likely, you need a more complex style driver: | 546 | <para>More likely, you need a more complex style driver: |
547 | one using non-control endpoints, reading or writing data | 547 | one using non-control endpoints, reading or writing data |
548 | and claiming exclusive use of an interface. | 548 | and claiming exclusive use of an interface. |
549 | <emphasis>Bulk</emphasis> transfers are easiest to use, | 549 | <emphasis>Bulk</emphasis> transfers are easiest to use, |
550 | but only their sibling <emphasis>interrupt</emphasis> transfers | 550 | but only their sibling <emphasis>interrupt</emphasis> transfers |
551 | work with low speed devices. | 551 | work with low speed devices. |
552 | Both interrupt and <emphasis>isochronous</emphasis> transfers | 552 | Both interrupt and <emphasis>isochronous</emphasis> transfers |
553 | offer service guarantees because their bandwidth is reserved. | 553 | offer service guarantees because their bandwidth is reserved. |
554 | Such "periodic" transfers are awkward to use through usbfs, | 554 | Such "periodic" transfers are awkward to use through usbfs, |
555 | unless you're using the asynchronous calls. However, interrupt | 555 | unless you're using the asynchronous calls. However, interrupt |
556 | transfers can also be used in a synchronous "one shot" style. | 556 | transfers can also be used in a synchronous "one shot" style. |
557 | </para> | 557 | </para> |
558 | 558 | ||
559 | <para>Your user-mode driver should never need to worry | 559 | <para>Your user-mode driver should never need to worry |
560 | about cleaning up request state when the device is | 560 | about cleaning up request state when the device is |
561 | disconnected, although it should close its open file | 561 | disconnected, although it should close its open file |
562 | descriptors as soon as it starts seeing the ENODEV | 562 | descriptors as soon as it starts seeing the ENODEV |
563 | errors. | 563 | errors. |
564 | </para> | 564 | </para> |
565 | 565 | ||
566 | </sect1> | 566 | </sect1> |
567 | 567 | ||
568 | <sect1><title>The ioctl() Requests</title> | 568 | <sect1><title>The ioctl() Requests</title> |
569 | 569 | ||
570 | <para>To use these ioctls, you need to include the following | 570 | <para>To use these ioctls, you need to include the following |
571 | headers in your userspace program: | 571 | headers in your userspace program: |
572 | <programlisting>#include <linux/usb.h> | 572 | <programlisting>#include <linux/usb.h> |
573 | #include <linux/usbdevice_fs.h> | 573 | #include <linux/usbdevice_fs.h> |
574 | #include <asm/byteorder.h></programlisting> | 574 | #include <asm/byteorder.h></programlisting> |
575 | The standard USB device model requests, from "Chapter 9" of | 575 | The standard USB device model requests, from "Chapter 9" of |
576 | the USB 2.0 specification, are automatically included from | 576 | the USB 2.0 specification, are automatically included from |
577 | the <filename><linux/usb_ch9.h></filename> header. | 577 | the <filename><linux/usb_ch9.h></filename> header. |
578 | </para> | 578 | </para> |
579 | 579 | ||
580 | <para>Unless noted otherwise, the ioctl requests | 580 | <para>Unless noted otherwise, the ioctl requests |
581 | described here will | 581 | described here will |
582 | update the modification time on the usbfs file to which | 582 | update the modification time on the usbfs file to which |
583 | they are applied (unless they fail). | 583 | they are applied (unless they fail). |
584 | A return of zero indicates success; otherwise, a | 584 | A return of zero indicates success; otherwise, a |
585 | standard USB error code is returned. (These are | 585 | standard USB error code is returned. (These are |
586 | documented in | 586 | documented in |
587 | <filename>Documentation/usb/error-codes.txt</filename> | 587 | <filename>Documentation/usb/error-codes.txt</filename> |
588 | in your kernel sources.) | 588 | in your kernel sources.) |
589 | </para> | 589 | </para> |
590 | 590 | ||
591 | <para>Each of these files multiplexes access to several | 591 | <para>Each of these files multiplexes access to several |
592 | I/O streams, one per endpoint. | 592 | I/O streams, one per endpoint. |
593 | Each device has one control endpoint (endpoint zero) | 593 | Each device has one control endpoint (endpoint zero) |
594 | which supports a limited RPC style RPC access. | 594 | which supports a limited RPC style RPC access. |
595 | Devices are configured | 595 | Devices are configured |
596 | by khubd (in the kernel) setting a device-wide | 596 | by khubd (in the kernel) setting a device-wide |
597 | <emphasis>configuration</emphasis> that affects things | 597 | <emphasis>configuration</emphasis> that affects things |
598 | like power consumption and basic functionality. | 598 | like power consumption and basic functionality. |
599 | The endpoints are part of USB <emphasis>interfaces</emphasis>, | 599 | The endpoints are part of USB <emphasis>interfaces</emphasis>, |
600 | which may have <emphasis>altsettings</emphasis> | 600 | which may have <emphasis>altsettings</emphasis> |
601 | affecting things like which endpoints are available. | 601 | affecting things like which endpoints are available. |
602 | Many devices only have a single configuration and interface, | 602 | Many devices only have a single configuration and interface, |
603 | so drivers for them will ignore configurations and altsettings. | 603 | so drivers for them will ignore configurations and altsettings. |
604 | </para> | 604 | </para> |
605 | 605 | ||
606 | 606 | ||
607 | <sect2> | 607 | <sect2> |
608 | <title>Management/Status Requests</title> | 608 | <title>Management/Status Requests</title> |
609 | 609 | ||
610 | <para>A number of usbfs requests don't deal very directly | 610 | <para>A number of usbfs requests don't deal very directly |
611 | with device I/O. | 611 | with device I/O. |
612 | They mostly relate to device management and status. | 612 | They mostly relate to device management and status. |
613 | These are all synchronous requests. | 613 | These are all synchronous requests. |
614 | </para> | 614 | </para> |
615 | 615 | ||
616 | <variablelist> | 616 | <variablelist> |
617 | 617 | ||
618 | <varlistentry><term>USBDEVFS_CLAIMINTERFACE</term> | 618 | <varlistentry><term>USBDEVFS_CLAIMINTERFACE</term> |
619 | <listitem><para>This is used to force usbfs to | 619 | <listitem><para>This is used to force usbfs to |
620 | claim a specific interface, | 620 | claim a specific interface, |
621 | which has not previously been claimed by usbfs or any other | 621 | which has not previously been claimed by usbfs or any other |
622 | kernel driver. | 622 | kernel driver. |
623 | The ioctl parameter is an integer holding the number of | 623 | The ioctl parameter is an integer holding the number of |
624 | the interface (bInterfaceNumber from descriptor). | 624 | the interface (bInterfaceNumber from descriptor). |
625 | </para><para> | 625 | </para><para> |
626 | Note that if your driver doesn't claim an interface | 626 | Note that if your driver doesn't claim an interface |
627 | before trying to use one of its endpoints, and no | 627 | before trying to use one of its endpoints, and no |
628 | other driver has bound to it, then the interface is | 628 | other driver has bound to it, then the interface is |
629 | automatically claimed by usbfs. | 629 | automatically claimed by usbfs. |
630 | </para><para> | 630 | </para><para> |
631 | This claim will be released by a RELEASEINTERFACE ioctl, | 631 | This claim will be released by a RELEASEINTERFACE ioctl, |
632 | or by closing the file descriptor. | 632 | or by closing the file descriptor. |
633 | File modification time is not updated by this request. | 633 | File modification time is not updated by this request. |
634 | </para></listitem></varlistentry> | 634 | </para></listitem></varlistentry> |
635 | 635 | ||
636 | <varlistentry><term>USBDEVFS_CONNECTINFO</term> | 636 | <varlistentry><term>USBDEVFS_CONNECTINFO</term> |
637 | <listitem><para>Says whether the device is lowspeed. | 637 | <listitem><para>Says whether the device is lowspeed. |
638 | The ioctl parameter points to a structure like this: | 638 | The ioctl parameter points to a structure like this: |
639 | <programlisting>struct usbdevfs_connectinfo { | 639 | <programlisting>struct usbdevfs_connectinfo { |
640 | unsigned int devnum; | 640 | unsigned int devnum; |
641 | unsigned char slow; | 641 | unsigned char slow; |
642 | }; </programlisting> | 642 | }; </programlisting> |
643 | File modification time is not updated by this request. | 643 | File modification time is not updated by this request. |
644 | </para><para> | 644 | </para><para> |
645 | <emphasis>You can't tell whether a "not slow" | 645 | <emphasis>You can't tell whether a "not slow" |
646 | device is connected at high speed (480 MBit/sec) | 646 | device is connected at high speed (480 MBit/sec) |
647 | or just full speed (12 MBit/sec).</emphasis> | 647 | or just full speed (12 MBit/sec).</emphasis> |
648 | You should know the devnum value already, | 648 | You should know the devnum value already, |
649 | it's the DDD value of the device file name. | 649 | it's the DDD value of the device file name. |
650 | </para></listitem></varlistentry> | 650 | </para></listitem></varlistentry> |
651 | 651 | ||
652 | <varlistentry><term>USBDEVFS_GETDRIVER</term> | 652 | <varlistentry><term>USBDEVFS_GETDRIVER</term> |
653 | <listitem><para>Returns the name of the kernel driver | 653 | <listitem><para>Returns the name of the kernel driver |
654 | bound to a given interface (a string). Parameter | 654 | bound to a given interface (a string). Parameter |
655 | is a pointer to this structure, which is modified: | 655 | is a pointer to this structure, which is modified: |
656 | <programlisting>struct usbdevfs_getdriver { | 656 | <programlisting>struct usbdevfs_getdriver { |
657 | unsigned int interface; | 657 | unsigned int interface; |
658 | char driver[USBDEVFS_MAXDRIVERNAME + 1]; | 658 | char driver[USBDEVFS_MAXDRIVERNAME + 1]; |
659 | };</programlisting> | 659 | };</programlisting> |
660 | File modification time is not updated by this request. | 660 | File modification time is not updated by this request. |
661 | </para></listitem></varlistentry> | 661 | </para></listitem></varlistentry> |
662 | 662 | ||
663 | <varlistentry><term>USBDEVFS_IOCTL</term> | 663 | <varlistentry><term>USBDEVFS_IOCTL</term> |
664 | <listitem><para>Passes a request from userspace through | 664 | <listitem><para>Passes a request from userspace through |
665 | to a kernel driver that has an ioctl entry in the | 665 | to a kernel driver that has an ioctl entry in the |
666 | <emphasis>struct usb_driver</emphasis> it registered. | 666 | <emphasis>struct usb_driver</emphasis> it registered. |
667 | <programlisting>struct usbdevfs_ioctl { | 667 | <programlisting>struct usbdevfs_ioctl { |
668 | int ifno; | 668 | int ifno; |
669 | int ioctl_code; | 669 | int ioctl_code; |
670 | void *data; | 670 | void *data; |
671 | }; | 671 | }; |
672 | 672 | ||
673 | /* user mode call looks like this. | 673 | /* user mode call looks like this. |
674 | * 'request' becomes the driver->ioctl() 'code' parameter. | 674 | * 'request' becomes the driver->ioctl() 'code' parameter. |
675 | * the size of 'param' is encoded in 'request', and that data | 675 | * the size of 'param' is encoded in 'request', and that data |
676 | * is copied to or from the driver->ioctl() 'buf' parameter. | 676 | * is copied to or from the driver->ioctl() 'buf' parameter. |
677 | */ | 677 | */ |
678 | static int | 678 | static int |
679 | usbdev_ioctl (int fd, int ifno, unsigned request, void *param) | 679 | usbdev_ioctl (int fd, int ifno, unsigned request, void *param) |
680 | { | 680 | { |
681 | struct usbdevfs_ioctl wrapper; | 681 | struct usbdevfs_ioctl wrapper; |
682 | 682 | ||
683 | wrapper.ifno = ifno; | 683 | wrapper.ifno = ifno; |
684 | wrapper.ioctl_code = request; | 684 | wrapper.ioctl_code = request; |
685 | wrapper.data = param; | 685 | wrapper.data = param; |
686 | 686 | ||
687 | return ioctl (fd, USBDEVFS_IOCTL, &wrapper); | 687 | return ioctl (fd, USBDEVFS_IOCTL, &wrapper); |
688 | } </programlisting> | 688 | } </programlisting> |
689 | File modification time is not updated by this request. | 689 | File modification time is not updated by this request. |
690 | </para><para> | 690 | </para><para> |
691 | This request lets kernel drivers talk to user mode code | 691 | This request lets kernel drivers talk to user mode code |
692 | through filesystem operations even when they don't create | 692 | through filesystem operations even when they don't create |
693 | a charactor or block special device. | 693 | a charactor or block special device. |
694 | It's also been used to do things like ask devices what | 694 | It's also been used to do things like ask devices what |
695 | device special file should be used. | 695 | device special file should be used. |
696 | Two pre-defined ioctls are used | 696 | Two pre-defined ioctls are used |
697 | to disconnect and reconnect kernel drivers, so | 697 | to disconnect and reconnect kernel drivers, so |
698 | that user mode code can completely manage binding | 698 | that user mode code can completely manage binding |
699 | and configuration of devices. | 699 | and configuration of devices. |
700 | </para></listitem></varlistentry> | 700 | </para></listitem></varlistentry> |
701 | 701 | ||
702 | <varlistentry><term>USBDEVFS_RELEASEINTERFACE</term> | 702 | <varlistentry><term>USBDEVFS_RELEASEINTERFACE</term> |
703 | <listitem><para>This is used to release the claim usbfs | 703 | <listitem><para>This is used to release the claim usbfs |
704 | made on interface, either implicitly or because of a | 704 | made on interface, either implicitly or because of a |
705 | USBDEVFS_CLAIMINTERFACE call, before the file | 705 | USBDEVFS_CLAIMINTERFACE call, before the file |
706 | descriptor is closed. | 706 | descriptor is closed. |
707 | The ioctl parameter is an integer holding the number of | 707 | The ioctl parameter is an integer holding the number of |
708 | the interface (bInterfaceNumber from descriptor); | 708 | the interface (bInterfaceNumber from descriptor); |
709 | File modification time is not updated by this request. | 709 | File modification time is not updated by this request. |
710 | </para><warning><para> | 710 | </para><warning><para> |
711 | <emphasis>No security check is made to ensure | 711 | <emphasis>No security check is made to ensure |
712 | that the task which made the claim is the one | 712 | that the task which made the claim is the one |
713 | which is releasing it. | 713 | which is releasing it. |
714 | This means that user mode driver may interfere | 714 | This means that user mode driver may interfere |
715 | other ones. </emphasis> | 715 | other ones. </emphasis> |
716 | </para></warning></listitem></varlistentry> | 716 | </para></warning></listitem></varlistentry> |
717 | 717 | ||
718 | <varlistentry><term>USBDEVFS_RESETEP</term> | 718 | <varlistentry><term>USBDEVFS_RESETEP</term> |
719 | <listitem><para>Resets the data toggle value for an endpoint | 719 | <listitem><para>Resets the data toggle value for an endpoint |
720 | (bulk or interrupt) to DATA0. | 720 | (bulk or interrupt) to DATA0. |
721 | The ioctl parameter is an integer endpoint number | 721 | The ioctl parameter is an integer endpoint number |
722 | (1 to 15, as identified in the endpoint descriptor), | 722 | (1 to 15, as identified in the endpoint descriptor), |
723 | with USB_DIR_IN added if the device's endpoint sends | 723 | with USB_DIR_IN added if the device's endpoint sends |
724 | data to the host. | 724 | data to the host. |
725 | </para><warning><para> | 725 | </para><warning><para> |
726 | <emphasis>Avoid using this request. | 726 | <emphasis>Avoid using this request. |
727 | It should probably be removed.</emphasis> | 727 | It should probably be removed.</emphasis> |
728 | Using it typically means the device and driver will lose | 728 | Using it typically means the device and driver will lose |
729 | toggle synchronization. If you really lost synchronization, | 729 | toggle synchronization. If you really lost synchronization, |
730 | you likely need to completely handshake with the device, | 730 | you likely need to completely handshake with the device, |
731 | using a request like CLEAR_HALT | 731 | using a request like CLEAR_HALT |
732 | or SET_INTERFACE. | 732 | or SET_INTERFACE. |
733 | </para></warning></listitem></varlistentry> | 733 | </para></warning></listitem></varlistentry> |
734 | 734 | ||
735 | </variablelist> | 735 | </variablelist> |
736 | 736 | ||
737 | </sect2> | 737 | </sect2> |
738 | 738 | ||
739 | <sect2> | 739 | <sect2> |
740 | <title>Synchronous I/O Support</title> | 740 | <title>Synchronous I/O Support</title> |
741 | 741 | ||
742 | <para>Synchronous requests involve the kernel blocking | 742 | <para>Synchronous requests involve the kernel blocking |
743 | until until the user mode request completes, either by | 743 | until the user mode request completes, either by |
744 | finishing successfully or by reporting an error. | 744 | finishing successfully or by reporting an error. |
745 | In most cases this is the simplest way to use usbfs, | 745 | In most cases this is the simplest way to use usbfs, |
746 | although as noted above it does prevent performing I/O | 746 | although as noted above it does prevent performing I/O |
747 | to more than one endpoint at a time. | 747 | to more than one endpoint at a time. |
748 | </para> | 748 | </para> |
749 | 749 | ||
750 | <variablelist> | 750 | <variablelist> |
751 | 751 | ||
752 | <varlistentry><term>USBDEVFS_BULK</term> | 752 | <varlistentry><term>USBDEVFS_BULK</term> |
753 | <listitem><para>Issues a bulk read or write request to the | 753 | <listitem><para>Issues a bulk read or write request to the |
754 | device. | 754 | device. |
755 | The ioctl parameter is a pointer to this structure: | 755 | The ioctl parameter is a pointer to this structure: |
756 | <programlisting>struct usbdevfs_bulktransfer { | 756 | <programlisting>struct usbdevfs_bulktransfer { |
757 | unsigned int ep; | 757 | unsigned int ep; |
758 | unsigned int len; | 758 | unsigned int len; |
759 | unsigned int timeout; /* in milliseconds */ | 759 | unsigned int timeout; /* in milliseconds */ |
760 | void *data; | 760 | void *data; |
761 | };</programlisting> | 761 | };</programlisting> |
762 | </para><para>The "ep" value identifies a | 762 | </para><para>The "ep" value identifies a |
763 | bulk endpoint number (1 to 15, as identified in an endpoint | 763 | bulk endpoint number (1 to 15, as identified in an endpoint |
764 | descriptor), | 764 | descriptor), |
765 | masked with USB_DIR_IN when referring to an endpoint which | 765 | masked with USB_DIR_IN when referring to an endpoint which |
766 | sends data to the host from the device. | 766 | sends data to the host from the device. |
767 | The length of the data buffer is identified by "len"; | 767 | The length of the data buffer is identified by "len"; |
768 | Recent kernels support requests up to about 128KBytes. | 768 | Recent kernels support requests up to about 128KBytes. |
769 | <emphasis>FIXME say how read length is returned, | 769 | <emphasis>FIXME say how read length is returned, |
770 | and how short reads are handled.</emphasis>. | 770 | and how short reads are handled.</emphasis>. |
771 | </para></listitem></varlistentry> | 771 | </para></listitem></varlistentry> |
772 | 772 | ||
773 | <varlistentry><term>USBDEVFS_CLEAR_HALT</term> | 773 | <varlistentry><term>USBDEVFS_CLEAR_HALT</term> |
774 | <listitem><para>Clears endpoint halt (stall) and | 774 | <listitem><para>Clears endpoint halt (stall) and |
775 | resets the endpoint toggle. This is only | 775 | resets the endpoint toggle. This is only |
776 | meaningful for bulk or interrupt endpoints. | 776 | meaningful for bulk or interrupt endpoints. |
777 | The ioctl parameter is an integer endpoint number | 777 | The ioctl parameter is an integer endpoint number |
778 | (1 to 15, as identified in an endpoint descriptor), | 778 | (1 to 15, as identified in an endpoint descriptor), |
779 | masked with USB_DIR_IN when referring to an endpoint which | 779 | masked with USB_DIR_IN when referring to an endpoint which |
780 | sends data to the host from the device. | 780 | sends data to the host from the device. |
781 | </para><para> | 781 | </para><para> |
782 | Use this on bulk or interrupt endpoints which have | 782 | Use this on bulk or interrupt endpoints which have |
783 | stalled, returning <emphasis>-EPIPE</emphasis> status | 783 | stalled, returning <emphasis>-EPIPE</emphasis> status |
784 | to a data transfer request. | 784 | to a data transfer request. |
785 | Do not issue the control request directly, since | 785 | Do not issue the control request directly, since |
786 | that could invalidate the host's record of the | 786 | that could invalidate the host's record of the |
787 | data toggle. | 787 | data toggle. |
788 | </para></listitem></varlistentry> | 788 | </para></listitem></varlistentry> |
789 | 789 | ||
790 | <varlistentry><term>USBDEVFS_CONTROL</term> | 790 | <varlistentry><term>USBDEVFS_CONTROL</term> |
791 | <listitem><para>Issues a control request to the device. | 791 | <listitem><para>Issues a control request to the device. |
792 | The ioctl parameter points to a structure like this: | 792 | The ioctl parameter points to a structure like this: |
793 | <programlisting>struct usbdevfs_ctrltransfer { | 793 | <programlisting>struct usbdevfs_ctrltransfer { |
794 | __u8 bRequestType; | 794 | __u8 bRequestType; |
795 | __u8 bRequest; | 795 | __u8 bRequest; |
796 | __u16 wValue; | 796 | __u16 wValue; |
797 | __u16 wIndex; | 797 | __u16 wIndex; |
798 | __u16 wLength; | 798 | __u16 wLength; |
799 | __u32 timeout; /* in milliseconds */ | 799 | __u32 timeout; /* in milliseconds */ |
800 | void *data; | 800 | void *data; |
801 | };</programlisting> | 801 | };</programlisting> |
802 | </para><para> | 802 | </para><para> |
803 | The first eight bytes of this structure are the contents | 803 | The first eight bytes of this structure are the contents |
804 | of the SETUP packet to be sent to the device; see the | 804 | of the SETUP packet to be sent to the device; see the |
805 | USB 2.0 specification for details. | 805 | USB 2.0 specification for details. |
806 | The bRequestType value is composed by combining a | 806 | The bRequestType value is composed by combining a |
807 | USB_TYPE_* value, a USB_DIR_* value, and a | 807 | USB_TYPE_* value, a USB_DIR_* value, and a |
808 | USB_RECIP_* value (from | 808 | USB_RECIP_* value (from |
809 | <emphasis><linux/usb.h></emphasis>). | 809 | <emphasis><linux/usb.h></emphasis>). |
810 | If wLength is nonzero, it describes the length of the data | 810 | If wLength is nonzero, it describes the length of the data |
811 | buffer, which is either written to the device | 811 | buffer, which is either written to the device |
812 | (USB_DIR_OUT) or read from the device (USB_DIR_IN). | 812 | (USB_DIR_OUT) or read from the device (USB_DIR_IN). |
813 | </para><para> | 813 | </para><para> |
814 | At this writing, you can't transfer more than 4 KBytes | 814 | At this writing, you can't transfer more than 4 KBytes |
815 | of data to or from a device; usbfs has a limit, and | 815 | of data to or from a device; usbfs has a limit, and |
816 | some host controller drivers have a limit. | 816 | some host controller drivers have a limit. |
817 | (That's not usually a problem.) | 817 | (That's not usually a problem.) |
818 | <emphasis>Also</emphasis> there's no way to say it's | 818 | <emphasis>Also</emphasis> there's no way to say it's |
819 | not OK to get a short read back from the device. | 819 | not OK to get a short read back from the device. |
820 | </para></listitem></varlistentry> | 820 | </para></listitem></varlistentry> |
821 | 821 | ||
822 | <varlistentry><term>USBDEVFS_RESET</term> | 822 | <varlistentry><term>USBDEVFS_RESET</term> |
823 | <listitem><para>Does a USB level device reset. | 823 | <listitem><para>Does a USB level device reset. |
824 | The ioctl parameter is ignored. | 824 | The ioctl parameter is ignored. |
825 | After the reset, this rebinds all device interfaces. | 825 | After the reset, this rebinds all device interfaces. |
826 | File modification time is not updated by this request. | 826 | File modification time is not updated by this request. |
827 | </para><warning><para> | 827 | </para><warning><para> |
828 | <emphasis>Avoid using this call</emphasis> | 828 | <emphasis>Avoid using this call</emphasis> |
829 | until some usbcore bugs get fixed, | 829 | until some usbcore bugs get fixed, |
830 | since it does not fully synchronize device, interface, | 830 | since it does not fully synchronize device, interface, |
831 | and driver (not just usbfs) state. | 831 | and driver (not just usbfs) state. |
832 | </para></warning></listitem></varlistentry> | 832 | </para></warning></listitem></varlistentry> |
833 | 833 | ||
834 | <varlistentry><term>USBDEVFS_SETINTERFACE</term> | 834 | <varlistentry><term>USBDEVFS_SETINTERFACE</term> |
835 | <listitem><para>Sets the alternate setting for an | 835 | <listitem><para>Sets the alternate setting for an |
836 | interface. The ioctl parameter is a pointer to a | 836 | interface. The ioctl parameter is a pointer to a |
837 | structure like this: | 837 | structure like this: |
838 | <programlisting>struct usbdevfs_setinterface { | 838 | <programlisting>struct usbdevfs_setinterface { |
839 | unsigned int interface; | 839 | unsigned int interface; |
840 | unsigned int altsetting; | 840 | unsigned int altsetting; |
841 | }; </programlisting> | 841 | }; </programlisting> |
842 | File modification time is not updated by this request. | 842 | File modification time is not updated by this request. |
843 | </para><para> | 843 | </para><para> |
844 | Those struct members are from some interface descriptor | 844 | Those struct members are from some interface descriptor |
845 | applying to the current configuration. | 845 | applying to the current configuration. |
846 | The interface number is the bInterfaceNumber value, and | 846 | The interface number is the bInterfaceNumber value, and |
847 | the altsetting number is the bAlternateSetting value. | 847 | the altsetting number is the bAlternateSetting value. |
848 | (This resets each endpoint in the interface.) | 848 | (This resets each endpoint in the interface.) |
849 | </para></listitem></varlistentry> | 849 | </para></listitem></varlistentry> |
850 | 850 | ||
851 | <varlistentry><term>USBDEVFS_SETCONFIGURATION</term> | 851 | <varlistentry><term>USBDEVFS_SETCONFIGURATION</term> |
852 | <listitem><para>Issues the | 852 | <listitem><para>Issues the |
853 | <function>usb_set_configuration</function> call | 853 | <function>usb_set_configuration</function> call |
854 | for the device. | 854 | for the device. |
855 | The parameter is an integer holding the number of | 855 | The parameter is an integer holding the number of |
856 | a configuration (bConfigurationValue from descriptor). | 856 | a configuration (bConfigurationValue from descriptor). |
857 | File modification time is not updated by this request. | 857 | File modification time is not updated by this request. |
858 | </para><warning><para> | 858 | </para><warning><para> |
859 | <emphasis>Avoid using this call</emphasis> | 859 | <emphasis>Avoid using this call</emphasis> |
860 | until some usbcore bugs get fixed, | 860 | until some usbcore bugs get fixed, |
861 | since it does not fully synchronize device, interface, | 861 | since it does not fully synchronize device, interface, |
862 | and driver (not just usbfs) state. | 862 | and driver (not just usbfs) state. |
863 | </para></warning></listitem></varlistentry> | 863 | </para></warning></listitem></varlistentry> |
864 | 864 | ||
865 | </variablelist> | 865 | </variablelist> |
866 | </sect2> | 866 | </sect2> |
867 | 867 | ||
868 | <sect2> | 868 | <sect2> |
869 | <title>Asynchronous I/O Support</title> | 869 | <title>Asynchronous I/O Support</title> |
870 | 870 | ||
871 | <para>As mentioned above, there are situations where it may be | 871 | <para>As mentioned above, there are situations where it may be |
872 | important to initiate concurrent operations from user mode code. | 872 | important to initiate concurrent operations from user mode code. |
873 | This is particularly important for periodic transfers | 873 | This is particularly important for periodic transfers |
874 | (interrupt and isochronous), but it can be used for other | 874 | (interrupt and isochronous), but it can be used for other |
875 | kinds of USB requests too. | 875 | kinds of USB requests too. |
876 | In such cases, the asynchronous requests described here | 876 | In such cases, the asynchronous requests described here |
877 | are essential. Rather than submitting one request and having | 877 | are essential. Rather than submitting one request and having |
878 | the kernel block until it completes, the blocking is separate. | 878 | the kernel block until it completes, the blocking is separate. |
879 | </para> | 879 | </para> |
880 | 880 | ||
881 | <para>These requests are packaged into a structure that | 881 | <para>These requests are packaged into a structure that |
882 | resembles the URB used by kernel device drivers. | 882 | resembles the URB used by kernel device drivers. |
883 | (No POSIX Async I/O support here, sorry.) | 883 | (No POSIX Async I/O support here, sorry.) |
884 | It identifies the endpoint type (USBDEVFS_URB_TYPE_*), | 884 | It identifies the endpoint type (USBDEVFS_URB_TYPE_*), |
885 | endpoint (number, masked with USB_DIR_IN as appropriate), | 885 | endpoint (number, masked with USB_DIR_IN as appropriate), |
886 | buffer and length, and a user "context" value serving to | 886 | buffer and length, and a user "context" value serving to |
887 | uniquely identify each request. | 887 | uniquely identify each request. |
888 | (It's usually a pointer to per-request data.) | 888 | (It's usually a pointer to per-request data.) |
889 | Flags can modify requests (not as many as supported for | 889 | Flags can modify requests (not as many as supported for |
890 | kernel drivers). | 890 | kernel drivers). |
891 | </para> | 891 | </para> |
892 | 892 | ||
893 | <para>Each request can specify a realtime signal number | 893 | <para>Each request can specify a realtime signal number |
894 | (between SIGRTMIN and SIGRTMAX, inclusive) to request a | 894 | (between SIGRTMIN and SIGRTMAX, inclusive) to request a |
895 | signal be sent when the request completes. | 895 | signal be sent when the request completes. |
896 | </para> | 896 | </para> |
897 | 897 | ||
898 | <para>When usbfs returns these urbs, the status value | 898 | <para>When usbfs returns these urbs, the status value |
899 | is updated, and the buffer may have been modified. | 899 | is updated, and the buffer may have been modified. |
900 | Except for isochronous transfers, the actual_length is | 900 | Except for isochronous transfers, the actual_length is |
901 | updated to say how many bytes were transferred; if the | 901 | updated to say how many bytes were transferred; if the |
902 | USBDEVFS_URB_DISABLE_SPD flag is set | 902 | USBDEVFS_URB_DISABLE_SPD flag is set |
903 | ("short packets are not OK"), if fewer bytes were read | 903 | ("short packets are not OK"), if fewer bytes were read |
904 | than were requested then you get an error report. | 904 | than were requested then you get an error report. |
905 | </para> | 905 | </para> |
906 | 906 | ||
907 | <programlisting>struct usbdevfs_iso_packet_desc { | 907 | <programlisting>struct usbdevfs_iso_packet_desc { |
908 | unsigned int length; | 908 | unsigned int length; |
909 | unsigned int actual_length; | 909 | unsigned int actual_length; |
910 | unsigned int status; | 910 | unsigned int status; |
911 | }; | 911 | }; |
912 | 912 | ||
913 | struct usbdevfs_urb { | 913 | struct usbdevfs_urb { |
914 | unsigned char type; | 914 | unsigned char type; |
915 | unsigned char endpoint; | 915 | unsigned char endpoint; |
916 | int status; | 916 | int status; |
917 | unsigned int flags; | 917 | unsigned int flags; |
918 | void *buffer; | 918 | void *buffer; |
919 | int buffer_length; | 919 | int buffer_length; |
920 | int actual_length; | 920 | int actual_length; |
921 | int start_frame; | 921 | int start_frame; |
922 | int number_of_packets; | 922 | int number_of_packets; |
923 | int error_count; | 923 | int error_count; |
924 | unsigned int signr; | 924 | unsigned int signr; |
925 | void *usercontext; | 925 | void *usercontext; |
926 | struct usbdevfs_iso_packet_desc iso_frame_desc[]; | 926 | struct usbdevfs_iso_packet_desc iso_frame_desc[]; |
927 | };</programlisting> | 927 | };</programlisting> |
928 | 928 | ||
929 | <para> For these asynchronous requests, the file modification | 929 | <para> For these asynchronous requests, the file modification |
930 | time reflects when the request was initiated. | 930 | time reflects when the request was initiated. |
931 | This contrasts with their use with the synchronous requests, | 931 | This contrasts with their use with the synchronous requests, |
932 | where it reflects when requests complete. | 932 | where it reflects when requests complete. |
933 | </para> | 933 | </para> |
934 | 934 | ||
935 | <variablelist> | 935 | <variablelist> |
936 | 936 | ||
937 | <varlistentry><term>USBDEVFS_DISCARDURB</term> | 937 | <varlistentry><term>USBDEVFS_DISCARDURB</term> |
938 | <listitem><para> | 938 | <listitem><para> |
939 | <emphasis>TBS</emphasis> | 939 | <emphasis>TBS</emphasis> |
940 | File modification time is not updated by this request. | 940 | File modification time is not updated by this request. |
941 | </para><para> | 941 | </para><para> |
942 | </para></listitem></varlistentry> | 942 | </para></listitem></varlistentry> |
943 | 943 | ||
944 | <varlistentry><term>USBDEVFS_DISCSIGNAL</term> | 944 | <varlistentry><term>USBDEVFS_DISCSIGNAL</term> |
945 | <listitem><para> | 945 | <listitem><para> |
946 | <emphasis>TBS</emphasis> | 946 | <emphasis>TBS</emphasis> |
947 | File modification time is not updated by this request. | 947 | File modification time is not updated by this request. |
948 | </para><para> | 948 | </para><para> |
949 | </para></listitem></varlistentry> | 949 | </para></listitem></varlistentry> |
950 | 950 | ||
951 | <varlistentry><term>USBDEVFS_REAPURB</term> | 951 | <varlistentry><term>USBDEVFS_REAPURB</term> |
952 | <listitem><para> | 952 | <listitem><para> |
953 | <emphasis>TBS</emphasis> | 953 | <emphasis>TBS</emphasis> |
954 | File modification time is not updated by this request. | 954 | File modification time is not updated by this request. |
955 | </para><para> | 955 | </para><para> |
956 | </para></listitem></varlistentry> | 956 | </para></listitem></varlistentry> |
957 | 957 | ||
958 | <varlistentry><term>USBDEVFS_REAPURBNDELAY</term> | 958 | <varlistentry><term>USBDEVFS_REAPURBNDELAY</term> |
959 | <listitem><para> | 959 | <listitem><para> |
960 | <emphasis>TBS</emphasis> | 960 | <emphasis>TBS</emphasis> |
961 | File modification time is not updated by this request. | 961 | File modification time is not updated by this request. |
962 | </para><para> | 962 | </para><para> |
963 | </para></listitem></varlistentry> | 963 | </para></listitem></varlistentry> |
964 | 964 | ||
965 | <varlistentry><term>USBDEVFS_SUBMITURB</term> | 965 | <varlistentry><term>USBDEVFS_SUBMITURB</term> |
966 | <listitem><para> | 966 | <listitem><para> |
967 | <emphasis>TBS</emphasis> | 967 | <emphasis>TBS</emphasis> |
968 | </para><para> | 968 | </para><para> |
969 | </para></listitem></varlistentry> | 969 | </para></listitem></varlistentry> |
970 | 970 | ||
971 | </variablelist> | 971 | </variablelist> |
972 | </sect2> | 972 | </sect2> |
973 | 973 | ||
974 | </sect1> | 974 | </sect1> |
975 | 975 | ||
976 | </chapter> | 976 | </chapter> |
977 | 977 | ||
978 | </book> | 978 | </book> |
979 | <!-- vim:syntax=sgml:sw=4 | 979 | <!-- vim:syntax=sgml:sw=4 |
980 | --> | 980 | --> |
981 | 981 |
Documentation/RCU/whatisRCU.txt
1 | What is RCU? | 1 | What is RCU? |
2 | 2 | ||
3 | RCU is a synchronization mechanism that was added to the Linux kernel | 3 | RCU is a synchronization mechanism that was added to the Linux kernel |
4 | during the 2.5 development effort that is optimized for read-mostly | 4 | during the 2.5 development effort that is optimized for read-mostly |
5 | situations. Although RCU is actually quite simple once you understand it, | 5 | situations. Although RCU is actually quite simple once you understand it, |
6 | getting there can sometimes be a challenge. Part of the problem is that | 6 | getting there can sometimes be a challenge. Part of the problem is that |
7 | most of the past descriptions of RCU have been written with the mistaken | 7 | most of the past descriptions of RCU have been written with the mistaken |
8 | assumption that there is "one true way" to describe RCU. Instead, | 8 | assumption that there is "one true way" to describe RCU. Instead, |
9 | the experience has been that different people must take different paths | 9 | the experience has been that different people must take different paths |
10 | to arrive at an understanding of RCU. This document provides several | 10 | to arrive at an understanding of RCU. This document provides several |
11 | different paths, as follows: | 11 | different paths, as follows: |
12 | 12 | ||
13 | 1. RCU OVERVIEW | 13 | 1. RCU OVERVIEW |
14 | 2. WHAT IS RCU'S CORE API? | 14 | 2. WHAT IS RCU'S CORE API? |
15 | 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? | 15 | 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? |
16 | 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? | 16 | 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? |
17 | 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? | 17 | 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? |
18 | 6. ANALOGY WITH READER-WRITER LOCKING | 18 | 6. ANALOGY WITH READER-WRITER LOCKING |
19 | 7. FULL LIST OF RCU APIs | 19 | 7. FULL LIST OF RCU APIs |
20 | 8. ANSWERS TO QUICK QUIZZES | 20 | 8. ANSWERS TO QUICK QUIZZES |
21 | 21 | ||
22 | People who prefer starting with a conceptual overview should focus on | 22 | People who prefer starting with a conceptual overview should focus on |
23 | Section 1, though most readers will profit by reading this section at | 23 | Section 1, though most readers will profit by reading this section at |
24 | some point. People who prefer to start with an API that they can then | 24 | some point. People who prefer to start with an API that they can then |
25 | experiment with should focus on Section 2. People who prefer to start | 25 | experiment with should focus on Section 2. People who prefer to start |
26 | with example uses should focus on Sections 3 and 4. People who need to | 26 | with example uses should focus on Sections 3 and 4. People who need to |
27 | understand the RCU implementation should focus on Section 5, then dive | 27 | understand the RCU implementation should focus on Section 5, then dive |
28 | into the kernel source code. People who reason best by analogy should | 28 | into the kernel source code. People who reason best by analogy should |
29 | focus on Section 6. Section 7 serves as an index to the docbook API | 29 | focus on Section 6. Section 7 serves as an index to the docbook API |
30 | documentation, and Section 8 is the traditional answer key. | 30 | documentation, and Section 8 is the traditional answer key. |
31 | 31 | ||
32 | So, start with the section that makes the most sense to you and your | 32 | So, start with the section that makes the most sense to you and your |
33 | preferred method of learning. If you need to know everything about | 33 | preferred method of learning. If you need to know everything about |
34 | everything, feel free to read the whole thing -- but if you are really | 34 | everything, feel free to read the whole thing -- but if you are really |
35 | that type of person, you have perused the source code and will therefore | 35 | that type of person, you have perused the source code and will therefore |
36 | never need this document anyway. ;-) | 36 | never need this document anyway. ;-) |
37 | 37 | ||
38 | 38 | ||
39 | 1. RCU OVERVIEW | 39 | 1. RCU OVERVIEW |
40 | 40 | ||
41 | The basic idea behind RCU is to split updates into "removal" and | 41 | The basic idea behind RCU is to split updates into "removal" and |
42 | "reclamation" phases. The removal phase removes references to data items | 42 | "reclamation" phases. The removal phase removes references to data items |
43 | within a data structure (possibly by replacing them with references to | 43 | within a data structure (possibly by replacing them with references to |
44 | new versions of these data items), and can run concurrently with readers. | 44 | new versions of these data items), and can run concurrently with readers. |
45 | The reason that it is safe to run the removal phase concurrently with | 45 | The reason that it is safe to run the removal phase concurrently with |
46 | readers is the semantics of modern CPUs guarantee that readers will see | 46 | readers is the semantics of modern CPUs guarantee that readers will see |
47 | either the old or the new version of the data structure rather than a | 47 | either the old or the new version of the data structure rather than a |
48 | partially updated reference. The reclamation phase does the work of reclaiming | 48 | partially updated reference. The reclamation phase does the work of reclaiming |
49 | (e.g., freeing) the data items removed from the data structure during the | 49 | (e.g., freeing) the data items removed from the data structure during the |
50 | removal phase. Because reclaiming data items can disrupt any readers | 50 | removal phase. Because reclaiming data items can disrupt any readers |
51 | concurrently referencing those data items, the reclamation phase must | 51 | concurrently referencing those data items, the reclamation phase must |
52 | not start until readers no longer hold references to those data items. | 52 | not start until readers no longer hold references to those data items. |
53 | 53 | ||
54 | Splitting the update into removal and reclamation phases permits the | 54 | Splitting the update into removal and reclamation phases permits the |
55 | updater to perform the removal phase immediately, and to defer the | 55 | updater to perform the removal phase immediately, and to defer the |
56 | reclamation phase until all readers active during the removal phase have | 56 | reclamation phase until all readers active during the removal phase have |
57 | completed, either by blocking until they finish or by registering a | 57 | completed, either by blocking until they finish or by registering a |
58 | callback that is invoked after they finish. Only readers that are active | 58 | callback that is invoked after they finish. Only readers that are active |
59 | during the removal phase need be considered, because any reader starting | 59 | during the removal phase need be considered, because any reader starting |
60 | after the removal phase will be unable to gain a reference to the removed | 60 | after the removal phase will be unable to gain a reference to the removed |
61 | data items, and therefore cannot be disrupted by the reclamation phase. | 61 | data items, and therefore cannot be disrupted by the reclamation phase. |
62 | 62 | ||
63 | So the typical RCU update sequence goes something like the following: | 63 | So the typical RCU update sequence goes something like the following: |
64 | 64 | ||
65 | a. Remove pointers to a data structure, so that subsequent | 65 | a. Remove pointers to a data structure, so that subsequent |
66 | readers cannot gain a reference to it. | 66 | readers cannot gain a reference to it. |
67 | 67 | ||
68 | b. Wait for all previous readers to complete their RCU read-side | 68 | b. Wait for all previous readers to complete their RCU read-side |
69 | critical sections. | 69 | critical sections. |
70 | 70 | ||
71 | c. At this point, there cannot be any readers who hold references | 71 | c. At this point, there cannot be any readers who hold references |
72 | to the data structure, so it now may safely be reclaimed | 72 | to the data structure, so it now may safely be reclaimed |
73 | (e.g., kfree()d). | 73 | (e.g., kfree()d). |
74 | 74 | ||
75 | Step (b) above is the key idea underlying RCU's deferred destruction. | 75 | Step (b) above is the key idea underlying RCU's deferred destruction. |
76 | The ability to wait until all readers are done allows RCU readers to | 76 | The ability to wait until all readers are done allows RCU readers to |
77 | use much lighter-weight synchronization, in some cases, absolutely no | 77 | use much lighter-weight synchronization, in some cases, absolutely no |
78 | synchronization at all. In contrast, in more conventional lock-based | 78 | synchronization at all. In contrast, in more conventional lock-based |
79 | schemes, readers must use heavy-weight synchronization in order to | 79 | schemes, readers must use heavy-weight synchronization in order to |
80 | prevent an updater from deleting the data structure out from under them. | 80 | prevent an updater from deleting the data structure out from under them. |
81 | This is because lock-based updaters typically update data items in place, | 81 | This is because lock-based updaters typically update data items in place, |
82 | and must therefore exclude readers. In contrast, RCU-based updaters | 82 | and must therefore exclude readers. In contrast, RCU-based updaters |
83 | typically take advantage of the fact that writes to single aligned | 83 | typically take advantage of the fact that writes to single aligned |
84 | pointers are atomic on modern CPUs, allowing atomic insertion, removal, | 84 | pointers are atomic on modern CPUs, allowing atomic insertion, removal, |
85 | and replacement of data items in a linked structure without disrupting | 85 | and replacement of data items in a linked structure without disrupting |
86 | readers. Concurrent RCU readers can then continue accessing the old | 86 | readers. Concurrent RCU readers can then continue accessing the old |
87 | versions, and can dispense with the atomic operations, memory barriers, | 87 | versions, and can dispense with the atomic operations, memory barriers, |
88 | and communications cache misses that are so expensive on present-day | 88 | and communications cache misses that are so expensive on present-day |
89 | SMP computer systems, even in absence of lock contention. | 89 | SMP computer systems, even in absence of lock contention. |
90 | 90 | ||
91 | In the three-step procedure shown above, the updater is performing both | 91 | In the three-step procedure shown above, the updater is performing both |
92 | the removal and the reclamation step, but it is often helpful for an | 92 | the removal and the reclamation step, but it is often helpful for an |
93 | entirely different thread to do the reclamation, as is in fact the case | 93 | entirely different thread to do the reclamation, as is in fact the case |
94 | in the Linux kernel's directory-entry cache (dcache). Even if the same | 94 | in the Linux kernel's directory-entry cache (dcache). Even if the same |
95 | thread performs both the update step (step (a) above) and the reclamation | 95 | thread performs both the update step (step (a) above) and the reclamation |
96 | step (step (c) above), it is often helpful to think of them separately. | 96 | step (step (c) above), it is often helpful to think of them separately. |
97 | For example, RCU readers and updaters need not communicate at all, | 97 | For example, RCU readers and updaters need not communicate at all, |
98 | but RCU provides implicit low-overhead communication between readers | 98 | but RCU provides implicit low-overhead communication between readers |
99 | and reclaimers, namely, in step (b) above. | 99 | and reclaimers, namely, in step (b) above. |
100 | 100 | ||
101 | So how the heck can a reclaimer tell when a reader is done, given | 101 | So how the heck can a reclaimer tell when a reader is done, given |
102 | that readers are not doing any sort of synchronization operations??? | 102 | that readers are not doing any sort of synchronization operations??? |
103 | Read on to learn about how RCU's API makes this easy. | 103 | Read on to learn about how RCU's API makes this easy. |
104 | 104 | ||
105 | 105 | ||
106 | 2. WHAT IS RCU'S CORE API? | 106 | 2. WHAT IS RCU'S CORE API? |
107 | 107 | ||
108 | The core RCU API is quite small: | 108 | The core RCU API is quite small: |
109 | 109 | ||
110 | a. rcu_read_lock() | 110 | a. rcu_read_lock() |
111 | b. rcu_read_unlock() | 111 | b. rcu_read_unlock() |
112 | c. synchronize_rcu() / call_rcu() | 112 | c. synchronize_rcu() / call_rcu() |
113 | d. rcu_assign_pointer() | 113 | d. rcu_assign_pointer() |
114 | e. rcu_dereference() | 114 | e. rcu_dereference() |
115 | 115 | ||
116 | There are many other members of the RCU API, but the rest can be | 116 | There are many other members of the RCU API, but the rest can be |
117 | expressed in terms of these five, though most implementations instead | 117 | expressed in terms of these five, though most implementations instead |
118 | express synchronize_rcu() in terms of the call_rcu() callback API. | 118 | express synchronize_rcu() in terms of the call_rcu() callback API. |
119 | 119 | ||
120 | The five core RCU APIs are described below, the other 18 will be enumerated | 120 | The five core RCU APIs are described below, the other 18 will be enumerated |
121 | later. See the kernel docbook documentation for more info, or look directly | 121 | later. See the kernel docbook documentation for more info, or look directly |
122 | at the function header comments. | 122 | at the function header comments. |
123 | 123 | ||
124 | rcu_read_lock() | 124 | rcu_read_lock() |
125 | 125 | ||
126 | void rcu_read_lock(void); | 126 | void rcu_read_lock(void); |
127 | 127 | ||
128 | Used by a reader to inform the reclaimer that the reader is | 128 | Used by a reader to inform the reclaimer that the reader is |
129 | entering an RCU read-side critical section. It is illegal | 129 | entering an RCU read-side critical section. It is illegal |
130 | to block while in an RCU read-side critical section, though | 130 | to block while in an RCU read-side critical section, though |
131 | kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side | 131 | kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side |
132 | critical sections. Any RCU-protected data structure accessed | 132 | critical sections. Any RCU-protected data structure accessed |
133 | during an RCU read-side critical section is guaranteed to remain | 133 | during an RCU read-side critical section is guaranteed to remain |
134 | unreclaimed for the full duration of that critical section. | 134 | unreclaimed for the full duration of that critical section. |
135 | Reference counts may be used in conjunction with RCU to maintain | 135 | Reference counts may be used in conjunction with RCU to maintain |
136 | longer-term references to data structures. | 136 | longer-term references to data structures. |
137 | 137 | ||
138 | rcu_read_unlock() | 138 | rcu_read_unlock() |
139 | 139 | ||
140 | void rcu_read_unlock(void); | 140 | void rcu_read_unlock(void); |
141 | 141 | ||
142 | Used by a reader to inform the reclaimer that the reader is | 142 | Used by a reader to inform the reclaimer that the reader is |
143 | exiting an RCU read-side critical section. Note that RCU | 143 | exiting an RCU read-side critical section. Note that RCU |
144 | read-side critical sections may be nested and/or overlapping. | 144 | read-side critical sections may be nested and/or overlapping. |
145 | 145 | ||
146 | synchronize_rcu() | 146 | synchronize_rcu() |
147 | 147 | ||
148 | void synchronize_rcu(void); | 148 | void synchronize_rcu(void); |
149 | 149 | ||
150 | Marks the end of updater code and the beginning of reclaimer | 150 | Marks the end of updater code and the beginning of reclaimer |
151 | code. It does this by blocking until all pre-existing RCU | 151 | code. It does this by blocking until all pre-existing RCU |
152 | read-side critical sections on all CPUs have completed. | 152 | read-side critical sections on all CPUs have completed. |
153 | Note that synchronize_rcu() will -not- necessarily wait for | 153 | Note that synchronize_rcu() will -not- necessarily wait for |
154 | any subsequent RCU read-side critical sections to complete. | 154 | any subsequent RCU read-side critical sections to complete. |
155 | For example, consider the following sequence of events: | 155 | For example, consider the following sequence of events: |
156 | 156 | ||
157 | CPU 0 CPU 1 CPU 2 | 157 | CPU 0 CPU 1 CPU 2 |
158 | ----------------- ------------------------- --------------- | 158 | ----------------- ------------------------- --------------- |
159 | 1. rcu_read_lock() | 159 | 1. rcu_read_lock() |
160 | 2. enters synchronize_rcu() | 160 | 2. enters synchronize_rcu() |
161 | 3. rcu_read_lock() | 161 | 3. rcu_read_lock() |
162 | 4. rcu_read_unlock() | 162 | 4. rcu_read_unlock() |
163 | 5. exits synchronize_rcu() | 163 | 5. exits synchronize_rcu() |
164 | 6. rcu_read_unlock() | 164 | 6. rcu_read_unlock() |
165 | 165 | ||
166 | To reiterate, synchronize_rcu() waits only for ongoing RCU | 166 | To reiterate, synchronize_rcu() waits only for ongoing RCU |
167 | read-side critical sections to complete, not necessarily for | 167 | read-side critical sections to complete, not necessarily for |
168 | any that begin after synchronize_rcu() is invoked. | 168 | any that begin after synchronize_rcu() is invoked. |
169 | 169 | ||
170 | Of course, synchronize_rcu() does not necessarily return | 170 | Of course, synchronize_rcu() does not necessarily return |
171 | -immediately- after the last pre-existing RCU read-side critical | 171 | -immediately- after the last pre-existing RCU read-side critical |
172 | section completes. For one thing, there might well be scheduling | 172 | section completes. For one thing, there might well be scheduling |
173 | delays. For another thing, many RCU implementations process | 173 | delays. For another thing, many RCU implementations process |
174 | requests in batches in order to improve efficiencies, which can | 174 | requests in batches in order to improve efficiencies, which can |
175 | further delay synchronize_rcu(). | 175 | further delay synchronize_rcu(). |
176 | 176 | ||
177 | Since synchronize_rcu() is the API that must figure out when | 177 | Since synchronize_rcu() is the API that must figure out when |
178 | readers are done, its implementation is key to RCU. For RCU | 178 | readers are done, its implementation is key to RCU. For RCU |
179 | to be useful in all but the most read-intensive situations, | 179 | to be useful in all but the most read-intensive situations, |
180 | synchronize_rcu()'s overhead must also be quite small. | 180 | synchronize_rcu()'s overhead must also be quite small. |
181 | 181 | ||
182 | The call_rcu() API is a callback form of synchronize_rcu(), | 182 | The call_rcu() API is a callback form of synchronize_rcu(), |
183 | and is described in more detail in a later section. Instead of | 183 | and is described in more detail in a later section. Instead of |
184 | blocking, it registers a function and argument which are invoked | 184 | blocking, it registers a function and argument which are invoked |
185 | after all ongoing RCU read-side critical sections have completed. | 185 | after all ongoing RCU read-side critical sections have completed. |
186 | This callback variant is particularly useful in situations where | 186 | This callback variant is particularly useful in situations where |
187 | it is illegal to block or where update-side performance is | 187 | it is illegal to block or where update-side performance is |
188 | critically important. | 188 | critically important. |
189 | 189 | ||
190 | However, the call_rcu() API should not be used lightly, as use | 190 | However, the call_rcu() API should not be used lightly, as use |
191 | of the synchronize_rcu() API generally results in simpler code. | 191 | of the synchronize_rcu() API generally results in simpler code. |
192 | In addition, the synchronize_rcu() API has the nice property | 192 | In addition, the synchronize_rcu() API has the nice property |
193 | of automatically limiting update rate should grace periods | 193 | of automatically limiting update rate should grace periods |
194 | be delayed. This property results in system resilience in face | 194 | be delayed. This property results in system resilience in face |
195 | of denial-of-service attacks. Code using call_rcu() should limit | 195 | of denial-of-service attacks. Code using call_rcu() should limit |
196 | update rate in order to gain this same sort of resilience. See | 196 | update rate in order to gain this same sort of resilience. See |
197 | checklist.txt for some approaches to limiting the update rate. | 197 | checklist.txt for some approaches to limiting the update rate. |
198 | 198 | ||
199 | rcu_assign_pointer() | 199 | rcu_assign_pointer() |
200 | 200 | ||
201 | typeof(p) rcu_assign_pointer(p, typeof(p) v); | 201 | typeof(p) rcu_assign_pointer(p, typeof(p) v); |
202 | 202 | ||
203 | Yes, rcu_assign_pointer() -is- implemented as a macro, though it | 203 | Yes, rcu_assign_pointer() -is- implemented as a macro, though it |
204 | would be cool to be able to declare a function in this manner. | 204 | would be cool to be able to declare a function in this manner. |
205 | (Compiler experts will no doubt disagree.) | 205 | (Compiler experts will no doubt disagree.) |
206 | 206 | ||
207 | The updater uses this function to assign a new value to an | 207 | The updater uses this function to assign a new value to an |
208 | RCU-protected pointer, in order to safely communicate the change | 208 | RCU-protected pointer, in order to safely communicate the change |
209 | in value from the updater to the reader. This function returns | 209 | in value from the updater to the reader. This function returns |
210 | the new value, and also executes any memory-barrier instructions | 210 | the new value, and also executes any memory-barrier instructions |
211 | required for a given CPU architecture. | 211 | required for a given CPU architecture. |
212 | 212 | ||
213 | Perhaps just as important, it serves to document (1) which | 213 | Perhaps just as important, it serves to document (1) which |
214 | pointers are protected by RCU and (2) the point at which a | 214 | pointers are protected by RCU and (2) the point at which a |
215 | given structure becomes accessible to other CPUs. That said, | 215 | given structure becomes accessible to other CPUs. That said, |
216 | rcu_assign_pointer() is most frequently used indirectly, via | 216 | rcu_assign_pointer() is most frequently used indirectly, via |
217 | the _rcu list-manipulation primitives such as list_add_rcu(). | 217 | the _rcu list-manipulation primitives such as list_add_rcu(). |
218 | 218 | ||
219 | rcu_dereference() | 219 | rcu_dereference() |
220 | 220 | ||
221 | typeof(p) rcu_dereference(p); | 221 | typeof(p) rcu_dereference(p); |
222 | 222 | ||
223 | Like rcu_assign_pointer(), rcu_dereference() must be implemented | 223 | Like rcu_assign_pointer(), rcu_dereference() must be implemented |
224 | as a macro. | 224 | as a macro. |
225 | 225 | ||
226 | The reader uses rcu_dereference() to fetch an RCU-protected | 226 | The reader uses rcu_dereference() to fetch an RCU-protected |
227 | pointer, which returns a value that may then be safely | 227 | pointer, which returns a value that may then be safely |
228 | dereferenced. Note that rcu_deference() does not actually | 228 | dereferenced. Note that rcu_deference() does not actually |
229 | dereference the pointer, instead, it protects the pointer for | 229 | dereference the pointer, instead, it protects the pointer for |
230 | later dereferencing. It also executes any needed memory-barrier | 230 | later dereferencing. It also executes any needed memory-barrier |
231 | instructions for a given CPU architecture. Currently, only Alpha | 231 | instructions for a given CPU architecture. Currently, only Alpha |
232 | needs memory barriers within rcu_dereference() -- on other CPUs, | 232 | needs memory barriers within rcu_dereference() -- on other CPUs, |
233 | it compiles to nothing, not even a compiler directive. | 233 | it compiles to nothing, not even a compiler directive. |
234 | 234 | ||
235 | Common coding practice uses rcu_dereference() to copy an | 235 | Common coding practice uses rcu_dereference() to copy an |
236 | RCU-protected pointer to a local variable, then dereferences | 236 | RCU-protected pointer to a local variable, then dereferences |
237 | this local variable, for example as follows: | 237 | this local variable, for example as follows: |
238 | 238 | ||
239 | p = rcu_dereference(head.next); | 239 | p = rcu_dereference(head.next); |
240 | return p->data; | 240 | return p->data; |
241 | 241 | ||
242 | However, in this case, one could just as easily combine these | 242 | However, in this case, one could just as easily combine these |
243 | into one statement: | 243 | into one statement: |
244 | 244 | ||
245 | return rcu_dereference(head.next)->data; | 245 | return rcu_dereference(head.next)->data; |
246 | 246 | ||
247 | If you are going to be fetching multiple fields from the | 247 | If you are going to be fetching multiple fields from the |
248 | RCU-protected structure, using the local variable is of | 248 | RCU-protected structure, using the local variable is of |
249 | course preferred. Repeated rcu_dereference() calls look | 249 | course preferred. Repeated rcu_dereference() calls look |
250 | ugly and incur unnecessary overhead on Alpha CPUs. | 250 | ugly and incur unnecessary overhead on Alpha CPUs. |
251 | 251 | ||
252 | Note that the value returned by rcu_dereference() is valid | 252 | Note that the value returned by rcu_dereference() is valid |
253 | only within the enclosing RCU read-side critical section. | 253 | only within the enclosing RCU read-side critical section. |
254 | For example, the following is -not- legal: | 254 | For example, the following is -not- legal: |
255 | 255 | ||
256 | rcu_read_lock(); | 256 | rcu_read_lock(); |
257 | p = rcu_dereference(head.next); | 257 | p = rcu_dereference(head.next); |
258 | rcu_read_unlock(); | 258 | rcu_read_unlock(); |
259 | x = p->address; | 259 | x = p->address; |
260 | rcu_read_lock(); | 260 | rcu_read_lock(); |
261 | y = p->data; | 261 | y = p->data; |
262 | rcu_read_unlock(); | 262 | rcu_read_unlock(); |
263 | 263 | ||
264 | Holding a reference from one RCU read-side critical section | 264 | Holding a reference from one RCU read-side critical section |
265 | to another is just as illegal as holding a reference from | 265 | to another is just as illegal as holding a reference from |
266 | one lock-based critical section to another! Similarly, | 266 | one lock-based critical section to another! Similarly, |
267 | using a reference outside of the critical section in which | 267 | using a reference outside of the critical section in which |
268 | it was acquired is just as illegal as doing so with normal | 268 | it was acquired is just as illegal as doing so with normal |
269 | locking. | 269 | locking. |
270 | 270 | ||
271 | As with rcu_assign_pointer(), an important function of | 271 | As with rcu_assign_pointer(), an important function of |
272 | rcu_dereference() is to document which pointers are protected by | 272 | rcu_dereference() is to document which pointers are protected by |
273 | RCU, in particular, flagging a pointer that is subject to changing | 273 | RCU, in particular, flagging a pointer that is subject to changing |
274 | at any time, including immediately after the rcu_dereference(). | 274 | at any time, including immediately after the rcu_dereference(). |
275 | And, again like rcu_assign_pointer(), rcu_dereference() is | 275 | And, again like rcu_assign_pointer(), rcu_dereference() is |
276 | typically used indirectly, via the _rcu list-manipulation | 276 | typically used indirectly, via the _rcu list-manipulation |
277 | primitives, such as list_for_each_entry_rcu(). | 277 | primitives, such as list_for_each_entry_rcu(). |
278 | 278 | ||
279 | The following diagram shows how each API communicates among the | 279 | The following diagram shows how each API communicates among the |
280 | reader, updater, and reclaimer. | 280 | reader, updater, and reclaimer. |
281 | 281 | ||
282 | 282 | ||
283 | rcu_assign_pointer() | 283 | rcu_assign_pointer() |
284 | +--------+ | 284 | +--------+ |
285 | +---------------------->| reader |---------+ | 285 | +---------------------->| reader |---------+ |
286 | | +--------+ | | 286 | | +--------+ | |
287 | | | | | 287 | | | | |
288 | | | | Protect: | 288 | | | | Protect: |
289 | | | | rcu_read_lock() | 289 | | | | rcu_read_lock() |
290 | | | | rcu_read_unlock() | 290 | | | | rcu_read_unlock() |
291 | | rcu_dereference() | | | 291 | | rcu_dereference() | | |
292 | +---------+ | | | 292 | +---------+ | | |
293 | | updater |<---------------------+ | | 293 | | updater |<---------------------+ | |
294 | +---------+ V | 294 | +---------+ V |
295 | | +-----------+ | 295 | | +-----------+ |
296 | +----------------------------------->| reclaimer | | 296 | +----------------------------------->| reclaimer | |
297 | +-----------+ | 297 | +-----------+ |
298 | Defer: | 298 | Defer: |
299 | synchronize_rcu() & call_rcu() | 299 | synchronize_rcu() & call_rcu() |
300 | 300 | ||
301 | 301 | ||
302 | The RCU infrastructure observes the time sequence of rcu_read_lock(), | 302 | The RCU infrastructure observes the time sequence of rcu_read_lock(), |
303 | rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in | 303 | rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in |
304 | order to determine when (1) synchronize_rcu() invocations may return | 304 | order to determine when (1) synchronize_rcu() invocations may return |
305 | to their callers and (2) call_rcu() callbacks may be invoked. Efficient | 305 | to their callers and (2) call_rcu() callbacks may be invoked. Efficient |
306 | implementations of the RCU infrastructure make heavy use of batching in | 306 | implementations of the RCU infrastructure make heavy use of batching in |
307 | order to amortize their overhead over many uses of the corresponding APIs. | 307 | order to amortize their overhead over many uses of the corresponding APIs. |
308 | 308 | ||
309 | There are no fewer than three RCU mechanisms in the Linux kernel; the | 309 | There are no fewer than three RCU mechanisms in the Linux kernel; the |
310 | diagram above shows the first one, which is by far the most commonly used. | 310 | diagram above shows the first one, which is by far the most commonly used. |
311 | The rcu_dereference() and rcu_assign_pointer() primitives are used for | 311 | The rcu_dereference() and rcu_assign_pointer() primitives are used for |
312 | all three mechanisms, but different defer and protect primitives are | 312 | all three mechanisms, but different defer and protect primitives are |
313 | used as follows: | 313 | used as follows: |
314 | 314 | ||
315 | Defer Protect | 315 | Defer Protect |
316 | 316 | ||
317 | a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() | 317 | a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() |
318 | call_rcu() | 318 | call_rcu() |
319 | 319 | ||
320 | b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() | 320 | b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() |
321 | 321 | ||
322 | c. synchronize_sched() preempt_disable() / preempt_enable() | 322 | c. synchronize_sched() preempt_disable() / preempt_enable() |
323 | local_irq_save() / local_irq_restore() | 323 | local_irq_save() / local_irq_restore() |
324 | hardirq enter / hardirq exit | 324 | hardirq enter / hardirq exit |
325 | NMI enter / NMI exit | 325 | NMI enter / NMI exit |
326 | 326 | ||
327 | These three mechanisms are used as follows: | 327 | These three mechanisms are used as follows: |
328 | 328 | ||
329 | a. RCU applied to normal data structures. | 329 | a. RCU applied to normal data structures. |
330 | 330 | ||
331 | b. RCU applied to networking data structures that may be subjected | 331 | b. RCU applied to networking data structures that may be subjected |
332 | to remote denial-of-service attacks. | 332 | to remote denial-of-service attacks. |
333 | 333 | ||
334 | c. RCU applied to scheduler and interrupt/NMI-handler tasks. | 334 | c. RCU applied to scheduler and interrupt/NMI-handler tasks. |
335 | 335 | ||
336 | Again, most uses will be of (a). The (b) and (c) cases are important | 336 | Again, most uses will be of (a). The (b) and (c) cases are important |
337 | for specialized uses, but are relatively uncommon. | 337 | for specialized uses, but are relatively uncommon. |
338 | 338 | ||
339 | 339 | ||
340 | 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? | 340 | 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? |
341 | 341 | ||
342 | This section shows a simple use of the core RCU API to protect a | 342 | This section shows a simple use of the core RCU API to protect a |
343 | global pointer to a dynamically allocated structure. More-typical | 343 | global pointer to a dynamically allocated structure. More-typical |
344 | uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. | 344 | uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. |
345 | 345 | ||
346 | struct foo { | 346 | struct foo { |
347 | int a; | 347 | int a; |
348 | char b; | 348 | char b; |
349 | long c; | 349 | long c; |
350 | }; | 350 | }; |
351 | DEFINE_SPINLOCK(foo_mutex); | 351 | DEFINE_SPINLOCK(foo_mutex); |
352 | 352 | ||
353 | struct foo *gbl_foo; | 353 | struct foo *gbl_foo; |
354 | 354 | ||
355 | /* | 355 | /* |
356 | * Create a new struct foo that is the same as the one currently | 356 | * Create a new struct foo that is the same as the one currently |
357 | * pointed to by gbl_foo, except that field "a" is replaced | 357 | * pointed to by gbl_foo, except that field "a" is replaced |
358 | * with "new_a". Points gbl_foo to the new structure, and | 358 | * with "new_a". Points gbl_foo to the new structure, and |
359 | * frees up the old structure after a grace period. | 359 | * frees up the old structure after a grace period. |
360 | * | 360 | * |
361 | * Uses rcu_assign_pointer() to ensure that concurrent readers | 361 | * Uses rcu_assign_pointer() to ensure that concurrent readers |
362 | * see the initialized version of the new structure. | 362 | * see the initialized version of the new structure. |
363 | * | 363 | * |
364 | * Uses synchronize_rcu() to ensure that any readers that might | 364 | * Uses synchronize_rcu() to ensure that any readers that might |
365 | * have references to the old structure complete before freeing | 365 | * have references to the old structure complete before freeing |
366 | * the old structure. | 366 | * the old structure. |
367 | */ | 367 | */ |
368 | void foo_update_a(int new_a) | 368 | void foo_update_a(int new_a) |
369 | { | 369 | { |
370 | struct foo *new_fp; | 370 | struct foo *new_fp; |
371 | struct foo *old_fp; | 371 | struct foo *old_fp; |
372 | 372 | ||
373 | new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); | 373 | new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); |
374 | spin_lock(&foo_mutex); | 374 | spin_lock(&foo_mutex); |
375 | old_fp = gbl_foo; | 375 | old_fp = gbl_foo; |
376 | *new_fp = *old_fp; | 376 | *new_fp = *old_fp; |
377 | new_fp->a = new_a; | 377 | new_fp->a = new_a; |
378 | rcu_assign_pointer(gbl_foo, new_fp); | 378 | rcu_assign_pointer(gbl_foo, new_fp); |
379 | spin_unlock(&foo_mutex); | 379 | spin_unlock(&foo_mutex); |
380 | synchronize_rcu(); | 380 | synchronize_rcu(); |
381 | kfree(old_fp); | 381 | kfree(old_fp); |
382 | } | 382 | } |
383 | 383 | ||
384 | /* | 384 | /* |
385 | * Return the value of field "a" of the current gbl_foo | 385 | * Return the value of field "a" of the current gbl_foo |
386 | * structure. Use rcu_read_lock() and rcu_read_unlock() | 386 | * structure. Use rcu_read_lock() and rcu_read_unlock() |
387 | * to ensure that the structure does not get deleted out | 387 | * to ensure that the structure does not get deleted out |
388 | * from under us, and use rcu_dereference() to ensure that | 388 | * from under us, and use rcu_dereference() to ensure that |
389 | * we see the initialized version of the structure (important | 389 | * we see the initialized version of the structure (important |
390 | * for DEC Alpha and for people reading the code). | 390 | * for DEC Alpha and for people reading the code). |
391 | */ | 391 | */ |
392 | int foo_get_a(void) | 392 | int foo_get_a(void) |
393 | { | 393 | { |
394 | int retval; | 394 | int retval; |
395 | 395 | ||
396 | rcu_read_lock(); | 396 | rcu_read_lock(); |
397 | retval = rcu_dereference(gbl_foo)->a; | 397 | retval = rcu_dereference(gbl_foo)->a; |
398 | rcu_read_unlock(); | 398 | rcu_read_unlock(); |
399 | return retval; | 399 | return retval; |
400 | } | 400 | } |
401 | 401 | ||
402 | So, to sum up: | 402 | So, to sum up: |
403 | 403 | ||
404 | o Use rcu_read_lock() and rcu_read_unlock() to guard RCU | 404 | o Use rcu_read_lock() and rcu_read_unlock() to guard RCU |
405 | read-side critical sections. | 405 | read-side critical sections. |
406 | 406 | ||
407 | o Within an RCU read-side critical section, use rcu_dereference() | 407 | o Within an RCU read-side critical section, use rcu_dereference() |
408 | to dereference RCU-protected pointers. | 408 | to dereference RCU-protected pointers. |
409 | 409 | ||
410 | o Use some solid scheme (such as locks or semaphores) to | 410 | o Use some solid scheme (such as locks or semaphores) to |
411 | keep concurrent updates from interfering with each other. | 411 | keep concurrent updates from interfering with each other. |
412 | 412 | ||
413 | o Use rcu_assign_pointer() to update an RCU-protected pointer. | 413 | o Use rcu_assign_pointer() to update an RCU-protected pointer. |
414 | This primitive protects concurrent readers from the updater, | 414 | This primitive protects concurrent readers from the updater, |
415 | -not- concurrent updates from each other! You therefore still | 415 | -not- concurrent updates from each other! You therefore still |
416 | need to use locking (or something similar) to keep concurrent | 416 | need to use locking (or something similar) to keep concurrent |
417 | rcu_assign_pointer() primitives from interfering with each other. | 417 | rcu_assign_pointer() primitives from interfering with each other. |
418 | 418 | ||
419 | o Use synchronize_rcu() -after- removing a data element from an | 419 | o Use synchronize_rcu() -after- removing a data element from an |
420 | RCU-protected data structure, but -before- reclaiming/freeing | 420 | RCU-protected data structure, but -before- reclaiming/freeing |
421 | the data element, in order to wait for the completion of all | 421 | the data element, in order to wait for the completion of all |
422 | RCU read-side critical sections that might be referencing that | 422 | RCU read-side critical sections that might be referencing that |
423 | data item. | 423 | data item. |
424 | 424 | ||
425 | See checklist.txt for additional rules to follow when using RCU. | 425 | See checklist.txt for additional rules to follow when using RCU. |
426 | And again, more-typical uses of RCU may be found in listRCU.txt, | 426 | And again, more-typical uses of RCU may be found in listRCU.txt, |
427 | arrayRCU.txt, and NMI-RCU.txt. | 427 | arrayRCU.txt, and NMI-RCU.txt. |
428 | 428 | ||
429 | 429 | ||
430 | 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? | 430 | 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? |
431 | 431 | ||
432 | In the example above, foo_update_a() blocks until a grace period elapses. | 432 | In the example above, foo_update_a() blocks until a grace period elapses. |
433 | This is quite simple, but in some cases one cannot afford to wait so | 433 | This is quite simple, but in some cases one cannot afford to wait so |
434 | long -- there might be other high-priority work to be done. | 434 | long -- there might be other high-priority work to be done. |
435 | 435 | ||
436 | In such cases, one uses call_rcu() rather than synchronize_rcu(). | 436 | In such cases, one uses call_rcu() rather than synchronize_rcu(). |
437 | The call_rcu() API is as follows: | 437 | The call_rcu() API is as follows: |
438 | 438 | ||
439 | void call_rcu(struct rcu_head * head, | 439 | void call_rcu(struct rcu_head * head, |
440 | void (*func)(struct rcu_head *head)); | 440 | void (*func)(struct rcu_head *head)); |
441 | 441 | ||
442 | This function invokes func(head) after a grace period has elapsed. | 442 | This function invokes func(head) after a grace period has elapsed. |
443 | This invocation might happen from either softirq or process context, | 443 | This invocation might happen from either softirq or process context, |
444 | so the function is not permitted to block. The foo struct needs to | 444 | so the function is not permitted to block. The foo struct needs to |
445 | have an rcu_head structure added, perhaps as follows: | 445 | have an rcu_head structure added, perhaps as follows: |
446 | 446 | ||
447 | struct foo { | 447 | struct foo { |
448 | int a; | 448 | int a; |
449 | char b; | 449 | char b; |
450 | long c; | 450 | long c; |
451 | struct rcu_head rcu; | 451 | struct rcu_head rcu; |
452 | }; | 452 | }; |
453 | 453 | ||
454 | The foo_update_a() function might then be written as follows: | 454 | The foo_update_a() function might then be written as follows: |
455 | 455 | ||
456 | /* | 456 | /* |
457 | * Create a new struct foo that is the same as the one currently | 457 | * Create a new struct foo that is the same as the one currently |
458 | * pointed to by gbl_foo, except that field "a" is replaced | 458 | * pointed to by gbl_foo, except that field "a" is replaced |
459 | * with "new_a". Points gbl_foo to the new structure, and | 459 | * with "new_a". Points gbl_foo to the new structure, and |
460 | * frees up the old structure after a grace period. | 460 | * frees up the old structure after a grace period. |
461 | * | 461 | * |
462 | * Uses rcu_assign_pointer() to ensure that concurrent readers | 462 | * Uses rcu_assign_pointer() to ensure that concurrent readers |
463 | * see the initialized version of the new structure. | 463 | * see the initialized version of the new structure. |
464 | * | 464 | * |
465 | * Uses call_rcu() to ensure that any readers that might have | 465 | * Uses call_rcu() to ensure that any readers that might have |
466 | * references to the old structure complete before freeing the | 466 | * references to the old structure complete before freeing the |
467 | * old structure. | 467 | * old structure. |
468 | */ | 468 | */ |
469 | void foo_update_a(int new_a) | 469 | void foo_update_a(int new_a) |
470 | { | 470 | { |
471 | struct foo *new_fp; | 471 | struct foo *new_fp; |
472 | struct foo *old_fp; | 472 | struct foo *old_fp; |
473 | 473 | ||
474 | new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); | 474 | new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); |
475 | spin_lock(&foo_mutex); | 475 | spin_lock(&foo_mutex); |
476 | old_fp = gbl_foo; | 476 | old_fp = gbl_foo; |
477 | *new_fp = *old_fp; | 477 | *new_fp = *old_fp; |
478 | new_fp->a = new_a; | 478 | new_fp->a = new_a; |
479 | rcu_assign_pointer(gbl_foo, new_fp); | 479 | rcu_assign_pointer(gbl_foo, new_fp); |
480 | spin_unlock(&foo_mutex); | 480 | spin_unlock(&foo_mutex); |
481 | call_rcu(&old_fp->rcu, foo_reclaim); | 481 | call_rcu(&old_fp->rcu, foo_reclaim); |
482 | } | 482 | } |
483 | 483 | ||
484 | The foo_reclaim() function might appear as follows: | 484 | The foo_reclaim() function might appear as follows: |
485 | 485 | ||
486 | void foo_reclaim(struct rcu_head *rp) | 486 | void foo_reclaim(struct rcu_head *rp) |
487 | { | 487 | { |
488 | struct foo *fp = container_of(rp, struct foo, rcu); | 488 | struct foo *fp = container_of(rp, struct foo, rcu); |
489 | 489 | ||
490 | kfree(fp); | 490 | kfree(fp); |
491 | } | 491 | } |
492 | 492 | ||
493 | The container_of() primitive is a macro that, given a pointer into a | 493 | The container_of() primitive is a macro that, given a pointer into a |
494 | struct, the type of the struct, and the pointed-to field within the | 494 | struct, the type of the struct, and the pointed-to field within the |
495 | struct, returns a pointer to the beginning of the struct. | 495 | struct, returns a pointer to the beginning of the struct. |
496 | 496 | ||
497 | The use of call_rcu() permits the caller of foo_update_a() to | 497 | The use of call_rcu() permits the caller of foo_update_a() to |
498 | immediately regain control, without needing to worry further about the | 498 | immediately regain control, without needing to worry further about the |
499 | old version of the newly updated element. It also clearly shows the | 499 | old version of the newly updated element. It also clearly shows the |
500 | RCU distinction between updater, namely foo_update_a(), and reclaimer, | 500 | RCU distinction between updater, namely foo_update_a(), and reclaimer, |
501 | namely foo_reclaim(). | 501 | namely foo_reclaim(). |
502 | 502 | ||
503 | The summary of advice is the same as for the previous section, except | 503 | The summary of advice is the same as for the previous section, except |
504 | that we are now using call_rcu() rather than synchronize_rcu(): | 504 | that we are now using call_rcu() rather than synchronize_rcu(): |
505 | 505 | ||
506 | o Use call_rcu() -after- removing a data element from an | 506 | o Use call_rcu() -after- removing a data element from an |
507 | RCU-protected data structure in order to register a callback | 507 | RCU-protected data structure in order to register a callback |
508 | function that will be invoked after the completion of all RCU | 508 | function that will be invoked after the completion of all RCU |
509 | read-side critical sections that might be referencing that | 509 | read-side critical sections that might be referencing that |
510 | data item. | 510 | data item. |
511 | 511 | ||
512 | Again, see checklist.txt for additional rules governing the use of RCU. | 512 | Again, see checklist.txt for additional rules governing the use of RCU. |
513 | 513 | ||
514 | 514 | ||
515 | 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? | 515 | 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? |
516 | 516 | ||
517 | One of the nice things about RCU is that it has extremely simple "toy" | 517 | One of the nice things about RCU is that it has extremely simple "toy" |
518 | implementations that are a good first step towards understanding the | 518 | implementations that are a good first step towards understanding the |
519 | production-quality implementations in the Linux kernel. This section | 519 | production-quality implementations in the Linux kernel. This section |
520 | presents two such "toy" implementations of RCU, one that is implemented | 520 | presents two such "toy" implementations of RCU, one that is implemented |
521 | in terms of familiar locking primitives, and another that more closely | 521 | in terms of familiar locking primitives, and another that more closely |
522 | resembles "classic" RCU. Both are way too simple for real-world use, | 522 | resembles "classic" RCU. Both are way too simple for real-world use, |
523 | lacking both functionality and performance. However, they are useful | 523 | lacking both functionality and performance. However, they are useful |
524 | in getting a feel for how RCU works. See kernel/rcupdate.c for a | 524 | in getting a feel for how RCU works. See kernel/rcupdate.c for a |
525 | production-quality implementation, and see: | 525 | production-quality implementation, and see: |
526 | 526 | ||
527 | http://www.rdrop.com/users/paulmck/RCU | 527 | http://www.rdrop.com/users/paulmck/RCU |
528 | 528 | ||
529 | for papers describing the Linux kernel RCU implementation. The OLS'01 | 529 | for papers describing the Linux kernel RCU implementation. The OLS'01 |
530 | and OLS'02 papers are a good introduction, and the dissertation provides | 530 | and OLS'02 papers are a good introduction, and the dissertation provides |
531 | more details on the current implementation as of early 2004. | 531 | more details on the current implementation as of early 2004. |
532 | 532 | ||
533 | 533 | ||
534 | 5A. "TOY" IMPLEMENTATION #1: LOCKING | 534 | 5A. "TOY" IMPLEMENTATION #1: LOCKING |
535 | 535 | ||
536 | This section presents a "toy" RCU implementation that is based on | 536 | This section presents a "toy" RCU implementation that is based on |
537 | familiar locking primitives. Its overhead makes it a non-starter for | 537 | familiar locking primitives. Its overhead makes it a non-starter for |
538 | real-life use, as does its lack of scalability. It is also unsuitable | 538 | real-life use, as does its lack of scalability. It is also unsuitable |
539 | for realtime use, since it allows scheduling latency to "bleed" from | 539 | for realtime use, since it allows scheduling latency to "bleed" from |
540 | one read-side critical section to another. | 540 | one read-side critical section to another. |
541 | 541 | ||
542 | However, it is probably the easiest implementation to relate to, so is | 542 | However, it is probably the easiest implementation to relate to, so is |
543 | a good starting point. | 543 | a good starting point. |
544 | 544 | ||
545 | It is extremely simple: | 545 | It is extremely simple: |
546 | 546 | ||
547 | static DEFINE_RWLOCK(rcu_gp_mutex); | 547 | static DEFINE_RWLOCK(rcu_gp_mutex); |
548 | 548 | ||
549 | void rcu_read_lock(void) | 549 | void rcu_read_lock(void) |
550 | { | 550 | { |
551 | read_lock(&rcu_gp_mutex); | 551 | read_lock(&rcu_gp_mutex); |
552 | } | 552 | } |
553 | 553 | ||
554 | void rcu_read_unlock(void) | 554 | void rcu_read_unlock(void) |
555 | { | 555 | { |
556 | read_unlock(&rcu_gp_mutex); | 556 | read_unlock(&rcu_gp_mutex); |
557 | } | 557 | } |
558 | 558 | ||
559 | void synchronize_rcu(void) | 559 | void synchronize_rcu(void) |
560 | { | 560 | { |
561 | write_lock(&rcu_gp_mutex); | 561 | write_lock(&rcu_gp_mutex); |
562 | write_unlock(&rcu_gp_mutex); | 562 | write_unlock(&rcu_gp_mutex); |
563 | } | 563 | } |
564 | 564 | ||
565 | [You can ignore rcu_assign_pointer() and rcu_dereference() without | 565 | [You can ignore rcu_assign_pointer() and rcu_dereference() without |
566 | missing much. But here they are anyway. And whatever you do, don't | 566 | missing much. But here they are anyway. And whatever you do, don't |
567 | forget about them when submitting patches making use of RCU!] | 567 | forget about them when submitting patches making use of RCU!] |
568 | 568 | ||
569 | #define rcu_assign_pointer(p, v) ({ \ | 569 | #define rcu_assign_pointer(p, v) ({ \ |
570 | smp_wmb(); \ | 570 | smp_wmb(); \ |
571 | (p) = (v); \ | 571 | (p) = (v); \ |
572 | }) | 572 | }) |
573 | 573 | ||
574 | #define rcu_dereference(p) ({ \ | 574 | #define rcu_dereference(p) ({ \ |
575 | typeof(p) _________p1 = p; \ | 575 | typeof(p) _________p1 = p; \ |
576 | smp_read_barrier_depends(); \ | 576 | smp_read_barrier_depends(); \ |
577 | (_________p1); \ | 577 | (_________p1); \ |
578 | }) | 578 | }) |
579 | 579 | ||
580 | 580 | ||
581 | The rcu_read_lock() and rcu_read_unlock() primitive read-acquire | 581 | The rcu_read_lock() and rcu_read_unlock() primitive read-acquire |
582 | and release a global reader-writer lock. The synchronize_rcu() | 582 | and release a global reader-writer lock. The synchronize_rcu() |
583 | primitive write-acquires this same lock, then immediately releases | 583 | primitive write-acquires this same lock, then immediately releases |
584 | it. This means that once synchronize_rcu() exits, all RCU read-side | 584 | it. This means that once synchronize_rcu() exits, all RCU read-side |
585 | critical sections that were in progress before synchronize_rcu() was | 585 | critical sections that were in progress before synchronize_rcu() was |
586 | called are guaranteed to have completed -- there is no way that | 586 | called are guaranteed to have completed -- there is no way that |
587 | synchronize_rcu() would have been able to write-acquire the lock | 587 | synchronize_rcu() would have been able to write-acquire the lock |
588 | otherwise. | 588 | otherwise. |
589 | 589 | ||
590 | It is possible to nest rcu_read_lock(), since reader-writer locks may | 590 | It is possible to nest rcu_read_lock(), since reader-writer locks may |
591 | be recursively acquired. Note also that rcu_read_lock() is immune | 591 | be recursively acquired. Note also that rcu_read_lock() is immune |
592 | from deadlock (an important property of RCU). The reason for this is | 592 | from deadlock (an important property of RCU). The reason for this is |
593 | that the only thing that can block rcu_read_lock() is a synchronize_rcu(). | 593 | that the only thing that can block rcu_read_lock() is a synchronize_rcu(). |
594 | But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, | 594 | But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, |
595 | so there can be no deadlock cycle. | 595 | so there can be no deadlock cycle. |
596 | 596 | ||
597 | Quick Quiz #1: Why is this argument naive? How could a deadlock | 597 | Quick Quiz #1: Why is this argument naive? How could a deadlock |
598 | occur when using this algorithm in a real-world Linux | 598 | occur when using this algorithm in a real-world Linux |
599 | kernel? How could this deadlock be avoided? | 599 | kernel? How could this deadlock be avoided? |
600 | 600 | ||
601 | 601 | ||
602 | 5B. "TOY" EXAMPLE #2: CLASSIC RCU | 602 | 5B. "TOY" EXAMPLE #2: CLASSIC RCU |
603 | 603 | ||
604 | This section presents a "toy" RCU implementation that is based on | 604 | This section presents a "toy" RCU implementation that is based on |
605 | "classic RCU". It is also short on performance (but only for updates) and | 605 | "classic RCU". It is also short on performance (but only for updates) and |
606 | on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT | 606 | on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT |
607 | kernels. The definitions of rcu_dereference() and rcu_assign_pointer() | 607 | kernels. The definitions of rcu_dereference() and rcu_assign_pointer() |
608 | are the same as those shown in the preceding section, so they are omitted. | 608 | are the same as those shown in the preceding section, so they are omitted. |
609 | 609 | ||
610 | void rcu_read_lock(void) { } | 610 | void rcu_read_lock(void) { } |
611 | 611 | ||
612 | void rcu_read_unlock(void) { } | 612 | void rcu_read_unlock(void) { } |
613 | 613 | ||
614 | void synchronize_rcu(void) | 614 | void synchronize_rcu(void) |
615 | { | 615 | { |
616 | int cpu; | 616 | int cpu; |
617 | 617 | ||
618 | for_each_possible_cpu(cpu) | 618 | for_each_possible_cpu(cpu) |
619 | run_on(cpu); | 619 | run_on(cpu); |
620 | } | 620 | } |
621 | 621 | ||
622 | Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing. | 622 | Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing. |
623 | This is the great strength of classic RCU in a non-preemptive kernel: | 623 | This is the great strength of classic RCU in a non-preemptive kernel: |
624 | read-side overhead is precisely zero, at least on non-Alpha CPUs. | 624 | read-side overhead is precisely zero, at least on non-Alpha CPUs. |
625 | And there is absolutely no way that rcu_read_lock() can possibly | 625 | And there is absolutely no way that rcu_read_lock() can possibly |
626 | participate in a deadlock cycle! | 626 | participate in a deadlock cycle! |
627 | 627 | ||
628 | The implementation of synchronize_rcu() simply schedules itself on each | 628 | The implementation of synchronize_rcu() simply schedules itself on each |
629 | CPU in turn. The run_on() primitive can be implemented straightforwardly | 629 | CPU in turn. The run_on() primitive can be implemented straightforwardly |
630 | in terms of the sched_setaffinity() primitive. Of course, a somewhat less | 630 | in terms of the sched_setaffinity() primitive. Of course, a somewhat less |
631 | "toy" implementation would restore the affinity upon completion rather | 631 | "toy" implementation would restore the affinity upon completion rather |
632 | than just leaving all tasks running on the last CPU, but when I said | 632 | than just leaving all tasks running on the last CPU, but when I said |
633 | "toy", I meant -toy-! | 633 | "toy", I meant -toy-! |
634 | 634 | ||
635 | So how the heck is this supposed to work??? | 635 | So how the heck is this supposed to work??? |
636 | 636 | ||
637 | Remember that it is illegal to block while in an RCU read-side critical | 637 | Remember that it is illegal to block while in an RCU read-side critical |
638 | section. Therefore, if a given CPU executes a context switch, we know | 638 | section. Therefore, if a given CPU executes a context switch, we know |
639 | that it must have completed all preceding RCU read-side critical sections. | 639 | that it must have completed all preceding RCU read-side critical sections. |
640 | Once -all- CPUs have executed a context switch, then -all- preceding | 640 | Once -all- CPUs have executed a context switch, then -all- preceding |
641 | RCU read-side critical sections will have completed. | 641 | RCU read-side critical sections will have completed. |
642 | 642 | ||
643 | So, suppose that we remove a data item from its structure and then invoke | 643 | So, suppose that we remove a data item from its structure and then invoke |
644 | synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed | 644 | synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed |
645 | that there are no RCU read-side critical sections holding a reference | 645 | that there are no RCU read-side critical sections holding a reference |
646 | to that data item, so we can safely reclaim it. | 646 | to that data item, so we can safely reclaim it. |
647 | 647 | ||
648 | Quick Quiz #2: Give an example where Classic RCU's read-side | 648 | Quick Quiz #2: Give an example where Classic RCU's read-side |
649 | overhead is -negative-. | 649 | overhead is -negative-. |
650 | 650 | ||
651 | Quick Quiz #3: If it is illegal to block in an RCU read-side | 651 | Quick Quiz #3: If it is illegal to block in an RCU read-side |
652 | critical section, what the heck do you do in | 652 | critical section, what the heck do you do in |
653 | PREEMPT_RT, where normal spinlocks can block??? | 653 | PREEMPT_RT, where normal spinlocks can block??? |
654 | 654 | ||
655 | 655 | ||
656 | 6. ANALOGY WITH READER-WRITER LOCKING | 656 | 6. ANALOGY WITH READER-WRITER LOCKING |
657 | 657 | ||
658 | Although RCU can be used in many different ways, a very common use of | 658 | Although RCU can be used in many different ways, a very common use of |
659 | RCU is analogous to reader-writer locking. The following unified | 659 | RCU is analogous to reader-writer locking. The following unified |
660 | diff shows how closely related RCU and reader-writer locking can be. | 660 | diff shows how closely related RCU and reader-writer locking can be. |
661 | 661 | ||
662 | @@ -13,15 +14,15 @@ | 662 | @@ -13,15 +14,15 @@ |
663 | struct list_head *lp; | 663 | struct list_head *lp; |
664 | struct el *p; | 664 | struct el *p; |
665 | 665 | ||
666 | - read_lock(); | 666 | - read_lock(); |
667 | - list_for_each_entry(p, head, lp) { | 667 | - list_for_each_entry(p, head, lp) { |
668 | + rcu_read_lock(); | 668 | + rcu_read_lock(); |
669 | + list_for_each_entry_rcu(p, head, lp) { | 669 | + list_for_each_entry_rcu(p, head, lp) { |
670 | if (p->key == key) { | 670 | if (p->key == key) { |
671 | *result = p->data; | 671 | *result = p->data; |
672 | - read_unlock(); | 672 | - read_unlock(); |
673 | + rcu_read_unlock(); | 673 | + rcu_read_unlock(); |
674 | return 1; | 674 | return 1; |
675 | } | 675 | } |
676 | } | 676 | } |
677 | - read_unlock(); | 677 | - read_unlock(); |
678 | + rcu_read_unlock(); | 678 | + rcu_read_unlock(); |
679 | return 0; | 679 | return 0; |
680 | } | 680 | } |
681 | 681 | ||
682 | @@ -29,15 +30,16 @@ | 682 | @@ -29,15 +30,16 @@ |
683 | { | 683 | { |
684 | struct el *p; | 684 | struct el *p; |
685 | 685 | ||
686 | - write_lock(&listmutex); | 686 | - write_lock(&listmutex); |
687 | + spin_lock(&listmutex); | 687 | + spin_lock(&listmutex); |
688 | list_for_each_entry(p, head, lp) { | 688 | list_for_each_entry(p, head, lp) { |
689 | if (p->key == key) { | 689 | if (p->key == key) { |
690 | - list_del(&p->list); | 690 | - list_del(&p->list); |
691 | - write_unlock(&listmutex); | 691 | - write_unlock(&listmutex); |
692 | + list_del_rcu(&p->list); | 692 | + list_del_rcu(&p->list); |
693 | + spin_unlock(&listmutex); | 693 | + spin_unlock(&listmutex); |
694 | + synchronize_rcu(); | 694 | + synchronize_rcu(); |
695 | kfree(p); | 695 | kfree(p); |
696 | return 1; | 696 | return 1; |
697 | } | 697 | } |
698 | } | 698 | } |
699 | - write_unlock(&listmutex); | 699 | - write_unlock(&listmutex); |
700 | + spin_unlock(&listmutex); | 700 | + spin_unlock(&listmutex); |
701 | return 0; | 701 | return 0; |
702 | } | 702 | } |
703 | 703 | ||
704 | Or, for those who prefer a side-by-side listing: | 704 | Or, for those who prefer a side-by-side listing: |
705 | 705 | ||
706 | 1 struct el { 1 struct el { | 706 | 1 struct el { 1 struct el { |
707 | 2 struct list_head list; 2 struct list_head list; | 707 | 2 struct list_head list; 2 struct list_head list; |
708 | 3 long key; 3 long key; | 708 | 3 long key; 3 long key; |
709 | 4 spinlock_t mutex; 4 spinlock_t mutex; | 709 | 4 spinlock_t mutex; 4 spinlock_t mutex; |
710 | 5 int data; 5 int data; | 710 | 5 int data; 5 int data; |
711 | 6 /* Other data fields */ 6 /* Other data fields */ | 711 | 6 /* Other data fields */ 6 /* Other data fields */ |
712 | 7 }; 7 }; | 712 | 7 }; 7 }; |
713 | 8 spinlock_t listmutex; 8 spinlock_t listmutex; | 713 | 8 spinlock_t listmutex; 8 spinlock_t listmutex; |
714 | 9 struct el head; 9 struct el head; | 714 | 9 struct el head; 9 struct el head; |
715 | 715 | ||
716 | 1 int search(long key, int *result) 1 int search(long key, int *result) | 716 | 1 int search(long key, int *result) 1 int search(long key, int *result) |
717 | 2 { 2 { | 717 | 2 { 2 { |
718 | 3 struct list_head *lp; 3 struct list_head *lp; | 718 | 3 struct list_head *lp; 3 struct list_head *lp; |
719 | 4 struct el *p; 4 struct el *p; | 719 | 4 struct el *p; 4 struct el *p; |
720 | 5 5 | 720 | 5 5 |
721 | 6 read_lock(); 6 rcu_read_lock(); | 721 | 6 read_lock(); 6 rcu_read_lock(); |
722 | 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { | 722 | 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { |
723 | 8 if (p->key == key) { 8 if (p->key == key) { | 723 | 8 if (p->key == key) { 8 if (p->key == key) { |
724 | 9 *result = p->data; 9 *result = p->data; | 724 | 9 *result = p->data; 9 *result = p->data; |
725 | 10 read_unlock(); 10 rcu_read_unlock(); | 725 | 10 read_unlock(); 10 rcu_read_unlock(); |
726 | 11 return 1; 11 return 1; | 726 | 11 return 1; 11 return 1; |
727 | 12 } 12 } | 727 | 12 } 12 } |
728 | 13 } 13 } | 728 | 13 } 13 } |
729 | 14 read_unlock(); 14 rcu_read_unlock(); | 729 | 14 read_unlock(); 14 rcu_read_unlock(); |
730 | 15 return 0; 15 return 0; | 730 | 15 return 0; 15 return 0; |
731 | 16 } 16 } | 731 | 16 } 16 } |
732 | 732 | ||
733 | 1 int delete(long key) 1 int delete(long key) | 733 | 1 int delete(long key) 1 int delete(long key) |
734 | 2 { 2 { | 734 | 2 { 2 { |
735 | 3 struct el *p; 3 struct el *p; | 735 | 3 struct el *p; 3 struct el *p; |
736 | 4 4 | 736 | 4 4 |
737 | 5 write_lock(&listmutex); 5 spin_lock(&listmutex); | 737 | 5 write_lock(&listmutex); 5 spin_lock(&listmutex); |
738 | 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { | 738 | 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { |
739 | 7 if (p->key == key) { 7 if (p->key == key) { | 739 | 7 if (p->key == key) { 7 if (p->key == key) { |
740 | 8 list_del(&p->list); 8 list_del_rcu(&p->list); | 740 | 8 list_del(&p->list); 8 list_del_rcu(&p->list); |
741 | 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); | 741 | 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); |
742 | 10 synchronize_rcu(); | 742 | 10 synchronize_rcu(); |
743 | 10 kfree(p); 11 kfree(p); | 743 | 10 kfree(p); 11 kfree(p); |
744 | 11 return 1; 12 return 1; | 744 | 11 return 1; 12 return 1; |
745 | 12 } 13 } | 745 | 12 } 13 } |
746 | 13 } 14 } | 746 | 13 } 14 } |
747 | 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); | 747 | 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); |
748 | 15 return 0; 16 return 0; | 748 | 15 return 0; 16 return 0; |
749 | 16 } 17 } | 749 | 16 } 17 } |
750 | 750 | ||
751 | Either way, the differences are quite small. Read-side locking moves | 751 | Either way, the differences are quite small. Read-side locking moves |
752 | to rcu_read_lock() and rcu_read_unlock, update-side locking moves from | 752 | to rcu_read_lock() and rcu_read_unlock, update-side locking moves from |
753 | from a reader-writer lock to a simple spinlock, and a synchronize_rcu() | 753 | a reader-writer lock to a simple spinlock, and a synchronize_rcu() |
754 | precedes the kfree(). | 754 | precedes the kfree(). |
755 | 755 | ||
756 | However, there is one potential catch: the read-side and update-side | 756 | However, there is one potential catch: the read-side and update-side |
757 | critical sections can now run concurrently. In many cases, this will | 757 | critical sections can now run concurrently. In many cases, this will |
758 | not be a problem, but it is necessary to check carefully regardless. | 758 | not be a problem, but it is necessary to check carefully regardless. |
759 | For example, if multiple independent list updates must be seen as | 759 | For example, if multiple independent list updates must be seen as |
760 | a single atomic update, converting to RCU will require special care. | 760 | a single atomic update, converting to RCU will require special care. |
761 | 761 | ||
762 | Also, the presence of synchronize_rcu() means that the RCU version of | 762 | Also, the presence of synchronize_rcu() means that the RCU version of |
763 | delete() can now block. If this is a problem, there is a callback-based | 763 | delete() can now block. If this is a problem, there is a callback-based |
764 | mechanism that never blocks, namely call_rcu(), that can be used in | 764 | mechanism that never blocks, namely call_rcu(), that can be used in |
765 | place of synchronize_rcu(). | 765 | place of synchronize_rcu(). |
766 | 766 | ||
767 | 767 | ||
768 | 7. FULL LIST OF RCU APIs | 768 | 7. FULL LIST OF RCU APIs |
769 | 769 | ||
770 | The RCU APIs are documented in docbook-format header comments in the | 770 | The RCU APIs are documented in docbook-format header comments in the |
771 | Linux-kernel source code, but it helps to have a full list of the | 771 | Linux-kernel source code, but it helps to have a full list of the |
772 | APIs, since there does not appear to be a way to categorize them | 772 | APIs, since there does not appear to be a way to categorize them |
773 | in docbook. Here is the list, by category. | 773 | in docbook. Here is the list, by category. |
774 | 774 | ||
775 | Markers for RCU read-side critical sections: | 775 | Markers for RCU read-side critical sections: |
776 | 776 | ||
777 | rcu_read_lock | 777 | rcu_read_lock |
778 | rcu_read_unlock | 778 | rcu_read_unlock |
779 | rcu_read_lock_bh | 779 | rcu_read_lock_bh |
780 | rcu_read_unlock_bh | 780 | rcu_read_unlock_bh |
781 | 781 | ||
782 | RCU pointer/list traversal: | 782 | RCU pointer/list traversal: |
783 | 783 | ||
784 | rcu_dereference | 784 | rcu_dereference |
785 | list_for_each_rcu (to be deprecated in favor of | 785 | list_for_each_rcu (to be deprecated in favor of |
786 | list_for_each_entry_rcu) | 786 | list_for_each_entry_rcu) |
787 | list_for_each_entry_rcu | 787 | list_for_each_entry_rcu |
788 | list_for_each_continue_rcu (to be deprecated in favor of new | 788 | list_for_each_continue_rcu (to be deprecated in favor of new |
789 | list_for_each_entry_continue_rcu) | 789 | list_for_each_entry_continue_rcu) |
790 | hlist_for_each_entry_rcu | 790 | hlist_for_each_entry_rcu |
791 | 791 | ||
792 | RCU pointer update: | 792 | RCU pointer update: |
793 | 793 | ||
794 | rcu_assign_pointer | 794 | rcu_assign_pointer |
795 | list_add_rcu | 795 | list_add_rcu |
796 | list_add_tail_rcu | 796 | list_add_tail_rcu |
797 | list_del_rcu | 797 | list_del_rcu |
798 | list_replace_rcu | 798 | list_replace_rcu |
799 | hlist_del_rcu | 799 | hlist_del_rcu |
800 | hlist_add_head_rcu | 800 | hlist_add_head_rcu |
801 | 801 | ||
802 | RCU grace period: | 802 | RCU grace period: |
803 | 803 | ||
804 | synchronize_net | 804 | synchronize_net |
805 | synchronize_sched | 805 | synchronize_sched |
806 | synchronize_rcu | 806 | synchronize_rcu |
807 | call_rcu | 807 | call_rcu |
808 | call_rcu_bh | 808 | call_rcu_bh |
809 | 809 | ||
810 | See the comment headers in the source code (or the docbook generated | 810 | See the comment headers in the source code (or the docbook generated |
811 | from them) for more information. | 811 | from them) for more information. |
812 | 812 | ||
813 | 813 | ||
814 | 8. ANSWERS TO QUICK QUIZZES | 814 | 8. ANSWERS TO QUICK QUIZZES |
815 | 815 | ||
816 | Quick Quiz #1: Why is this argument naive? How could a deadlock | 816 | Quick Quiz #1: Why is this argument naive? How could a deadlock |
817 | occur when using this algorithm in a real-world Linux | 817 | occur when using this algorithm in a real-world Linux |
818 | kernel? [Referring to the lock-based "toy" RCU | 818 | kernel? [Referring to the lock-based "toy" RCU |
819 | algorithm.] | 819 | algorithm.] |
820 | 820 | ||
821 | Answer: Consider the following sequence of events: | 821 | Answer: Consider the following sequence of events: |
822 | 822 | ||
823 | 1. CPU 0 acquires some unrelated lock, call it | 823 | 1. CPU 0 acquires some unrelated lock, call it |
824 | "problematic_lock", disabling irq via | 824 | "problematic_lock", disabling irq via |
825 | spin_lock_irqsave(). | 825 | spin_lock_irqsave(). |
826 | 826 | ||
827 | 2. CPU 1 enters synchronize_rcu(), write-acquiring | 827 | 2. CPU 1 enters synchronize_rcu(), write-acquiring |
828 | rcu_gp_mutex. | 828 | rcu_gp_mutex. |
829 | 829 | ||
830 | 3. CPU 0 enters rcu_read_lock(), but must wait | 830 | 3. CPU 0 enters rcu_read_lock(), but must wait |
831 | because CPU 1 holds rcu_gp_mutex. | 831 | because CPU 1 holds rcu_gp_mutex. |
832 | 832 | ||
833 | 4. CPU 1 is interrupted, and the irq handler | 833 | 4. CPU 1 is interrupted, and the irq handler |
834 | attempts to acquire problematic_lock. | 834 | attempts to acquire problematic_lock. |
835 | 835 | ||
836 | The system is now deadlocked. | 836 | The system is now deadlocked. |
837 | 837 | ||
838 | One way to avoid this deadlock is to use an approach like | 838 | One way to avoid this deadlock is to use an approach like |
839 | that of CONFIG_PREEMPT_RT, where all normal spinlocks | 839 | that of CONFIG_PREEMPT_RT, where all normal spinlocks |
840 | become blocking locks, and all irq handlers execute in | 840 | become blocking locks, and all irq handlers execute in |
841 | the context of special tasks. In this case, in step 4 | 841 | the context of special tasks. In this case, in step 4 |
842 | above, the irq handler would block, allowing CPU 1 to | 842 | above, the irq handler would block, allowing CPU 1 to |
843 | release rcu_gp_mutex, avoiding the deadlock. | 843 | release rcu_gp_mutex, avoiding the deadlock. |
844 | 844 | ||
845 | Even in the absence of deadlock, this RCU implementation | 845 | Even in the absence of deadlock, this RCU implementation |
846 | allows latency to "bleed" from readers to other | 846 | allows latency to "bleed" from readers to other |
847 | readers through synchronize_rcu(). To see this, | 847 | readers through synchronize_rcu(). To see this, |
848 | consider task A in an RCU read-side critical section | 848 | consider task A in an RCU read-side critical section |
849 | (thus read-holding rcu_gp_mutex), task B blocked | 849 | (thus read-holding rcu_gp_mutex), task B blocked |
850 | attempting to write-acquire rcu_gp_mutex, and | 850 | attempting to write-acquire rcu_gp_mutex, and |
851 | task C blocked in rcu_read_lock() attempting to | 851 | task C blocked in rcu_read_lock() attempting to |
852 | read_acquire rcu_gp_mutex. Task A's RCU read-side | 852 | read_acquire rcu_gp_mutex. Task A's RCU read-side |
853 | latency is holding up task C, albeit indirectly via | 853 | latency is holding up task C, albeit indirectly via |
854 | task B. | 854 | task B. |
855 | 855 | ||
856 | Realtime RCU implementations therefore use a counter-based | 856 | Realtime RCU implementations therefore use a counter-based |
857 | approach where tasks in RCU read-side critical sections | 857 | approach where tasks in RCU read-side critical sections |
858 | cannot be blocked by tasks executing synchronize_rcu(). | 858 | cannot be blocked by tasks executing synchronize_rcu(). |
859 | 859 | ||
860 | Quick Quiz #2: Give an example where Classic RCU's read-side | 860 | Quick Quiz #2: Give an example where Classic RCU's read-side |
861 | overhead is -negative-. | 861 | overhead is -negative-. |
862 | 862 | ||
863 | Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT | 863 | Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT |
864 | kernel where a routing table is used by process-context | 864 | kernel where a routing table is used by process-context |
865 | code, but can be updated by irq-context code (for example, | 865 | code, but can be updated by irq-context code (for example, |
866 | by an "ICMP REDIRECT" packet). The usual way of handling | 866 | by an "ICMP REDIRECT" packet). The usual way of handling |
867 | this would be to have the process-context code disable | 867 | this would be to have the process-context code disable |
868 | interrupts while searching the routing table. Use of | 868 | interrupts while searching the routing table. Use of |
869 | RCU allows such interrupt-disabling to be dispensed with. | 869 | RCU allows such interrupt-disabling to be dispensed with. |
870 | Thus, without RCU, you pay the cost of disabling interrupts, | 870 | Thus, without RCU, you pay the cost of disabling interrupts, |
871 | and with RCU you don't. | 871 | and with RCU you don't. |
872 | 872 | ||
873 | One can argue that the overhead of RCU in this | 873 | One can argue that the overhead of RCU in this |
874 | case is negative with respect to the single-CPU | 874 | case is negative with respect to the single-CPU |
875 | interrupt-disabling approach. Others might argue that | 875 | interrupt-disabling approach. Others might argue that |
876 | the overhead of RCU is merely zero, and that replacing | 876 | the overhead of RCU is merely zero, and that replacing |
877 | the positive overhead of the interrupt-disabling scheme | 877 | the positive overhead of the interrupt-disabling scheme |
878 | with the zero-overhead RCU scheme does not constitute | 878 | with the zero-overhead RCU scheme does not constitute |
879 | negative overhead. | 879 | negative overhead. |
880 | 880 | ||
881 | In real life, of course, things are more complex. But | 881 | In real life, of course, things are more complex. But |
882 | even the theoretical possibility of negative overhead for | 882 | even the theoretical possibility of negative overhead for |
883 | a synchronization primitive is a bit unexpected. ;-) | 883 | a synchronization primitive is a bit unexpected. ;-) |
884 | 884 | ||
885 | Quick Quiz #3: If it is illegal to block in an RCU read-side | 885 | Quick Quiz #3: If it is illegal to block in an RCU read-side |
886 | critical section, what the heck do you do in | 886 | critical section, what the heck do you do in |
887 | PREEMPT_RT, where normal spinlocks can block??? | 887 | PREEMPT_RT, where normal spinlocks can block??? |
888 | 888 | ||
889 | Answer: Just as PREEMPT_RT permits preemption of spinlock | 889 | Answer: Just as PREEMPT_RT permits preemption of spinlock |
890 | critical sections, it permits preemption of RCU | 890 | critical sections, it permits preemption of RCU |
891 | read-side critical sections. It also permits | 891 | read-side critical sections. It also permits |
892 | spinlocks blocking while in RCU read-side critical | 892 | spinlocks blocking while in RCU read-side critical |
893 | sections. | 893 | sections. |
894 | 894 | ||
895 | Why the apparent inconsistency? Because it is it | 895 | Why the apparent inconsistency? Because it is it |
896 | possible to use priority boosting to keep the RCU | 896 | possible to use priority boosting to keep the RCU |
897 | grace periods short if need be (for example, if running | 897 | grace periods short if need be (for example, if running |
898 | short of memory). In contrast, if blocking waiting | 898 | short of memory). In contrast, if blocking waiting |
899 | for (say) network reception, there is no way to know | 899 | for (say) network reception, there is no way to know |
900 | what should be boosted. Especially given that the | 900 | what should be boosted. Especially given that the |
901 | process we need to boost might well be a human being | 901 | process we need to boost might well be a human being |
902 | who just went out for a pizza or something. And although | 902 | who just went out for a pizza or something. And although |
903 | a computer-operated cattle prod might arouse serious | 903 | a computer-operated cattle prod might arouse serious |
904 | interest, it might also provoke serious objections. | 904 | interest, it might also provoke serious objections. |
905 | Besides, how does the computer know what pizza parlor | 905 | Besides, how does the computer know what pizza parlor |
906 | the human being went to??? | 906 | the human being went to??? |
907 | 907 | ||
908 | 908 | ||
909 | ACKNOWLEDGEMENTS | 909 | ACKNOWLEDGEMENTS |
910 | 910 | ||
911 | My thanks to the people who helped make this human-readable, including | 911 | My thanks to the people who helped make this human-readable, including |
912 | Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. | 912 | Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. |
913 | 913 | ||
914 | 914 | ||
915 | For more information, see http://www.rdrop.com/users/paulmck/RCU. | 915 | For more information, see http://www.rdrop.com/users/paulmck/RCU. |
916 | 916 |
Documentation/block/biodoc.txt
1 | Notes on the Generic Block Layer Rewrite in Linux 2.5 | 1 | Notes on the Generic Block Layer Rewrite in Linux 2.5 |
2 | ===================================================== | 2 | ===================================================== |
3 | 3 | ||
4 | Notes Written on Jan 15, 2002: | 4 | Notes Written on Jan 15, 2002: |
5 | Jens Axboe <axboe@suse.de> | 5 | Jens Axboe <axboe@suse.de> |
6 | Suparna Bhattacharya <suparna@in.ibm.com> | 6 | Suparna Bhattacharya <suparna@in.ibm.com> |
7 | 7 | ||
8 | Last Updated May 2, 2002 | 8 | Last Updated May 2, 2002 |
9 | September 2003: Updated I/O Scheduler portions | 9 | September 2003: Updated I/O Scheduler portions |
10 | Nick Piggin <piggin@cyberone.com.au> | 10 | Nick Piggin <piggin@cyberone.com.au> |
11 | 11 | ||
12 | Introduction: | 12 | Introduction: |
13 | 13 | ||
14 | These are some notes describing some aspects of the 2.5 block layer in the | 14 | These are some notes describing some aspects of the 2.5 block layer in the |
15 | context of the bio rewrite. The idea is to bring out some of the key | 15 | context of the bio rewrite. The idea is to bring out some of the key |
16 | changes and a glimpse of the rationale behind those changes. | 16 | changes and a glimpse of the rationale behind those changes. |
17 | 17 | ||
18 | Please mail corrections & suggestions to suparna@in.ibm.com. | 18 | Please mail corrections & suggestions to suparna@in.ibm.com. |
19 | 19 | ||
20 | Credits: | 20 | Credits: |
21 | --------- | 21 | --------- |
22 | 22 | ||
23 | 2.5 bio rewrite: | 23 | 2.5 bio rewrite: |
24 | Jens Axboe <axboe@suse.de> | 24 | Jens Axboe <axboe@suse.de> |
25 | 25 | ||
26 | Many aspects of the generic block layer redesign were driven by and evolved | 26 | Many aspects of the generic block layer redesign were driven by and evolved |
27 | over discussions, prior patches and the collective experience of several | 27 | over discussions, prior patches and the collective experience of several |
28 | people. See sections 8 and 9 for a list of some related references. | 28 | people. See sections 8 and 9 for a list of some related references. |
29 | 29 | ||
30 | The following people helped with review comments and inputs for this | 30 | The following people helped with review comments and inputs for this |
31 | document: | 31 | document: |
32 | Christoph Hellwig <hch@infradead.org> | 32 | Christoph Hellwig <hch@infradead.org> |
33 | Arjan van de Ven <arjanv@redhat.com> | 33 | Arjan van de Ven <arjanv@redhat.com> |
34 | Randy Dunlap <rdunlap@xenotime.net> | 34 | Randy Dunlap <rdunlap@xenotime.net> |
35 | Andre Hedrick <andre@linux-ide.org> | 35 | Andre Hedrick <andre@linux-ide.org> |
36 | 36 | ||
37 | The following people helped with fixes/contributions to the bio patches | 37 | The following people helped with fixes/contributions to the bio patches |
38 | while it was still work-in-progress: | 38 | while it was still work-in-progress: |
39 | David S. Miller <davem@redhat.com> | 39 | David S. Miller <davem@redhat.com> |
40 | 40 | ||
41 | 41 | ||
42 | Description of Contents: | 42 | Description of Contents: |
43 | ------------------------ | 43 | ------------------------ |
44 | 44 | ||
45 | 1. Scope for tuning of logic to various needs | 45 | 1. Scope for tuning of logic to various needs |
46 | 1.1 Tuning based on device or low level driver capabilities | 46 | 1.1 Tuning based on device or low level driver capabilities |
47 | - Per-queue parameters | 47 | - Per-queue parameters |
48 | - Highmem I/O support | 48 | - Highmem I/O support |
49 | - I/O scheduler modularization | 49 | - I/O scheduler modularization |
50 | 1.2 Tuning based on high level requirements/capabilities | 50 | 1.2 Tuning based on high level requirements/capabilities |
51 | 1.2.1 I/O Barriers | 51 | 1.2.1 I/O Barriers |
52 | 1.2.2 Request Priority/Latency | 52 | 1.2.2 Request Priority/Latency |
53 | 1.3 Direct access/bypass to lower layers for diagnostics and special | 53 | 1.3 Direct access/bypass to lower layers for diagnostics and special |
54 | device operations | 54 | device operations |
55 | 1.3.1 Pre-built commands | 55 | 1.3.1 Pre-built commands |
56 | 2. New flexible and generic but minimalist i/o structure or descriptor | 56 | 2. New flexible and generic but minimalist i/o structure or descriptor |
57 | (instead of using buffer heads at the i/o layer) | 57 | (instead of using buffer heads at the i/o layer) |
58 | 2.1 Requirements/Goals addressed | 58 | 2.1 Requirements/Goals addressed |
59 | 2.2 The bio struct in detail (multi-page io unit) | 59 | 2.2 The bio struct in detail (multi-page io unit) |
60 | 2.3 Changes in the request structure | 60 | 2.3 Changes in the request structure |
61 | 3. Using bios | 61 | 3. Using bios |
62 | 3.1 Setup/teardown (allocation, splitting) | 62 | 3.1 Setup/teardown (allocation, splitting) |
63 | 3.2 Generic bio helper routines | 63 | 3.2 Generic bio helper routines |
64 | 3.2.1 Traversing segments and completion units in a request | 64 | 3.2.1 Traversing segments and completion units in a request |
65 | 3.2.2 Setting up DMA scatterlists | 65 | 3.2.2 Setting up DMA scatterlists |
66 | 3.2.3 I/O completion | 66 | 3.2.3 I/O completion |
67 | 3.2.4 Implications for drivers that do not interpret bios (don't handle | 67 | 3.2.4 Implications for drivers that do not interpret bios (don't handle |
68 | multiple segments) | 68 | multiple segments) |
69 | 3.2.5 Request command tagging | 69 | 3.2.5 Request command tagging |
70 | 3.3 I/O submission | 70 | 3.3 I/O submission |
71 | 4. The I/O scheduler | 71 | 4. The I/O scheduler |
72 | 5. Scalability related changes | 72 | 5. Scalability related changes |
73 | 5.1 Granular locking: Removal of io_request_lock | 73 | 5.1 Granular locking: Removal of io_request_lock |
74 | 5.2 Prepare for transition to 64 bit sector_t | 74 | 5.2 Prepare for transition to 64 bit sector_t |
75 | 6. Other Changes/Implications | 75 | 6. Other Changes/Implications |
76 | 6.1 Partition re-mapping handled by the generic block layer | 76 | 6.1 Partition re-mapping handled by the generic block layer |
77 | 7. A few tips on migration of older drivers | 77 | 7. A few tips on migration of older drivers |
78 | 8. A list of prior/related/impacted patches/ideas | 78 | 8. A list of prior/related/impacted patches/ideas |
79 | 9. Other References/Discussion Threads | 79 | 9. Other References/Discussion Threads |
80 | 80 | ||
81 | --------------------------------------------------------------------------- | 81 | --------------------------------------------------------------------------- |
82 | 82 | ||
83 | Bio Notes | 83 | Bio Notes |
84 | -------- | 84 | -------- |
85 | 85 | ||
86 | Let us discuss the changes in the context of how some overall goals for the | 86 | Let us discuss the changes in the context of how some overall goals for the |
87 | block layer are addressed. | 87 | block layer are addressed. |
88 | 88 | ||
89 | 1. Scope for tuning the generic logic to satisfy various requirements | 89 | 1. Scope for tuning the generic logic to satisfy various requirements |
90 | 90 | ||
91 | The block layer design supports adaptable abstractions to handle common | 91 | The block layer design supports adaptable abstractions to handle common |
92 | processing with the ability to tune the logic to an appropriate extent | 92 | processing with the ability to tune the logic to an appropriate extent |
93 | depending on the nature of the device and the requirements of the caller. | 93 | depending on the nature of the device and the requirements of the caller. |
94 | One of the objectives of the rewrite was to increase the degree of tunability | 94 | One of the objectives of the rewrite was to increase the degree of tunability |
95 | and to enable higher level code to utilize underlying device/driver | 95 | and to enable higher level code to utilize underlying device/driver |
96 | capabilities to the maximum extent for better i/o performance. This is | 96 | capabilities to the maximum extent for better i/o performance. This is |
97 | important especially in the light of ever improving hardware capabilities | 97 | important especially in the light of ever improving hardware capabilities |
98 | and application/middleware software designed to take advantage of these | 98 | and application/middleware software designed to take advantage of these |
99 | capabilities. | 99 | capabilities. |
100 | 100 | ||
101 | 1.1 Tuning based on low level device / driver capabilities | 101 | 1.1 Tuning based on low level device / driver capabilities |
102 | 102 | ||
103 | Sophisticated devices with large built-in caches, intelligent i/o scheduling | 103 | Sophisticated devices with large built-in caches, intelligent i/o scheduling |
104 | optimizations, high memory DMA support, etc may find some of the | 104 | optimizations, high memory DMA support, etc may find some of the |
105 | generic processing an overhead, while for less capable devices the | 105 | generic processing an overhead, while for less capable devices the |
106 | generic functionality is essential for performance or correctness reasons. | 106 | generic functionality is essential for performance or correctness reasons. |
107 | Knowledge of some of the capabilities or parameters of the device should be | 107 | Knowledge of some of the capabilities or parameters of the device should be |
108 | used at the generic block layer to take the right decisions on | 108 | used at the generic block layer to take the right decisions on |
109 | behalf of the driver. | 109 | behalf of the driver. |
110 | 110 | ||
111 | How is this achieved ? | 111 | How is this achieved ? |
112 | 112 | ||
113 | Tuning at a per-queue level: | 113 | Tuning at a per-queue level: |
114 | 114 | ||
115 | i. Per-queue limits/values exported to the generic layer by the driver | 115 | i. Per-queue limits/values exported to the generic layer by the driver |
116 | 116 | ||
117 | Various parameters that the generic i/o scheduler logic uses are set at | 117 | Various parameters that the generic i/o scheduler logic uses are set at |
118 | a per-queue level (e.g maximum request size, maximum number of segments in | 118 | a per-queue level (e.g maximum request size, maximum number of segments in |
119 | a scatter-gather list, hardsect size) | 119 | a scatter-gather list, hardsect size) |
120 | 120 | ||
121 | Some parameters that were earlier available as global arrays indexed by | 121 | Some parameters that were earlier available as global arrays indexed by |
122 | major/minor are now directly associated with the queue. Some of these may | 122 | major/minor are now directly associated with the queue. Some of these may |
123 | move into the block device structure in the future. Some characteristics | 123 | move into the block device structure in the future. Some characteristics |
124 | have been incorporated into a queue flags field rather than separate fields | 124 | have been incorporated into a queue flags field rather than separate fields |
125 | in themselves. There are blk_queue_xxx functions to set the parameters, | 125 | in themselves. There are blk_queue_xxx functions to set the parameters, |
126 | rather than update the fields directly | 126 | rather than update the fields directly |
127 | 127 | ||
128 | Some new queue property settings: | 128 | Some new queue property settings: |
129 | 129 | ||
130 | blk_queue_bounce_limit(q, u64 dma_address) | 130 | blk_queue_bounce_limit(q, u64 dma_address) |
131 | Enable I/O to highmem pages, dma_address being the | 131 | Enable I/O to highmem pages, dma_address being the |
132 | limit. No highmem default. | 132 | limit. No highmem default. |
133 | 133 | ||
134 | blk_queue_max_sectors(q, max_sectors) | 134 | blk_queue_max_sectors(q, max_sectors) |
135 | Sets two variables that limit the size of the request. | 135 | Sets two variables that limit the size of the request. |
136 | 136 | ||
137 | - The request queue's max_sectors, which is a soft size in | 137 | - The request queue's max_sectors, which is a soft size in |
138 | in units of 512 byte sectors, and could be dynamically varied | 138 | units of 512 byte sectors, and could be dynamically varied |
139 | by the core kernel. | 139 | by the core kernel. |
140 | 140 | ||
141 | - The request queue's max_hw_sectors, which is a hard limit | 141 | - The request queue's max_hw_sectors, which is a hard limit |
142 | and reflects the maximum size request a driver can handle | 142 | and reflects the maximum size request a driver can handle |
143 | in units of 512 byte sectors. | 143 | in units of 512 byte sectors. |
144 | 144 | ||
145 | The default for both max_sectors and max_hw_sectors is | 145 | The default for both max_sectors and max_hw_sectors is |
146 | 255. The upper limit of max_sectors is 1024. | 146 | 255. The upper limit of max_sectors is 1024. |
147 | 147 | ||
148 | blk_queue_max_phys_segments(q, max_segments) | 148 | blk_queue_max_phys_segments(q, max_segments) |
149 | Maximum physical segments you can handle in a request. 128 | 149 | Maximum physical segments you can handle in a request. 128 |
150 | default (driver limit). (See 3.2.2) | 150 | default (driver limit). (See 3.2.2) |
151 | 151 | ||
152 | blk_queue_max_hw_segments(q, max_segments) | 152 | blk_queue_max_hw_segments(q, max_segments) |
153 | Maximum dma segments the hardware can handle in a request. 128 | 153 | Maximum dma segments the hardware can handle in a request. 128 |
154 | default (host adapter limit, after dma remapping). | 154 | default (host adapter limit, after dma remapping). |
155 | (See 3.2.2) | 155 | (See 3.2.2) |
156 | 156 | ||
157 | blk_queue_max_segment_size(q, max_seg_size) | 157 | blk_queue_max_segment_size(q, max_seg_size) |
158 | Maximum size of a clustered segment, 64kB default. | 158 | Maximum size of a clustered segment, 64kB default. |
159 | 159 | ||
160 | blk_queue_hardsect_size(q, hardsect_size) | 160 | blk_queue_hardsect_size(q, hardsect_size) |
161 | Lowest possible sector size that the hardware can operate | 161 | Lowest possible sector size that the hardware can operate |
162 | on, 512 bytes default. | 162 | on, 512 bytes default. |
163 | 163 | ||
164 | New queue flags: | 164 | New queue flags: |
165 | 165 | ||
166 | QUEUE_FLAG_CLUSTER (see 3.2.2) | 166 | QUEUE_FLAG_CLUSTER (see 3.2.2) |
167 | QUEUE_FLAG_QUEUED (see 3.2.4) | 167 | QUEUE_FLAG_QUEUED (see 3.2.4) |
168 | 168 | ||
169 | 169 | ||
170 | ii. High-mem i/o capabilities are now considered the default | 170 | ii. High-mem i/o capabilities are now considered the default |
171 | 171 | ||
172 | The generic bounce buffer logic, present in 2.4, where the block layer would | 172 | The generic bounce buffer logic, present in 2.4, where the block layer would |
173 | by default copyin/out i/o requests on high-memory buffers to low-memory buffers | 173 | by default copyin/out i/o requests on high-memory buffers to low-memory buffers |
174 | assuming that the driver wouldn't be able to handle it directly, has been | 174 | assuming that the driver wouldn't be able to handle it directly, has been |
175 | changed in 2.5. The bounce logic is now applied only for memory ranges | 175 | changed in 2.5. The bounce logic is now applied only for memory ranges |
176 | for which the device cannot handle i/o. A driver can specify this by | 176 | for which the device cannot handle i/o. A driver can specify this by |
177 | setting the queue bounce limit for the request queue for the device | 177 | setting the queue bounce limit for the request queue for the device |
178 | (blk_queue_bounce_limit()). This avoids the inefficiencies of the copyin/out | 178 | (blk_queue_bounce_limit()). This avoids the inefficiencies of the copyin/out |
179 | where a device is capable of handling high memory i/o. | 179 | where a device is capable of handling high memory i/o. |
180 | 180 | ||
181 | In order to enable high-memory i/o where the device is capable of supporting | 181 | In order to enable high-memory i/o where the device is capable of supporting |
182 | it, the pci dma mapping routines and associated data structures have now been | 182 | it, the pci dma mapping routines and associated data structures have now been |
183 | modified to accomplish a direct page -> bus translation, without requiring | 183 | modified to accomplish a direct page -> bus translation, without requiring |
184 | a virtual address mapping (unlike the earlier scheme of virtual address | 184 | a virtual address mapping (unlike the earlier scheme of virtual address |
185 | -> bus translation). So this works uniformly for high-memory pages (which | 185 | -> bus translation). So this works uniformly for high-memory pages (which |
186 | do not have a correponding kernel virtual address space mapping) and | 186 | do not have a correponding kernel virtual address space mapping) and |
187 | low-memory pages. | 187 | low-memory pages. |
188 | 188 | ||
189 | Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA | 189 | Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA |
190 | aspects and mapping of scatter gather lists, and support for 64 bit PCI. | 190 | aspects and mapping of scatter gather lists, and support for 64 bit PCI. |
191 | 191 | ||
192 | Special handling is required only for cases where i/o needs to happen on | 192 | Special handling is required only for cases where i/o needs to happen on |
193 | pages at physical memory addresses beyond what the device can support. In these | 193 | pages at physical memory addresses beyond what the device can support. In these |
194 | cases, a bounce bio representing a buffer from the supported memory range | 194 | cases, a bounce bio representing a buffer from the supported memory range |
195 | is used for performing the i/o with copyin/copyout as needed depending on | 195 | is used for performing the i/o with copyin/copyout as needed depending on |
196 | the type of the operation. For example, in case of a read operation, the | 196 | the type of the operation. For example, in case of a read operation, the |
197 | data read has to be copied to the original buffer on i/o completion, so a | 197 | data read has to be copied to the original buffer on i/o completion, so a |
198 | callback routine is set up to do this, while for write, the data is copied | 198 | callback routine is set up to do this, while for write, the data is copied |
199 | from the original buffer to the bounce buffer prior to issuing the | 199 | from the original buffer to the bounce buffer prior to issuing the |
200 | operation. Since an original buffer may be in a high memory area that's not | 200 | operation. Since an original buffer may be in a high memory area that's not |
201 | mapped in kernel virtual addr, a kmap operation may be required for | 201 | mapped in kernel virtual addr, a kmap operation may be required for |
202 | performing the copy, and special care may be needed in the completion path | 202 | performing the copy, and special care may be needed in the completion path |
203 | as it may not be in irq context. Special care is also required (by way of | 203 | as it may not be in irq context. Special care is also required (by way of |
204 | GFP flags) when allocating bounce buffers, to avoid certain highmem | 204 | GFP flags) when allocating bounce buffers, to avoid certain highmem |
205 | deadlock possibilities. | 205 | deadlock possibilities. |
206 | 206 | ||
207 | It is also possible that a bounce buffer may be allocated from high-memory | 207 | It is also possible that a bounce buffer may be allocated from high-memory |
208 | area that's not mapped in kernel virtual addr, but within the range that the | 208 | area that's not mapped in kernel virtual addr, but within the range that the |
209 | device can use directly; so the bounce page may need to be kmapped during | 209 | device can use directly; so the bounce page may need to be kmapped during |
210 | copy operations. [Note: This does not hold in the current implementation, | 210 | copy operations. [Note: This does not hold in the current implementation, |
211 | though] | 211 | though] |
212 | 212 | ||
213 | There are some situations when pages from high memory may need to | 213 | There are some situations when pages from high memory may need to |
214 | be kmapped, even if bounce buffers are not necessary. For example a device | 214 | be kmapped, even if bounce buffers are not necessary. For example a device |
215 | may need to abort DMA operations and revert to PIO for the transfer, in | 215 | may need to abort DMA operations and revert to PIO for the transfer, in |
216 | which case a virtual mapping of the page is required. For SCSI it is also | 216 | which case a virtual mapping of the page is required. For SCSI it is also |
217 | done in some scenarios where the low level driver cannot be trusted to | 217 | done in some scenarios where the low level driver cannot be trusted to |
218 | handle a single sg entry correctly. The driver is expected to perform the | 218 | handle a single sg entry correctly. The driver is expected to perform the |
219 | kmaps as needed on such occasions using the __bio_kmap_atomic and bio_kmap_irq | 219 | kmaps as needed on such occasions using the __bio_kmap_atomic and bio_kmap_irq |
220 | routines as appropriate. A driver could also use the blk_queue_bounce() | 220 | routines as appropriate. A driver could also use the blk_queue_bounce() |
221 | routine on its own to bounce highmem i/o to low memory for specific requests | 221 | routine on its own to bounce highmem i/o to low memory for specific requests |
222 | if so desired. | 222 | if so desired. |
223 | 223 | ||
224 | iii. The i/o scheduler algorithm itself can be replaced/set as appropriate | 224 | iii. The i/o scheduler algorithm itself can be replaced/set as appropriate |
225 | 225 | ||
226 | As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular | 226 | As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular |
227 | queue or pick from (copy) existing generic schedulers and replace/override | 227 | queue or pick from (copy) existing generic schedulers and replace/override |
228 | certain portions of it. The 2.5 rewrite provides improved modularization | 228 | certain portions of it. The 2.5 rewrite provides improved modularization |
229 | of the i/o scheduler. There are more pluggable callbacks, e.g for init, | 229 | of the i/o scheduler. There are more pluggable callbacks, e.g for init, |
230 | add request, extract request, which makes it possible to abstract specific | 230 | add request, extract request, which makes it possible to abstract specific |
231 | i/o scheduling algorithm aspects and details outside of the generic loop. | 231 | i/o scheduling algorithm aspects and details outside of the generic loop. |
232 | It also makes it possible to completely hide the implementation details of | 232 | It also makes it possible to completely hide the implementation details of |
233 | the i/o scheduler from block drivers. | 233 | the i/o scheduler from block drivers. |
234 | 234 | ||
235 | I/O scheduler wrappers are to be used instead of accessing the queue directly. | 235 | I/O scheduler wrappers are to be used instead of accessing the queue directly. |
236 | See section 4. The I/O scheduler for details. | 236 | See section 4. The I/O scheduler for details. |
237 | 237 | ||
238 | 1.2 Tuning Based on High level code capabilities | 238 | 1.2 Tuning Based on High level code capabilities |
239 | 239 | ||
240 | i. Application capabilities for raw i/o | 240 | i. Application capabilities for raw i/o |
241 | 241 | ||
242 | This comes from some of the high-performance database/middleware | 242 | This comes from some of the high-performance database/middleware |
243 | requirements where an application prefers to make its own i/o scheduling | 243 | requirements where an application prefers to make its own i/o scheduling |
244 | decisions based on an understanding of the access patterns and i/o | 244 | decisions based on an understanding of the access patterns and i/o |
245 | characteristics | 245 | characteristics |
246 | 246 | ||
247 | ii. High performance filesystems or other higher level kernel code's | 247 | ii. High performance filesystems or other higher level kernel code's |
248 | capabilities | 248 | capabilities |
249 | 249 | ||
250 | Kernel components like filesystems could also take their own i/o scheduling | 250 | Kernel components like filesystems could also take their own i/o scheduling |
251 | decisions for optimizing performance. Journalling filesystems may need | 251 | decisions for optimizing performance. Journalling filesystems may need |
252 | some control over i/o ordering. | 252 | some control over i/o ordering. |
253 | 253 | ||
254 | What kind of support exists at the generic block layer for this ? | 254 | What kind of support exists at the generic block layer for this ? |
255 | 255 | ||
256 | The flags and rw fields in the bio structure can be used for some tuning | 256 | The flags and rw fields in the bio structure can be used for some tuning |
257 | from above e.g indicating that an i/o is just a readahead request, or for | 257 | from above e.g indicating that an i/o is just a readahead request, or for |
258 | marking barrier requests (discussed next), or priority settings (currently | 258 | marking barrier requests (discussed next), or priority settings (currently |
259 | unused). As far as user applications are concerned they would need an | 259 | unused). As far as user applications are concerned they would need an |
260 | additional mechanism either via open flags or ioctls, or some other upper | 260 | additional mechanism either via open flags or ioctls, or some other upper |
261 | level mechanism to communicate such settings to block. | 261 | level mechanism to communicate such settings to block. |
262 | 262 | ||
263 | 1.2.1 I/O Barriers | 263 | 1.2.1 I/O Barriers |
264 | 264 | ||
265 | There is a way to enforce strict ordering for i/os through barriers. | 265 | There is a way to enforce strict ordering for i/os through barriers. |
266 | All requests before a barrier point must be serviced before the barrier | 266 | All requests before a barrier point must be serviced before the barrier |
267 | request and any other requests arriving after the barrier will not be | 267 | request and any other requests arriving after the barrier will not be |
268 | serviced until after the barrier has completed. This is useful for higher | 268 | serviced until after the barrier has completed. This is useful for higher |
269 | level control on write ordering, e.g flushing a log of committed updates | 269 | level control on write ordering, e.g flushing a log of committed updates |
270 | to disk before the corresponding updates themselves. | 270 | to disk before the corresponding updates themselves. |
271 | 271 | ||
272 | A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. | 272 | A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. |
273 | The generic i/o scheduler would make sure that it places the barrier request and | 273 | The generic i/o scheduler would make sure that it places the barrier request and |
274 | all other requests coming after it after all the previous requests in the | 274 | all other requests coming after it after all the previous requests in the |
275 | queue. Barriers may be implemented in different ways depending on the | 275 | queue. Barriers may be implemented in different ways depending on the |
276 | driver. For more details regarding I/O barriers, please read barrier.txt | 276 | driver. For more details regarding I/O barriers, please read barrier.txt |
277 | in this directory. | 277 | in this directory. |
278 | 278 | ||
279 | 1.2.2 Request Priority/Latency | 279 | 1.2.2 Request Priority/Latency |
280 | 280 | ||
281 | Todo/Under discussion: | 281 | Todo/Under discussion: |
282 | Arjan's proposed request priority scheme allows higher levels some broad | 282 | Arjan's proposed request priority scheme allows higher levels some broad |
283 | control (high/med/low) over the priority of an i/o request vs other pending | 283 | control (high/med/low) over the priority of an i/o request vs other pending |
284 | requests in the queue. For example it allows reads for bringing in an | 284 | requests in the queue. For example it allows reads for bringing in an |
285 | executable page on demand to be given a higher priority over pending write | 285 | executable page on demand to be given a higher priority over pending write |
286 | requests which haven't aged too much on the queue. Potentially this priority | 286 | requests which haven't aged too much on the queue. Potentially this priority |
287 | could even be exposed to applications in some manner, providing higher level | 287 | could even be exposed to applications in some manner, providing higher level |
288 | tunability. Time based aging avoids starvation of lower priority | 288 | tunability. Time based aging avoids starvation of lower priority |
289 | requests. Some bits in the bi_rw flags field in the bio structure are | 289 | requests. Some bits in the bi_rw flags field in the bio structure are |
290 | intended to be used for this priority information. | 290 | intended to be used for this priority information. |
291 | 291 | ||
292 | 292 | ||
293 | 1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode) | 293 | 1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode) |
294 | (e.g Diagnostics, Systems Management) | 294 | (e.g Diagnostics, Systems Management) |
295 | 295 | ||
296 | There are situations where high-level code needs to have direct access to | 296 | There are situations where high-level code needs to have direct access to |
297 | the low level device capabilities or requires the ability to issue commands | 297 | the low level device capabilities or requires the ability to issue commands |
298 | to the device bypassing some of the intermediate i/o layers. | 298 | to the device bypassing some of the intermediate i/o layers. |
299 | These could, for example, be special control commands issued through ioctl | 299 | These could, for example, be special control commands issued through ioctl |
300 | interfaces, or could be raw read/write commands that stress the drive's | 300 | interfaces, or could be raw read/write commands that stress the drive's |
301 | capabilities for certain kinds of fitness tests. Having direct interfaces at | 301 | capabilities for certain kinds of fitness tests. Having direct interfaces at |
302 | multiple levels without having to pass through upper layers makes | 302 | multiple levels without having to pass through upper layers makes |
303 | it possible to perform bottom up validation of the i/o path, layer by | 303 | it possible to perform bottom up validation of the i/o path, layer by |
304 | layer, starting from the media. | 304 | layer, starting from the media. |
305 | 305 | ||
306 | The normal i/o submission interfaces, e.g submit_bio, could be bypassed | 306 | The normal i/o submission interfaces, e.g submit_bio, could be bypassed |
307 | for specially crafted requests which such ioctl or diagnostics | 307 | for specially crafted requests which such ioctl or diagnostics |
308 | interfaces would typically use, and the elevator add_request routine | 308 | interfaces would typically use, and the elevator add_request routine |
309 | can instead be used to directly insert such requests in the queue or preferably | 309 | can instead be used to directly insert such requests in the queue or preferably |
310 | the blk_do_rq routine can be used to place the request on the queue and | 310 | the blk_do_rq routine can be used to place the request on the queue and |
311 | wait for completion. Alternatively, sometimes the caller might just | 311 | wait for completion. Alternatively, sometimes the caller might just |
312 | invoke a lower level driver specific interface with the request as a | 312 | invoke a lower level driver specific interface with the request as a |
313 | parameter. | 313 | parameter. |
314 | 314 | ||
315 | If the request is a means for passing on special information associated with | 315 | If the request is a means for passing on special information associated with |
316 | the command, then such information is associated with the request->special | 316 | the command, then such information is associated with the request->special |
317 | field (rather than misuse the request->buffer field which is meant for the | 317 | field (rather than misuse the request->buffer field which is meant for the |
318 | request data buffer's virtual mapping). | 318 | request data buffer's virtual mapping). |
319 | 319 | ||
320 | For passing request data, the caller must build up a bio descriptor | 320 | For passing request data, the caller must build up a bio descriptor |
321 | representing the concerned memory buffer if the underlying driver interprets | 321 | representing the concerned memory buffer if the underlying driver interprets |
322 | bio segments or uses the block layer end*request* functions for i/o | 322 | bio segments or uses the block layer end*request* functions for i/o |
323 | completion. Alternatively one could directly use the request->buffer field to | 323 | completion. Alternatively one could directly use the request->buffer field to |
324 | specify the virtual address of the buffer, if the driver expects buffer | 324 | specify the virtual address of the buffer, if the driver expects buffer |
325 | addresses passed in this way and ignores bio entries for the request type | 325 | addresses passed in this way and ignores bio entries for the request type |
326 | involved. In the latter case, the driver would modify and manage the | 326 | involved. In the latter case, the driver would modify and manage the |
327 | request->buffer, request->sector and request->nr_sectors or | 327 | request->buffer, request->sector and request->nr_sectors or |
328 | request->current_nr_sectors fields itself rather than using the block layer | 328 | request->current_nr_sectors fields itself rather than using the block layer |
329 | end_request or end_that_request_first completion interfaces. | 329 | end_request or end_that_request_first completion interfaces. |
330 | (See 2.3 or Documentation/block/request.txt for a brief explanation of | 330 | (See 2.3 or Documentation/block/request.txt for a brief explanation of |
331 | the request structure fields) | 331 | the request structure fields) |
332 | 332 | ||
333 | [TBD: end_that_request_last should be usable even in this case; | 333 | [TBD: end_that_request_last should be usable even in this case; |
334 | Perhaps an end_that_direct_request_first routine could be implemented to make | 334 | Perhaps an end_that_direct_request_first routine could be implemented to make |
335 | handling direct requests easier for such drivers; Also for drivers that | 335 | handling direct requests easier for such drivers; Also for drivers that |
336 | expect bios, a helper function could be provided for setting up a bio | 336 | expect bios, a helper function could be provided for setting up a bio |
337 | corresponding to a data buffer] | 337 | corresponding to a data buffer] |
338 | 338 | ||
339 | <JENS: I dont understand the above, why is end_that_request_first() not | 339 | <JENS: I dont understand the above, why is end_that_request_first() not |
340 | usable? Or _last for that matter. I must be missing something> | 340 | usable? Or _last for that matter. I must be missing something> |
341 | <SUP: What I meant here was that if the request doesn't have a bio, then | 341 | <SUP: What I meant here was that if the request doesn't have a bio, then |
342 | end_that_request_first doesn't modify nr_sectors or current_nr_sectors, | 342 | end_that_request_first doesn't modify nr_sectors or current_nr_sectors, |
343 | and hence can't be used for advancing request state settings on the | 343 | and hence can't be used for advancing request state settings on the |
344 | completion of partial transfers. The driver has to modify these fields | 344 | completion of partial transfers. The driver has to modify these fields |
345 | directly by hand. | 345 | directly by hand. |
346 | This is because end_that_request_first only iterates over the bio list, | 346 | This is because end_that_request_first only iterates over the bio list, |
347 | and always returns 0 if there are none associated with the request. | 347 | and always returns 0 if there are none associated with the request. |
348 | _last works OK in this case, and is not a problem, as I mentioned earlier | 348 | _last works OK in this case, and is not a problem, as I mentioned earlier |
349 | > | 349 | > |
350 | 350 | ||
351 | 1.3.1 Pre-built Commands | 351 | 1.3.1 Pre-built Commands |
352 | 352 | ||
353 | A request can be created with a pre-built custom command to be sent directly | 353 | A request can be created with a pre-built custom command to be sent directly |
354 | to the device. The cmd block in the request structure has room for filling | 354 | to the device. The cmd block in the request structure has room for filling |
355 | in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for | 355 | in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for |
356 | command pre-building, and the type of the request is now indicated | 356 | command pre-building, and the type of the request is now indicated |
357 | through rq->flags instead of via rq->cmd) | 357 | through rq->flags instead of via rq->cmd) |
358 | 358 | ||
359 | The request structure flags can be set up to indicate the type of request | 359 | The request structure flags can be set up to indicate the type of request |
360 | in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC: | 360 | in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC: |
361 | packet command issued via blk_do_rq, REQ_SPECIAL: special request). | 361 | packet command issued via blk_do_rq, REQ_SPECIAL: special request). |
362 | 362 | ||
363 | It can help to pre-build device commands for requests in advance. | 363 | It can help to pre-build device commands for requests in advance. |
364 | Drivers can now specify a request prepare function (q->prep_rq_fn) that the | 364 | Drivers can now specify a request prepare function (q->prep_rq_fn) that the |
365 | block layer would invoke to pre-build device commands for a given request, | 365 | block layer would invoke to pre-build device commands for a given request, |
366 | or perform other preparatory processing for the request. This is routine is | 366 | or perform other preparatory processing for the request. This is routine is |
367 | called by elv_next_request(), i.e. typically just before servicing a request. | 367 | called by elv_next_request(), i.e. typically just before servicing a request. |
368 | (The prepare function would not be called for requests that have REQ_DONTPREP | 368 | (The prepare function would not be called for requests that have REQ_DONTPREP |
369 | enabled) | 369 | enabled) |
370 | 370 | ||
371 | Aside: | 371 | Aside: |
372 | Pre-building could possibly even be done early, i.e before placing the | 372 | Pre-building could possibly even be done early, i.e before placing the |
373 | request on the queue, rather than construct the command on the fly in the | 373 | request on the queue, rather than construct the command on the fly in the |
374 | driver while servicing the request queue when it may affect latencies in | 374 | driver while servicing the request queue when it may affect latencies in |
375 | interrupt context or responsiveness in general. One way to add early | 375 | interrupt context or responsiveness in general. One way to add early |
376 | pre-building would be to do it whenever we fail to merge on a request. | 376 | pre-building would be to do it whenever we fail to merge on a request. |
377 | Now REQ_NOMERGE is set in the request flags to skip this one in the future, | 377 | Now REQ_NOMERGE is set in the request flags to skip this one in the future, |
378 | which means that it will not change before we feed it to the device. So | 378 | which means that it will not change before we feed it to the device. So |
379 | the pre-builder hook can be invoked there. | 379 | the pre-builder hook can be invoked there. |
380 | 380 | ||
381 | 381 | ||
382 | 2. Flexible and generic but minimalist i/o structure/descriptor. | 382 | 2. Flexible and generic but minimalist i/o structure/descriptor. |
383 | 383 | ||
384 | 2.1 Reason for a new structure and requirements addressed | 384 | 2.1 Reason for a new structure and requirements addressed |
385 | 385 | ||
386 | Prior to 2.5, buffer heads were used as the unit of i/o at the generic block | 386 | Prior to 2.5, buffer heads were used as the unit of i/o at the generic block |
387 | layer, and the low level request structure was associated with a chain of | 387 | layer, and the low level request structure was associated with a chain of |
388 | buffer heads for a contiguous i/o request. This led to certain inefficiencies | 388 | buffer heads for a contiguous i/o request. This led to certain inefficiencies |
389 | when it came to large i/o requests and readv/writev style operations, as it | 389 | when it came to large i/o requests and readv/writev style operations, as it |
390 | forced such requests to be broken up into small chunks before being passed | 390 | forced such requests to be broken up into small chunks before being passed |
391 | on to the generic block layer, only to be merged by the i/o scheduler | 391 | on to the generic block layer, only to be merged by the i/o scheduler |
392 | when the underlying device was capable of handling the i/o in one shot. | 392 | when the underlying device was capable of handling the i/o in one shot. |
393 | Also, using the buffer head as an i/o structure for i/os that didn't originate | 393 | Also, using the buffer head as an i/o structure for i/os that didn't originate |
394 | from the buffer cache unecessarily added to the weight of the descriptors | 394 | from the buffer cache unecessarily added to the weight of the descriptors |
395 | which were generated for each such chunk. | 395 | which were generated for each such chunk. |
396 | 396 | ||
397 | The following were some of the goals and expectations considered in the | 397 | The following were some of the goals and expectations considered in the |
398 | redesign of the block i/o data structure in 2.5. | 398 | redesign of the block i/o data structure in 2.5. |
399 | 399 | ||
400 | i. Should be appropriate as a descriptor for both raw and buffered i/o - | 400 | i. Should be appropriate as a descriptor for both raw and buffered i/o - |
401 | avoid cache related fields which are irrelevant in the direct/page i/o path, | 401 | avoid cache related fields which are irrelevant in the direct/page i/o path, |
402 | or filesystem block size alignment restrictions which may not be relevant | 402 | or filesystem block size alignment restrictions which may not be relevant |
403 | for raw i/o. | 403 | for raw i/o. |
404 | ii. Ability to represent high-memory buffers (which do not have a virtual | 404 | ii. Ability to represent high-memory buffers (which do not have a virtual |
405 | address mapping in kernel address space). | 405 | address mapping in kernel address space). |
406 | iii.Ability to represent large i/os w/o unecessarily breaking them up (i.e | 406 | iii.Ability to represent large i/os w/o unecessarily breaking them up (i.e |
407 | greater than PAGE_SIZE chunks in one shot) | 407 | greater than PAGE_SIZE chunks in one shot) |
408 | iv. At the same time, ability to retain independent identity of i/os from | 408 | iv. At the same time, ability to retain independent identity of i/os from |
409 | different sources or i/o units requiring individual completion (e.g. for | 409 | different sources or i/o units requiring individual completion (e.g. for |
410 | latency reasons) | 410 | latency reasons) |
411 | v. Ability to represent an i/o involving multiple physical memory segments | 411 | v. Ability to represent an i/o involving multiple physical memory segments |
412 | (including non-page aligned page fragments, as specified via readv/writev) | 412 | (including non-page aligned page fragments, as specified via readv/writev) |
413 | without unecessarily breaking it up, if the underlying device is capable of | 413 | without unecessarily breaking it up, if the underlying device is capable of |
414 | handling it. | 414 | handling it. |
415 | vi. Preferably should be based on a memory descriptor structure that can be | 415 | vi. Preferably should be based on a memory descriptor structure that can be |
416 | passed around different types of subsystems or layers, maybe even | 416 | passed around different types of subsystems or layers, maybe even |
417 | networking, without duplication or extra copies of data/descriptor fields | 417 | networking, without duplication or extra copies of data/descriptor fields |
418 | themselves in the process | 418 | themselves in the process |
419 | vii.Ability to handle the possibility of splits/merges as the structure passes | 419 | vii.Ability to handle the possibility of splits/merges as the structure passes |
420 | through layered drivers (lvm, md, evms), with minimal overhead. | 420 | through layered drivers (lvm, md, evms), with minimal overhead. |
421 | 421 | ||
422 | The solution was to define a new structure (bio) for the block layer, | 422 | The solution was to define a new structure (bio) for the block layer, |
423 | instead of using the buffer head structure (bh) directly, the idea being | 423 | instead of using the buffer head structure (bh) directly, the idea being |
424 | avoidance of some associated baggage and limitations. The bio structure | 424 | avoidance of some associated baggage and limitations. The bio structure |
425 | is uniformly used for all i/o at the block layer ; it forms a part of the | 425 | is uniformly used for all i/o at the block layer ; it forms a part of the |
426 | bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are | 426 | bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are |
427 | mapped to bio structures. | 427 | mapped to bio structures. |
428 | 428 | ||
429 | 2.2 The bio struct | 429 | 2.2 The bio struct |
430 | 430 | ||
431 | The bio structure uses a vector representation pointing to an array of tuples | 431 | The bio structure uses a vector representation pointing to an array of tuples |
432 | of <page, offset, len> to describe the i/o buffer, and has various other | 432 | of <page, offset, len> to describe the i/o buffer, and has various other |
433 | fields describing i/o parameters and state that needs to be maintained for | 433 | fields describing i/o parameters and state that needs to be maintained for |
434 | performing the i/o. | 434 | performing the i/o. |
435 | 435 | ||
436 | Notice that this representation means that a bio has no virtual address | 436 | Notice that this representation means that a bio has no virtual address |
437 | mapping at all (unlike buffer heads). | 437 | mapping at all (unlike buffer heads). |
438 | 438 | ||
439 | struct bio_vec { | 439 | struct bio_vec { |
440 | struct page *bv_page; | 440 | struct page *bv_page; |
441 | unsigned short bv_len; | 441 | unsigned short bv_len; |
442 | unsigned short bv_offset; | 442 | unsigned short bv_offset; |
443 | }; | 443 | }; |
444 | 444 | ||
445 | /* | 445 | /* |
446 | * main unit of I/O for the block layer and lower layers (ie drivers) | 446 | * main unit of I/O for the block layer and lower layers (ie drivers) |
447 | */ | 447 | */ |
448 | struct bio { | 448 | struct bio { |
449 | sector_t bi_sector; | 449 | sector_t bi_sector; |
450 | struct bio *bi_next; /* request queue link */ | 450 | struct bio *bi_next; /* request queue link */ |
451 | struct block_device *bi_bdev; /* target device */ | 451 | struct block_device *bi_bdev; /* target device */ |
452 | unsigned long bi_flags; /* status, command, etc */ | 452 | unsigned long bi_flags; /* status, command, etc */ |
453 | unsigned long bi_rw; /* low bits: r/w, high: priority */ | 453 | unsigned long bi_rw; /* low bits: r/w, high: priority */ |
454 | 454 | ||
455 | unsigned int bi_vcnt; /* how may bio_vec's */ | 455 | unsigned int bi_vcnt; /* how may bio_vec's */ |
456 | unsigned int bi_idx; /* current index into bio_vec array */ | 456 | unsigned int bi_idx; /* current index into bio_vec array */ |
457 | 457 | ||
458 | unsigned int bi_size; /* total size in bytes */ | 458 | unsigned int bi_size; /* total size in bytes */ |
459 | unsigned short bi_phys_segments; /* segments after physaddr coalesce*/ | 459 | unsigned short bi_phys_segments; /* segments after physaddr coalesce*/ |
460 | unsigned short bi_hw_segments; /* segments after DMA remapping */ | 460 | unsigned short bi_hw_segments; /* segments after DMA remapping */ |
461 | unsigned int bi_max; /* max bio_vecs we can hold | 461 | unsigned int bi_max; /* max bio_vecs we can hold |
462 | used as index into pool */ | 462 | used as index into pool */ |
463 | struct bio_vec *bi_io_vec; /* the actual vec list */ | 463 | struct bio_vec *bi_io_vec; /* the actual vec list */ |
464 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ | 464 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ |
465 | atomic_t bi_cnt; /* pin count: free when it hits zero */ | 465 | atomic_t bi_cnt; /* pin count: free when it hits zero */ |
466 | void *bi_private; | 466 | void *bi_private; |
467 | bio_destructor_t *bi_destructor; /* bi_destructor (bio) */ | 467 | bio_destructor_t *bi_destructor; /* bi_destructor (bio) */ |
468 | }; | 468 | }; |
469 | 469 | ||
470 | With this multipage bio design: | 470 | With this multipage bio design: |
471 | 471 | ||
472 | - Large i/os can be sent down in one go using a bio_vec list consisting | 472 | - Large i/os can be sent down in one go using a bio_vec list consisting |
473 | of an array of <page, offset, len> fragments (similar to the way fragments | 473 | of an array of <page, offset, len> fragments (similar to the way fragments |
474 | are represented in the zero-copy network code) | 474 | are represented in the zero-copy network code) |
475 | - Splitting of an i/o request across multiple devices (as in the case of | 475 | - Splitting of an i/o request across multiple devices (as in the case of |
476 | lvm or raid) is achieved by cloning the bio (where the clone points to | 476 | lvm or raid) is achieved by cloning the bio (where the clone points to |
477 | the same bi_io_vec array, but with the index and size accordingly modified) | 477 | the same bi_io_vec array, but with the index and size accordingly modified) |
478 | - A linked list of bios is used as before for unrelated merges (*) - this | 478 | - A linked list of bios is used as before for unrelated merges (*) - this |
479 | avoids reallocs and makes independent completions easier to handle. | 479 | avoids reallocs and makes independent completions easier to handle. |
480 | - Code that traverses the req list needs to make a distinction between | 480 | - Code that traverses the req list needs to make a distinction between |
481 | segments of a request (bio_for_each_segment) and the distinct completion | 481 | segments of a request (bio_for_each_segment) and the distinct completion |
482 | units/bios (rq_for_each_bio). | 482 | units/bios (rq_for_each_bio). |
483 | - Drivers which can't process a large bio in one shot can use the bi_idx | 483 | - Drivers which can't process a large bio in one shot can use the bi_idx |
484 | field to keep track of the next bio_vec entry to process. | 484 | field to keep track of the next bio_vec entry to process. |
485 | (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE) | 485 | (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE) |
486 | [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying | 486 | [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying |
487 | bi_offset an len fields] | 487 | bi_offset an len fields] |
488 | 488 | ||
489 | (*) unrelated merges -- a request ends up containing two or more bios that | 489 | (*) unrelated merges -- a request ends up containing two or more bios that |
490 | didn't originate from the same place. | 490 | didn't originate from the same place. |
491 | 491 | ||
492 | bi_end_io() i/o callback gets called on i/o completion of the entire bio. | 492 | bi_end_io() i/o callback gets called on i/o completion of the entire bio. |
493 | 493 | ||
494 | At a lower level, drivers build a scatter gather list from the merged bios. | 494 | At a lower level, drivers build a scatter gather list from the merged bios. |
495 | The scatter gather list is in the form of an array of <page, offset, len> | 495 | The scatter gather list is in the form of an array of <page, offset, len> |
496 | entries with their corresponding dma address mappings filled in at the | 496 | entries with their corresponding dma address mappings filled in at the |
497 | appropriate time. As an optimization, contiguous physical pages can be | 497 | appropriate time. As an optimization, contiguous physical pages can be |
498 | covered by a single entry where <page> refers to the first page and <len> | 498 | covered by a single entry where <page> refers to the first page and <len> |
499 | covers the range of pages (upto 16 contiguous pages could be covered this | 499 | covers the range of pages (upto 16 contiguous pages could be covered this |
500 | way). There is a helper routine (blk_rq_map_sg) which drivers can use to build | 500 | way). There is a helper routine (blk_rq_map_sg) which drivers can use to build |
501 | the sg list. | 501 | the sg list. |
502 | 502 | ||
503 | Note: Right now the only user of bios with more than one page is ll_rw_kio, | 503 | Note: Right now the only user of bios with more than one page is ll_rw_kio, |
504 | which in turn means that only raw I/O uses it (direct i/o may not work | 504 | which in turn means that only raw I/O uses it (direct i/o may not work |
505 | right now). The intent however is to enable clustering of pages etc to | 505 | right now). The intent however is to enable clustering of pages etc to |
506 | become possible. The pagebuf abstraction layer from SGI also uses multi-page | 506 | become possible. The pagebuf abstraction layer from SGI also uses multi-page |
507 | bios, but that is currently not included in the stock development kernels. | 507 | bios, but that is currently not included in the stock development kernels. |
508 | The same is true of Andrew Morton's work-in-progress multipage bio writeout | 508 | The same is true of Andrew Morton's work-in-progress multipage bio writeout |
509 | and readahead patches. | 509 | and readahead patches. |
510 | 510 | ||
511 | 2.3 Changes in the Request Structure | 511 | 2.3 Changes in the Request Structure |
512 | 512 | ||
513 | The request structure is the structure that gets passed down to low level | 513 | The request structure is the structure that gets passed down to low level |
514 | drivers. The block layer make_request function builds up a request structure, | 514 | drivers. The block layer make_request function builds up a request structure, |
515 | places it on the queue and invokes the drivers request_fn. The driver makes | 515 | places it on the queue and invokes the drivers request_fn. The driver makes |
516 | use of block layer helper routine elv_next_request to pull the next request | 516 | use of block layer helper routine elv_next_request to pull the next request |
517 | off the queue. Control or diagnostic functions might bypass block and directly | 517 | off the queue. Control or diagnostic functions might bypass block and directly |
518 | invoke underlying driver entry points passing in a specially constructed | 518 | invoke underlying driver entry points passing in a specially constructed |
519 | request structure. | 519 | request structure. |
520 | 520 | ||
521 | Only some relevant fields (mainly those which changed or may be referred | 521 | Only some relevant fields (mainly those which changed or may be referred |
522 | to in some of the discussion here) are listed below, not necessarily in | 522 | to in some of the discussion here) are listed below, not necessarily in |
523 | the order in which they occur in the structure (see include/linux/blkdev.h) | 523 | the order in which they occur in the structure (see include/linux/blkdev.h) |
524 | Refer to Documentation/block/request.txt for details about all the request | 524 | Refer to Documentation/block/request.txt for details about all the request |
525 | structure fields and a quick reference about the layers which are | 525 | structure fields and a quick reference about the layers which are |
526 | supposed to use or modify those fields. | 526 | supposed to use or modify those fields. |
527 | 527 | ||
528 | struct request { | 528 | struct request { |
529 | struct list_head queuelist; /* Not meant to be directly accessed by | 529 | struct list_head queuelist; /* Not meant to be directly accessed by |
530 | the driver. | 530 | the driver. |
531 | Used by q->elv_next_request_fn | 531 | Used by q->elv_next_request_fn |
532 | rq->queue is gone | 532 | rq->queue is gone |
533 | */ | 533 | */ |
534 | . | 534 | . |
535 | . | 535 | . |
536 | unsigned char cmd[16]; /* prebuilt command data block */ | 536 | unsigned char cmd[16]; /* prebuilt command data block */ |
537 | unsigned long flags; /* also includes earlier rq->cmd settings */ | 537 | unsigned long flags; /* also includes earlier rq->cmd settings */ |
538 | . | 538 | . |
539 | . | 539 | . |
540 | sector_t sector; /* this field is now of type sector_t instead of int | 540 | sector_t sector; /* this field is now of type sector_t instead of int |
541 | preparation for 64 bit sectors */ | 541 | preparation for 64 bit sectors */ |
542 | . | 542 | . |
543 | . | 543 | . |
544 | 544 | ||
545 | /* Number of scatter-gather DMA addr+len pairs after | 545 | /* Number of scatter-gather DMA addr+len pairs after |
546 | * physical address coalescing is performed. | 546 | * physical address coalescing is performed. |
547 | */ | 547 | */ |
548 | unsigned short nr_phys_segments; | 548 | unsigned short nr_phys_segments; |
549 | 549 | ||
550 | /* Number of scatter-gather addr+len pairs after | 550 | /* Number of scatter-gather addr+len pairs after |
551 | * physical and DMA remapping hardware coalescing is performed. | 551 | * physical and DMA remapping hardware coalescing is performed. |
552 | * This is the number of scatter-gather entries the driver | 552 | * This is the number of scatter-gather entries the driver |
553 | * will actually have to deal with after DMA mapping is done. | 553 | * will actually have to deal with after DMA mapping is done. |
554 | */ | 554 | */ |
555 | unsigned short nr_hw_segments; | 555 | unsigned short nr_hw_segments; |
556 | 556 | ||
557 | /* Various sector counts */ | 557 | /* Various sector counts */ |
558 | unsigned long nr_sectors; /* no. of sectors left: driver modifiable */ | 558 | unsigned long nr_sectors; /* no. of sectors left: driver modifiable */ |
559 | unsigned long hard_nr_sectors; /* block internal copy of above */ | 559 | unsigned long hard_nr_sectors; /* block internal copy of above */ |
560 | unsigned int current_nr_sectors; /* no. of sectors left in the | 560 | unsigned int current_nr_sectors; /* no. of sectors left in the |
561 | current segment:driver modifiable */ | 561 | current segment:driver modifiable */ |
562 | unsigned long hard_cur_sectors; /* block internal copy of the above */ | 562 | unsigned long hard_cur_sectors; /* block internal copy of the above */ |
563 | . | 563 | . |
564 | . | 564 | . |
565 | int tag; /* command tag associated with request */ | 565 | int tag; /* command tag associated with request */ |
566 | void *special; /* same as before */ | 566 | void *special; /* same as before */ |
567 | char *buffer; /* valid only for low memory buffers upto | 567 | char *buffer; /* valid only for low memory buffers upto |
568 | current_nr_sectors */ | 568 | current_nr_sectors */ |
569 | . | 569 | . |
570 | . | 570 | . |
571 | struct bio *bio, *biotail; /* bio list instead of bh */ | 571 | struct bio *bio, *biotail; /* bio list instead of bh */ |
572 | struct request_list *rl; | 572 | struct request_list *rl; |
573 | } | 573 | } |
574 | 574 | ||
575 | See the rq_flag_bits definitions for an explanation of the various flags | 575 | See the rq_flag_bits definitions for an explanation of the various flags |
576 | available. Some bits are used by the block layer or i/o scheduler. | 576 | available. Some bits are used by the block layer or i/o scheduler. |
577 | 577 | ||
578 | The behaviour of the various sector counts are almost the same as before, | 578 | The behaviour of the various sector counts are almost the same as before, |
579 | except that since we have multi-segment bios, current_nr_sectors refers | 579 | except that since we have multi-segment bios, current_nr_sectors refers |
580 | to the numbers of sectors in the current segment being processed which could | 580 | to the numbers of sectors in the current segment being processed which could |
581 | be one of the many segments in the current bio (i.e i/o completion unit). | 581 | be one of the many segments in the current bio (i.e i/o completion unit). |
582 | The nr_sectors value refers to the total number of sectors in the whole | 582 | The nr_sectors value refers to the total number of sectors in the whole |
583 | request that remain to be transferred (no change). The purpose of the | 583 | request that remain to be transferred (no change). The purpose of the |
584 | hard_xxx values is for block to remember these counts every time it hands | 584 | hard_xxx values is for block to remember these counts every time it hands |
585 | over the request to the driver. These values are updated by block on | 585 | over the request to the driver. These values are updated by block on |
586 | end_that_request_first, i.e. every time the driver completes a part of the | 586 | end_that_request_first, i.e. every time the driver completes a part of the |
587 | transfer and invokes block end*request helpers to mark this. The | 587 | transfer and invokes block end*request helpers to mark this. The |
588 | driver should not modify these values. The block layer sets up the | 588 | driver should not modify these values. The block layer sets up the |
589 | nr_sectors and current_nr_sectors fields (based on the corresponding | 589 | nr_sectors and current_nr_sectors fields (based on the corresponding |
590 | hard_xxx values and the number of bytes transferred) and updates it on | 590 | hard_xxx values and the number of bytes transferred) and updates it on |
591 | every transfer that invokes end_that_request_first. It does the same for the | 591 | every transfer that invokes end_that_request_first. It does the same for the |
592 | buffer, bio, bio->bi_idx fields too. | 592 | buffer, bio, bio->bi_idx fields too. |
593 | 593 | ||
594 | The buffer field is just a virtual address mapping of the current segment | 594 | The buffer field is just a virtual address mapping of the current segment |
595 | of the i/o buffer in cases where the buffer resides in low-memory. For high | 595 | of the i/o buffer in cases where the buffer resides in low-memory. For high |
596 | memory i/o, this field is not valid and must not be used by drivers. | 596 | memory i/o, this field is not valid and must not be used by drivers. |
597 | 597 | ||
598 | Code that sets up its own request structures and passes them down to | 598 | Code that sets up its own request structures and passes them down to |
599 | a driver needs to be careful about interoperation with the block layer helper | 599 | a driver needs to be careful about interoperation with the block layer helper |
600 | functions which the driver uses. (Section 1.3) | 600 | functions which the driver uses. (Section 1.3) |
601 | 601 | ||
602 | 3. Using bios | 602 | 3. Using bios |
603 | 603 | ||
604 | 3.1 Setup/Teardown | 604 | 3.1 Setup/Teardown |
605 | 605 | ||
606 | There are routines for managing the allocation, and reference counting, and | 606 | There are routines for managing the allocation, and reference counting, and |
607 | freeing of bios (bio_alloc, bio_get, bio_put). | 607 | freeing of bios (bio_alloc, bio_get, bio_put). |
608 | 608 | ||
609 | This makes use of Ingo Molnar's mempool implementation, which enables | 609 | This makes use of Ingo Molnar's mempool implementation, which enables |
610 | subsystems like bio to maintain their own reserve memory pools for guaranteed | 610 | subsystems like bio to maintain their own reserve memory pools for guaranteed |
611 | deadlock-free allocations during extreme VM load. For example, the VM | 611 | deadlock-free allocations during extreme VM load. For example, the VM |
612 | subsystem makes use of the block layer to writeout dirty pages in order to be | 612 | subsystem makes use of the block layer to writeout dirty pages in order to be |
613 | able to free up memory space, a case which needs careful handling. The | 613 | able to free up memory space, a case which needs careful handling. The |
614 | allocation logic draws from the preallocated emergency reserve in situations | 614 | allocation logic draws from the preallocated emergency reserve in situations |
615 | where it cannot allocate through normal means. If the pool is empty and it | 615 | where it cannot allocate through normal means. If the pool is empty and it |
616 | can wait, then it would trigger action that would help free up memory or | 616 | can wait, then it would trigger action that would help free up memory or |
617 | replenish the pool (without deadlocking) and wait for availability in the pool. | 617 | replenish the pool (without deadlocking) and wait for availability in the pool. |
618 | If it is in IRQ context, and hence not in a position to do this, allocation | 618 | If it is in IRQ context, and hence not in a position to do this, allocation |
619 | could fail if the pool is empty. In general mempool always first tries to | 619 | could fail if the pool is empty. In general mempool always first tries to |
620 | perform allocation without having to wait, even if it means digging into the | 620 | perform allocation without having to wait, even if it means digging into the |
621 | pool as long it is not less that 50% full. | 621 | pool as long it is not less that 50% full. |
622 | 622 | ||
623 | On a free, memory is released to the pool or directly freed depending on | 623 | On a free, memory is released to the pool or directly freed depending on |
624 | the current availability in the pool. The mempool interface lets the | 624 | the current availability in the pool. The mempool interface lets the |
625 | subsystem specify the routines to be used for normal alloc and free. In the | 625 | subsystem specify the routines to be used for normal alloc and free. In the |
626 | case of bio, these routines make use of the standard slab allocator. | 626 | case of bio, these routines make use of the standard slab allocator. |
627 | 627 | ||
628 | The caller of bio_alloc is expected to taken certain steps to avoid | 628 | The caller of bio_alloc is expected to taken certain steps to avoid |
629 | deadlocks, e.g. avoid trying to allocate more memory from the pool while | 629 | deadlocks, e.g. avoid trying to allocate more memory from the pool while |
630 | already holding memory obtained from the pool. | 630 | already holding memory obtained from the pool. |
631 | [TBD: This is a potential issue, though a rare possibility | 631 | [TBD: This is a potential issue, though a rare possibility |
632 | in the bounce bio allocation that happens in the current code, since | 632 | in the bounce bio allocation that happens in the current code, since |
633 | it ends up allocating a second bio from the same pool while | 633 | it ends up allocating a second bio from the same pool while |
634 | holding the original bio ] | 634 | holding the original bio ] |
635 | 635 | ||
636 | Memory allocated from the pool should be released back within a limited | 636 | Memory allocated from the pool should be released back within a limited |
637 | amount of time (in the case of bio, that would be after the i/o is completed). | 637 | amount of time (in the case of bio, that would be after the i/o is completed). |
638 | This ensures that if part of the pool has been used up, some work (in this | 638 | This ensures that if part of the pool has been used up, some work (in this |
639 | case i/o) must already be in progress and memory would be available when it | 639 | case i/o) must already be in progress and memory would be available when it |
640 | is over. If allocating from multiple pools in the same code path, the order | 640 | is over. If allocating from multiple pools in the same code path, the order |
641 | or hierarchy of allocation needs to be consistent, just the way one deals | 641 | or hierarchy of allocation needs to be consistent, just the way one deals |
642 | with multiple locks. | 642 | with multiple locks. |
643 | 643 | ||
644 | The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc()) | 644 | The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc()) |
645 | for a non-clone bio. There are the 6 pools setup for different size biovecs, | 645 | for a non-clone bio. There are the 6 pools setup for different size biovecs, |
646 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the | 646 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the |
647 | given size from these slabs. | 647 | given size from these slabs. |
648 | 648 | ||
649 | The bi_destructor() routine takes into account the possibility of the bio | 649 | The bi_destructor() routine takes into account the possibility of the bio |
650 | having originated from a different source (see later discussions on | 650 | having originated from a different source (see later discussions on |
651 | n/w to block transfers and kvec_cb) | 651 | n/w to block transfers and kvec_cb) |
652 | 652 | ||
653 | The bio_get() routine may be used to hold an extra reference on a bio prior | 653 | The bio_get() routine may be used to hold an extra reference on a bio prior |
654 | to i/o submission, if the bio fields are likely to be accessed after the | 654 | to i/o submission, if the bio fields are likely to be accessed after the |
655 | i/o is issued (since the bio may otherwise get freed in case i/o completion | 655 | i/o is issued (since the bio may otherwise get freed in case i/o completion |
656 | happens in the meantime). | 656 | happens in the meantime). |
657 | 657 | ||
658 | The bio_clone() routine may be used to duplicate a bio, where the clone | 658 | The bio_clone() routine may be used to duplicate a bio, where the clone |
659 | shares the bio_vec_list with the original bio (i.e. both point to the | 659 | shares the bio_vec_list with the original bio (i.e. both point to the |
660 | same bio_vec_list). This would typically be used for splitting i/o requests | 660 | same bio_vec_list). This would typically be used for splitting i/o requests |
661 | in lvm or md. | 661 | in lvm or md. |
662 | 662 | ||
663 | 3.2 Generic bio helper Routines | 663 | 3.2 Generic bio helper Routines |
664 | 664 | ||
665 | 3.2.1 Traversing segments and completion units in a request | 665 | 3.2.1 Traversing segments and completion units in a request |
666 | 666 | ||
667 | The macros bio_for_each_segment() and rq_for_each_bio() should be used for | 667 | The macros bio_for_each_segment() and rq_for_each_bio() should be used for |
668 | traversing the bios in the request list (drivers should avoid directly | 668 | traversing the bios in the request list (drivers should avoid directly |
669 | trying to do it themselves). Using these helpers should also make it easier | 669 | trying to do it themselves). Using these helpers should also make it easier |
670 | to cope with block changes in the future. | 670 | to cope with block changes in the future. |
671 | 671 | ||
672 | rq_for_each_bio(bio, rq) | 672 | rq_for_each_bio(bio, rq) |
673 | bio_for_each_segment(bio_vec, bio, i) | 673 | bio_for_each_segment(bio_vec, bio, i) |
674 | /* bio_vec is now current segment */ | 674 | /* bio_vec is now current segment */ |
675 | 675 | ||
676 | I/O completion callbacks are per-bio rather than per-segment, so drivers | 676 | I/O completion callbacks are per-bio rather than per-segment, so drivers |
677 | that traverse bio chains on completion need to keep that in mind. Drivers | 677 | that traverse bio chains on completion need to keep that in mind. Drivers |
678 | which don't make a distinction between segments and completion units would | 678 | which don't make a distinction between segments and completion units would |
679 | need to be reorganized to support multi-segment bios. | 679 | need to be reorganized to support multi-segment bios. |
680 | 680 | ||
681 | 3.2.2 Setting up DMA scatterlists | 681 | 3.2.2 Setting up DMA scatterlists |
682 | 682 | ||
683 | The blk_rq_map_sg() helper routine would be used for setting up scatter | 683 | The blk_rq_map_sg() helper routine would be used for setting up scatter |
684 | gather lists from a request, so a driver need not do it on its own. | 684 | gather lists from a request, so a driver need not do it on its own. |
685 | 685 | ||
686 | nr_segments = blk_rq_map_sg(q, rq, scatterlist); | 686 | nr_segments = blk_rq_map_sg(q, rq, scatterlist); |
687 | 687 | ||
688 | The helper routine provides a level of abstraction which makes it easier | 688 | The helper routine provides a level of abstraction which makes it easier |
689 | to modify the internals of request to scatterlist conversion down the line | 689 | to modify the internals of request to scatterlist conversion down the line |
690 | without breaking drivers. The blk_rq_map_sg routine takes care of several | 690 | without breaking drivers. The blk_rq_map_sg routine takes care of several |
691 | things like collapsing physically contiguous segments (if QUEUE_FLAG_CLUSTER | 691 | things like collapsing physically contiguous segments (if QUEUE_FLAG_CLUSTER |
692 | is set) and correct segment accounting to avoid exceeding the limits which | 692 | is set) and correct segment accounting to avoid exceeding the limits which |
693 | the i/o hardware can handle, based on various queue properties. | 693 | the i/o hardware can handle, based on various queue properties. |
694 | 694 | ||
695 | - Prevents a clustered segment from crossing a 4GB mem boundary | 695 | - Prevents a clustered segment from crossing a 4GB mem boundary |
696 | - Avoids building segments that would exceed the number of physical | 696 | - Avoids building segments that would exceed the number of physical |
697 | memory segments that the driver can handle (phys_segments) and the | 697 | memory segments that the driver can handle (phys_segments) and the |
698 | number that the underlying hardware can handle at once, accounting for | 698 | number that the underlying hardware can handle at once, accounting for |
699 | DMA remapping (hw_segments) (i.e. IOMMU aware limits). | 699 | DMA remapping (hw_segments) (i.e. IOMMU aware limits). |
700 | 700 | ||
701 | Routines which the low level driver can use to set up the segment limits: | 701 | Routines which the low level driver can use to set up the segment limits: |
702 | 702 | ||
703 | blk_queue_max_hw_segments() : Sets an upper limit of the maximum number of | 703 | blk_queue_max_hw_segments() : Sets an upper limit of the maximum number of |
704 | hw data segments in a request (i.e. the maximum number of address/length | 704 | hw data segments in a request (i.e. the maximum number of address/length |
705 | pairs the host adapter can actually hand to the device at once) | 705 | pairs the host adapter can actually hand to the device at once) |
706 | 706 | ||
707 | blk_queue_max_phys_segments() : Sets an upper limit on the maximum number | 707 | blk_queue_max_phys_segments() : Sets an upper limit on the maximum number |
708 | of physical data segments in a request (i.e. the largest sized scatter list | 708 | of physical data segments in a request (i.e. the largest sized scatter list |
709 | a driver could handle) | 709 | a driver could handle) |
710 | 710 | ||
711 | 3.2.3 I/O completion | 711 | 3.2.3 I/O completion |
712 | 712 | ||
713 | The existing generic block layer helper routines end_request, | 713 | The existing generic block layer helper routines end_request, |
714 | end_that_request_first and end_that_request_last can be used for i/o | 714 | end_that_request_first and end_that_request_last can be used for i/o |
715 | completion (and setting things up so the rest of the i/o or the next | 715 | completion (and setting things up so the rest of the i/o or the next |
716 | request can be kicked of) as before. With the introduction of multi-page | 716 | request can be kicked of) as before. With the introduction of multi-page |
717 | bio support, end_that_request_first requires an additional argument indicating | 717 | bio support, end_that_request_first requires an additional argument indicating |
718 | the number of sectors completed. | 718 | the number of sectors completed. |
719 | 719 | ||
720 | 3.2.4 Implications for drivers that do not interpret bios (don't handle | 720 | 3.2.4 Implications for drivers that do not interpret bios (don't handle |
721 | multiple segments) | 721 | multiple segments) |
722 | 722 | ||
723 | Drivers that do not interpret bios e.g those which do not handle multiple | 723 | Drivers that do not interpret bios e.g those which do not handle multiple |
724 | segments and do not support i/o into high memory addresses (require bounce | 724 | segments and do not support i/o into high memory addresses (require bounce |
725 | buffers) and expect only virtually mapped buffers, can access the rq->buffer | 725 | buffers) and expect only virtually mapped buffers, can access the rq->buffer |
726 | field. As before the driver should use current_nr_sectors to determine the | 726 | field. As before the driver should use current_nr_sectors to determine the |
727 | size of remaining data in the current segment (that is the maximum it can | 727 | size of remaining data in the current segment (that is the maximum it can |
728 | transfer in one go unless it interprets segments), and rely on the block layer | 728 | transfer in one go unless it interprets segments), and rely on the block layer |
729 | end_request, or end_that_request_first/last to take care of all accounting | 729 | end_request, or end_that_request_first/last to take care of all accounting |
730 | and transparent mapping of the next bio segment when a segment boundary | 730 | and transparent mapping of the next bio segment when a segment boundary |
731 | is crossed on completion of a transfer. (The end*request* functions should | 731 | is crossed on completion of a transfer. (The end*request* functions should |
732 | be used if only if the request has come down from block/bio path, not for | 732 | be used if only if the request has come down from block/bio path, not for |
733 | direct access requests which only specify rq->buffer without a valid rq->bio) | 733 | direct access requests which only specify rq->buffer without a valid rq->bio) |
734 | 734 | ||
735 | 3.2.5 Generic request command tagging | 735 | 3.2.5 Generic request command tagging |
736 | 736 | ||
737 | 3.2.5.1 Tag helpers | 737 | 3.2.5.1 Tag helpers |
738 | 738 | ||
739 | Block now offers some simple generic functionality to help support command | 739 | Block now offers some simple generic functionality to help support command |
740 | queueing (typically known as tagged command queueing), ie manage more than | 740 | queueing (typically known as tagged command queueing), ie manage more than |
741 | one outstanding command on a queue at any given time. | 741 | one outstanding command on a queue at any given time. |
742 | 742 | ||
743 | blk_queue_init_tags(request_queue_t *q, int depth) | 743 | blk_queue_init_tags(request_queue_t *q, int depth) |
744 | 744 | ||
745 | Initialize internal command tagging structures for a maximum | 745 | Initialize internal command tagging structures for a maximum |
746 | depth of 'depth'. | 746 | depth of 'depth'. |
747 | 747 | ||
748 | blk_queue_free_tags((request_queue_t *q) | 748 | blk_queue_free_tags((request_queue_t *q) |
749 | 749 | ||
750 | Teardown tag info associated with the queue. This will be done | 750 | Teardown tag info associated with the queue. This will be done |
751 | automatically by block if blk_queue_cleanup() is called on a queue | 751 | automatically by block if blk_queue_cleanup() is called on a queue |
752 | that is using tagging. | 752 | that is using tagging. |
753 | 753 | ||
754 | The above are initialization and exit management, the main helpers during | 754 | The above are initialization and exit management, the main helpers during |
755 | normal operations are: | 755 | normal operations are: |
756 | 756 | ||
757 | blk_queue_start_tag(request_queue_t *q, struct request *rq) | 757 | blk_queue_start_tag(request_queue_t *q, struct request *rq) |
758 | 758 | ||
759 | Start tagged operation for this request. A free tag number between | 759 | Start tagged operation for this request. A free tag number between |
760 | 0 and 'depth' is assigned to the request (rq->tag holds this number), | 760 | 0 and 'depth' is assigned to the request (rq->tag holds this number), |
761 | and 'rq' is added to the internal tag management. If the maximum depth | 761 | and 'rq' is added to the internal tag management. If the maximum depth |
762 | for this queue is already achieved (or if the tag wasn't started for | 762 | for this queue is already achieved (or if the tag wasn't started for |
763 | some other reason), 1 is returned. Otherwise 0 is returned. | 763 | some other reason), 1 is returned. Otherwise 0 is returned. |
764 | 764 | ||
765 | blk_queue_end_tag(request_queue_t *q, struct request *rq) | 765 | blk_queue_end_tag(request_queue_t *q, struct request *rq) |
766 | 766 | ||
767 | End tagged operation on this request. 'rq' is removed from the internal | 767 | End tagged operation on this request. 'rq' is removed from the internal |
768 | book keeping structures. | 768 | book keeping structures. |
769 | 769 | ||
770 | To minimize struct request and queue overhead, the tag helpers utilize some | 770 | To minimize struct request and queue overhead, the tag helpers utilize some |
771 | of the same request members that are used for normal request queue management. | 771 | of the same request members that are used for normal request queue management. |
772 | This means that a request cannot both be an active tag and be on the queue | 772 | This means that a request cannot both be an active tag and be on the queue |
773 | list at the same time. blk_queue_start_tag() will remove the request, but | 773 | list at the same time. blk_queue_start_tag() will remove the request, but |
774 | the driver must remember to call blk_queue_end_tag() before signalling | 774 | the driver must remember to call blk_queue_end_tag() before signalling |
775 | completion of the request to the block layer. This means ending tag | 775 | completion of the request to the block layer. This means ending tag |
776 | operations before calling end_that_request_last()! For an example of a user | 776 | operations before calling end_that_request_last()! For an example of a user |
777 | of these helpers, see the IDE tagged command queueing support. | 777 | of these helpers, see the IDE tagged command queueing support. |
778 | 778 | ||
779 | Certain hardware conditions may dictate a need to invalidate the block tag | 779 | Certain hardware conditions may dictate a need to invalidate the block tag |
780 | queue. For instance, on IDE any tagged request error needs to clear both | 780 | queue. For instance, on IDE any tagged request error needs to clear both |
781 | the hardware and software block queue and enable the driver to sanely restart | 781 | the hardware and software block queue and enable the driver to sanely restart |
782 | all the outstanding requests. There's a third helper to do that: | 782 | all the outstanding requests. There's a third helper to do that: |
783 | 783 | ||
784 | blk_queue_invalidate_tags(request_queue_t *q) | 784 | blk_queue_invalidate_tags(request_queue_t *q) |
785 | 785 | ||
786 | Clear the internal block tag queue and re-add all the pending requests | 786 | Clear the internal block tag queue and re-add all the pending requests |
787 | to the request queue. The driver will receive them again on the | 787 | to the request queue. The driver will receive them again on the |
788 | next request_fn run, just like it did the first time it encountered | 788 | next request_fn run, just like it did the first time it encountered |
789 | them. | 789 | them. |
790 | 790 | ||
791 | 3.2.5.2 Tag info | 791 | 3.2.5.2 Tag info |
792 | 792 | ||
793 | Some block functions exist to query current tag status or to go from a | 793 | Some block functions exist to query current tag status or to go from a |
794 | tag number to the associated request. These are, in no particular order: | 794 | tag number to the associated request. These are, in no particular order: |
795 | 795 | ||
796 | blk_queue_tagged(q) | 796 | blk_queue_tagged(q) |
797 | 797 | ||
798 | Returns 1 if the queue 'q' is using tagging, 0 if not. | 798 | Returns 1 if the queue 'q' is using tagging, 0 if not. |
799 | 799 | ||
800 | blk_queue_tag_request(q, tag) | 800 | blk_queue_tag_request(q, tag) |
801 | 801 | ||
802 | Returns a pointer to the request associated with tag 'tag'. | 802 | Returns a pointer to the request associated with tag 'tag'. |
803 | 803 | ||
804 | blk_queue_tag_depth(q) | 804 | blk_queue_tag_depth(q) |
805 | 805 | ||
806 | Return current queue depth. | 806 | Return current queue depth. |
807 | 807 | ||
808 | blk_queue_tag_queue(q) | 808 | blk_queue_tag_queue(q) |
809 | 809 | ||
810 | Returns 1 if the queue can accept a new queued command, 0 if we are | 810 | Returns 1 if the queue can accept a new queued command, 0 if we are |
811 | at the maximum depth already. | 811 | at the maximum depth already. |
812 | 812 | ||
813 | blk_queue_rq_tagged(rq) | 813 | blk_queue_rq_tagged(rq) |
814 | 814 | ||
815 | Returns 1 if the request 'rq' is tagged. | 815 | Returns 1 if the request 'rq' is tagged. |
816 | 816 | ||
817 | 3.2.5.2 Internal structure | 817 | 3.2.5.2 Internal structure |
818 | 818 | ||
819 | Internally, block manages tags in the blk_queue_tag structure: | 819 | Internally, block manages tags in the blk_queue_tag structure: |
820 | 820 | ||
821 | struct blk_queue_tag { | 821 | struct blk_queue_tag { |
822 | struct request **tag_index; /* array or pointers to rq */ | 822 | struct request **tag_index; /* array or pointers to rq */ |
823 | unsigned long *tag_map; /* bitmap of free tags */ | 823 | unsigned long *tag_map; /* bitmap of free tags */ |
824 | struct list_head busy_list; /* fifo list of busy tags */ | 824 | struct list_head busy_list; /* fifo list of busy tags */ |
825 | int busy; /* queue depth */ | 825 | int busy; /* queue depth */ |
826 | int max_depth; /* max queue depth */ | 826 | int max_depth; /* max queue depth */ |
827 | }; | 827 | }; |
828 | 828 | ||
829 | Most of the above is simple and straight forward, however busy_list may need | 829 | Most of the above is simple and straight forward, however busy_list may need |
830 | a bit of explaining. Normally we don't care too much about request ordering, | 830 | a bit of explaining. Normally we don't care too much about request ordering, |
831 | but in the event of any barrier requests in the tag queue we need to ensure | 831 | but in the event of any barrier requests in the tag queue we need to ensure |
832 | that requests are restarted in the order they were queue. This may happen | 832 | that requests are restarted in the order they were queue. This may happen |
833 | if the driver needs to use blk_queue_invalidate_tags(). | 833 | if the driver needs to use blk_queue_invalidate_tags(). |
834 | 834 | ||
835 | Tagging also defines a new request flag, REQ_QUEUED. This is set whenever | 835 | Tagging also defines a new request flag, REQ_QUEUED. This is set whenever |
836 | a request is currently tagged. You should not use this flag directly, | 836 | a request is currently tagged. You should not use this flag directly, |
837 | blk_rq_tagged(rq) is the portable way to do so. | 837 | blk_rq_tagged(rq) is the portable way to do so. |
838 | 838 | ||
839 | 3.3 I/O Submission | 839 | 3.3 I/O Submission |
840 | 840 | ||
841 | The routine submit_bio() is used to submit a single io. Higher level i/o | 841 | The routine submit_bio() is used to submit a single io. Higher level i/o |
842 | routines make use of this: | 842 | routines make use of this: |
843 | 843 | ||
844 | (a) Buffered i/o: | 844 | (a) Buffered i/o: |
845 | The routine submit_bh() invokes submit_bio() on a bio corresponding to the | 845 | The routine submit_bh() invokes submit_bio() on a bio corresponding to the |
846 | bh, allocating the bio if required. ll_rw_block() uses submit_bh() as before. | 846 | bh, allocating the bio if required. ll_rw_block() uses submit_bh() as before. |
847 | 847 | ||
848 | (b) Kiobuf i/o (for raw/direct i/o): | 848 | (b) Kiobuf i/o (for raw/direct i/o): |
849 | The ll_rw_kio() routine breaks up the kiobuf into page sized chunks and | 849 | The ll_rw_kio() routine breaks up the kiobuf into page sized chunks and |
850 | maps the array to one or more multi-page bios, issuing submit_bio() to | 850 | maps the array to one or more multi-page bios, issuing submit_bio() to |
851 | perform the i/o on each of these. | 851 | perform the i/o on each of these. |
852 | 852 | ||
853 | The embedded bh array in the kiobuf structure has been removed and no | 853 | The embedded bh array in the kiobuf structure has been removed and no |
854 | preallocation of bios is done for kiobufs. [The intent is to remove the | 854 | preallocation of bios is done for kiobufs. [The intent is to remove the |
855 | blocks array as well, but it's currently in there to kludge around direct i/o.] | 855 | blocks array as well, but it's currently in there to kludge around direct i/o.] |
856 | Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc. | 856 | Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc. |
857 | 857 | ||
858 | Todo/Observation: | 858 | Todo/Observation: |
859 | 859 | ||
860 | A single kiobuf structure is assumed to correspond to a contiguous range | 860 | A single kiobuf structure is assumed to correspond to a contiguous range |
861 | of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec. | 861 | of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec. |
862 | So right now it wouldn't work for direct i/o on non-contiguous blocks. | 862 | So right now it wouldn't work for direct i/o on non-contiguous blocks. |
863 | This is to be resolved. The eventual direction is to replace kiobuf | 863 | This is to be resolved. The eventual direction is to replace kiobuf |
864 | by kvec's. | 864 | by kvec's. |
865 | 865 | ||
866 | Badari Pulavarty has a patch to implement direct i/o correctly using | 866 | Badari Pulavarty has a patch to implement direct i/o correctly using |
867 | bio and kvec. | 867 | bio and kvec. |
868 | 868 | ||
869 | 869 | ||
870 | (c) Page i/o: | 870 | (c) Page i/o: |
871 | Todo/Under discussion: | 871 | Todo/Under discussion: |
872 | 872 | ||
873 | Andrew Morton's multi-page bio patches attempt to issue multi-page | 873 | Andrew Morton's multi-page bio patches attempt to issue multi-page |
874 | writeouts (and reads) from the page cache, by directly building up | 874 | writeouts (and reads) from the page cache, by directly building up |
875 | large bios for submission completely bypassing the usage of buffer | 875 | large bios for submission completely bypassing the usage of buffer |
876 | heads. This work is still in progress. | 876 | heads. This work is still in progress. |
877 | 877 | ||
878 | Christoph Hellwig had some code that uses bios for page-io (rather than | 878 | Christoph Hellwig had some code that uses bios for page-io (rather than |
879 | bh). This isn't included in bio as yet. Christoph was also working on a | 879 | bh). This isn't included in bio as yet. Christoph was also working on a |
880 | design for representing virtual/real extents as an entity and modifying | 880 | design for representing virtual/real extents as an entity and modifying |
881 | some of the address space ops interfaces to utilize this abstraction rather | 881 | some of the address space ops interfaces to utilize this abstraction rather |
882 | than buffer_heads. (This is somewhat along the lines of the SGI XFS pagebuf | 882 | than buffer_heads. (This is somewhat along the lines of the SGI XFS pagebuf |
883 | abstraction, but intended to be as lightweight as possible). | 883 | abstraction, but intended to be as lightweight as possible). |
884 | 884 | ||
885 | (d) Direct access i/o: | 885 | (d) Direct access i/o: |
886 | Direct access requests that do not contain bios would be submitted differently | 886 | Direct access requests that do not contain bios would be submitted differently |
887 | as discussed earlier in section 1.3. | 887 | as discussed earlier in section 1.3. |
888 | 888 | ||
889 | Aside: | 889 | Aside: |
890 | 890 | ||
891 | Kvec i/o: | 891 | Kvec i/o: |
892 | 892 | ||
893 | Ben LaHaise's aio code uses a slightly different structure instead | 893 | Ben LaHaise's aio code uses a slightly different structure instead |
894 | of kiobufs, called a kvec_cb. This contains an array of <page, offset, len> | 894 | of kiobufs, called a kvec_cb. This contains an array of <page, offset, len> |
895 | tuples (very much like the networking code), together with a callback function | 895 | tuples (very much like the networking code), together with a callback function |
896 | and data pointer. This is embedded into a brw_cb structure when passed | 896 | and data pointer. This is embedded into a brw_cb structure when passed |
897 | to brw_kvec_async(). | 897 | to brw_kvec_async(). |
898 | 898 | ||
899 | Now it should be possible to directly map these kvecs to a bio. Just as while | 899 | Now it should be possible to directly map these kvecs to a bio. Just as while |
900 | cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec | 900 | cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec |
901 | array pointer to point to the veclet array in kvecs. | 901 | array pointer to point to the veclet array in kvecs. |
902 | 902 | ||
903 | TBD: In order for this to work, some changes are needed in the way multi-page | 903 | TBD: In order for this to work, some changes are needed in the way multi-page |
904 | bios are handled today. The values of the tuples in such a vector passed in | 904 | bios are handled today. The values of the tuples in such a vector passed in |
905 | from higher level code should not be modified by the block layer in the course | 905 | from higher level code should not be modified by the block layer in the course |
906 | of its request processing, since that would make it hard for the higher layer | 906 | of its request processing, since that would make it hard for the higher layer |
907 | to continue to use the vector descriptor (kvec) after i/o completes. Instead, | 907 | to continue to use the vector descriptor (kvec) after i/o completes. Instead, |
908 | all such transient state should either be maintained in the request structure, | 908 | all such transient state should either be maintained in the request structure, |
909 | and passed on in some way to the endio completion routine. | 909 | and passed on in some way to the endio completion routine. |
910 | 910 | ||
911 | 911 | ||
912 | 4. The I/O scheduler | 912 | 4. The I/O scheduler |
913 | I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch | 913 | I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch |
914 | queue and specific I/O schedulers. Unless stated otherwise, elevator is used | 914 | queue and specific I/O schedulers. Unless stated otherwise, elevator is used |
915 | to refer to both parts and I/O scheduler to specific I/O schedulers. | 915 | to refer to both parts and I/O scheduler to specific I/O schedulers. |
916 | 916 | ||
917 | Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c. | 917 | Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c. |
918 | The generic dispatch queue is responsible for properly ordering barrier | 918 | The generic dispatch queue is responsible for properly ordering barrier |
919 | requests, requeueing, handling non-fs requests and all other subtleties. | 919 | requests, requeueing, handling non-fs requests and all other subtleties. |
920 | 920 | ||
921 | Specific I/O schedulers are responsible for ordering normal filesystem | 921 | Specific I/O schedulers are responsible for ordering normal filesystem |
922 | requests. They can also choose to delay certain requests to improve | 922 | requests. They can also choose to delay certain requests to improve |
923 | throughput or whatever purpose. As the plural form indicates, there are | 923 | throughput or whatever purpose. As the plural form indicates, there are |
924 | multiple I/O schedulers. They can be built as modules but at least one should | 924 | multiple I/O schedulers. They can be built as modules but at least one should |
925 | be built inside the kernel. Each queue can choose different one and can also | 925 | be built inside the kernel. Each queue can choose different one and can also |
926 | change to another one dynamically. | 926 | change to another one dynamically. |
927 | 927 | ||
928 | A block layer call to the i/o scheduler follows the convention elv_xxx(). This | 928 | A block layer call to the i/o scheduler follows the convention elv_xxx(). This |
929 | calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh, | 929 | calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh, |
930 | xxx and xxx might not match exactly, but use your imagination. If an elevator | 930 | xxx and xxx might not match exactly, but use your imagination. If an elevator |
931 | doesn't implement a function, the switch does nothing or some minimal house | 931 | doesn't implement a function, the switch does nothing or some minimal house |
932 | keeping work. | 932 | keeping work. |
933 | 933 | ||
934 | 4.1. I/O scheduler API | 934 | 4.1. I/O scheduler API |
935 | 935 | ||
936 | The functions an elevator may implement are: (* are mandatory) | 936 | The functions an elevator may implement are: (* are mandatory) |
937 | elevator_merge_fn called to query requests for merge with a bio | 937 | elevator_merge_fn called to query requests for merge with a bio |
938 | 938 | ||
939 | elevator_merge_req_fn called when two requests get merged. the one | 939 | elevator_merge_req_fn called when two requests get merged. the one |
940 | which gets merged into the other one will be | 940 | which gets merged into the other one will be |
941 | never seen by I/O scheduler again. IOW, after | 941 | never seen by I/O scheduler again. IOW, after |
942 | being merged, the request is gone. | 942 | being merged, the request is gone. |
943 | 943 | ||
944 | elevator_merged_fn called when a request in the scheduler has been | 944 | elevator_merged_fn called when a request in the scheduler has been |
945 | involved in a merge. It is used in the deadline | 945 | involved in a merge. It is used in the deadline |
946 | scheduler for example, to reposition the request | 946 | scheduler for example, to reposition the request |
947 | if its sorting order has changed. | 947 | if its sorting order has changed. |
948 | 948 | ||
949 | elevator_dispatch_fn fills the dispatch queue with ready requests. | 949 | elevator_dispatch_fn fills the dispatch queue with ready requests. |
950 | I/O schedulers are free to postpone requests by | 950 | I/O schedulers are free to postpone requests by |
951 | not filling the dispatch queue unless @force | 951 | not filling the dispatch queue unless @force |
952 | is non-zero. Once dispatched, I/O schedulers | 952 | is non-zero. Once dispatched, I/O schedulers |
953 | are not allowed to manipulate the requests - | 953 | are not allowed to manipulate the requests - |
954 | they belong to generic dispatch queue. | 954 | they belong to generic dispatch queue. |
955 | 955 | ||
956 | elevator_add_req_fn called to add a new request into the scheduler | 956 | elevator_add_req_fn called to add a new request into the scheduler |
957 | 957 | ||
958 | elevator_queue_empty_fn returns true if the merge queue is empty. | 958 | elevator_queue_empty_fn returns true if the merge queue is empty. |
959 | Drivers shouldn't use this, but rather check | 959 | Drivers shouldn't use this, but rather check |
960 | if elv_next_request is NULL (without losing the | 960 | if elv_next_request is NULL (without losing the |
961 | request if one exists!) | 961 | request if one exists!) |
962 | 962 | ||
963 | elevator_former_req_fn | 963 | elevator_former_req_fn |
964 | elevator_latter_req_fn These return the request before or after the | 964 | elevator_latter_req_fn These return the request before or after the |
965 | one specified in disk sort order. Used by the | 965 | one specified in disk sort order. Used by the |
966 | block layer to find merge possibilities. | 966 | block layer to find merge possibilities. |
967 | 967 | ||
968 | elevator_completed_req_fn called when a request is completed. | 968 | elevator_completed_req_fn called when a request is completed. |
969 | 969 | ||
970 | elevator_may_queue_fn returns true if the scheduler wants to allow the | 970 | elevator_may_queue_fn returns true if the scheduler wants to allow the |
971 | current context to queue a new request even if | 971 | current context to queue a new request even if |
972 | it is over the queue limit. This must be used | 972 | it is over the queue limit. This must be used |
973 | very carefully!! | 973 | very carefully!! |
974 | 974 | ||
975 | elevator_set_req_fn | 975 | elevator_set_req_fn |
976 | elevator_put_req_fn Must be used to allocate and free any elevator | 976 | elevator_put_req_fn Must be used to allocate and free any elevator |
977 | specific storage for a request. | 977 | specific storage for a request. |
978 | 978 | ||
979 | elevator_activate_req_fn Called when device driver first sees a request. | 979 | elevator_activate_req_fn Called when device driver first sees a request. |
980 | I/O schedulers can use this callback to | 980 | I/O schedulers can use this callback to |
981 | determine when actual execution of a request | 981 | determine when actual execution of a request |
982 | starts. | 982 | starts. |
983 | elevator_deactivate_req_fn Called when device driver decides to delay | 983 | elevator_deactivate_req_fn Called when device driver decides to delay |
984 | a request by requeueing it. | 984 | a request by requeueing it. |
985 | 985 | ||
986 | elevator_init_fn | 986 | elevator_init_fn |
987 | elevator_exit_fn Allocate and free any elevator specific storage | 987 | elevator_exit_fn Allocate and free any elevator specific storage |
988 | for a queue. | 988 | for a queue. |
989 | 989 | ||
990 | 4.2 Request flows seen by I/O schedulers | 990 | 4.2 Request flows seen by I/O schedulers |
991 | All requests seen by I/O schedulers strictly follow one of the following three | 991 | All requests seen by I/O schedulers strictly follow one of the following three |
992 | flows. | 992 | flows. |
993 | 993 | ||
994 | set_req_fn -> | 994 | set_req_fn -> |
995 | 995 | ||
996 | i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn -> | 996 | i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn -> |
997 | (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn | 997 | (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn |
998 | ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn | 998 | ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn |
999 | iii. [none] | 999 | iii. [none] |
1000 | 1000 | ||
1001 | -> put_req_fn | 1001 | -> put_req_fn |
1002 | 1002 | ||
1003 | 4.3 I/O scheduler implementation | 1003 | 4.3 I/O scheduler implementation |
1004 | The generic i/o scheduler algorithm attempts to sort/merge/batch requests for | 1004 | The generic i/o scheduler algorithm attempts to sort/merge/batch requests for |
1005 | optimal disk scan and request servicing performance (based on generic | 1005 | optimal disk scan and request servicing performance (based on generic |
1006 | principles and device capabilities), optimized for: | 1006 | principles and device capabilities), optimized for: |
1007 | i. improved throughput | 1007 | i. improved throughput |
1008 | ii. improved latency | 1008 | ii. improved latency |
1009 | iii. better utilization of h/w & CPU time | 1009 | iii. better utilization of h/w & CPU time |
1010 | 1010 | ||
1011 | Characteristics: | 1011 | Characteristics: |
1012 | 1012 | ||
1013 | i. Binary tree | 1013 | i. Binary tree |
1014 | AS and deadline i/o schedulers use red black binary trees for disk position | 1014 | AS and deadline i/o schedulers use red black binary trees for disk position |
1015 | sorting and searching, and a fifo linked list for time-based searching. This | 1015 | sorting and searching, and a fifo linked list for time-based searching. This |
1016 | gives good scalability and good availablility of information. Requests are | 1016 | gives good scalability and good availablility of information. Requests are |
1017 | almost always dispatched in disk sort order, so a cache is kept of the next | 1017 | almost always dispatched in disk sort order, so a cache is kept of the next |
1018 | request in sort order to prevent binary tree lookups. | 1018 | request in sort order to prevent binary tree lookups. |
1019 | 1019 | ||
1020 | This arrangement is not a generic block layer characteristic however, so | 1020 | This arrangement is not a generic block layer characteristic however, so |
1021 | elevators may implement queues as they please. | 1021 | elevators may implement queues as they please. |
1022 | 1022 | ||
1023 | ii. Merge hash | 1023 | ii. Merge hash |
1024 | AS and deadline use a hash table indexed by the last sector of a request. This | 1024 | AS and deadline use a hash table indexed by the last sector of a request. This |
1025 | enables merging code to quickly look up "back merge" candidates, even when | 1025 | enables merging code to quickly look up "back merge" candidates, even when |
1026 | multiple I/O streams are being performed at once on one disk. | 1026 | multiple I/O streams are being performed at once on one disk. |
1027 | 1027 | ||
1028 | "Front merges", a new request being merged at the front of an existing request, | 1028 | "Front merges", a new request being merged at the front of an existing request, |
1029 | are far less common than "back merges" due to the nature of most I/O patterns. | 1029 | are far less common than "back merges" due to the nature of most I/O patterns. |
1030 | Front merges are handled by the binary trees in AS and deadline schedulers. | 1030 | Front merges are handled by the binary trees in AS and deadline schedulers. |
1031 | 1031 | ||
1032 | iii. Plugging the queue to batch requests in anticipation of opportunities for | 1032 | iii. Plugging the queue to batch requests in anticipation of opportunities for |
1033 | merge/sort optimizations | 1033 | merge/sort optimizations |
1034 | 1034 | ||
1035 | This is just the same as in 2.4 so far, though per-device unplugging | 1035 | This is just the same as in 2.4 so far, though per-device unplugging |
1036 | support is anticipated for 2.5. Also with a priority-based i/o scheduler, | 1036 | support is anticipated for 2.5. Also with a priority-based i/o scheduler, |
1037 | such decisions could be based on request priorities. | 1037 | such decisions could be based on request priorities. |
1038 | 1038 | ||
1039 | Plugging is an approach that the current i/o scheduling algorithm resorts to so | 1039 | Plugging is an approach that the current i/o scheduling algorithm resorts to so |
1040 | that it collects up enough requests in the queue to be able to take | 1040 | that it collects up enough requests in the queue to be able to take |
1041 | advantage of the sorting/merging logic in the elevator. If the | 1041 | advantage of the sorting/merging logic in the elevator. If the |
1042 | queue is empty when a request comes in, then it plugs the request queue | 1042 | queue is empty when a request comes in, then it plugs the request queue |
1043 | (sort of like plugging the bottom of a vessel to get fluid to build up) | 1043 | (sort of like plugging the bottom of a vessel to get fluid to build up) |
1044 | till it fills up with a few more requests, before starting to service | 1044 | till it fills up with a few more requests, before starting to service |
1045 | the requests. This provides an opportunity to merge/sort the requests before | 1045 | the requests. This provides an opportunity to merge/sort the requests before |
1046 | passing them down to the device. There are various conditions when the queue is | 1046 | passing them down to the device. There are various conditions when the queue is |
1047 | unplugged (to open up the flow again), either through a scheduled task or | 1047 | unplugged (to open up the flow again), either through a scheduled task or |
1048 | could be on demand. For example wait_on_buffer sets the unplugging going | 1048 | could be on demand. For example wait_on_buffer sets the unplugging going |
1049 | (by running tq_disk) so the read gets satisfied soon. So in the read case, | 1049 | (by running tq_disk) so the read gets satisfied soon. So in the read case, |
1050 | the queue gets explicitly unplugged as part of waiting for completion, | 1050 | the queue gets explicitly unplugged as part of waiting for completion, |
1051 | in fact all queues get unplugged as a side-effect. | 1051 | in fact all queues get unplugged as a side-effect. |
1052 | 1052 | ||
1053 | Aside: | 1053 | Aside: |
1054 | This is kind of controversial territory, as it's not clear if plugging is | 1054 | This is kind of controversial territory, as it's not clear if plugging is |
1055 | always the right thing to do. Devices typically have their own queues, | 1055 | always the right thing to do. Devices typically have their own queues, |
1056 | and allowing a big queue to build up in software, while letting the device be | 1056 | and allowing a big queue to build up in software, while letting the device be |
1057 | idle for a while may not always make sense. The trick is to handle the fine | 1057 | idle for a while may not always make sense. The trick is to handle the fine |
1058 | balance between when to plug and when to open up. Also now that we have | 1058 | balance between when to plug and when to open up. Also now that we have |
1059 | multi-page bios being queued in one shot, we may not need to wait to merge | 1059 | multi-page bios being queued in one shot, we may not need to wait to merge |
1060 | a big request from the broken up pieces coming by. | 1060 | a big request from the broken up pieces coming by. |
1061 | 1061 | ||
1062 | Per-queue granularity unplugging (still a Todo) may help reduce some of the | 1062 | Per-queue granularity unplugging (still a Todo) may help reduce some of the |
1063 | concerns with just a single tq_disk flush approach. Something like | 1063 | concerns with just a single tq_disk flush approach. Something like |
1064 | blk_kick_queue() to unplug a specific queue (right away ?) | 1064 | blk_kick_queue() to unplug a specific queue (right away ?) |
1065 | or optionally, all queues, is in the plan. | 1065 | or optionally, all queues, is in the plan. |
1066 | 1066 | ||
1067 | 4.4 I/O contexts | 1067 | 4.4 I/O contexts |
1068 | I/O contexts provide a dynamically allocated per process data area. They may | 1068 | I/O contexts provide a dynamically allocated per process data area. They may |
1069 | be used in I/O schedulers, and in the block layer (could be used for IO statis, | 1069 | be used in I/O schedulers, and in the block layer (could be used for IO statis, |
1070 | priorities for example). See *io_context in block/ll_rw_blk.c, and as-iosched.c | 1070 | priorities for example). See *io_context in block/ll_rw_blk.c, and as-iosched.c |
1071 | for an example of usage in an i/o scheduler. | 1071 | for an example of usage in an i/o scheduler. |
1072 | 1072 | ||
1073 | 1073 | ||
1074 | 5. Scalability related changes | 1074 | 5. Scalability related changes |
1075 | 1075 | ||
1076 | 5.1 Granular Locking: io_request_lock replaced by a per-queue lock | 1076 | 5.1 Granular Locking: io_request_lock replaced by a per-queue lock |
1077 | 1077 | ||
1078 | The global io_request_lock has been removed as of 2.5, to avoid | 1078 | The global io_request_lock has been removed as of 2.5, to avoid |
1079 | the scalability bottleneck it was causing, and has been replaced by more | 1079 | the scalability bottleneck it was causing, and has been replaced by more |
1080 | granular locking. The request queue structure has a pointer to the | 1080 | granular locking. The request queue structure has a pointer to the |
1081 | lock to be used for that queue. As a result, locking can now be | 1081 | lock to be used for that queue. As a result, locking can now be |
1082 | per-queue, with a provision for sharing a lock across queues if | 1082 | per-queue, with a provision for sharing a lock across queues if |
1083 | necessary (e.g the scsi layer sets the queue lock pointers to the | 1083 | necessary (e.g the scsi layer sets the queue lock pointers to the |
1084 | corresponding adapter lock, which results in a per host locking | 1084 | corresponding adapter lock, which results in a per host locking |
1085 | granularity). The locking semantics are the same, i.e. locking is | 1085 | granularity). The locking semantics are the same, i.e. locking is |
1086 | still imposed by the block layer, grabbing the lock before | 1086 | still imposed by the block layer, grabbing the lock before |
1087 | request_fn execution which it means that lots of older drivers | 1087 | request_fn execution which it means that lots of older drivers |
1088 | should still be SMP safe. Drivers are free to drop the queue | 1088 | should still be SMP safe. Drivers are free to drop the queue |
1089 | lock themselves, if required. Drivers that explicitly used the | 1089 | lock themselves, if required. Drivers that explicitly used the |
1090 | io_request_lock for serialization need to be modified accordingly. | 1090 | io_request_lock for serialization need to be modified accordingly. |
1091 | Usually it's as easy as adding a global lock: | 1091 | Usually it's as easy as adding a global lock: |
1092 | 1092 | ||
1093 | static spinlock_t my_driver_lock = SPIN_LOCK_UNLOCKED; | 1093 | static spinlock_t my_driver_lock = SPIN_LOCK_UNLOCKED; |
1094 | 1094 | ||
1095 | and passing the address to that lock to blk_init_queue(). | 1095 | and passing the address to that lock to blk_init_queue(). |
1096 | 1096 | ||
1097 | 5.2 64 bit sector numbers (sector_t prepares for 64 bit support) | 1097 | 5.2 64 bit sector numbers (sector_t prepares for 64 bit support) |
1098 | 1098 | ||
1099 | The sector number used in the bio structure has been changed to sector_t, | 1099 | The sector number used in the bio structure has been changed to sector_t, |
1100 | which could be defined as 64 bit in preparation for 64 bit sector support. | 1100 | which could be defined as 64 bit in preparation for 64 bit sector support. |
1101 | 1101 | ||
1102 | 6. Other Changes/Implications | 1102 | 6. Other Changes/Implications |
1103 | 1103 | ||
1104 | 6.1 Partition re-mapping handled by the generic block layer | 1104 | 6.1 Partition re-mapping handled by the generic block layer |
1105 | 1105 | ||
1106 | In 2.5 some of the gendisk/partition related code has been reorganized. | 1106 | In 2.5 some of the gendisk/partition related code has been reorganized. |
1107 | Now the generic block layer performs partition-remapping early and thus | 1107 | Now the generic block layer performs partition-remapping early and thus |
1108 | provides drivers with a sector number relative to whole device, rather than | 1108 | provides drivers with a sector number relative to whole device, rather than |
1109 | having to take partition number into account in order to arrive at the true | 1109 | having to take partition number into account in order to arrive at the true |
1110 | sector number. The routine blk_partition_remap() is invoked by | 1110 | sector number. The routine blk_partition_remap() is invoked by |
1111 | generic_make_request even before invoking the queue specific make_request_fn, | 1111 | generic_make_request even before invoking the queue specific make_request_fn, |
1112 | so the i/o scheduler also gets to operate on whole disk sector numbers. This | 1112 | so the i/o scheduler also gets to operate on whole disk sector numbers. This |
1113 | should typically not require changes to block drivers, it just never gets | 1113 | should typically not require changes to block drivers, it just never gets |
1114 | to invoke its own partition sector offset calculations since all bios | 1114 | to invoke its own partition sector offset calculations since all bios |
1115 | sent are offset from the beginning of the device. | 1115 | sent are offset from the beginning of the device. |
1116 | 1116 | ||
1117 | 1117 | ||
1118 | 7. A Few Tips on Migration of older drivers | 1118 | 7. A Few Tips on Migration of older drivers |
1119 | 1119 | ||
1120 | Old-style drivers that just use CURRENT and ignores clustered requests, | 1120 | Old-style drivers that just use CURRENT and ignores clustered requests, |
1121 | may not need much change. The generic layer will automatically handle | 1121 | may not need much change. The generic layer will automatically handle |
1122 | clustered requests, multi-page bios, etc for the driver. | 1122 | clustered requests, multi-page bios, etc for the driver. |
1123 | 1123 | ||
1124 | For a low performance driver or hardware that is PIO driven or just doesn't | 1124 | For a low performance driver or hardware that is PIO driven or just doesn't |
1125 | support scatter-gather changes should be minimal too. | 1125 | support scatter-gather changes should be minimal too. |
1126 | 1126 | ||
1127 | The following are some points to keep in mind when converting old drivers | 1127 | The following are some points to keep in mind when converting old drivers |
1128 | to bio. | 1128 | to bio. |
1129 | 1129 | ||
1130 | Drivers should use elv_next_request to pick up requests and are no longer | 1130 | Drivers should use elv_next_request to pick up requests and are no longer |
1131 | supposed to handle looping directly over the request list. | 1131 | supposed to handle looping directly over the request list. |
1132 | (struct request->queue has been removed) | 1132 | (struct request->queue has been removed) |
1133 | 1133 | ||
1134 | Now end_that_request_first takes an additional number_of_sectors argument. | 1134 | Now end_that_request_first takes an additional number_of_sectors argument. |
1135 | It used to handle always just the first buffer_head in a request, now | 1135 | It used to handle always just the first buffer_head in a request, now |
1136 | it will loop and handle as many sectors (on a bio-segment granularity) | 1136 | it will loop and handle as many sectors (on a bio-segment granularity) |
1137 | as specified. | 1137 | as specified. |
1138 | 1138 | ||
1139 | Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the | 1139 | Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the |
1140 | right thing to use is bio_endio(bio, uptodate) instead. | 1140 | right thing to use is bio_endio(bio, uptodate) instead. |
1141 | 1141 | ||
1142 | If the driver is dropping the io_request_lock from its request_fn strategy, | 1142 | If the driver is dropping the io_request_lock from its request_fn strategy, |
1143 | then it just needs to replace that with q->queue_lock instead. | 1143 | then it just needs to replace that with q->queue_lock instead. |
1144 | 1144 | ||
1145 | As described in Sec 1.1, drivers can set max sector size, max segment size | 1145 | As described in Sec 1.1, drivers can set max sector size, max segment size |
1146 | etc per queue now. Drivers that used to define their own merge functions i | 1146 | etc per queue now. Drivers that used to define their own merge functions i |
1147 | to handle things like this can now just use the blk_queue_* functions at | 1147 | to handle things like this can now just use the blk_queue_* functions at |
1148 | blk_init_queue time. | 1148 | blk_init_queue time. |
1149 | 1149 | ||
1150 | Drivers no longer have to map a {partition, sector offset} into the | 1150 | Drivers no longer have to map a {partition, sector offset} into the |
1151 | correct absolute location anymore, this is done by the block layer, so | 1151 | correct absolute location anymore, this is done by the block layer, so |
1152 | where a driver received a request ala this before: | 1152 | where a driver received a request ala this before: |
1153 | 1153 | ||
1154 | rq->rq_dev = mk_kdev(3, 5); /* /dev/hda5 */ | 1154 | rq->rq_dev = mk_kdev(3, 5); /* /dev/hda5 */ |
1155 | rq->sector = 0; /* first sector on hda5 */ | 1155 | rq->sector = 0; /* first sector on hda5 */ |
1156 | 1156 | ||
1157 | it will now see | 1157 | it will now see |
1158 | 1158 | ||
1159 | rq->rq_dev = mk_kdev(3, 0); /* /dev/hda */ | 1159 | rq->rq_dev = mk_kdev(3, 0); /* /dev/hda */ |
1160 | rq->sector = 123128; /* offset from start of disk */ | 1160 | rq->sector = 123128; /* offset from start of disk */ |
1161 | 1161 | ||
1162 | As mentioned, there is no virtual mapping of a bio. For DMA, this is | 1162 | As mentioned, there is no virtual mapping of a bio. For DMA, this is |
1163 | not a problem as the driver probably never will need a virtual mapping. | 1163 | not a problem as the driver probably never will need a virtual mapping. |
1164 | Instead it needs a bus mapping (pci_map_page for a single segment or | 1164 | Instead it needs a bus mapping (pci_map_page for a single segment or |
1165 | use blk_rq_map_sg for scatter gather) to be able to ship it to the driver. For | 1165 | use blk_rq_map_sg for scatter gather) to be able to ship it to the driver. For |
1166 | PIO drivers (or drivers that need to revert to PIO transfer once in a | 1166 | PIO drivers (or drivers that need to revert to PIO transfer once in a |
1167 | while (IDE for example)), where the CPU is doing the actual data | 1167 | while (IDE for example)), where the CPU is doing the actual data |
1168 | transfer a virtual mapping is needed. If the driver supports highmem I/O, | 1168 | transfer a virtual mapping is needed. If the driver supports highmem I/O, |
1169 | (Sec 1.1, (ii) ) it needs to use __bio_kmap_atomic and bio_kmap_irq to | 1169 | (Sec 1.1, (ii) ) it needs to use __bio_kmap_atomic and bio_kmap_irq to |
1170 | temporarily map a bio into the virtual address space. | 1170 | temporarily map a bio into the virtual address space. |
1171 | 1171 | ||
1172 | 1172 | ||
1173 | 8. Prior/Related/Impacted patches | 1173 | 8. Prior/Related/Impacted patches |
1174 | 1174 | ||
1175 | 8.1. Earlier kiobuf patches (sct/axboe/chait/hch/mkp) | 1175 | 8.1. Earlier kiobuf patches (sct/axboe/chait/hch/mkp) |
1176 | - orig kiobuf & raw i/o patches (now in 2.4 tree) | 1176 | - orig kiobuf & raw i/o patches (now in 2.4 tree) |
1177 | - direct kiobuf based i/o to devices (no intermediate bh's) | 1177 | - direct kiobuf based i/o to devices (no intermediate bh's) |
1178 | - page i/o using kiobuf | 1178 | - page i/o using kiobuf |
1179 | - kiobuf splitting for lvm (mkp) | 1179 | - kiobuf splitting for lvm (mkp) |
1180 | - elevator support for kiobuf request merging (axboe) | 1180 | - elevator support for kiobuf request merging (axboe) |
1181 | 8.2. Zero-copy networking (Dave Miller) | 1181 | 8.2. Zero-copy networking (Dave Miller) |
1182 | 8.3. SGI XFS - pagebuf patches - use of kiobufs | 1182 | 8.3. SGI XFS - pagebuf patches - use of kiobufs |
1183 | 8.4. Multi-page pioent patch for bio (Christoph Hellwig) | 1183 | 8.4. Multi-page pioent patch for bio (Christoph Hellwig) |
1184 | 8.5. Direct i/o implementation (Andrea Arcangeli) since 2.4.10-pre11 | 1184 | 8.5. Direct i/o implementation (Andrea Arcangeli) since 2.4.10-pre11 |
1185 | 8.6. Async i/o implementation patch (Ben LaHaise) | 1185 | 8.6. Async i/o implementation patch (Ben LaHaise) |
1186 | 8.7. EVMS layering design (IBM EVMS team) | 1186 | 8.7. EVMS layering design (IBM EVMS team) |
1187 | 8.8. Larger page cache size patch (Ben LaHaise) and | 1187 | 8.8. Larger page cache size patch (Ben LaHaise) and |
1188 | Large page size (Daniel Phillips) | 1188 | Large page size (Daniel Phillips) |
1189 | => larger contiguous physical memory buffers | 1189 | => larger contiguous physical memory buffers |
1190 | 8.9. VM reservations patch (Ben LaHaise) | 1190 | 8.9. VM reservations patch (Ben LaHaise) |
1191 | 8.10. Write clustering patches ? (Marcelo/Quintela/Riel ?) | 1191 | 8.10. Write clustering patches ? (Marcelo/Quintela/Riel ?) |
1192 | 8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+ | 1192 | 8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+ |
1193 | 8.12. Multiple block-size transfers for faster raw i/o (Shailabh Nagar, | 1193 | 8.12. Multiple block-size transfers for faster raw i/o (Shailabh Nagar, |
1194 | Badari) | 1194 | Badari) |
1195 | 8.13 Priority based i/o scheduler - prepatches (Arjan van de Ven) | 1195 | 8.13 Priority based i/o scheduler - prepatches (Arjan van de Ven) |
1196 | 8.14 IDE Taskfile i/o patch (Andre Hedrick) | 1196 | 8.14 IDE Taskfile i/o patch (Andre Hedrick) |
1197 | 8.15 Multi-page writeout and readahead patches (Andrew Morton) | 1197 | 8.15 Multi-page writeout and readahead patches (Andrew Morton) |
1198 | 8.16 Direct i/o patches for 2.5 using kvec and bio (Badari Pulavarthy) | 1198 | 8.16 Direct i/o patches for 2.5 using kvec and bio (Badari Pulavarthy) |
1199 | 1199 | ||
1200 | 9. Other References: | 1200 | 9. Other References: |
1201 | 1201 | ||
1202 | 9.1 The Splice I/O Model - Larry McVoy (and subsequent discussions on lkml, | 1202 | 9.1 The Splice I/O Model - Larry McVoy (and subsequent discussions on lkml, |
1203 | and Linus' comments - Jan 2001) | 1203 | and Linus' comments - Jan 2001) |
1204 | 9.2 Discussions about kiobuf and bh design on lkml between sct, linus, alan | 1204 | 9.2 Discussions about kiobuf and bh design on lkml between sct, linus, alan |
1205 | et al - Feb-March 2001 (many of the initial thoughts that led to bio were | 1205 | et al - Feb-March 2001 (many of the initial thoughts that led to bio were |
1206 | brought up in this discussion thread) | 1206 | brought up in this discussion thread) |
1207 | 9.3 Discussions on mempool on lkml - Dec 2001. | 1207 | 9.3 Discussions on mempool on lkml - Dec 2001. |
1208 | 1208 | ||
1209 | 1209 |
Documentation/driver-model/overview.txt
1 | The Linux Kernel Device Model | 1 | The Linux Kernel Device Model |
2 | 2 | ||
3 | Patrick Mochel <mochel@digitalimplant.org> | 3 | Patrick Mochel <mochel@digitalimplant.org> |
4 | 4 | ||
5 | Drafted 26 August 2002 | 5 | Drafted 26 August 2002 |
6 | Updated 31 January 2006 | 6 | Updated 31 January 2006 |
7 | 7 | ||
8 | 8 | ||
9 | Overview | 9 | Overview |
10 | ~~~~~~~~ | 10 | ~~~~~~~~ |
11 | 11 | ||
12 | The Linux Kernel Driver Model is a unification of all the disparate driver | 12 | The Linux Kernel Driver Model is a unification of all the disparate driver |
13 | models that were previously used in the kernel. It is intended to augment the | 13 | models that were previously used in the kernel. It is intended to augment the |
14 | bus-specific drivers for bridges and devices by consolidating a set of data | 14 | bus-specific drivers for bridges and devices by consolidating a set of data |
15 | and operations into globally accessible data structures. | 15 | and operations into globally accessible data structures. |
16 | 16 | ||
17 | Traditional driver models implemented some sort of tree-like structure | 17 | Traditional driver models implemented some sort of tree-like structure |
18 | (sometimes just a list) for the devices they control. There wasn't any | 18 | (sometimes just a list) for the devices they control. There wasn't any |
19 | uniformity across the different bus types. | 19 | uniformity across the different bus types. |
20 | 20 | ||
21 | The current driver model provides a common, uniform data model for describing | 21 | The current driver model provides a common, uniform data model for describing |
22 | a bus and the devices that can appear under the bus. The unified bus | 22 | a bus and the devices that can appear under the bus. The unified bus |
23 | model includes a set of common attributes which all busses carry, and a set | 23 | model includes a set of common attributes which all busses carry, and a set |
24 | of common callbacks, such as device discovery during bus probing, bus | 24 | of common callbacks, such as device discovery during bus probing, bus |
25 | shutdown, bus power management, etc. | 25 | shutdown, bus power management, etc. |
26 | 26 | ||
27 | The common device and bridge interface reflects the goals of the modern | 27 | The common device and bridge interface reflects the goals of the modern |
28 | computer: namely the ability to do seamless device "plug and play", power | 28 | computer: namely the ability to do seamless device "plug and play", power |
29 | management, and hot plug. In particular, the model dictated by Intel and | 29 | management, and hot plug. In particular, the model dictated by Intel and |
30 | Microsoft (namely ACPI) ensures that almost every device on almost any bus | 30 | Microsoft (namely ACPI) ensures that almost every device on almost any bus |
31 | on an x86-compatible system can work within this paradigm. Of course, | 31 | on an x86-compatible system can work within this paradigm. Of course, |
32 | not every bus is able to support all such operations, although most | 32 | not every bus is able to support all such operations, although most |
33 | buses support a most of those operations. | 33 | buses support a most of those operations. |
34 | 34 | ||
35 | 35 | ||
36 | Downstream Access | 36 | Downstream Access |
37 | ~~~~~~~~~~~~~~~~~ | 37 | ~~~~~~~~~~~~~~~~~ |
38 | 38 | ||
39 | Common data fields have been moved out of individual bus layers into a common | 39 | Common data fields have been moved out of individual bus layers into a common |
40 | data structure. These fields must still be accessed by the bus layers, | 40 | data structure. These fields must still be accessed by the bus layers, |
41 | and sometimes by the device-specific drivers. | 41 | and sometimes by the device-specific drivers. |
42 | 42 | ||
43 | Other bus layers are encouraged to do what has been done for the PCI layer. | 43 | Other bus layers are encouraged to do what has been done for the PCI layer. |
44 | struct pci_dev now looks like this: | 44 | struct pci_dev now looks like this: |
45 | 45 | ||
46 | struct pci_dev { | 46 | struct pci_dev { |
47 | ... | 47 | ... |
48 | 48 | ||
49 | struct device dev; | 49 | struct device dev; |
50 | }; | 50 | }; |
51 | 51 | ||
52 | Note first that it is statically allocated. This means only one allocation on | 52 | Note first that it is statically allocated. This means only one allocation on |
53 | device discovery. Note also that it is at the _end_ of struct pci_dev. This is | 53 | device discovery. Note also that it is at the _end_ of struct pci_dev. This is |
54 | to make people think about what they're doing when switching between the bus | 54 | to make people think about what they're doing when switching between the bus |
55 | driver and the global driver; and to prevent against mindless casts between | 55 | driver and the global driver; and to prevent against mindless casts between |
56 | the two. | 56 | the two. |
57 | 57 | ||
58 | The PCI bus layer freely accesses the fields of struct device. It knows about | 58 | The PCI bus layer freely accesses the fields of struct device. It knows about |
59 | the structure of struct pci_dev, and it should know the structure of struct | 59 | the structure of struct pci_dev, and it should know the structure of struct |
60 | device. Individual PCI device drivers that have been converted the the current | 60 | device. Individual PCI device drivers that have been converted to the current |
61 | driver model generally do not and should not touch the fields of struct device, | 61 | driver model generally do not and should not touch the fields of struct device, |
62 | unless there is a strong compelling reason to do so. | 62 | unless there is a strong compelling reason to do so. |
63 | 63 | ||
64 | This abstraction is prevention of unnecessary pain during transitional phases. | 64 | This abstraction is prevention of unnecessary pain during transitional phases. |
65 | If the name of the field changes or is removed, then every downstream driver | 65 | If the name of the field changes or is removed, then every downstream driver |
66 | will break. On the other hand, if only the bus layer (and not the device | 66 | will break. On the other hand, if only the bus layer (and not the device |
67 | layer) accesses struct device, it is only that layer that needs to change. | 67 | layer) accesses struct device, it is only that layer that needs to change. |
68 | 68 | ||
69 | 69 | ||
70 | User Interface | 70 | User Interface |
71 | ~~~~~~~~~~~~~~ | 71 | ~~~~~~~~~~~~~~ |
72 | 72 | ||
73 | By virtue of having a complete hierarchical view of all the devices in the | 73 | By virtue of having a complete hierarchical view of all the devices in the |
74 | system, exporting a complete hierarchical view to userspace becomes relatively | 74 | system, exporting a complete hierarchical view to userspace becomes relatively |
75 | easy. This has been accomplished by implementing a special purpose virtual | 75 | easy. This has been accomplished by implementing a special purpose virtual |
76 | file system named sysfs. It is hence possible for the user to mount the | 76 | file system named sysfs. It is hence possible for the user to mount the |
77 | whole sysfs filesystem anywhere in userspace. | 77 | whole sysfs filesystem anywhere in userspace. |
78 | 78 | ||
79 | This can be done permanently by providing the following entry into the | 79 | This can be done permanently by providing the following entry into the |
80 | /etc/fstab (under the provision that the mount point does exist, of course): | 80 | /etc/fstab (under the provision that the mount point does exist, of course): |
81 | 81 | ||
82 | none /sys sysfs defaults 0 0 | 82 | none /sys sysfs defaults 0 0 |
83 | 83 | ||
84 | Or by hand on the command line: | 84 | Or by hand on the command line: |
85 | 85 | ||
86 | # mount -t sysfs sysfs /sys | 86 | # mount -t sysfs sysfs /sys |
87 | 87 | ||
88 | Whenever a device is inserted into the tree, a directory is created for it. | 88 | Whenever a device is inserted into the tree, a directory is created for it. |
89 | This directory may be populated at each layer of discovery - the global layer, | 89 | This directory may be populated at each layer of discovery - the global layer, |
90 | the bus layer, or the device layer. | 90 | the bus layer, or the device layer. |
91 | 91 | ||
92 | The global layer currently creates two files - 'name' and 'power'. The | 92 | The global layer currently creates two files - 'name' and 'power'. The |
93 | former only reports the name of the device. The latter reports the | 93 | former only reports the name of the device. The latter reports the |
94 | current power state of the device. It will also be used to set the current | 94 | current power state of the device. It will also be used to set the current |
95 | power state. | 95 | power state. |
96 | 96 | ||
97 | The bus layer may also create files for the devices it finds while probing the | 97 | The bus layer may also create files for the devices it finds while probing the |
98 | bus. For example, the PCI layer currently creates 'irq' and 'resource' files | 98 | bus. For example, the PCI layer currently creates 'irq' and 'resource' files |
99 | for each PCI device. | 99 | for each PCI device. |
100 | 100 | ||
101 | A device-specific driver may also export files in its directory to expose | 101 | A device-specific driver may also export files in its directory to expose |
102 | device-specific data or tunable interfaces. | 102 | device-specific data or tunable interfaces. |
103 | 103 | ||
104 | More information about the sysfs directory layout can be found in | 104 | More information about the sysfs directory layout can be found in |
105 | the other documents in this directory and in the file | 105 | the other documents in this directory and in the file |
106 | Documentation/filesystems/sysfs.txt. | 106 | Documentation/filesystems/sysfs.txt. |
107 | 107 | ||
108 | 108 |
Documentation/exception.txt
1 | Kernel level exception handling in Linux 2.1.8 | 1 | Kernel level exception handling in Linux 2.1.8 |
2 | Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com> | 2 | Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com> |
3 | 3 | ||
4 | When a process runs in kernel mode, it often has to access user | 4 | When a process runs in kernel mode, it often has to access user |
5 | mode memory whose address has been passed by an untrusted program. | 5 | mode memory whose address has been passed by an untrusted program. |
6 | To protect itself the kernel has to verify this address. | 6 | To protect itself the kernel has to verify this address. |
7 | 7 | ||
8 | In older versions of Linux this was done with the | 8 | In older versions of Linux this was done with the |
9 | int verify_area(int type, const void * addr, unsigned long size) | 9 | int verify_area(int type, const void * addr, unsigned long size) |
10 | function (which has since been replaced by access_ok()). | 10 | function (which has since been replaced by access_ok()). |
11 | 11 | ||
12 | This function verified that the memory area starting at address | 12 | This function verified that the memory area starting at address |
13 | addr and of size size was accessible for the operation specified | 13 | 'addr' and of size 'size' was accessible for the operation specified |
14 | in type (read or write). To do this, verify_read had to look up the | 14 | in type (read or write). To do this, verify_read had to look up the |
15 | virtual memory area (vma) that contained the address addr. In the | 15 | virtual memory area (vma) that contained the address addr. In the |
16 | normal case (correctly working program), this test was successful. | 16 | normal case (correctly working program), this test was successful. |
17 | It only failed for a few buggy programs. In some kernel profiling | 17 | It only failed for a few buggy programs. In some kernel profiling |
18 | tests, this normally unneeded verification used up a considerable | 18 | tests, this normally unneeded verification used up a considerable |
19 | amount of time. | 19 | amount of time. |
20 | 20 | ||
21 | To overcome this situation, Linus decided to let the virtual memory | 21 | To overcome this situation, Linus decided to let the virtual memory |
22 | hardware present in every Linux-capable CPU handle this test. | 22 | hardware present in every Linux-capable CPU handle this test. |
23 | 23 | ||
24 | How does this work? | 24 | How does this work? |
25 | 25 | ||
26 | Whenever the kernel tries to access an address that is currently not | 26 | Whenever the kernel tries to access an address that is currently not |
27 | accessible, the CPU generates a page fault exception and calls the | 27 | accessible, the CPU generates a page fault exception and calls the |
28 | page fault handler | 28 | page fault handler |
29 | 29 | ||
30 | void do_page_fault(struct pt_regs *regs, unsigned long error_code) | 30 | void do_page_fault(struct pt_regs *regs, unsigned long error_code) |
31 | 31 | ||
32 | in arch/i386/mm/fault.c. The parameters on the stack are set up by | 32 | in arch/i386/mm/fault.c. The parameters on the stack are set up by |
33 | the low level assembly glue in arch/i386/kernel/entry.S. The parameter | 33 | the low level assembly glue in arch/i386/kernel/entry.S. The parameter |
34 | regs is a pointer to the saved registers on the stack, error_code | 34 | regs is a pointer to the saved registers on the stack, error_code |
35 | contains a reason code for the exception. | 35 | contains a reason code for the exception. |
36 | 36 | ||
37 | do_page_fault first obtains the unaccessible address from the CPU | 37 | do_page_fault first obtains the unaccessible address from the CPU |
38 | control register CR2. If the address is within the virtual address | 38 | control register CR2. If the address is within the virtual address |
39 | space of the process, the fault probably occurred, because the page | 39 | space of the process, the fault probably occurred, because the page |
40 | was not swapped in, write protected or something similar. However, | 40 | was not swapped in, write protected or something similar. However, |
41 | we are interested in the other case: the address is not valid, there | 41 | we are interested in the other case: the address is not valid, there |
42 | is no vma that contains this address. In this case, the kernel jumps | 42 | is no vma that contains this address. In this case, the kernel jumps |
43 | to the bad_area label. | 43 | to the bad_area label. |
44 | 44 | ||
45 | There it uses the address of the instruction that caused the exception | 45 | There it uses the address of the instruction that caused the exception |
46 | (i.e. regs->eip) to find an address where the execution can continue | 46 | (i.e. regs->eip) to find an address where the execution can continue |
47 | (fixup). If this search is successful, the fault handler modifies the | 47 | (fixup). If this search is successful, the fault handler modifies the |
48 | return address (again regs->eip) and returns. The execution will | 48 | return address (again regs->eip) and returns. The execution will |
49 | continue at the address in fixup. | 49 | continue at the address in fixup. |
50 | 50 | ||
51 | Where does fixup point to? | 51 | Where does fixup point to? |
52 | 52 | ||
53 | Since we jump to the contents of fixup, fixup obviously points | 53 | Since we jump to the contents of fixup, fixup obviously points |
54 | to executable code. This code is hidden inside the user access macros. | 54 | to executable code. This code is hidden inside the user access macros. |
55 | I have picked the get_user macro defined in include/asm/uaccess.h as an | 55 | I have picked the get_user macro defined in include/asm/uaccess.h as an |
56 | example. The definition is somewhat hard to follow, so let's peek at | 56 | example. The definition is somewhat hard to follow, so let's peek at |
57 | the code generated by the preprocessor and the compiler. I selected | 57 | the code generated by the preprocessor and the compiler. I selected |
58 | the get_user call in drivers/char/console.c for a detailed examination. | 58 | the get_user call in drivers/char/console.c for a detailed examination. |
59 | 59 | ||
60 | The original code in console.c line 1405: | 60 | The original code in console.c line 1405: |
61 | get_user(c, buf); | 61 | get_user(c, buf); |
62 | 62 | ||
63 | The preprocessor output (edited to become somewhat readable): | 63 | The preprocessor output (edited to become somewhat readable): |
64 | 64 | ||
65 | ( | 65 | ( |
66 | { | 66 | { |
67 | long __gu_err = - 14 , __gu_val = 0; | 67 | long __gu_err = - 14 , __gu_val = 0; |
68 | const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); | 68 | const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); |
69 | if (((((0 + current_set[0])->tss.segment) == 0x18 ) || | 69 | if (((((0 + current_set[0])->tss.segment) == 0x18 ) || |
70 | (((sizeof(*(buf))) <= 0xC0000000UL) && | 70 | (((sizeof(*(buf))) <= 0xC0000000UL) && |
71 | ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) | 71 | ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) |
72 | do { | 72 | do { |
73 | __gu_err = 0; | 73 | __gu_err = 0; |
74 | switch ((sizeof(*(buf)))) { | 74 | switch ((sizeof(*(buf)))) { |
75 | case 1: | 75 | case 1: |
76 | __asm__ __volatile__( | 76 | __asm__ __volatile__( |
77 | "1: mov" "b" " %2,%" "b" "1\n" | 77 | "1: mov" "b" " %2,%" "b" "1\n" |
78 | "2:\n" | 78 | "2:\n" |
79 | ".section .fixup,\"ax\"\n" | 79 | ".section .fixup,\"ax\"\n" |
80 | "3: movl %3,%0\n" | 80 | "3: movl %3,%0\n" |
81 | " xor" "b" " %" "b" "1,%" "b" "1\n" | 81 | " xor" "b" " %" "b" "1,%" "b" "1\n" |
82 | " jmp 2b\n" | 82 | " jmp 2b\n" |
83 | ".section __ex_table,\"a\"\n" | 83 | ".section __ex_table,\"a\"\n" |
84 | " .align 4\n" | 84 | " .align 4\n" |
85 | " .long 1b,3b\n" | 85 | " .long 1b,3b\n" |
86 | ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) | 86 | ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) |
87 | ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; | 87 | ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; |
88 | break; | 88 | break; |
89 | case 2: | 89 | case 2: |
90 | __asm__ __volatile__( | 90 | __asm__ __volatile__( |
91 | "1: mov" "w" " %2,%" "w" "1\n" | 91 | "1: mov" "w" " %2,%" "w" "1\n" |
92 | "2:\n" | 92 | "2:\n" |
93 | ".section .fixup,\"ax\"\n" | 93 | ".section .fixup,\"ax\"\n" |
94 | "3: movl %3,%0\n" | 94 | "3: movl %3,%0\n" |
95 | " xor" "w" " %" "w" "1,%" "w" "1\n" | 95 | " xor" "w" " %" "w" "1,%" "w" "1\n" |
96 | " jmp 2b\n" | 96 | " jmp 2b\n" |
97 | ".section __ex_table,\"a\"\n" | 97 | ".section __ex_table,\"a\"\n" |
98 | " .align 4\n" | 98 | " .align 4\n" |
99 | " .long 1b,3b\n" | 99 | " .long 1b,3b\n" |
100 | ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) | 100 | ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) |
101 | ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); | 101 | ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); |
102 | break; | 102 | break; |
103 | case 4: | 103 | case 4: |
104 | __asm__ __volatile__( | 104 | __asm__ __volatile__( |
105 | "1: mov" "l" " %2,%" "" "1\n" | 105 | "1: mov" "l" " %2,%" "" "1\n" |
106 | "2:\n" | 106 | "2:\n" |
107 | ".section .fixup,\"ax\"\n" | 107 | ".section .fixup,\"ax\"\n" |
108 | "3: movl %3,%0\n" | 108 | "3: movl %3,%0\n" |
109 | " xor" "l" " %" "" "1,%" "" "1\n" | 109 | " xor" "l" " %" "" "1,%" "" "1\n" |
110 | " jmp 2b\n" | 110 | " jmp 2b\n" |
111 | ".section __ex_table,\"a\"\n" | 111 | ".section __ex_table,\"a\"\n" |
112 | " .align 4\n" " .long 1b,3b\n" | 112 | " .align 4\n" " .long 1b,3b\n" |
113 | ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) | 113 | ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) |
114 | ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); | 114 | ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); |
115 | break; | 115 | break; |
116 | default: | 116 | default: |
117 | (__gu_val) = __get_user_bad(); | 117 | (__gu_val) = __get_user_bad(); |
118 | } | 118 | } |
119 | } while (0) ; | 119 | } while (0) ; |
120 | ((c)) = (__typeof__(*((buf))))__gu_val; | 120 | ((c)) = (__typeof__(*((buf))))__gu_val; |
121 | __gu_err; | 121 | __gu_err; |
122 | } | 122 | } |
123 | ); | 123 | ); |
124 | 124 | ||
125 | WOW! Black GCC/assembly magic. This is impossible to follow, so let's | 125 | WOW! Black GCC/assembly magic. This is impossible to follow, so let's |
126 | see what code gcc generates: | 126 | see what code gcc generates: |
127 | 127 | ||
128 | > xorl %edx,%edx | 128 | > xorl %edx,%edx |
129 | > movl current_set,%eax | 129 | > movl current_set,%eax |
130 | > cmpl $24,788(%eax) | 130 | > cmpl $24,788(%eax) |
131 | > je .L1424 | 131 | > je .L1424 |
132 | > cmpl $-1073741825,64(%esp) | 132 | > cmpl $-1073741825,64(%esp) |
133 | > ja .L1423 | 133 | > ja .L1423 |
134 | > .L1424: | 134 | > .L1424: |
135 | > movl %edx,%eax | 135 | > movl %edx,%eax |
136 | > movl 64(%esp),%ebx | 136 | > movl 64(%esp),%ebx |
137 | > #APP | 137 | > #APP |
138 | > 1: movb (%ebx),%dl /* this is the actual user access */ | 138 | > 1: movb (%ebx),%dl /* this is the actual user access */ |
139 | > 2: | 139 | > 2: |
140 | > .section .fixup,"ax" | 140 | > .section .fixup,"ax" |
141 | > 3: movl $-14,%eax | 141 | > 3: movl $-14,%eax |
142 | > xorb %dl,%dl | 142 | > xorb %dl,%dl |
143 | > jmp 2b | 143 | > jmp 2b |
144 | > .section __ex_table,"a" | 144 | > .section __ex_table,"a" |
145 | > .align 4 | 145 | > .align 4 |
146 | > .long 1b,3b | 146 | > .long 1b,3b |
147 | > .text | 147 | > .text |
148 | > #NO_APP | 148 | > #NO_APP |
149 | > .L1423: | 149 | > .L1423: |
150 | > movzbl %dl,%esi | 150 | > movzbl %dl,%esi |
151 | 151 | ||
152 | The optimizer does a good job and gives us something we can actually | 152 | The optimizer does a good job and gives us something we can actually |
153 | understand. Can we? The actual user access is quite obvious. Thanks | 153 | understand. Can we? The actual user access is quite obvious. Thanks |
154 | to the unified address space we can just access the address in user | 154 | to the unified address space we can just access the address in user |
155 | memory. But what does the .section stuff do????? | 155 | memory. But what does the .section stuff do????? |
156 | 156 | ||
157 | To understand this we have to look at the final kernel: | 157 | To understand this we have to look at the final kernel: |
158 | 158 | ||
159 | > objdump --section-headers vmlinux | 159 | > objdump --section-headers vmlinux |
160 | > | 160 | > |
161 | > vmlinux: file format elf32-i386 | 161 | > vmlinux: file format elf32-i386 |
162 | > | 162 | > |
163 | > Sections: | 163 | > Sections: |
164 | > Idx Name Size VMA LMA File off Algn | 164 | > Idx Name Size VMA LMA File off Algn |
165 | > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 | 165 | > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 |
166 | > CONTENTS, ALLOC, LOAD, READONLY, CODE | 166 | > CONTENTS, ALLOC, LOAD, READONLY, CODE |
167 | > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 | 167 | > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 |
168 | > CONTENTS, ALLOC, LOAD, READONLY, CODE | 168 | > CONTENTS, ALLOC, LOAD, READONLY, CODE |
169 | > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 | 169 | > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 |
170 | > CONTENTS, ALLOC, LOAD, READONLY, DATA | 170 | > CONTENTS, ALLOC, LOAD, READONLY, DATA |
171 | > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 | 171 | > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 |
172 | > CONTENTS, ALLOC, LOAD, READONLY, DATA | 172 | > CONTENTS, ALLOC, LOAD, READONLY, DATA |
173 | > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 | 173 | > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 |
174 | > CONTENTS, ALLOC, LOAD, DATA | 174 | > CONTENTS, ALLOC, LOAD, DATA |
175 | > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 | 175 | > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 |
176 | > ALLOC | 176 | > ALLOC |
177 | > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 | 177 | > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 |
178 | > CONTENTS, READONLY | 178 | > CONTENTS, READONLY |
179 | > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 | 179 | > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 |
180 | > CONTENTS, READONLY | 180 | > CONTENTS, READONLY |
181 | 181 | ||
182 | There are obviously 2 non standard ELF sections in the generated object | 182 | There are obviously 2 non standard ELF sections in the generated object |
183 | file. But first we want to find out what happened to our code in the | 183 | file. But first we want to find out what happened to our code in the |
184 | final kernel executable: | 184 | final kernel executable: |
185 | 185 | ||
186 | > objdump --disassemble --section=.text vmlinux | 186 | > objdump --disassemble --section=.text vmlinux |
187 | > | 187 | > |
188 | > c017e785 <do_con_write+c1> xorl %edx,%edx | 188 | > c017e785 <do_con_write+c1> xorl %edx,%edx |
189 | > c017e787 <do_con_write+c3> movl 0xc01c7bec,%eax | 189 | > c017e787 <do_con_write+c3> movl 0xc01c7bec,%eax |
190 | > c017e78c <do_con_write+c8> cmpl $0x18,0x314(%eax) | 190 | > c017e78c <do_con_write+c8> cmpl $0x18,0x314(%eax) |
191 | > c017e793 <do_con_write+cf> je c017e79f <do_con_write+db> | 191 | > c017e793 <do_con_write+cf> je c017e79f <do_con_write+db> |
192 | > c017e795 <do_con_write+d1> cmpl $0xbfffffff,0x40(%esp,1) | 192 | > c017e795 <do_con_write+d1> cmpl $0xbfffffff,0x40(%esp,1) |
193 | > c017e79d <do_con_write+d9> ja c017e7a7 <do_con_write+e3> | 193 | > c017e79d <do_con_write+d9> ja c017e7a7 <do_con_write+e3> |
194 | > c017e79f <do_con_write+db> movl %edx,%eax | 194 | > c017e79f <do_con_write+db> movl %edx,%eax |
195 | > c017e7a1 <do_con_write+dd> movl 0x40(%esp,1),%ebx | 195 | > c017e7a1 <do_con_write+dd> movl 0x40(%esp,1),%ebx |
196 | > c017e7a5 <do_con_write+e1> movb (%ebx),%dl | 196 | > c017e7a5 <do_con_write+e1> movb (%ebx),%dl |
197 | > c017e7a7 <do_con_write+e3> movzbl %dl,%esi | 197 | > c017e7a7 <do_con_write+e3> movzbl %dl,%esi |
198 | 198 | ||
199 | The whole user memory access is reduced to 10 x86 machine instructions. | 199 | The whole user memory access is reduced to 10 x86 machine instructions. |
200 | The instructions bracketed in the .section directives are no longer | 200 | The instructions bracketed in the .section directives are no longer |
201 | in the normal execution path. They are located in a different section | 201 | in the normal execution path. They are located in a different section |
202 | of the executable file: | 202 | of the executable file: |
203 | 203 | ||
204 | > objdump --disassemble --section=.fixup vmlinux | 204 | > objdump --disassemble --section=.fixup vmlinux |
205 | > | 205 | > |
206 | > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax | 206 | > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax |
207 | > c0199ffa <.fixup+10ba> xorb %dl,%dl | 207 | > c0199ffa <.fixup+10ba> xorb %dl,%dl |
208 | > c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3> | 208 | > c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3> |
209 | 209 | ||
210 | And finally: | 210 | And finally: |
211 | > objdump --full-contents --section=__ex_table vmlinux | 211 | > objdump --full-contents --section=__ex_table vmlinux |
212 | > | 212 | > |
213 | > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ | 213 | > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ |
214 | > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ | 214 | > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ |
215 | > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ | 215 | > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ |
216 | 216 | ||
217 | or in human readable byte order: | 217 | or in human readable byte order: |
218 | 218 | ||
219 | > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ | 219 | > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ |
220 | > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ | 220 | > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ |
221 | ^^^^^^^^^^^^^^^^^ | 221 | ^^^^^^^^^^^^^^^^^ |
222 | this is the interesting part! | 222 | this is the interesting part! |
223 | > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ | 223 | > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ |
224 | 224 | ||
225 | What happened? The assembly directives | 225 | What happened? The assembly directives |
226 | 226 | ||
227 | .section .fixup,"ax" | 227 | .section .fixup,"ax" |
228 | .section __ex_table,"a" | 228 | .section __ex_table,"a" |
229 | 229 | ||
230 | told the assembler to move the following code to the specified | 230 | told the assembler to move the following code to the specified |
231 | sections in the ELF object file. So the instructions | 231 | sections in the ELF object file. So the instructions |
232 | 3: movl $-14,%eax | 232 | 3: movl $-14,%eax |
233 | xorb %dl,%dl | 233 | xorb %dl,%dl |
234 | jmp 2b | 234 | jmp 2b |
235 | ended up in the .fixup section of the object file and the addresses | 235 | ended up in the .fixup section of the object file and the addresses |
236 | .long 1b,3b | 236 | .long 1b,3b |
237 | ended up in the __ex_table section of the object file. 1b and 3b | 237 | ended up in the __ex_table section of the object file. 1b and 3b |
238 | are local labels. The local label 1b (1b stands for next label 1 | 238 | are local labels. The local label 1b (1b stands for next label 1 |
239 | backward) is the address of the instruction that might fault, i.e. | 239 | backward) is the address of the instruction that might fault, i.e. |
240 | in our case the address of the label 1 is c017e7a5: | 240 | in our case the address of the label 1 is c017e7a5: |
241 | the original assembly code: > 1: movb (%ebx),%dl | 241 | the original assembly code: > 1: movb (%ebx),%dl |
242 | and linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl | 242 | and linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl |
243 | 243 | ||
244 | The local label 3 (backwards again) is the address of the code to handle | 244 | The local label 3 (backwards again) is the address of the code to handle |
245 | the fault, in our case the actual value is c0199ff5: | 245 | the fault, in our case the actual value is c0199ff5: |
246 | the original assembly code: > 3: movl $-14,%eax | 246 | the original assembly code: > 3: movl $-14,%eax |
247 | and linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax | 247 | and linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax |
248 | 248 | ||
249 | The assembly code | 249 | The assembly code |
250 | > .section __ex_table,"a" | 250 | > .section __ex_table,"a" |
251 | > .align 4 | 251 | > .align 4 |
252 | > .long 1b,3b | 252 | > .long 1b,3b |
253 | 253 | ||
254 | becomes the value pair | 254 | becomes the value pair |
255 | > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ | 255 | > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ |
256 | ^this is ^this is | 256 | ^this is ^this is |
257 | 1b 3b | 257 | 1b 3b |
258 | c017e7a5,c0199ff5 in the exception table of the kernel. | 258 | c017e7a5,c0199ff5 in the exception table of the kernel. |
259 | 259 | ||
260 | So, what actually happens if a fault from kernel mode with no suitable | 260 | So, what actually happens if a fault from kernel mode with no suitable |
261 | vma occurs? | 261 | vma occurs? |
262 | 262 | ||
263 | 1.) access to invalid address: | 263 | 1.) access to invalid address: |
264 | > c017e7a5 <do_con_write+e1> movb (%ebx),%dl | 264 | > c017e7a5 <do_con_write+e1> movb (%ebx),%dl |
265 | 2.) MMU generates exception | 265 | 2.) MMU generates exception |
266 | 3.) CPU calls do_page_fault | 266 | 3.) CPU calls do_page_fault |
267 | 4.) do page fault calls search_exception_table (regs->eip == c017e7a5); | 267 | 4.) do page fault calls search_exception_table (regs->eip == c017e7a5); |
268 | 5.) search_exception_table looks up the address c017e7a5 in the | 268 | 5.) search_exception_table looks up the address c017e7a5 in the |
269 | exception table (i.e. the contents of the ELF section __ex_table) | 269 | exception table (i.e. the contents of the ELF section __ex_table) |
270 | and returns the address of the associated fault handle code c0199ff5. | 270 | and returns the address of the associated fault handle code c0199ff5. |
271 | 6.) do_page_fault modifies its own return address to point to the fault | 271 | 6.) do_page_fault modifies its own return address to point to the fault |
272 | handle code and returns. | 272 | handle code and returns. |
273 | 7.) execution continues in the fault handling code. | 273 | 7.) execution continues in the fault handling code. |
274 | 8.) 8a) EAX becomes -EFAULT (== -14) | 274 | 8.) 8a) EAX becomes -EFAULT (== -14) |
275 | 8b) DL becomes zero (the value we "read" from user space) | 275 | 8b) DL becomes zero (the value we "read" from user space) |
276 | 8c) execution continues at local label 2 (address of the | 276 | 8c) execution continues at local label 2 (address of the |
277 | instruction immediately after the faulting user access). | 277 | instruction immediately after the faulting user access). |
278 | 278 | ||
279 | The steps 8a to 8c in a certain way emulate the faulting instruction. | 279 | The steps 8a to 8c in a certain way emulate the faulting instruction. |
280 | 280 | ||
281 | That's it, mostly. If you look at our example, you might ask why | 281 | That's it, mostly. If you look at our example, you might ask why |
282 | we set EAX to -EFAULT in the exception handler code. Well, the | 282 | we set EAX to -EFAULT in the exception handler code. Well, the |
283 | get_user macro actually returns a value: 0, if the user access was | 283 | get_user macro actually returns a value: 0, if the user access was |
284 | successful, -EFAULT on failure. Our original code did not test this | 284 | successful, -EFAULT on failure. Our original code did not test this |
285 | return value, however the inline assembly code in get_user tries to | 285 | return value, however the inline assembly code in get_user tries to |
286 | return -EFAULT. GCC selected EAX to return this value. | 286 | return -EFAULT. GCC selected EAX to return this value. |
287 | 287 | ||
288 | NOTE: | 288 | NOTE: |
289 | Due to the way that the exception table is built and needs to be ordered, | 289 | Due to the way that the exception table is built and needs to be ordered, |
290 | only use exceptions for code in the .text section. Any other section | 290 | only use exceptions for code in the .text section. Any other section |
291 | will cause the exception table to not be sorted correctly, and the | 291 | will cause the exception table to not be sorted correctly, and the |
292 | exceptions will fail. | 292 | exceptions will fail. |
293 | 293 |
Documentation/fb/fbcon.txt
1 | The Framebuffer Console | 1 | The Framebuffer Console |
2 | ======================= | 2 | ======================= |
3 | 3 | ||
4 | The framebuffer console (fbcon), as its name implies, is a text | 4 | The framebuffer console (fbcon), as its name implies, is a text |
5 | console running on top of the framebuffer device. It has the functionality of | 5 | console running on top of the framebuffer device. It has the functionality of |
6 | any standard text console driver, such as the VGA console, with the added | 6 | any standard text console driver, such as the VGA console, with the added |
7 | features that can be attributed to the graphical nature of the framebuffer. | 7 | features that can be attributed to the graphical nature of the framebuffer. |
8 | 8 | ||
9 | In the x86 architecture, the framebuffer console is optional, and | 9 | In the x86 architecture, the framebuffer console is optional, and |
10 | some even treat it as a toy. For other architectures, it is the only available | 10 | some even treat it as a toy. For other architectures, it is the only available |
11 | display device, text or graphical. | 11 | display device, text or graphical. |
12 | 12 | ||
13 | What are the features of fbcon? The framebuffer console supports | 13 | What are the features of fbcon? The framebuffer console supports |
14 | high resolutions, varying font types, display rotation, primitive multihead, | 14 | high resolutions, varying font types, display rotation, primitive multihead, |
15 | etc. Theoretically, multi-colored fonts, blending, aliasing, and any feature | 15 | etc. Theoretically, multi-colored fonts, blending, aliasing, and any feature |
16 | made available by the underlying graphics card are also possible. | 16 | made available by the underlying graphics card are also possible. |
17 | 17 | ||
18 | A. Configuration | 18 | A. Configuration |
19 | 19 | ||
20 | The framebuffer console can be enabled by using your favorite kernel | 20 | The framebuffer console can be enabled by using your favorite kernel |
21 | configuration tool. It is under Device Drivers->Graphics Support->Support for | 21 | configuration tool. It is under Device Drivers->Graphics Support->Support for |
22 | framebuffer devices->Framebuffer Console Support. Select 'y' to compile | 22 | framebuffer devices->Framebuffer Console Support. Select 'y' to compile |
23 | support statically, or 'm' for module support. The module will be fbcon. | 23 | support statically, or 'm' for module support. The module will be fbcon. |
24 | 24 | ||
25 | In order for fbcon to activate, at least one framebuffer driver is | 25 | In order for fbcon to activate, at least one framebuffer driver is |
26 | required, so choose from any of the numerous drivers available. For x86 | 26 | required, so choose from any of the numerous drivers available. For x86 |
27 | systems, they almost universally have VGA cards, so vga16fb and vesafb will | 27 | systems, they almost universally have VGA cards, so vga16fb and vesafb will |
28 | always be available. However, using a chipset-specific driver will give you | 28 | always be available. However, using a chipset-specific driver will give you |
29 | more speed and features, such as the ability to change the video mode | 29 | more speed and features, such as the ability to change the video mode |
30 | dynamically. | 30 | dynamically. |
31 | 31 | ||
32 | To display the penguin logo, choose any logo available in Logo | 32 | To display the penguin logo, choose any logo available in Logo |
33 | Configuration->Boot up logo. | 33 | Configuration->Boot up logo. |
34 | 34 | ||
35 | Also, you will need to select at least one compiled-in fonts, but if | 35 | Also, you will need to select at least one compiled-in fonts, but if |
36 | you don't do anything, the kernel configuration tool will select one for you, | 36 | you don't do anything, the kernel configuration tool will select one for you, |
37 | usually an 8x16 font. | 37 | usually an 8x16 font. |
38 | 38 | ||
39 | GOTCHA: A common bug report is enabling the framebuffer without enabling the | 39 | GOTCHA: A common bug report is enabling the framebuffer without enabling the |
40 | framebuffer console. Depending on the driver, you may get a blanked or | 40 | framebuffer console. Depending on the driver, you may get a blanked or |
41 | garbled display, but the system still boots to completion. If you are | 41 | garbled display, but the system still boots to completion. If you are |
42 | fortunate to have a driver that does not alter the graphics chip, then you | 42 | fortunate to have a driver that does not alter the graphics chip, then you |
43 | will still get a VGA console. | 43 | will still get a VGA console. |
44 | 44 | ||
45 | B. Loading | 45 | B. Loading |
46 | 46 | ||
47 | Possible scenarios: | 47 | Possible scenarios: |
48 | 48 | ||
49 | 1. Driver and fbcon are compiled statically | 49 | 1. Driver and fbcon are compiled statically |
50 | 50 | ||
51 | Usually, fbcon will automatically take over your console. The notable | 51 | Usually, fbcon will automatically take over your console. The notable |
52 | exception is vesafb. It needs to be explicitly activated with the | 52 | exception is vesafb. It needs to be explicitly activated with the |
53 | vga= boot option parameter. | 53 | vga= boot option parameter. |
54 | 54 | ||
55 | 2. Driver is compiled statically, fbcon is compiled as a module | 55 | 2. Driver is compiled statically, fbcon is compiled as a module |
56 | 56 | ||
57 | Depending on the driver, you either get a standard console, or a | 57 | Depending on the driver, you either get a standard console, or a |
58 | garbled display, as mentioned above. To get a framebuffer console, | 58 | garbled display, as mentioned above. To get a framebuffer console, |
59 | do a 'modprobe fbcon'. | 59 | do a 'modprobe fbcon'. |
60 | 60 | ||
61 | 3. Driver is compiled as a module, fbcon is compiled statically | 61 | 3. Driver is compiled as a module, fbcon is compiled statically |
62 | 62 | ||
63 | You get your standard console. Once the driver is loaded with | 63 | You get your standard console. Once the driver is loaded with |
64 | 'modprobe xxxfb', fbcon automatically takes over the console with | 64 | 'modprobe xxxfb', fbcon automatically takes over the console with |
65 | the possible exception of using the fbcon=map:n option. See below. | 65 | the possible exception of using the fbcon=map:n option. See below. |
66 | 66 | ||
67 | 4. Driver and fbcon are compiled as a module. | 67 | 4. Driver and fbcon are compiled as a module. |
68 | 68 | ||
69 | You can load them in any order. Once both are loaded, fbcon will take | 69 | You can load them in any order. Once both are loaded, fbcon will take |
70 | over the console. | 70 | over the console. |
71 | 71 | ||
72 | C. Boot options | 72 | C. Boot options |
73 | 73 | ||
74 | The framebuffer console has several, largely unknown, boot options | 74 | The framebuffer console has several, largely unknown, boot options |
75 | that can change its behavior. | 75 | that can change its behavior. |
76 | 76 | ||
77 | 1. fbcon=font:<name> | 77 | 1. fbcon=font:<name> |
78 | 78 | ||
79 | Select the initial font to use. The value 'name' can be any of the | 79 | Select the initial font to use. The value 'name' can be any of the |
80 | compiled-in fonts: VGA8x16, 7x14, 10x18, VGA8x8, MINI4x6, RomanLarge, | 80 | compiled-in fonts: VGA8x16, 7x14, 10x18, VGA8x8, MINI4x6, RomanLarge, |
81 | SUN8x16, SUN12x22, ProFont6x11, Acorn8x8, PEARL8x8. | 81 | SUN8x16, SUN12x22, ProFont6x11, Acorn8x8, PEARL8x8. |
82 | 82 | ||
83 | Note, not all drivers can handle font with widths not divisible by 8, | 83 | Note, not all drivers can handle font with widths not divisible by 8, |
84 | such as vga16fb. | 84 | such as vga16fb. |
85 | 85 | ||
86 | 2. fbcon=scrollback:<value>[k] | 86 | 2. fbcon=scrollback:<value>[k] |
87 | 87 | ||
88 | The scrollback buffer is memory that is used to preserve display | 88 | The scrollback buffer is memory that is used to preserve display |
89 | contents that has already scrolled past your view. This is accessed | 89 | contents that has already scrolled past your view. This is accessed |
90 | by using the Shift-PageUp key combination. The value 'value' is any | 90 | by using the Shift-PageUp key combination. The value 'value' is any |
91 | integer. It defaults to 32KB. The 'k' suffix is optional, and will | 91 | integer. It defaults to 32KB. The 'k' suffix is optional, and will |
92 | multiply the 'value' by 1024. | 92 | multiply the 'value' by 1024. |
93 | 93 | ||
94 | 3. fbcon=map:<0123> | 94 | 3. fbcon=map:<0123> |
95 | 95 | ||
96 | This is an interesting option. It tells which driver gets mapped to | 96 | This is an interesting option. It tells which driver gets mapped to |
97 | which console. The value '0123' is a sequence that gets repeated until | 97 | which console. The value '0123' is a sequence that gets repeated until |
98 | the total length is 64 which is the number of consoles available. In | 98 | the total length is 64 which is the number of consoles available. In |
99 | the above example, it is expanded to 012301230123... and the mapping | 99 | the above example, it is expanded to 012301230123... and the mapping |
100 | will be: | 100 | will be: |
101 | 101 | ||
102 | tty | 1 2 3 4 5 6 7 8 9 ... | 102 | tty | 1 2 3 4 5 6 7 8 9 ... |
103 | fb | 0 1 2 3 0 1 2 3 0 ... | 103 | fb | 0 1 2 3 0 1 2 3 0 ... |
104 | 104 | ||
105 | ('cat /proc/fb' should tell you what the fb numbers are) | 105 | ('cat /proc/fb' should tell you what the fb numbers are) |
106 | 106 | ||
107 | One side effect that may be useful is using a map value that exceeds | 107 | One side effect that may be useful is using a map value that exceeds |
108 | the number of loaded fb drivers. For example, if only one driver is | 108 | the number of loaded fb drivers. For example, if only one driver is |
109 | available, fb0, adding fbcon=map:1 tells fbcon not to take over the | 109 | available, fb0, adding fbcon=map:1 tells fbcon not to take over the |
110 | console. | 110 | console. |
111 | 111 | ||
112 | Later on, when you want to map the console the to the framebuffer | 112 | Later on, when you want to map the console the to the framebuffer |
113 | device, you can use the con2fbmap utility. | 113 | device, you can use the con2fbmap utility. |
114 | 114 | ||
115 | 4. fbcon=vc:<n1>-<n2> | 115 | 4. fbcon=vc:<n1>-<n2> |
116 | 116 | ||
117 | This option tells fbcon to take over only a range of consoles as | 117 | This option tells fbcon to take over only a range of consoles as |
118 | specified by the values 'n1' and 'n2'. The rest of the consoles | 118 | specified by the values 'n1' and 'n2'. The rest of the consoles |
119 | outside the given range will still be controlled by the standard | 119 | outside the given range will still be controlled by the standard |
120 | console driver. | 120 | console driver. |
121 | 121 | ||
122 | NOTE: For x86 machines, the standard console is the VGA console which | 122 | NOTE: For x86 machines, the standard console is the VGA console which |
123 | is typically located on the same video card. Thus, the consoles that | 123 | is typically located on the same video card. Thus, the consoles that |
124 | are controlled by the VGA console will be garbled. | 124 | are controlled by the VGA console will be garbled. |
125 | 125 | ||
126 | 4. fbcon=rotate:<n> | 126 | 4. fbcon=rotate:<n> |
127 | 127 | ||
128 | This option changes the orientation angle of the console display. The | 128 | This option changes the orientation angle of the console display. The |
129 | value 'n' accepts the following: | 129 | value 'n' accepts the following: |
130 | 130 | ||
131 | 0 - normal orientation (0 degree) | 131 | 0 - normal orientation (0 degree) |
132 | 1 - clockwise orientation (90 degrees) | 132 | 1 - clockwise orientation (90 degrees) |
133 | 2 - upside down orientation (180 degrees) | 133 | 2 - upside down orientation (180 degrees) |
134 | 3 - counterclockwise orientation (270 degrees) | 134 | 3 - counterclockwise orientation (270 degrees) |
135 | 135 | ||
136 | The angle can be changed anytime afterwards by 'echoing' the same | 136 | The angle can be changed anytime afterwards by 'echoing' the same |
137 | numbers to any one of the 2 attributes found in | 137 | numbers to any one of the 2 attributes found in |
138 | /sys/class/graphics/fbcon | 138 | /sys/class/graphics/fbcon |
139 | 139 | ||
140 | rotate - rotate the display of the active console | 140 | rotate - rotate the display of the active console |
141 | rotate_all - rotate the display of all consoles | 141 | rotate_all - rotate the display of all consoles |
142 | 142 | ||
143 | Console rotation will only become available if Console Rotation | 143 | Console rotation will only become available if Console Rotation |
144 | Support is compiled in your kernel. | 144 | Support is compiled in your kernel. |
145 | 145 | ||
146 | NOTE: This is purely console rotation. Any other applications that | 146 | NOTE: This is purely console rotation. Any other applications that |
147 | use the framebuffer will remain at their 'normal'orientation. | 147 | use the framebuffer will remain at their 'normal'orientation. |
148 | Actually, the underlying fb driver is totally ignorant of console | 148 | Actually, the underlying fb driver is totally ignorant of console |
149 | rotation. | 149 | rotation. |
150 | 150 | ||
151 | C. Attaching, Detaching and Unloading | 151 | C. Attaching, Detaching and Unloading |
152 | 152 | ||
153 | Before going on on how to attach, detach and unload the framebuffer console, an | 153 | Before going on on how to attach, detach and unload the framebuffer console, an |
154 | illustration of the dependencies may help. | 154 | illustration of the dependencies may help. |
155 | 155 | ||
156 | The console layer, as with most subsystems, needs a driver that interfaces with | 156 | The console layer, as with most subsystems, needs a driver that interfaces with |
157 | the hardware. Thus, in a VGA console: | 157 | the hardware. Thus, in a VGA console: |
158 | 158 | ||
159 | console ---> VGA driver ---> hardware. | 159 | console ---> VGA driver ---> hardware. |
160 | 160 | ||
161 | Assuming the VGA driver can be unloaded, one must first unbind the VGA driver | 161 | Assuming the VGA driver can be unloaded, one must first unbind the VGA driver |
162 | from the console layer before unloading the driver. The VGA driver cannot be | 162 | from the console layer before unloading the driver. The VGA driver cannot be |
163 | unloaded if it is still bound to the console layer. (See | 163 | unloaded if it is still bound to the console layer. (See |
164 | Documentation/console/console.txt for more information). | 164 | Documentation/console/console.txt for more information). |
165 | 165 | ||
166 | This is more complicated in the case of the the framebuffer console (fbcon), | 166 | This is more complicated in the case of the framebuffer console (fbcon), |
167 | because fbcon is an intermediate layer between the console and the drivers: | 167 | because fbcon is an intermediate layer between the console and the drivers: |
168 | 168 | ||
169 | console ---> fbcon ---> fbdev drivers ---> hardware | 169 | console ---> fbcon ---> fbdev drivers ---> hardware |
170 | 170 | ||
171 | The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot | 171 | The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot |
172 | be unloaded if it's bound to the console layer. | 172 | be unloaded if it's bound to the console layer. |
173 | 173 | ||
174 | So to unload the fbdev drivers, one must first unbind fbcon from the console, | 174 | So to unload the fbdev drivers, one must first unbind fbcon from the console, |
175 | then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from | 175 | then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from |
176 | the console layer will automatically unbind framebuffer drivers from | 176 | the console layer will automatically unbind framebuffer drivers from |
177 | fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from | 177 | fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from |
178 | fbcon. | 178 | fbcon. |
179 | 179 | ||
180 | So, how do we unbind fbcon from the console? Part of the answer is in | 180 | So, how do we unbind fbcon from the console? Part of the answer is in |
181 | Documentation/console/console.txt. To summarize: | 181 | Documentation/console/console.txt. To summarize: |
182 | 182 | ||
183 | Echo a value to the bind file that represents the framebuffer console | 183 | Echo a value to the bind file that represents the framebuffer console |
184 | driver. So assuming vtcon1 represents fbcon, then: | 184 | driver. So assuming vtcon1 represents fbcon, then: |
185 | 185 | ||
186 | echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to | 186 | echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to |
187 | console layer | 187 | console layer |
188 | echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from | 188 | echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from |
189 | console layer | 189 | console layer |
190 | 190 | ||
191 | If fbcon is detached from the console layer, your boot console driver (which is | 191 | If fbcon is detached from the console layer, your boot console driver (which is |
192 | usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will | 192 | usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will |
193 | restore VGA text mode for you. With the rest, before detaching fbcon, you | 193 | restore VGA text mode for you. With the rest, before detaching fbcon, you |
194 | must take a few additional steps to make sure that your VGA text mode is | 194 | must take a few additional steps to make sure that your VGA text mode is |
195 | restored properly. The following is one of the several methods that you can do: | 195 | restored properly. The following is one of the several methods that you can do: |
196 | 196 | ||
197 | 1. Download or install vbetool. This utility is included with most | 197 | 1. Download or install vbetool. This utility is included with most |
198 | distributions nowadays, and is usually part of the suspend/resume tool. | 198 | distributions nowadays, and is usually part of the suspend/resume tool. |
199 | 199 | ||
200 | 2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set | 200 | 2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set |
201 | to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers. | 201 | to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers. |
202 | 202 | ||
203 | 3. Boot into text mode and as root run: | 203 | 3. Boot into text mode and as root run: |
204 | 204 | ||
205 | vbetool vbestate save > <vga state file> | 205 | vbetool vbestate save > <vga state file> |
206 | 206 | ||
207 | The above command saves the register contents of your graphics | 207 | The above command saves the register contents of your graphics |
208 | hardware to <vga state file>. You need to do this step only once as | 208 | hardware to <vga state file>. You need to do this step only once as |
209 | the state file can be reused. | 209 | the state file can be reused. |
210 | 210 | ||
211 | 4. If fbcon is compiled as a module, load fbcon by doing: | 211 | 4. If fbcon is compiled as a module, load fbcon by doing: |
212 | 212 | ||
213 | modprobe fbcon | 213 | modprobe fbcon |
214 | 214 | ||
215 | 5. Now to detach fbcon: | 215 | 5. Now to detach fbcon: |
216 | 216 | ||
217 | vbetool vbestate restore < <vga state file> && \ | 217 | vbetool vbestate restore < <vga state file> && \ |
218 | echo 0 > /sys/class/vtconsole/vtcon1/bind | 218 | echo 0 > /sys/class/vtconsole/vtcon1/bind |
219 | 219 | ||
220 | 6. That's it, you're back to VGA mode. And if you compiled fbcon as a module, | 220 | 6. That's it, you're back to VGA mode. And if you compiled fbcon as a module, |
221 | you can unload it by 'rmmod fbcon' | 221 | you can unload it by 'rmmod fbcon' |
222 | 222 | ||
223 | 7. To reattach fbcon: | 223 | 7. To reattach fbcon: |
224 | 224 | ||
225 | echo 1 > /sys/class/vtconsole/vtcon1/bind | 225 | echo 1 > /sys/class/vtconsole/vtcon1/bind |
226 | 226 | ||
227 | 8. Once fbcon is unbound, all drivers registered to the system will also | 227 | 8. Once fbcon is unbound, all drivers registered to the system will also |
228 | become unbound. This means that fbcon and individual framebuffer drivers | 228 | become unbound. This means that fbcon and individual framebuffer drivers |
229 | can be unloaded or reloaded at will. Reloading the drivers or fbcon will | 229 | can be unloaded or reloaded at will. Reloading the drivers or fbcon will |
230 | automatically bind the console, fbcon and the drivers together. Unloading | 230 | automatically bind the console, fbcon and the drivers together. Unloading |
231 | all the drivers without unloading fbcon will make it impossible for the | 231 | all the drivers without unloading fbcon will make it impossible for the |
232 | console to bind fbcon. | 232 | console to bind fbcon. |
233 | 233 | ||
234 | Notes for vesafb users: | 234 | Notes for vesafb users: |
235 | ======================= | 235 | ======================= |
236 | 236 | ||
237 | Unfortunately, if your bootline includes a vga=xxx parameter that sets the | 237 | Unfortunately, if your bootline includes a vga=xxx parameter that sets the |
238 | hardware in graphics mode, such as when loading vesafb, vgacon will not load. | 238 | hardware in graphics mode, such as when loading vesafb, vgacon will not load. |
239 | Instead, vgacon will replace the default boot console with dummycon, and you | 239 | Instead, vgacon will replace the default boot console with dummycon, and you |
240 | won't get any display after detaching fbcon. Your machine is still alive, so | 240 | won't get any display after detaching fbcon. Your machine is still alive, so |
241 | you can reattach vesafb. However, to reattach vesafb, you need to do one of | 241 | you can reattach vesafb. However, to reattach vesafb, you need to do one of |
242 | the following: | 242 | the following: |
243 | 243 | ||
244 | Variation 1: | 244 | Variation 1: |
245 | 245 | ||
246 | a. Before detaching fbcon, do | 246 | a. Before detaching fbcon, do |
247 | 247 | ||
248 | vbetool vbemode save > <vesa state file> # do once for each vesafb mode, | 248 | vbetool vbemode save > <vesa state file> # do once for each vesafb mode, |
249 | # the file can be reused | 249 | # the file can be reused |
250 | 250 | ||
251 | b. Detach fbcon as in step 5. | 251 | b. Detach fbcon as in step 5. |
252 | 252 | ||
253 | c. Attach fbcon | 253 | c. Attach fbcon |
254 | 254 | ||
255 | vbetool vbestate restore < <vesa state file> && \ | 255 | vbetool vbestate restore < <vesa state file> && \ |
256 | echo 1 > /sys/class/vtconsole/vtcon1/bind | 256 | echo 1 > /sys/class/vtconsole/vtcon1/bind |
257 | 257 | ||
258 | Variation 2: | 258 | Variation 2: |
259 | 259 | ||
260 | a. Before detaching fbcon, do: | 260 | a. Before detaching fbcon, do: |
261 | echo <ID> > /sys/class/tty/console/bind | 261 | echo <ID> > /sys/class/tty/console/bind |
262 | 262 | ||
263 | 263 | ||
264 | vbetool vbemode get | 264 | vbetool vbemode get |
265 | 265 | ||
266 | b. Take note of the mode number | 266 | b. Take note of the mode number |
267 | 267 | ||
268 | b. Detach fbcon as in step 5. | 268 | b. Detach fbcon as in step 5. |
269 | 269 | ||
270 | c. Attach fbcon: | 270 | c. Attach fbcon: |
271 | 271 | ||
272 | vbetool vbemode set <mode number> && \ | 272 | vbetool vbemode set <mode number> && \ |
273 | echo 1 > /sys/class/vtconsole/vtcon1/bind | 273 | echo 1 > /sys/class/vtconsole/vtcon1/bind |
274 | 274 | ||
275 | Samples: | 275 | Samples: |
276 | ======== | 276 | ======== |
277 | 277 | ||
278 | Here are 2 sample bash scripts that you can use to bind or unbind the | 278 | Here are 2 sample bash scripts that you can use to bind or unbind the |
279 | framebuffer console driver if you are in an X86 box: | 279 | framebuffer console driver if you are in an X86 box: |
280 | 280 | ||
281 | --------------------------------------------------------------------------- | 281 | --------------------------------------------------------------------------- |
282 | #!/bin/bash | 282 | #!/bin/bash |
283 | # Unbind fbcon | 283 | # Unbind fbcon |
284 | 284 | ||
285 | # Change this to where your actual vgastate file is located | 285 | # Change this to where your actual vgastate file is located |
286 | # Or Use VGASTATE=$1 to indicate the state file at runtime | 286 | # Or Use VGASTATE=$1 to indicate the state file at runtime |
287 | VGASTATE=/tmp/vgastate | 287 | VGASTATE=/tmp/vgastate |
288 | 288 | ||
289 | # path to vbetool | 289 | # path to vbetool |
290 | VBETOOL=/usr/local/bin | 290 | VBETOOL=/usr/local/bin |
291 | 291 | ||
292 | 292 | ||
293 | for (( i = 0; i < 16; i++)) | 293 | for (( i = 0; i < 16; i++)) |
294 | do | 294 | do |
295 | if test -x /sys/class/vtconsole/vtcon$i; then | 295 | if test -x /sys/class/vtconsole/vtcon$i; then |
296 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ | 296 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ |
297 | = 1 ]; then | 297 | = 1 ]; then |
298 | if test -x $VBETOOL/vbetool; then | 298 | if test -x $VBETOOL/vbetool; then |
299 | echo Unbinding vtcon$i | 299 | echo Unbinding vtcon$i |
300 | $VBETOOL/vbetool vbestate restore < $VGASTATE | 300 | $VBETOOL/vbetool vbestate restore < $VGASTATE |
301 | echo 0 > /sys/class/vtconsole/vtcon$i/bind | 301 | echo 0 > /sys/class/vtconsole/vtcon$i/bind |
302 | fi | 302 | fi |
303 | fi | 303 | fi |
304 | fi | 304 | fi |
305 | done | 305 | done |
306 | 306 | ||
307 | --------------------------------------------------------------------------- | 307 | --------------------------------------------------------------------------- |
308 | #!/bin/bash | 308 | #!/bin/bash |
309 | # Bind fbcon | 309 | # Bind fbcon |
310 | 310 | ||
311 | for (( i = 0; i < 16; i++)) | 311 | for (( i = 0; i < 16; i++)) |
312 | do | 312 | do |
313 | if test -x /sys/class/vtconsole/vtcon$i; then | 313 | if test -x /sys/class/vtconsole/vtcon$i; then |
314 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ | 314 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ |
315 | = 1 ]; then | 315 | = 1 ]; then |
316 | echo Unbinding vtcon$i | 316 | echo Unbinding vtcon$i |
317 | echo 1 > /sys/class/vtconsole/vtcon$i/bind | 317 | echo 1 > /sys/class/vtconsole/vtcon$i/bind |
318 | fi | 318 | fi |
319 | fi | 319 | fi |
320 | done | 320 | done |
321 | --------------------------------------------------------------------------- | 321 | --------------------------------------------------------------------------- |
322 | 322 | ||
323 | -- | 323 | -- |
324 | Antonino Daplas <adaplas@pol.net> | 324 | Antonino Daplas <adaplas@pol.net> |
325 | 325 |
Documentation/filesystems/directory-locking
1 | Locking scheme used for directory operations is based on two | 1 | Locking scheme used for directory operations is based on two |
2 | kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem). | 2 | kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem). |
3 | 3 | ||
4 | For our purposes all operations fall in 5 classes: | 4 | For our purposes all operations fall in 5 classes: |
5 | 5 | ||
6 | 1) read access. Locking rules: caller locks directory we are accessing. | 6 | 1) read access. Locking rules: caller locks directory we are accessing. |
7 | 7 | ||
8 | 2) object creation. Locking rules: same as above. | 8 | 2) object creation. Locking rules: same as above. |
9 | 9 | ||
10 | 3) object removal. Locking rules: caller locks parent, finds victim, | 10 | 3) object removal. Locking rules: caller locks parent, finds victim, |
11 | locks victim and calls the method. | 11 | locks victim and calls the method. |
12 | 12 | ||
13 | 4) rename() that is _not_ cross-directory. Locking rules: caller locks | 13 | 4) rename() that is _not_ cross-directory. Locking rules: caller locks |
14 | the parent, finds source and target, if target already exists - locks it | 14 | the parent, finds source and target, if target already exists - locks it |
15 | and then calls the method. | 15 | and then calls the method. |
16 | 16 | ||
17 | 5) link creation. Locking rules: | 17 | 5) link creation. Locking rules: |
18 | * lock parent | 18 | * lock parent |
19 | * check that source is not a directory | 19 | * check that source is not a directory |
20 | * lock source | 20 | * lock source |
21 | * call the method. | 21 | * call the method. |
22 | 22 | ||
23 | 6) cross-directory rename. The trickiest in the whole bunch. Locking | 23 | 6) cross-directory rename. The trickiest in the whole bunch. Locking |
24 | rules: | 24 | rules: |
25 | * lock the filesystem | 25 | * lock the filesystem |
26 | * lock parents in "ancestors first" order. | 26 | * lock parents in "ancestors first" order. |
27 | * find source and target. | 27 | * find source and target. |
28 | * if old parent is equal to or is a descendent of target | 28 | * if old parent is equal to or is a descendent of target |
29 | fail with -ENOTEMPTY | 29 | fail with -ENOTEMPTY |
30 | * if new parent is equal to or is a descendent of source | 30 | * if new parent is equal to or is a descendent of source |
31 | fail with -ELOOP | 31 | fail with -ELOOP |
32 | * if target exists - lock it. | 32 | * if target exists - lock it. |
33 | * call the method. | 33 | * call the method. |
34 | 34 | ||
35 | 35 | ||
36 | The rules above obviously guarantee that all directories that are going to be | 36 | The rules above obviously guarantee that all directories that are going to be |
37 | read, modified or removed by method will be locked by caller. | 37 | read, modified or removed by method will be locked by caller. |
38 | 38 | ||
39 | 39 | ||
40 | If no directory is its own ancestor, the scheme above is deadlock-free. | 40 | If no directory is its own ancestor, the scheme above is deadlock-free. |
41 | Proof: | 41 | Proof: |
42 | 42 | ||
43 | First of all, at any moment we have a partial ordering of the | 43 | First of all, at any moment we have a partial ordering of the |
44 | objects - A < B iff A is an ancestor of B. | 44 | objects - A < B iff A is an ancestor of B. |
45 | 45 | ||
46 | That ordering can change. However, the following is true: | 46 | That ordering can change. However, the following is true: |
47 | 47 | ||
48 | (1) if object removal or non-cross-directory rename holds lock on A and | 48 | (1) if object removal or non-cross-directory rename holds lock on A and |
49 | attempts to acquire lock on B, A will remain the parent of B until we | 49 | attempts to acquire lock on B, A will remain the parent of B until we |
50 | acquire the lock on B. (Proof: only cross-directory rename can change | 50 | acquire the lock on B. (Proof: only cross-directory rename can change |
51 | the parent of object and it would have to lock the parent). | 51 | the parent of object and it would have to lock the parent). |
52 | 52 | ||
53 | (2) if cross-directory rename holds the lock on filesystem, order will not | 53 | (2) if cross-directory rename holds the lock on filesystem, order will not |
54 | change until rename acquires all locks. (Proof: other cross-directory | 54 | change until rename acquires all locks. (Proof: other cross-directory |
55 | renames will be blocked on filesystem lock and we don't start changing | 55 | renames will be blocked on filesystem lock and we don't start changing |
56 | the order until we had acquired all locks). | 56 | the order until we had acquired all locks). |
57 | 57 | ||
58 | (3) any operation holds at most one lock on non-directory object and | 58 | (3) any operation holds at most one lock on non-directory object and |
59 | that lock is acquired after all other locks. (Proof: see descriptions | 59 | that lock is acquired after all other locks. (Proof: see descriptions |
60 | of operations). | 60 | of operations). |
61 | 61 | ||
62 | Now consider the minimal deadlock. Each process is blocked on | 62 | Now consider the minimal deadlock. Each process is blocked on |
63 | attempt to acquire some lock and already holds at least one lock. Let's | 63 | attempt to acquire some lock and already holds at least one lock. Let's |
64 | consider the set of contended locks. First of all, filesystem lock is | 64 | consider the set of contended locks. First of all, filesystem lock is |
65 | not contended, since any process blocked on it is not holding any locks. | 65 | not contended, since any process blocked on it is not holding any locks. |
66 | Thus all processes are blocked on ->i_sem. | 66 | Thus all processes are blocked on ->i_sem. |
67 | 67 | ||
68 | Non-directory objects are not contended due to (3). Thus link | 68 | Non-directory objects are not contended due to (3). Thus link |
69 | creation can't be a part of deadlock - it can't be blocked on source | 69 | creation can't be a part of deadlock - it can't be blocked on source |
70 | and it means that it doesn't hold any locks. | 70 | and it means that it doesn't hold any locks. |
71 | 71 | ||
72 | Any contended object is either held by cross-directory rename or | 72 | Any contended object is either held by cross-directory rename or |
73 | has a child that is also contended. Indeed, suppose that it is held by | 73 | has a child that is also contended. Indeed, suppose that it is held by |
74 | operation other than cross-directory rename. Then the lock this operation | 74 | operation other than cross-directory rename. Then the lock this operation |
75 | is blocked on belongs to child of that object due to (1). | 75 | is blocked on belongs to child of that object due to (1). |
76 | 76 | ||
77 | It means that one of the operations is cross-directory rename. | 77 | It means that one of the operations is cross-directory rename. |
78 | Otherwise the set of contended objects would be infinite - each of them | 78 | Otherwise the set of contended objects would be infinite - each of them |
79 | would have a contended child and we had assumed that no object is its | 79 | would have a contended child and we had assumed that no object is its |
80 | own descendent. Moreover, there is exactly one cross-directory rename | 80 | own descendent. Moreover, there is exactly one cross-directory rename |
81 | (see above). | 81 | (see above). |
82 | 82 | ||
83 | Consider the object blocking the cross-directory rename. One | 83 | Consider the object blocking the cross-directory rename. One |
84 | of its descendents is locked by cross-directory rename (otherwise we | 84 | of its descendents is locked by cross-directory rename (otherwise we |
85 | would again have an infinite set of of contended objects). But that | 85 | would again have an infinite set of contended objects). But that |
86 | means that cross-directory rename is taking locks out of order. Due | 86 | means that cross-directory rename is taking locks out of order. Due |
87 | to (2) the order hadn't changed since we had acquired filesystem lock. | 87 | to (2) the order hadn't changed since we had acquired filesystem lock. |
88 | But locking rules for cross-directory rename guarantee that we do not | 88 | But locking rules for cross-directory rename guarantee that we do not |
89 | try to acquire lock on descendent before the lock on ancestor. | 89 | try to acquire lock on descendent before the lock on ancestor. |
90 | Contradiction. I.e. deadlock is impossible. Q.E.D. | 90 | Contradiction. I.e. deadlock is impossible. Q.E.D. |
91 | 91 | ||
92 | 92 | ||
93 | These operations are guaranteed to avoid loop creation. Indeed, | 93 | These operations are guaranteed to avoid loop creation. Indeed, |
94 | the only operation that could introduce loops is cross-directory rename. | 94 | the only operation that could introduce loops is cross-directory rename. |
95 | Since the only new (parent, child) pair added by rename() is (new parent, | 95 | Since the only new (parent, child) pair added by rename() is (new parent, |
96 | source), such loop would have to contain these objects and the rest of it | 96 | source), such loop would have to contain these objects and the rest of it |
97 | would have to exist before rename(). I.e. at the moment of loop creation | 97 | would have to exist before rename(). I.e. at the moment of loop creation |
98 | rename() responsible for that would be holding filesystem lock and new parent | 98 | rename() responsible for that would be holding filesystem lock and new parent |
99 | would have to be equal to or a descendent of source. But that means that | 99 | would have to be equal to or a descendent of source. But that means that |
100 | new parent had been equal to or a descendent of source since the moment when | 100 | new parent had been equal to or a descendent of source since the moment when |
101 | we had acquired filesystem lock and rename() would fail with -ELOOP in that | 101 | we had acquired filesystem lock and rename() would fail with -ELOOP in that |
102 | case. | 102 | case. |
103 | 103 | ||
104 | While this locking scheme works for arbitrary DAGs, it relies on | 104 | While this locking scheme works for arbitrary DAGs, it relies on |
105 | ability to check that directory is a descendent of another object. Current | 105 | ability to check that directory is a descendent of another object. Current |
106 | implementation assumes that directory graph is a tree. This assumption is | 106 | implementation assumes that directory graph is a tree. This assumption is |
107 | also preserved by all operations (cross-directory rename on a tree that would | 107 | also preserved by all operations (cross-directory rename on a tree that would |
108 | not introduce a cycle will leave it a tree and link() fails for directories). | 108 | not introduce a cycle will leave it a tree and link() fails for directories). |
109 | 109 | ||
110 | Notice that "directory" in the above == "anything that might have | 110 | Notice that "directory" in the above == "anything that might have |
111 | children", so if we are going to introduce hybrid objects we will need | 111 | children", so if we are going to introduce hybrid objects we will need |
112 | either to make sure that link(2) doesn't work for them or to make changes | 112 | either to make sure that link(2) doesn't work for them or to make changes |
113 | in is_subdir() that would make it work even in presence of such beasts. | 113 | in is_subdir() that would make it work even in presence of such beasts. |
114 | 114 |
Documentation/filesystems/files.txt
1 | File management in the Linux kernel | 1 | File management in the Linux kernel |
2 | ----------------------------------- | 2 | ----------------------------------- |
3 | 3 | ||
4 | This document describes how locking for files (struct file) | 4 | This document describes how locking for files (struct file) |
5 | and file descriptor table (struct files) works. | 5 | and file descriptor table (struct files) works. |
6 | 6 | ||
7 | Up until 2.6.12, the file descriptor table has been protected | 7 | Up until 2.6.12, the file descriptor table has been protected |
8 | with a lock (files->file_lock) and reference count (files->count). | 8 | with a lock (files->file_lock) and reference count (files->count). |
9 | ->file_lock protected accesses to all the file related fields | 9 | ->file_lock protected accesses to all the file related fields |
10 | of the table. ->count was used for sharing the file descriptor | 10 | of the table. ->count was used for sharing the file descriptor |
11 | table between tasks cloned with CLONE_FILES flag. Typically | 11 | table between tasks cloned with CLONE_FILES flag. Typically |
12 | this would be the case for posix threads. As with the common | 12 | this would be the case for posix threads. As with the common |
13 | refcounting model in the kernel, the last task doing | 13 | refcounting model in the kernel, the last task doing |
14 | a put_files_struct() frees the file descriptor (fd) table. | 14 | a put_files_struct() frees the file descriptor (fd) table. |
15 | The files (struct file) themselves are protected using | 15 | The files (struct file) themselves are protected using |
16 | reference count (->f_count). | 16 | reference count (->f_count). |
17 | 17 | ||
18 | In the new lock-free model of file descriptor management, | 18 | In the new lock-free model of file descriptor management, |
19 | the reference counting is similar, but the locking is | 19 | the reference counting is similar, but the locking is |
20 | based on RCU. The file descriptor table contains multiple | 20 | based on RCU. The file descriptor table contains multiple |
21 | elements - the fd sets (open_fds and close_on_exec, the | 21 | elements - the fd sets (open_fds and close_on_exec, the |
22 | array of file pointers, the sizes of the sets and the array | 22 | array of file pointers, the sizes of the sets and the array |
23 | etc.). In order for the updates to appear atomic to | 23 | etc.). In order for the updates to appear atomic to |
24 | a lock-free reader, all the elements of the file descriptor | 24 | a lock-free reader, all the elements of the file descriptor |
25 | table are in a separate structure - struct fdtable. | 25 | table are in a separate structure - struct fdtable. |
26 | files_struct contains a pointer to struct fdtable through | 26 | files_struct contains a pointer to struct fdtable through |
27 | which the actual fd table is accessed. Initially the | 27 | which the actual fd table is accessed. Initially the |
28 | fdtable is embedded in files_struct itself. On a subsequent | 28 | fdtable is embedded in files_struct itself. On a subsequent |
29 | expansion of fdtable, a new fdtable structure is allocated | 29 | expansion of fdtable, a new fdtable structure is allocated |
30 | and files->fdtab points to the new structure. The fdtable | 30 | and files->fdtab points to the new structure. The fdtable |
31 | structure is freed with RCU and lock-free readers either | 31 | structure is freed with RCU and lock-free readers either |
32 | see the old fdtable or the new fdtable making the update | 32 | see the old fdtable or the new fdtable making the update |
33 | appear atomic. Here are the locking rules for | 33 | appear atomic. Here are the locking rules for |
34 | the fdtable structure - | 34 | the fdtable structure - |
35 | 35 | ||
36 | 1. All references to the fdtable must be done through | 36 | 1. All references to the fdtable must be done through |
37 | the files_fdtable() macro : | 37 | the files_fdtable() macro : |
38 | 38 | ||
39 | struct fdtable *fdt; | 39 | struct fdtable *fdt; |
40 | 40 | ||
41 | rcu_read_lock(); | 41 | rcu_read_lock(); |
42 | 42 | ||
43 | fdt = files_fdtable(files); | 43 | fdt = files_fdtable(files); |
44 | .... | 44 | .... |
45 | if (n <= fdt->max_fds) | 45 | if (n <= fdt->max_fds) |
46 | .... | 46 | .... |
47 | ... | 47 | ... |
48 | rcu_read_unlock(); | 48 | rcu_read_unlock(); |
49 | 49 | ||
50 | files_fdtable() uses rcu_dereference() macro which takes care of | 50 | files_fdtable() uses rcu_dereference() macro which takes care of |
51 | the memory barrier requirements for lock-free dereference. | 51 | the memory barrier requirements for lock-free dereference. |
52 | The fdtable pointer must be read within the read-side | 52 | The fdtable pointer must be read within the read-side |
53 | critical section. | 53 | critical section. |
54 | 54 | ||
55 | 2. Reading of the fdtable as described above must be protected | 55 | 2. Reading of the fdtable as described above must be protected |
56 | by rcu_read_lock()/rcu_read_unlock(). | 56 | by rcu_read_lock()/rcu_read_unlock(). |
57 | 57 | ||
58 | 3. For any update to the the fd table, files->file_lock must | 58 | 3. For any update to the fd table, files->file_lock must |
59 | be held. | 59 | be held. |
60 | 60 | ||
61 | 4. To look up the file structure given an fd, a reader | 61 | 4. To look up the file structure given an fd, a reader |
62 | must use either fcheck() or fcheck_files() APIs. These | 62 | must use either fcheck() or fcheck_files() APIs. These |
63 | take care of barrier requirements due to lock-free lookup. | 63 | take care of barrier requirements due to lock-free lookup. |
64 | An example : | 64 | An example : |
65 | 65 | ||
66 | struct file *file; | 66 | struct file *file; |
67 | 67 | ||
68 | rcu_read_lock(); | 68 | rcu_read_lock(); |
69 | file = fcheck(fd); | 69 | file = fcheck(fd); |
70 | if (file) { | 70 | if (file) { |
71 | ... | 71 | ... |
72 | } | 72 | } |
73 | .... | 73 | .... |
74 | rcu_read_unlock(); | 74 | rcu_read_unlock(); |
75 | 75 | ||
76 | 5. Handling of the file structures is special. Since the look-up | 76 | 5. Handling of the file structures is special. Since the look-up |
77 | of the fd (fget()/fget_light()) are lock-free, it is possible | 77 | of the fd (fget()/fget_light()) are lock-free, it is possible |
78 | that look-up may race with the last put() operation on the | 78 | that look-up may race with the last put() operation on the |
79 | file structure. This is avoided using the rcuref APIs | 79 | file structure. This is avoided using the rcuref APIs |
80 | on ->f_count : | 80 | on ->f_count : |
81 | 81 | ||
82 | rcu_read_lock(); | 82 | rcu_read_lock(); |
83 | file = fcheck_files(files, fd); | 83 | file = fcheck_files(files, fd); |
84 | if (file) { | 84 | if (file) { |
85 | if (rcuref_inc_lf(&file->f_count)) | 85 | if (rcuref_inc_lf(&file->f_count)) |
86 | *fput_needed = 1; | 86 | *fput_needed = 1; |
87 | else | 87 | else |
88 | /* Didn't get the reference, someone's freed */ | 88 | /* Didn't get the reference, someone's freed */ |
89 | file = NULL; | 89 | file = NULL; |
90 | } | 90 | } |
91 | rcu_read_unlock(); | 91 | rcu_read_unlock(); |
92 | .... | 92 | .... |
93 | return file; | 93 | return file; |
94 | 94 | ||
95 | rcuref_inc_lf() detects if refcounts is already zero or | 95 | rcuref_inc_lf() detects if refcounts is already zero or |
96 | goes to zero during increment. If it does, we fail | 96 | goes to zero during increment. If it does, we fail |
97 | fget()/fget_light(). | 97 | fget()/fget_light(). |
98 | 98 | ||
99 | 6. Since both fdtable and file structures can be looked up | 99 | 6. Since both fdtable and file structures can be looked up |
100 | lock-free, they must be installed using rcu_assign_pointer() | 100 | lock-free, they must be installed using rcu_assign_pointer() |
101 | API. If they are looked up lock-free, rcu_dereference() | 101 | API. If they are looked up lock-free, rcu_dereference() |
102 | must be used. However it is advisable to use files_fdtable() | 102 | must be used. However it is advisable to use files_fdtable() |
103 | and fcheck()/fcheck_files() which take care of these issues. | 103 | and fcheck()/fcheck_files() which take care of these issues. |
104 | 104 | ||
105 | 7. While updating, the fdtable pointer must be looked up while | 105 | 7. While updating, the fdtable pointer must be looked up while |
106 | holding files->file_lock. If ->file_lock is dropped, then | 106 | holding files->file_lock. If ->file_lock is dropped, then |
107 | another thread expand the files thereby creating a new | 107 | another thread expand the files thereby creating a new |
108 | fdtable and making the earlier fdtable pointer stale. | 108 | fdtable and making the earlier fdtable pointer stale. |
109 | For example : | 109 | For example : |
110 | 110 | ||
111 | spin_lock(&files->file_lock); | 111 | spin_lock(&files->file_lock); |
112 | fd = locate_fd(files, file, start); | 112 | fd = locate_fd(files, file, start); |
113 | if (fd >= 0) { | 113 | if (fd >= 0) { |
114 | /* locate_fd() may have expanded fdtable, load the ptr */ | 114 | /* locate_fd() may have expanded fdtable, load the ptr */ |
115 | fdt = files_fdtable(files); | 115 | fdt = files_fdtable(files); |
116 | FD_SET(fd, fdt->open_fds); | 116 | FD_SET(fd, fdt->open_fds); |
117 | FD_CLR(fd, fdt->close_on_exec); | 117 | FD_CLR(fd, fdt->close_on_exec); |
118 | spin_unlock(&files->file_lock); | 118 | spin_unlock(&files->file_lock); |
119 | ..... | 119 | ..... |
120 | 120 | ||
121 | Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), | 121 | Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), |
122 | the fdtable pointer (fdt) must be loaded after locate_fd(). | 122 | the fdtable pointer (fdt) must be loaded after locate_fd(). |
123 | 123 | ||
124 | 124 |
Documentation/filesystems/spufs.txt
1 | SPUFS(2) Linux Programmer's Manual SPUFS(2) | 1 | SPUFS(2) Linux Programmer's Manual SPUFS(2) |
2 | 2 | ||
3 | 3 | ||
4 | 4 | ||
5 | NAME | 5 | NAME |
6 | spufs - the SPU file system | 6 | spufs - the SPU file system |
7 | 7 | ||
8 | 8 | ||
9 | DESCRIPTION | 9 | DESCRIPTION |
10 | The SPU file system is used on PowerPC machines that implement the Cell | 10 | The SPU file system is used on PowerPC machines that implement the Cell |
11 | Broadband Engine Architecture in order to access Synergistic Processor | 11 | Broadband Engine Architecture in order to access Synergistic Processor |
12 | Units (SPUs). | 12 | Units (SPUs). |
13 | 13 | ||
14 | The file system provides a name space similar to posix shared memory or | 14 | The file system provides a name space similar to posix shared memory or |
15 | message queues. Users that have write permissions on the file system | 15 | message queues. Users that have write permissions on the file system |
16 | can use spu_create(2) to establish SPU contexts in the spufs root. | 16 | can use spu_create(2) to establish SPU contexts in the spufs root. |
17 | 17 | ||
18 | Every SPU context is represented by a directory containing a predefined | 18 | Every SPU context is represented by a directory containing a predefined |
19 | set of files. These files can be used for manipulating the state of the | 19 | set of files. These files can be used for manipulating the state of the |
20 | logical SPU. Users can change permissions on those files, but not actu- | 20 | logical SPU. Users can change permissions on those files, but not actu- |
21 | ally add or remove files. | 21 | ally add or remove files. |
22 | 22 | ||
23 | 23 | ||
24 | MOUNT OPTIONS | 24 | MOUNT OPTIONS |
25 | uid=<uid> | 25 | uid=<uid> |
26 | set the user owning the mount point, the default is 0 (root). | 26 | set the user owning the mount point, the default is 0 (root). |
27 | 27 | ||
28 | gid=<gid> | 28 | gid=<gid> |
29 | set the group owning the mount point, the default is 0 (root). | 29 | set the group owning the mount point, the default is 0 (root). |
30 | 30 | ||
31 | 31 | ||
32 | FILES | 32 | FILES |
33 | The files in spufs mostly follow the standard behavior for regular sys- | 33 | The files in spufs mostly follow the standard behavior for regular sys- |
34 | tem calls like read(2) or write(2), but often support only a subset of | 34 | tem calls like read(2) or write(2), but often support only a subset of |
35 | the operations supported on regular file systems. This list details the | 35 | the operations supported on regular file systems. This list details the |
36 | supported operations and the deviations from the behaviour in the | 36 | supported operations and the deviations from the behaviour in the |
37 | respective man pages. | 37 | respective man pages. |
38 | 38 | ||
39 | All files that support the read(2) operation also support readv(2) and | 39 | All files that support the read(2) operation also support readv(2) and |
40 | all files that support the write(2) operation also support writev(2). | 40 | all files that support the write(2) operation also support writev(2). |
41 | All files support the access(2) and stat(2) family of operations, but | 41 | All files support the access(2) and stat(2) family of operations, but |
42 | only the st_mode, st_nlink, st_uid and st_gid fields of struct stat | 42 | only the st_mode, st_nlink, st_uid and st_gid fields of struct stat |
43 | contain reliable information. | 43 | contain reliable information. |
44 | 44 | ||
45 | All files support the chmod(2)/fchmod(2) and chown(2)/fchown(2) opera- | 45 | All files support the chmod(2)/fchmod(2) and chown(2)/fchown(2) opera- |
46 | tions, but will not be able to grant permissions that contradict the | 46 | tions, but will not be able to grant permissions that contradict the |
47 | possible operations, e.g. read access on the wbox file. | 47 | possible operations, e.g. read access on the wbox file. |
48 | 48 | ||
49 | The current set of files is: | 49 | The current set of files is: |
50 | 50 | ||
51 | 51 | ||
52 | /mem | 52 | /mem |
53 | the contents of the local storage memory of the SPU. This can be | 53 | the contents of the local storage memory of the SPU. This can be |
54 | accessed like a regular shared memory file and contains both code and | 54 | accessed like a regular shared memory file and contains both code and |
55 | data in the address space of the SPU. The possible operations on an | 55 | data in the address space of the SPU. The possible operations on an |
56 | open mem file are: | 56 | open mem file are: |
57 | 57 | ||
58 | read(2), pread(2), write(2), pwrite(2), lseek(2) | 58 | read(2), pread(2), write(2), pwrite(2), lseek(2) |
59 | These operate as documented, with the exception that seek(2), | 59 | These operate as documented, with the exception that seek(2), |
60 | write(2) and pwrite(2) are not supported beyond the end of the | 60 | write(2) and pwrite(2) are not supported beyond the end of the |
61 | file. The file size is the size of the local storage of the SPU, | 61 | file. The file size is the size of the local storage of the SPU, |
62 | which normally is 256 kilobytes. | 62 | which normally is 256 kilobytes. |
63 | 63 | ||
64 | mmap(2) | 64 | mmap(2) |
65 | Mapping mem into the process address space gives access to the | 65 | Mapping mem into the process address space gives access to the |
66 | SPU local storage within the process address space. Only | 66 | SPU local storage within the process address space. Only |
67 | MAP_SHARED mappings are allowed. | 67 | MAP_SHARED mappings are allowed. |
68 | 68 | ||
69 | 69 | ||
70 | /mbox | 70 | /mbox |
71 | The first SPU to CPU communication mailbox. This file is read-only and | 71 | The first SPU to CPU communication mailbox. This file is read-only and |
72 | can be read in units of 32 bits. The file can only be used in non- | 72 | can be read in units of 32 bits. The file can only be used in non- |
73 | blocking mode and it even poll() will not block on it. The possible | 73 | blocking mode and it even poll() will not block on it. The possible |
74 | operations on an open mbox file are: | 74 | operations on an open mbox file are: |
75 | 75 | ||
76 | read(2) | 76 | read(2) |
77 | If a count smaller than four is requested, read returns -1 and | 77 | If a count smaller than four is requested, read returns -1 and |
78 | sets errno to EINVAL. If there is no data available in the mail | 78 | sets errno to EINVAL. If there is no data available in the mail |
79 | box, the return value is set to -1 and errno becomes EAGAIN. | 79 | box, the return value is set to -1 and errno becomes EAGAIN. |
80 | When data has been read successfully, four bytes are placed in | 80 | When data has been read successfully, four bytes are placed in |
81 | the data buffer and the value four is returned. | 81 | the data buffer and the value four is returned. |
82 | 82 | ||
83 | 83 | ||
84 | /ibox | 84 | /ibox |
85 | The second SPU to CPU communication mailbox. This file is similar to | 85 | The second SPU to CPU communication mailbox. This file is similar to |
86 | the first mailbox file, but can be read in blocking I/O mode, and the | 86 | the first mailbox file, but can be read in blocking I/O mode, and the |
87 | poll family of system calls can be used to wait for it. The possible | 87 | poll family of system calls can be used to wait for it. The possible |
88 | operations on an open ibox file are: | 88 | operations on an open ibox file are: |
89 | 89 | ||
90 | read(2) | 90 | read(2) |
91 | If a count smaller than four is requested, read returns -1 and | 91 | If a count smaller than four is requested, read returns -1 and |
92 | sets errno to EINVAL. If there is no data available in the mail | 92 | sets errno to EINVAL. If there is no data available in the mail |
93 | box and the file descriptor has been opened with O_NONBLOCK, the | 93 | box and the file descriptor has been opened with O_NONBLOCK, the |
94 | return value is set to -1 and errno becomes EAGAIN. | 94 | return value is set to -1 and errno becomes EAGAIN. |
95 | 95 | ||
96 | If there is no data available in the mail box and the file | 96 | If there is no data available in the mail box and the file |
97 | descriptor has been opened without O_NONBLOCK, the call will | 97 | descriptor has been opened without O_NONBLOCK, the call will |
98 | block until the SPU writes to its interrupt mailbox channel. | 98 | block until the SPU writes to its interrupt mailbox channel. |
99 | When data has been read successfully, four bytes are placed in | 99 | When data has been read successfully, four bytes are placed in |
100 | the data buffer and the value four is returned. | 100 | the data buffer and the value four is returned. |
101 | 101 | ||
102 | poll(2) | 102 | poll(2) |
103 | Poll on the ibox file returns (POLLIN | POLLRDNORM) whenever | 103 | Poll on the ibox file returns (POLLIN | POLLRDNORM) whenever |
104 | data is available for reading. | 104 | data is available for reading. |
105 | 105 | ||
106 | 106 | ||
107 | /wbox | 107 | /wbox |
108 | The CPU to SPU communation mailbox. It is write-only can can be written | 108 | The CPU to SPU communation mailbox. It is write-only and can be written |
109 | in units of 32 bits. If the mailbox is full, write() will block and | 109 | in units of 32 bits. If the mailbox is full, write() will block and |
110 | poll can be used to wait for it becoming empty again. The possible | 110 | poll can be used to wait for it becoming empty again. The possible |
111 | operations on an open wbox file are: write(2) If a count smaller than | 111 | operations on an open wbox file are: write(2) If a count smaller than |
112 | four is requested, write returns -1 and sets errno to EINVAL. If there | 112 | four is requested, write returns -1 and sets errno to EINVAL. If there |
113 | is no space available in the mail box and the file descriptor has been | 113 | is no space available in the mail box and the file descriptor has been |
114 | opened with O_NONBLOCK, the return value is set to -1 and errno becomes | 114 | opened with O_NONBLOCK, the return value is set to -1 and errno becomes |
115 | EAGAIN. | 115 | EAGAIN. |
116 | 116 | ||
117 | If there is no space available in the mail box and the file descriptor | 117 | If there is no space available in the mail box and the file descriptor |
118 | has been opened without O_NONBLOCK, the call will block until the SPU | 118 | has been opened without O_NONBLOCK, the call will block until the SPU |
119 | reads from its PPE mailbox channel. When data has been read success- | 119 | reads from its PPE mailbox channel. When data has been read success- |
120 | fully, four bytes are placed in the data buffer and the value four is | 120 | fully, four bytes are placed in the data buffer and the value four is |
121 | returned. | 121 | returned. |
122 | 122 | ||
123 | poll(2) | 123 | poll(2) |
124 | Poll on the ibox file returns (POLLOUT | POLLWRNORM) whenever | 124 | Poll on the ibox file returns (POLLOUT | POLLWRNORM) whenever |
125 | space is available for writing. | 125 | space is available for writing. |
126 | 126 | ||
127 | 127 | ||
128 | /mbox_stat | 128 | /mbox_stat |
129 | /ibox_stat | 129 | /ibox_stat |
130 | /wbox_stat | 130 | /wbox_stat |
131 | Read-only files that contain the length of the current queue, i.e. how | 131 | Read-only files that contain the length of the current queue, i.e. how |
132 | many words can be read from mbox or ibox or how many words can be | 132 | many words can be read from mbox or ibox or how many words can be |
133 | written to wbox without blocking. The files can be read only in 4-byte | 133 | written to wbox without blocking. The files can be read only in 4-byte |
134 | units and return a big-endian binary integer number. The possible | 134 | units and return a big-endian binary integer number. The possible |
135 | operations on an open *box_stat file are: | 135 | operations on an open *box_stat file are: |
136 | 136 | ||
137 | read(2) | 137 | read(2) |
138 | If a count smaller than four is requested, read returns -1 and | 138 | If a count smaller than four is requested, read returns -1 and |
139 | sets errno to EINVAL. Otherwise, a four byte value is placed in | 139 | sets errno to EINVAL. Otherwise, a four byte value is placed in |
140 | the data buffer, containing the number of elements that can be | 140 | the data buffer, containing the number of elements that can be |
141 | read from (for mbox_stat and ibox_stat) or written to (for | 141 | read from (for mbox_stat and ibox_stat) or written to (for |
142 | wbox_stat) the respective mail box without blocking or resulting | 142 | wbox_stat) the respective mail box without blocking or resulting |
143 | in EAGAIN. | 143 | in EAGAIN. |
144 | 144 | ||
145 | 145 | ||
146 | /npc | 146 | /npc |
147 | /decr | 147 | /decr |
148 | /decr_status | 148 | /decr_status |
149 | /spu_tag_mask | 149 | /spu_tag_mask |
150 | /event_mask | 150 | /event_mask |
151 | /srr0 | 151 | /srr0 |
152 | Internal registers of the SPU. The representation is an ASCII string | 152 | Internal registers of the SPU. The representation is an ASCII string |
153 | with the numeric value of the next instruction to be executed. These | 153 | with the numeric value of the next instruction to be executed. These |
154 | can be used in read/write mode for debugging, but normal operation of | 154 | can be used in read/write mode for debugging, but normal operation of |
155 | programs should not rely on them because access to any of them except | 155 | programs should not rely on them because access to any of them except |
156 | npc requires an SPU context save and is therefore very inefficient. | 156 | npc requires an SPU context save and is therefore very inefficient. |
157 | 157 | ||
158 | The contents of these files are: | 158 | The contents of these files are: |
159 | 159 | ||
160 | npc Next Program Counter | 160 | npc Next Program Counter |
161 | 161 | ||
162 | decr SPU Decrementer | 162 | decr SPU Decrementer |
163 | 163 | ||
164 | decr_status Decrementer Status | 164 | decr_status Decrementer Status |
165 | 165 | ||
166 | spu_tag_mask MFC tag mask for SPU DMA | 166 | spu_tag_mask MFC tag mask for SPU DMA |
167 | 167 | ||
168 | event_mask Event mask for SPU interrupts | 168 | event_mask Event mask for SPU interrupts |
169 | 169 | ||
170 | srr0 Interrupt Return address register | 170 | srr0 Interrupt Return address register |
171 | 171 | ||
172 | 172 | ||
173 | The possible operations on an open npc, decr, decr_status, | 173 | The possible operations on an open npc, decr, decr_status, |
174 | spu_tag_mask, event_mask or srr0 file are: | 174 | spu_tag_mask, event_mask or srr0 file are: |
175 | 175 | ||
176 | read(2) | 176 | read(2) |
177 | When the count supplied to the read call is shorter than the | 177 | When the count supplied to the read call is shorter than the |
178 | required length for the pointer value plus a newline character, | 178 | required length for the pointer value plus a newline character, |
179 | subsequent reads from the same file descriptor will result in | 179 | subsequent reads from the same file descriptor will result in |
180 | completing the string, regardless of changes to the register by | 180 | completing the string, regardless of changes to the register by |
181 | a running SPU task. When a complete string has been read, all | 181 | a running SPU task. When a complete string has been read, all |
182 | subsequent read operations will return zero bytes and a new file | 182 | subsequent read operations will return zero bytes and a new file |
183 | descriptor needs to be opened to read the value again. | 183 | descriptor needs to be opened to read the value again. |
184 | 184 | ||
185 | write(2) | 185 | write(2) |
186 | A write operation on the file results in setting the register to | 186 | A write operation on the file results in setting the register to |
187 | the value given in the string. The string is parsed from the | 187 | the value given in the string. The string is parsed from the |
188 | beginning to the first non-numeric character or the end of the | 188 | beginning to the first non-numeric character or the end of the |
189 | buffer. Subsequent writes to the same file descriptor overwrite | 189 | buffer. Subsequent writes to the same file descriptor overwrite |
190 | the previous setting. | 190 | the previous setting. |
191 | 191 | ||
192 | 192 | ||
193 | /fpcr | 193 | /fpcr |
194 | This file gives access to the Floating Point Status and Control Regis- | 194 | This file gives access to the Floating Point Status and Control Regis- |
195 | ter as a four byte long file. The operations on the fpcr file are: | 195 | ter as a four byte long file. The operations on the fpcr file are: |
196 | 196 | ||
197 | read(2) | 197 | read(2) |
198 | If a count smaller than four is requested, read returns -1 and | 198 | If a count smaller than four is requested, read returns -1 and |
199 | sets errno to EINVAL. Otherwise, a four byte value is placed in | 199 | sets errno to EINVAL. Otherwise, a four byte value is placed in |
200 | the data buffer, containing the current value of the fpcr regis- | 200 | the data buffer, containing the current value of the fpcr regis- |
201 | ter. | 201 | ter. |
202 | 202 | ||
203 | write(2) | 203 | write(2) |
204 | If a count smaller than four is requested, write returns -1 and | 204 | If a count smaller than four is requested, write returns -1 and |
205 | sets errno to EINVAL. Otherwise, a four byte value is copied | 205 | sets errno to EINVAL. Otherwise, a four byte value is copied |
206 | from the data buffer, updating the value of the fpcr register. | 206 | from the data buffer, updating the value of the fpcr register. |
207 | 207 | ||
208 | 208 | ||
209 | /signal1 | 209 | /signal1 |
210 | /signal2 | 210 | /signal2 |
211 | The two signal notification channels of an SPU. These are read-write | 211 | The two signal notification channels of an SPU. These are read-write |
212 | files that operate on a 32 bit word. Writing to one of these files | 212 | files that operate on a 32 bit word. Writing to one of these files |
213 | triggers an interrupt on the SPU. The value writting to the signal | 213 | triggers an interrupt on the SPU. The value writting to the signal |
214 | files can be read from the SPU through a channel read or from host user | 214 | files can be read from the SPU through a channel read or from host user |
215 | space through the file. After the value has been read by the SPU, it | 215 | space through the file. After the value has been read by the SPU, it |
216 | is reset to zero. The possible operations on an open signal1 or sig- | 216 | is reset to zero. The possible operations on an open signal1 or sig- |
217 | nal2 file are: | 217 | nal2 file are: |
218 | 218 | ||
219 | read(2) | 219 | read(2) |
220 | If a count smaller than four is requested, read returns -1 and | 220 | If a count smaller than four is requested, read returns -1 and |
221 | sets errno to EINVAL. Otherwise, a four byte value is placed in | 221 | sets errno to EINVAL. Otherwise, a four byte value is placed in |
222 | the data buffer, containing the current value of the specified | 222 | the data buffer, containing the current value of the specified |
223 | signal notification register. | 223 | signal notification register. |
224 | 224 | ||
225 | write(2) | 225 | write(2) |
226 | If a count smaller than four is requested, write returns -1 and | 226 | If a count smaller than four is requested, write returns -1 and |
227 | sets errno to EINVAL. Otherwise, a four byte value is copied | 227 | sets errno to EINVAL. Otherwise, a four byte value is copied |
228 | from the data buffer, updating the value of the specified signal | 228 | from the data buffer, updating the value of the specified signal |
229 | notification register. The signal notification register will | 229 | notification register. The signal notification register will |
230 | either be replaced with the input data or will be updated to the | 230 | either be replaced with the input data or will be updated to the |
231 | bitwise OR or the old value and the input data, depending on the | 231 | bitwise OR or the old value and the input data, depending on the |
232 | contents of the signal1_type, or signal2_type respectively, | 232 | contents of the signal1_type, or signal2_type respectively, |
233 | file. | 233 | file. |
234 | 234 | ||
235 | 235 | ||
236 | /signal1_type | 236 | /signal1_type |
237 | /signal2_type | 237 | /signal2_type |
238 | These two files change the behavior of the signal1 and signal2 notifi- | 238 | These two files change the behavior of the signal1 and signal2 notifi- |
239 | cation files. The contain a numerical ASCII string which is read as | 239 | cation files. The contain a numerical ASCII string which is read as |
240 | either "1" or "0". In mode 0 (overwrite), the hardware replaces the | 240 | either "1" or "0". In mode 0 (overwrite), the hardware replaces the |
241 | contents of the signal channel with the data that is written to it. in | 241 | contents of the signal channel with the data that is written to it. in |
242 | mode 1 (logical OR), the hardware accumulates the bits that are subse- | 242 | mode 1 (logical OR), the hardware accumulates the bits that are subse- |
243 | quently written to it. The possible operations on an open signal1_type | 243 | quently written to it. The possible operations on an open signal1_type |
244 | or signal2_type file are: | 244 | or signal2_type file are: |
245 | 245 | ||
246 | read(2) | 246 | read(2) |
247 | When the count supplied to the read call is shorter than the | 247 | When the count supplied to the read call is shorter than the |
248 | required length for the digit plus a newline character, subse- | 248 | required length for the digit plus a newline character, subse- |
249 | quent reads from the same file descriptor will result in com- | 249 | quent reads from the same file descriptor will result in com- |
250 | pleting the string. When a complete string has been read, all | 250 | pleting the string. When a complete string has been read, all |
251 | subsequent read operations will return zero bytes and a new file | 251 | subsequent read operations will return zero bytes and a new file |
252 | descriptor needs to be opened to read the value again. | 252 | descriptor needs to be opened to read the value again. |
253 | 253 | ||
254 | write(2) | 254 | write(2) |
255 | A write operation on the file results in setting the register to | 255 | A write operation on the file results in setting the register to |
256 | the value given in the string. The string is parsed from the | 256 | the value given in the string. The string is parsed from the |
257 | beginning to the first non-numeric character or the end of the | 257 | beginning to the first non-numeric character or the end of the |
258 | buffer. Subsequent writes to the same file descriptor overwrite | 258 | buffer. Subsequent writes to the same file descriptor overwrite |
259 | the previous setting. | 259 | the previous setting. |
260 | 260 | ||
261 | 261 | ||
262 | EXAMPLES | 262 | EXAMPLES |
263 | /etc/fstab entry | 263 | /etc/fstab entry |
264 | none /spu spufs gid=spu 0 0 | 264 | none /spu spufs gid=spu 0 0 |
265 | 265 | ||
266 | 266 | ||
267 | AUTHORS | 267 | AUTHORS |
268 | Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>, | 268 | Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>, |
269 | Ulrich Weigand <Ulrich.Weigand@de.ibm.com> | 269 | Ulrich Weigand <Ulrich.Weigand@de.ibm.com> |
270 | 270 | ||
271 | SEE ALSO | 271 | SEE ALSO |
272 | capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7) | 272 | capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7) |
273 | 273 | ||
274 | 274 | ||
275 | 275 | ||
276 | Linux 2005-09-28 SPUFS(2) | 276 | Linux 2005-09-28 SPUFS(2) |
277 | 277 | ||
278 | ------------------------------------------------------------------------------ | 278 | ------------------------------------------------------------------------------ |
279 | 279 | ||
280 | SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2) | 280 | SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2) |
281 | 281 | ||
282 | 282 | ||
283 | 283 | ||
284 | NAME | 284 | NAME |
285 | spu_run - execute an spu context | 285 | spu_run - execute an spu context |
286 | 286 | ||
287 | 287 | ||
288 | SYNOPSIS | 288 | SYNOPSIS |
289 | #include <sys/spu.h> | 289 | #include <sys/spu.h> |
290 | 290 | ||
291 | int spu_run(int fd, unsigned int *npc, unsigned int *event); | 291 | int spu_run(int fd, unsigned int *npc, unsigned int *event); |
292 | 292 | ||
293 | DESCRIPTION | 293 | DESCRIPTION |
294 | The spu_run system call is used on PowerPC machines that implement the | 294 | The spu_run system call is used on PowerPC machines that implement the |
295 | Cell Broadband Engine Architecture in order to access Synergistic Pro- | 295 | Cell Broadband Engine Architecture in order to access Synergistic Pro- |
296 | cessor Units (SPUs). It uses the fd that was returned from spu_cre- | 296 | cessor Units (SPUs). It uses the fd that was returned from spu_cre- |
297 | ate(2) to address a specific SPU context. When the context gets sched- | 297 | ate(2) to address a specific SPU context. When the context gets sched- |
298 | uled to a physical SPU, it starts execution at the instruction pointer | 298 | uled to a physical SPU, it starts execution at the instruction pointer |
299 | passed in npc. | 299 | passed in npc. |
300 | 300 | ||
301 | Execution of SPU code happens synchronously, meaning that spu_run does | 301 | Execution of SPU code happens synchronously, meaning that spu_run does |
302 | not return while the SPU is still running. If there is a need to exe- | 302 | not return while the SPU is still running. If there is a need to exe- |
303 | cute SPU code in parallel with other code on either the main CPU or | 303 | cute SPU code in parallel with other code on either the main CPU or |
304 | other SPUs, you need to create a new thread of execution first, e.g. | 304 | other SPUs, you need to create a new thread of execution first, e.g. |
305 | using the pthread_create(3) call. | 305 | using the pthread_create(3) call. |
306 | 306 | ||
307 | When spu_run returns, the current value of the SPU instruction pointer | 307 | When spu_run returns, the current value of the SPU instruction pointer |
308 | is written back to npc, so you can call spu_run again without updating | 308 | is written back to npc, so you can call spu_run again without updating |
309 | the pointers. | 309 | the pointers. |
310 | 310 | ||
311 | event can be a NULL pointer or point to an extended status code that | 311 | event can be a NULL pointer or point to an extended status code that |
312 | gets filled when spu_run returns. It can be one of the following con- | 312 | gets filled when spu_run returns. It can be one of the following con- |
313 | stants: | 313 | stants: |
314 | 314 | ||
315 | SPE_EVENT_DMA_ALIGNMENT | 315 | SPE_EVENT_DMA_ALIGNMENT |
316 | A DMA alignment error | 316 | A DMA alignment error |
317 | 317 | ||
318 | SPE_EVENT_SPE_DATA_SEGMENT | 318 | SPE_EVENT_SPE_DATA_SEGMENT |
319 | A DMA segmentation error | 319 | A DMA segmentation error |
320 | 320 | ||
321 | SPE_EVENT_SPE_DATA_STORAGE | 321 | SPE_EVENT_SPE_DATA_STORAGE |
322 | A DMA storage error | 322 | A DMA storage error |
323 | 323 | ||
324 | If NULL is passed as the event argument, these errors will result in a | 324 | If NULL is passed as the event argument, these errors will result in a |
325 | signal delivered to the calling process. | 325 | signal delivered to the calling process. |
326 | 326 | ||
327 | RETURN VALUE | 327 | RETURN VALUE |
328 | spu_run returns the value of the spu_status register or -1 to indicate | 328 | spu_run returns the value of the spu_status register or -1 to indicate |
329 | an error and set errno to one of the error codes listed below. The | 329 | an error and set errno to one of the error codes listed below. The |
330 | spu_status register value contains a bit mask of status codes and | 330 | spu_status register value contains a bit mask of status codes and |
331 | optionally a 14 bit code returned from the stop-and-signal instruction | 331 | optionally a 14 bit code returned from the stop-and-signal instruction |
332 | on the SPU. The bit masks for the status codes are: | 332 | on the SPU. The bit masks for the status codes are: |
333 | 333 | ||
334 | 0x02 SPU was stopped by stop-and-signal. | 334 | 0x02 SPU was stopped by stop-and-signal. |
335 | 335 | ||
336 | 0x04 SPU was stopped by halt. | 336 | 0x04 SPU was stopped by halt. |
337 | 337 | ||
338 | 0x08 SPU is waiting for a channel. | 338 | 0x08 SPU is waiting for a channel. |
339 | 339 | ||
340 | 0x10 SPU is in single-step mode. | 340 | 0x10 SPU is in single-step mode. |
341 | 341 | ||
342 | 0x20 SPU has tried to execute an invalid instruction. | 342 | 0x20 SPU has tried to execute an invalid instruction. |
343 | 343 | ||
344 | 0x40 SPU has tried to access an invalid channel. | 344 | 0x40 SPU has tried to access an invalid channel. |
345 | 345 | ||
346 | 0x3fff0000 | 346 | 0x3fff0000 |
347 | The bits masked with this value contain the code returned from | 347 | The bits masked with this value contain the code returned from |
348 | stop-and-signal. | 348 | stop-and-signal. |
349 | 349 | ||
350 | There are always one or more of the lower eight bits set or an error | 350 | There are always one or more of the lower eight bits set or an error |
351 | code is returned from spu_run. | 351 | code is returned from spu_run. |
352 | 352 | ||
353 | ERRORS | 353 | ERRORS |
354 | EAGAIN or EWOULDBLOCK | 354 | EAGAIN or EWOULDBLOCK |
355 | fd is in non-blocking mode and spu_run would block. | 355 | fd is in non-blocking mode and spu_run would block. |
356 | 356 | ||
357 | EBADF fd is not a valid file descriptor. | 357 | EBADF fd is not a valid file descriptor. |
358 | 358 | ||
359 | EFAULT npc is not a valid pointer or status is neither NULL nor a valid | 359 | EFAULT npc is not a valid pointer or status is neither NULL nor a valid |
360 | pointer. | 360 | pointer. |
361 | 361 | ||
362 | EINTR A signal occurred while spu_run was in progress. The npc value | 362 | EINTR A signal occurred while spu_run was in progress. The npc value |
363 | has been updated to the new program counter value if necessary. | 363 | has been updated to the new program counter value if necessary. |
364 | 364 | ||
365 | EINVAL fd is not a file descriptor returned from spu_create(2). | 365 | EINVAL fd is not a file descriptor returned from spu_create(2). |
366 | 366 | ||
367 | ENOMEM Insufficient memory was available to handle a page fault result- | 367 | ENOMEM Insufficient memory was available to handle a page fault result- |
368 | ing from an MFC direct memory access. | 368 | ing from an MFC direct memory access. |
369 | 369 | ||
370 | ENOSYS the functionality is not provided by the current system, because | 370 | ENOSYS the functionality is not provided by the current system, because |
371 | either the hardware does not provide SPUs or the spufs module is | 371 | either the hardware does not provide SPUs or the spufs module is |
372 | not loaded. | 372 | not loaded. |
373 | 373 | ||
374 | 374 | ||
375 | NOTES | 375 | NOTES |
376 | spu_run is meant to be used from libraries that implement a more | 376 | spu_run is meant to be used from libraries that implement a more |
377 | abstract interface to SPUs, not to be used from regular applications. | 377 | abstract interface to SPUs, not to be used from regular applications. |
378 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | 378 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- |
379 | ommended libraries. | 379 | ommended libraries. |
380 | 380 | ||
381 | 381 | ||
382 | CONFORMING TO | 382 | CONFORMING TO |
383 | This call is Linux specific and only implemented by the ppc64 architec- | 383 | This call is Linux specific and only implemented by the ppc64 architec- |
384 | ture. Programs using this system call are not portable. | 384 | ture. Programs using this system call are not portable. |
385 | 385 | ||
386 | 386 | ||
387 | BUGS | 387 | BUGS |
388 | The code does not yet fully implement all features lined out here. | 388 | The code does not yet fully implement all features lined out here. |
389 | 389 | ||
390 | 390 | ||
391 | AUTHOR | 391 | AUTHOR |
392 | Arnd Bergmann <arndb@de.ibm.com> | 392 | Arnd Bergmann <arndb@de.ibm.com> |
393 | 393 | ||
394 | SEE ALSO | 394 | SEE ALSO |
395 | capabilities(7), close(2), spu_create(2), spufs(7) | 395 | capabilities(7), close(2), spu_create(2), spufs(7) |
396 | 396 | ||
397 | 397 | ||
398 | 398 | ||
399 | Linux 2005-09-28 SPU_RUN(2) | 399 | Linux 2005-09-28 SPU_RUN(2) |
400 | 400 | ||
401 | ------------------------------------------------------------------------------ | 401 | ------------------------------------------------------------------------------ |
402 | 402 | ||
403 | SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2) | 403 | SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2) |
404 | 404 | ||
405 | 405 | ||
406 | 406 | ||
407 | NAME | 407 | NAME |
408 | spu_create - create a new spu context | 408 | spu_create - create a new spu context |
409 | 409 | ||
410 | 410 | ||
411 | SYNOPSIS | 411 | SYNOPSIS |
412 | #include <sys/types.h> | 412 | #include <sys/types.h> |
413 | #include <sys/spu.h> | 413 | #include <sys/spu.h> |
414 | 414 | ||
415 | int spu_create(const char *pathname, int flags, mode_t mode); | 415 | int spu_create(const char *pathname, int flags, mode_t mode); |
416 | 416 | ||
417 | DESCRIPTION | 417 | DESCRIPTION |
418 | The spu_create system call is used on PowerPC machines that implement | 418 | The spu_create system call is used on PowerPC machines that implement |
419 | the Cell Broadband Engine Architecture in order to access Synergistic | 419 | the Cell Broadband Engine Architecture in order to access Synergistic |
420 | Processor Units (SPUs). It creates a new logical context for an SPU in | 420 | Processor Units (SPUs). It creates a new logical context for an SPU in |
421 | pathname and returns a handle to associated with it. pathname must | 421 | pathname and returns a handle to associated with it. pathname must |
422 | point to a non-existing directory in the mount point of the SPU file | 422 | point to a non-existing directory in the mount point of the SPU file |
423 | system (spufs). When spu_create is successful, a directory gets cre- | 423 | system (spufs). When spu_create is successful, a directory gets cre- |
424 | ated on pathname and it is populated with files. | 424 | ated on pathname and it is populated with files. |
425 | 425 | ||
426 | The returned file handle can only be passed to spu_run(2) or closed, | 426 | The returned file handle can only be passed to spu_run(2) or closed, |
427 | other operations are not defined on it. When it is closed, all associ- | 427 | other operations are not defined on it. When it is closed, all associ- |
428 | ated directory entries in spufs are removed. When the last file handle | 428 | ated directory entries in spufs are removed. When the last file handle |
429 | pointing either inside of the context directory or to this file | 429 | pointing either inside of the context directory or to this file |
430 | descriptor is closed, the logical SPU context is destroyed. | 430 | descriptor is closed, the logical SPU context is destroyed. |
431 | 431 | ||
432 | The parameter flags can be zero or any bitwise or'd combination of the | 432 | The parameter flags can be zero or any bitwise or'd combination of the |
433 | following constants: | 433 | following constants: |
434 | 434 | ||
435 | SPU_RAWIO | 435 | SPU_RAWIO |
436 | Allow mapping of some of the hardware registers of the SPU into | 436 | Allow mapping of some of the hardware registers of the SPU into |
437 | user space. This flag requires the CAP_SYS_RAWIO capability, see | 437 | user space. This flag requires the CAP_SYS_RAWIO capability, see |
438 | capabilities(7). | 438 | capabilities(7). |
439 | 439 | ||
440 | The mode parameter specifies the permissions used for creating the new | 440 | The mode parameter specifies the permissions used for creating the new |
441 | directory in spufs. mode is modified with the user's umask(2) value | 441 | directory in spufs. mode is modified with the user's umask(2) value |
442 | and then used for both the directory and the files contained in it. The | 442 | and then used for both the directory and the files contained in it. The |
443 | file permissions mask out some more bits of mode because they typically | 443 | file permissions mask out some more bits of mode because they typically |
444 | support only read or write access. See stat(2) for a full list of the | 444 | support only read or write access. See stat(2) for a full list of the |
445 | possible mode values. | 445 | possible mode values. |
446 | 446 | ||
447 | 447 | ||
448 | RETURN VALUE | 448 | RETURN VALUE |
449 | spu_create returns a new file descriptor. It may return -1 to indicate | 449 | spu_create returns a new file descriptor. It may return -1 to indicate |
450 | an error condition and set errno to one of the error codes listed | 450 | an error condition and set errno to one of the error codes listed |
451 | below. | 451 | below. |
452 | 452 | ||
453 | 453 | ||
454 | ERRORS | 454 | ERRORS |
455 | EACCESS | 455 | EACCESS |
456 | The current user does not have write access on the spufs mount | 456 | The current user does not have write access on the spufs mount |
457 | point. | 457 | point. |
458 | 458 | ||
459 | EEXIST An SPU context already exists at the given path name. | 459 | EEXIST An SPU context already exists at the given path name. |
460 | 460 | ||
461 | EFAULT pathname is not a valid string pointer in the current address | 461 | EFAULT pathname is not a valid string pointer in the current address |
462 | space. | 462 | space. |
463 | 463 | ||
464 | EINVAL pathname is not a directory in the spufs mount point. | 464 | EINVAL pathname is not a directory in the spufs mount point. |
465 | 465 | ||
466 | ELOOP Too many symlinks were found while resolving pathname. | 466 | ELOOP Too many symlinks were found while resolving pathname. |
467 | 467 | ||
468 | EMFILE The process has reached its maximum open file limit. | 468 | EMFILE The process has reached its maximum open file limit. |
469 | 469 | ||
470 | ENAMETOOLONG | 470 | ENAMETOOLONG |
471 | pathname was too long. | 471 | pathname was too long. |
472 | 472 | ||
473 | ENFILE The system has reached the global open file limit. | 473 | ENFILE The system has reached the global open file limit. |
474 | 474 | ||
475 | ENOENT Part of pathname could not be resolved. | 475 | ENOENT Part of pathname could not be resolved. |
476 | 476 | ||
477 | ENOMEM The kernel could not allocate all resources required. | 477 | ENOMEM The kernel could not allocate all resources required. |
478 | 478 | ||
479 | ENOSPC There are not enough SPU resources available to create a new | 479 | ENOSPC There are not enough SPU resources available to create a new |
480 | context or the user specific limit for the number of SPU con- | 480 | context or the user specific limit for the number of SPU con- |
481 | texts has been reached. | 481 | texts has been reached. |
482 | 482 | ||
483 | ENOSYS the functionality is not provided by the current system, because | 483 | ENOSYS the functionality is not provided by the current system, because |
484 | either the hardware does not provide SPUs or the spufs module is | 484 | either the hardware does not provide SPUs or the spufs module is |
485 | not loaded. | 485 | not loaded. |
486 | 486 | ||
487 | ENOTDIR | 487 | ENOTDIR |
488 | A part of pathname is not a directory. | 488 | A part of pathname is not a directory. |
489 | 489 | ||
490 | 490 | ||
491 | 491 | ||
492 | NOTES | 492 | NOTES |
493 | spu_create is meant to be used from libraries that implement a more | 493 | spu_create is meant to be used from libraries that implement a more |
494 | abstract interface to SPUs, not to be used from regular applications. | 494 | abstract interface to SPUs, not to be used from regular applications. |
495 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | 495 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- |
496 | ommended libraries. | 496 | ommended libraries. |
497 | 497 | ||
498 | 498 | ||
499 | FILES | 499 | FILES |
500 | pathname must point to a location beneath the mount point of spufs. By | 500 | pathname must point to a location beneath the mount point of spufs. By |
501 | convention, it gets mounted in /spu. | 501 | convention, it gets mounted in /spu. |
502 | 502 | ||
503 | 503 | ||
504 | CONFORMING TO | 504 | CONFORMING TO |
505 | This call is Linux specific and only implemented by the ppc64 architec- | 505 | This call is Linux specific and only implemented by the ppc64 architec- |
506 | ture. Programs using this system call are not portable. | 506 | ture. Programs using this system call are not portable. |
507 | 507 | ||
508 | 508 | ||
509 | BUGS | 509 | BUGS |
510 | The code does not yet fully implement all features lined out here. | 510 | The code does not yet fully implement all features lined out here. |
511 | 511 | ||
512 | 512 | ||
513 | AUTHOR | 513 | AUTHOR |
514 | Arnd Bergmann <arndb@de.ibm.com> | 514 | Arnd Bergmann <arndb@de.ibm.com> |
515 | 515 | ||
516 | SEE ALSO | 516 | SEE ALSO |
517 | capabilities(7), close(2), spu_run(2), spufs(7) | 517 | capabilities(7), close(2), spu_run(2), spufs(7) |
518 | 518 | ||
519 | 519 | ||
520 | 520 | ||
521 | Linux 2005-09-28 SPU_CREATE(2) | 521 | Linux 2005-09-28 SPU_CREATE(2) |
522 | 522 |
Documentation/filesystems/tmpfs.txt
1 | Tmpfs is a file system which keeps all files in virtual memory. | 1 | Tmpfs is a file system which keeps all files in virtual memory. |
2 | 2 | ||
3 | 3 | ||
4 | Everything in tmpfs is temporary in the sense that no files will be | 4 | Everything in tmpfs is temporary in the sense that no files will be |
5 | created on your hard drive. If you unmount a tmpfs instance, | 5 | created on your hard drive. If you unmount a tmpfs instance, |
6 | everything stored therein is lost. | 6 | everything stored therein is lost. |
7 | 7 | ||
8 | tmpfs puts everything into the kernel internal caches and grows and | 8 | tmpfs puts everything into the kernel internal caches and grows and |
9 | shrinks to accommodate the files it contains and is able to swap | 9 | shrinks to accommodate the files it contains and is able to swap |
10 | unneeded pages out to swap space. It has maximum size limits which can | 10 | unneeded pages out to swap space. It has maximum size limits which can |
11 | be adjusted on the fly via 'mount -o remount ...' | 11 | be adjusted on the fly via 'mount -o remount ...' |
12 | 12 | ||
13 | If you compare it to ramfs (which was the template to create tmpfs) | 13 | If you compare it to ramfs (which was the template to create tmpfs) |
14 | you gain swapping and limit checking. Another similar thing is the RAM | 14 | you gain swapping and limit checking. Another similar thing is the RAM |
15 | disk (/dev/ram*), which simulates a fixed size hard disk in physical | 15 | disk (/dev/ram*), which simulates a fixed size hard disk in physical |
16 | RAM, where you have to create an ordinary filesystem on top. Ramdisks | 16 | RAM, where you have to create an ordinary filesystem on top. Ramdisks |
17 | cannot swap and you do not have the possibility to resize them. | 17 | cannot swap and you do not have the possibility to resize them. |
18 | 18 | ||
19 | Since tmpfs lives completely in the page cache and on swap, all tmpfs | 19 | Since tmpfs lives completely in the page cache and on swap, all tmpfs |
20 | pages currently in memory will show up as cached. It will not show up | 20 | pages currently in memory will show up as cached. It will not show up |
21 | as shared or something like that. Further on you can check the actual | 21 | as shared or something like that. Further on you can check the actual |
22 | RAM+swap use of a tmpfs instance with df(1) and du(1). | 22 | RAM+swap use of a tmpfs instance with df(1) and du(1). |
23 | 23 | ||
24 | 24 | ||
25 | tmpfs has the following uses: | 25 | tmpfs has the following uses: |
26 | 26 | ||
27 | 1) There is always a kernel internal mount which you will not see at | 27 | 1) There is always a kernel internal mount which you will not see at |
28 | all. This is used for shared anonymous mappings and SYSV shared | 28 | all. This is used for shared anonymous mappings and SYSV shared |
29 | memory. | 29 | memory. |
30 | 30 | ||
31 | This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not | 31 | This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not |
32 | set, the user visible part of tmpfs is not build. But the internal | 32 | set, the user visible part of tmpfs is not build. But the internal |
33 | mechanisms are always present. | 33 | mechanisms are always present. |
34 | 34 | ||
35 | 2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for | 35 | 2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for |
36 | POSIX shared memory (shm_open, shm_unlink). Adding the following | 36 | POSIX shared memory (shm_open, shm_unlink). Adding the following |
37 | line to /etc/fstab should take care of this: | 37 | line to /etc/fstab should take care of this: |
38 | 38 | ||
39 | tmpfs /dev/shm tmpfs defaults 0 0 | 39 | tmpfs /dev/shm tmpfs defaults 0 0 |
40 | 40 | ||
41 | Remember to create the directory that you intend to mount tmpfs on | 41 | Remember to create the directory that you intend to mount tmpfs on |
42 | if necessary. | 42 | if necessary. |
43 | 43 | ||
44 | This mount is _not_ needed for SYSV shared memory. The internal | 44 | This mount is _not_ needed for SYSV shared memory. The internal |
45 | mount is used for that. (In the 2.3 kernel versions it was | 45 | mount is used for that. (In the 2.3 kernel versions it was |
46 | necessary to mount the predecessor of tmpfs (shm fs) to use SYSV | 46 | necessary to mount the predecessor of tmpfs (shm fs) to use SYSV |
47 | shared memory) | 47 | shared memory) |
48 | 48 | ||
49 | 3) Some people (including me) find it very convenient to mount it | 49 | 3) Some people (including me) find it very convenient to mount it |
50 | e.g. on /tmp and /var/tmp and have a big swap partition. And now | 50 | e.g. on /tmp and /var/tmp and have a big swap partition. And now |
51 | loop mounts of tmpfs files do work, so mkinitrd shipped by most | 51 | loop mounts of tmpfs files do work, so mkinitrd shipped by most |
52 | distributions should succeed with a tmpfs /tmp. | 52 | distributions should succeed with a tmpfs /tmp. |
53 | 53 | ||
54 | 4) And probably a lot more I do not know about :-) | 54 | 4) And probably a lot more I do not know about :-) |
55 | 55 | ||
56 | 56 | ||
57 | tmpfs has three mount options for sizing: | 57 | tmpfs has three mount options for sizing: |
58 | 58 | ||
59 | size: The limit of allocated bytes for this tmpfs instance. The | 59 | size: The limit of allocated bytes for this tmpfs instance. The |
60 | default is half of your physical RAM without swap. If you | 60 | default is half of your physical RAM without swap. If you |
61 | oversize your tmpfs instances the machine will deadlock | 61 | oversize your tmpfs instances the machine will deadlock |
62 | since the OOM handler will not be able to free that memory. | 62 | since the OOM handler will not be able to free that memory. |
63 | nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE. | 63 | nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE. |
64 | nr_inodes: The maximum number of inodes for this instance. The default | 64 | nr_inodes: The maximum number of inodes for this instance. The default |
65 | is half of the number of your physical RAM pages, or (on a | 65 | is half of the number of your physical RAM pages, or (on a |
66 | a machine with highmem) the number of lowmem RAM pages, | 66 | machine with highmem) the number of lowmem RAM pages, |
67 | whichever is the lower. | 67 | whichever is the lower. |
68 | 68 | ||
69 | These parameters accept a suffix k, m or g for kilo, mega and giga and | 69 | These parameters accept a suffix k, m or g for kilo, mega and giga and |
70 | can be changed on remount. The size parameter also accepts a suffix % | 70 | can be changed on remount. The size parameter also accepts a suffix % |
71 | to limit this tmpfs instance to that percentage of your physical RAM: | 71 | to limit this tmpfs instance to that percentage of your physical RAM: |
72 | the default, when neither size nor nr_blocks is specified, is size=50% | 72 | the default, when neither size nor nr_blocks is specified, is size=50% |
73 | 73 | ||
74 | If nr_blocks=0 (or size=0), blocks will not be limited in that instance; | 74 | If nr_blocks=0 (or size=0), blocks will not be limited in that instance; |
75 | if nr_inodes=0, inodes will not be limited. It is generally unwise to | 75 | if nr_inodes=0, inodes will not be limited. It is generally unwise to |
76 | mount with such options, since it allows any user with write access to | 76 | mount with such options, since it allows any user with write access to |
77 | use up all the memory on the machine; but enhances the scalability of | 77 | use up all the memory on the machine; but enhances the scalability of |
78 | that instance in a system with many cpus making intensive use of it. | 78 | that instance in a system with many cpus making intensive use of it. |
79 | 79 | ||
80 | 80 | ||
81 | tmpfs has a mount option to set the NUMA memory allocation policy for | 81 | tmpfs has a mount option to set the NUMA memory allocation policy for |
82 | all files in that instance (if CONFIG_NUMA is enabled) - which can be | 82 | all files in that instance (if CONFIG_NUMA is enabled) - which can be |
83 | adjusted on the fly via 'mount -o remount ...' | 83 | adjusted on the fly via 'mount -o remount ...' |
84 | 84 | ||
85 | mpol=default prefers to allocate memory from the local node | 85 | mpol=default prefers to allocate memory from the local node |
86 | mpol=prefer:Node prefers to allocate memory from the given Node | 86 | mpol=prefer:Node prefers to allocate memory from the given Node |
87 | mpol=bind:NodeList allocates memory only from nodes in NodeList | 87 | mpol=bind:NodeList allocates memory only from nodes in NodeList |
88 | mpol=interleave prefers to allocate from each node in turn | 88 | mpol=interleave prefers to allocate from each node in turn |
89 | mpol=interleave:NodeList allocates from each node of NodeList in turn | 89 | mpol=interleave:NodeList allocates from each node of NodeList in turn |
90 | 90 | ||
91 | NodeList format is a comma-separated list of decimal numbers and ranges, | 91 | NodeList format is a comma-separated list of decimal numbers and ranges, |
92 | a range being two hyphen-separated decimal numbers, the smallest and | 92 | a range being two hyphen-separated decimal numbers, the smallest and |
93 | largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 | 93 | largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 |
94 | 94 | ||
95 | Note that trying to mount a tmpfs with an mpol option will fail if the | 95 | Note that trying to mount a tmpfs with an mpol option will fail if the |
96 | running kernel does not support NUMA; and will fail if its nodelist | 96 | running kernel does not support NUMA; and will fail if its nodelist |
97 | specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs | 97 | specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs |
98 | being mounted, but from time to time runs a kernel built without NUMA | 98 | being mounted, but from time to time runs a kernel built without NUMA |
99 | capability (perhaps a safe recovery kernel), or configured to support | 99 | capability (perhaps a safe recovery kernel), or configured to support |
100 | fewer nodes, then it is advisable to omit the mpol option from automatic | 100 | fewer nodes, then it is advisable to omit the mpol option from automatic |
101 | mount options. It can be added later, when the tmpfs is already mounted | 101 | mount options. It can be added later, when the tmpfs is already mounted |
102 | on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. | 102 | on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. |
103 | 103 | ||
104 | 104 | ||
105 | To specify the initial root directory you can use the following mount | 105 | To specify the initial root directory you can use the following mount |
106 | options: | 106 | options: |
107 | 107 | ||
108 | mode: The permissions as an octal number | 108 | mode: The permissions as an octal number |
109 | uid: The user id | 109 | uid: The user id |
110 | gid: The group id | 110 | gid: The group id |
111 | 111 | ||
112 | These options do not have any effect on remount. You can change these | 112 | These options do not have any effect on remount. You can change these |
113 | parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem. | 113 | parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem. |
114 | 114 | ||
115 | 115 | ||
116 | So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' | 116 | So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' |
117 | will give you tmpfs instance on /mytmpfs which can allocate 10GB | 117 | will give you tmpfs instance on /mytmpfs which can allocate 10GB |
118 | RAM/SWAP in 10240 inodes and it is only accessible by root. | 118 | RAM/SWAP in 10240 inodes and it is only accessible by root. |
119 | 119 | ||
120 | 120 | ||
121 | Author: | 121 | Author: |
122 | Christoph Rohland <cr@sap.com>, 1.12.01 | 122 | Christoph Rohland <cr@sap.com>, 1.12.01 |
123 | Updated: | 123 | Updated: |
124 | Hugh Dickins <hugh@veritas.com>, 19 February 2006 | 124 | Hugh Dickins <hugh@veritas.com>, 19 February 2006 |
125 | 125 |
Documentation/filesystems/vfat.txt
1 | USING VFAT | 1 | USING VFAT |
2 | ---------------------------------------------------------------------- | 2 | ---------------------------------------------------------------------- |
3 | To use the vfat filesystem, use the filesystem type 'vfat'. i.e. | 3 | To use the vfat filesystem, use the filesystem type 'vfat'. i.e. |
4 | mount -t vfat /dev/fd0 /mnt | 4 | mount -t vfat /dev/fd0 /mnt |
5 | 5 | ||
6 | No special partition formatter is required. mkdosfs will work fine | 6 | No special partition formatter is required. mkdosfs will work fine |
7 | if you want to format from within Linux. | 7 | if you want to format from within Linux. |
8 | 8 | ||
9 | VFAT MOUNT OPTIONS | 9 | VFAT MOUNT OPTIONS |
10 | ---------------------------------------------------------------------- | 10 | ---------------------------------------------------------------------- |
11 | umask=### -- The permission mask (for files and directories, see umask(1)). | 11 | umask=### -- The permission mask (for files and directories, see umask(1)). |
12 | The default is the umask of current process. | 12 | The default is the umask of current process. |
13 | 13 | ||
14 | dmask=### -- The permission mask for the directory. | 14 | dmask=### -- The permission mask for the directory. |
15 | The default is the umask of current process. | 15 | The default is the umask of current process. |
16 | 16 | ||
17 | fmask=### -- The permission mask for files. | 17 | fmask=### -- The permission mask for files. |
18 | The default is the umask of current process. | 18 | The default is the umask of current process. |
19 | 19 | ||
20 | codepage=### -- Sets the codepage number for converting to shortname | 20 | codepage=### -- Sets the codepage number for converting to shortname |
21 | characters on FAT filesystem. | 21 | characters on FAT filesystem. |
22 | By default, FAT_DEFAULT_CODEPAGE setting is used. | 22 | By default, FAT_DEFAULT_CODEPAGE setting is used. |
23 | 23 | ||
24 | iocharset=name -- Character set to use for converting between the | 24 | iocharset=name -- Character set to use for converting between the |
25 | encoding is used for user visible filename and 16 bit | 25 | encoding is used for user visible filename and 16 bit |
26 | Unicode characters. Long filenames are stored on disk | 26 | Unicode characters. Long filenames are stored on disk |
27 | in Unicode format, but Unix for the most part doesn't | 27 | in Unicode format, but Unix for the most part doesn't |
28 | know how to deal with Unicode. | 28 | know how to deal with Unicode. |
29 | By default, FAT_DEFAULT_IOCHARSET setting is used. | 29 | By default, FAT_DEFAULT_IOCHARSET setting is used. |
30 | 30 | ||
31 | There is also an option of doing UTF-8 translations | 31 | There is also an option of doing UTF-8 translations |
32 | with the utf8 option. | 32 | with the utf8 option. |
33 | 33 | ||
34 | NOTE: "iocharset=utf8" is not recommended. If unsure, | 34 | NOTE: "iocharset=utf8" is not recommended. If unsure, |
35 | you should consider the following option instead. | 35 | you should consider the following option instead. |
36 | 36 | ||
37 | utf8=<bool> -- UTF-8 is the filesystem safe version of Unicode that | 37 | utf8=<bool> -- UTF-8 is the filesystem safe version of Unicode that |
38 | is used by the console. It can be be enabled for the | 38 | is used by the console. It can be enabled for the |
39 | filesystem with this option. If 'uni_xlate' gets set, | 39 | filesystem with this option. If 'uni_xlate' gets set, |
40 | UTF-8 gets disabled. | 40 | UTF-8 gets disabled. |
41 | 41 | ||
42 | uni_xlate=<bool> -- Translate unhandled Unicode characters to special | 42 | uni_xlate=<bool> -- Translate unhandled Unicode characters to special |
43 | escaped sequences. This would let you backup and | 43 | escaped sequences. This would let you backup and |
44 | restore filenames that are created with any Unicode | 44 | restore filenames that are created with any Unicode |
45 | characters. Until Linux supports Unicode for real, | 45 | characters. Until Linux supports Unicode for real, |
46 | this gives you an alternative. Without this option, | 46 | this gives you an alternative. Without this option, |
47 | a '?' is used when no translation is possible. The | 47 | a '?' is used when no translation is possible. The |
48 | escape character is ':' because it is otherwise | 48 | escape character is ':' because it is otherwise |
49 | illegal on the vfat filesystem. The escape sequence | 49 | illegal on the vfat filesystem. The escape sequence |
50 | that gets used is ':' and the four digits of hexadecimal | 50 | that gets used is ':' and the four digits of hexadecimal |
51 | unicode. | 51 | unicode. |
52 | 52 | ||
53 | nonumtail=<bool> -- When creating 8.3 aliases, normally the alias will | 53 | nonumtail=<bool> -- When creating 8.3 aliases, normally the alias will |
54 | end in '~1' or tilde followed by some number. If this | 54 | end in '~1' or tilde followed by some number. If this |
55 | option is set, then if the filename is | 55 | option is set, then if the filename is |
56 | "longfilename.txt" and "longfile.txt" does not | 56 | "longfilename.txt" and "longfile.txt" does not |
57 | currently exist in the directory, 'longfile.txt' will | 57 | currently exist in the directory, 'longfile.txt' will |
58 | be the short alias instead of 'longfi~1.txt'. | 58 | be the short alias instead of 'longfi~1.txt'. |
59 | 59 | ||
60 | quiet -- Stops printing certain warning messages. | 60 | quiet -- Stops printing certain warning messages. |
61 | 61 | ||
62 | check=s|r|n -- Case sensitivity checking setting. | 62 | check=s|r|n -- Case sensitivity checking setting. |
63 | s: strict, case sensitive | 63 | s: strict, case sensitive |
64 | r: relaxed, case insensitive | 64 | r: relaxed, case insensitive |
65 | n: normal, default setting, currently case insensitive | 65 | n: normal, default setting, currently case insensitive |
66 | 66 | ||
67 | shortname=lower|win95|winnt|mixed | 67 | shortname=lower|win95|winnt|mixed |
68 | -- Shortname display/create setting. | 68 | -- Shortname display/create setting. |
69 | lower: convert to lowercase for display, | 69 | lower: convert to lowercase for display, |
70 | emulate the Windows 95 rule for create. | 70 | emulate the Windows 95 rule for create. |
71 | win95: emulate the Windows 95 rule for display/create. | 71 | win95: emulate the Windows 95 rule for display/create. |
72 | winnt: emulate the Windows NT rule for display/create. | 72 | winnt: emulate the Windows NT rule for display/create. |
73 | mixed: emulate the Windows NT rule for display, | 73 | mixed: emulate the Windows NT rule for display, |
74 | emulate the Windows 95 rule for create. | 74 | emulate the Windows 95 rule for create. |
75 | Default setting is `lower'. | 75 | Default setting is `lower'. |
76 | 76 | ||
77 | <bool>: 0,1,yes,no,true,false | 77 | <bool>: 0,1,yes,no,true,false |
78 | 78 | ||
79 | TODO | 79 | TODO |
80 | ---------------------------------------------------------------------- | 80 | ---------------------------------------------------------------------- |
81 | * Need to get rid of the raw scanning stuff. Instead, always use | 81 | * Need to get rid of the raw scanning stuff. Instead, always use |
82 | a get next directory entry approach. The only thing left that uses | 82 | a get next directory entry approach. The only thing left that uses |
83 | raw scanning is the directory renaming code. | 83 | raw scanning is the directory renaming code. |
84 | 84 | ||
85 | 85 | ||
86 | POSSIBLE PROBLEMS | 86 | POSSIBLE PROBLEMS |
87 | ---------------------------------------------------------------------- | 87 | ---------------------------------------------------------------------- |
88 | * vfat_valid_longname does not properly checked reserved names. | 88 | * vfat_valid_longname does not properly checked reserved names. |
89 | * When a volume name is the same as a directory name in the root | 89 | * When a volume name is the same as a directory name in the root |
90 | directory of the filesystem, the directory name sometimes shows | 90 | directory of the filesystem, the directory name sometimes shows |
91 | up as an empty file. | 91 | up as an empty file. |
92 | * autoconv option does not work correctly. | 92 | * autoconv option does not work correctly. |
93 | 93 | ||
94 | BUG REPORTS | 94 | BUG REPORTS |
95 | ---------------------------------------------------------------------- | 95 | ---------------------------------------------------------------------- |
96 | If you have trouble with the VFAT filesystem, mail bug reports to | 96 | If you have trouble with the VFAT filesystem, mail bug reports to |
97 | chaffee@bmrc.cs.berkeley.edu. Please specify the filename | 97 | chaffee@bmrc.cs.berkeley.edu. Please specify the filename |
98 | and the operation that gave you trouble. | 98 | and the operation that gave you trouble. |
99 | 99 | ||
100 | TEST SUITE | 100 | TEST SUITE |
101 | ---------------------------------------------------------------------- | 101 | ---------------------------------------------------------------------- |
102 | If you plan to make any modifications to the vfat filesystem, please | 102 | If you plan to make any modifications to the vfat filesystem, please |
103 | get the test suite that comes with the vfat distribution at | 103 | get the test suite that comes with the vfat distribution at |
104 | 104 | ||
105 | http://bmrc.berkeley.edu/people/chaffee/vfat.html | 105 | http://bmrc.berkeley.edu/people/chaffee/vfat.html |
106 | 106 | ||
107 | This tests quite a few parts of the vfat filesystem and additional | 107 | This tests quite a few parts of the vfat filesystem and additional |
108 | tests for new features or untested features would be appreciated. | 108 | tests for new features or untested features would be appreciated. |
109 | 109 | ||
110 | NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM | 110 | NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM |
111 | ---------------------------------------------------------------------- | 111 | ---------------------------------------------------------------------- |
112 | (This documentation was provided by Galen C. Hunt <gchunt@cs.rochester.edu> | 112 | (This documentation was provided by Galen C. Hunt <gchunt@cs.rochester.edu> |
113 | and lightly annotated by Gordon Chaffee). | 113 | and lightly annotated by Gordon Chaffee). |
114 | 114 | ||
115 | This document presents a very rough, technical overview of my | 115 | This document presents a very rough, technical overview of my |
116 | knowledge of the extended FAT file system used in Windows NT 3.5 and | 116 | knowledge of the extended FAT file system used in Windows NT 3.5 and |
117 | Windows 95. I don't guarantee that any of the following is correct, | 117 | Windows 95. I don't guarantee that any of the following is correct, |
118 | but it appears to be so. | 118 | but it appears to be so. |
119 | 119 | ||
120 | The extended FAT file system is almost identical to the FAT | 120 | The extended FAT file system is almost identical to the FAT |
121 | file system used in DOS versions up to and including 6.223410239847 | 121 | file system used in DOS versions up to and including 6.223410239847 |
122 | :-). The significant change has been the addition of long file names. | 122 | :-). The significant change has been the addition of long file names. |
123 | These names support up to 255 characters including spaces and lower | 123 | These names support up to 255 characters including spaces and lower |
124 | case characters as opposed to the traditional 8.3 short names. | 124 | case characters as opposed to the traditional 8.3 short names. |
125 | 125 | ||
126 | Here is the description of the traditional FAT entry in the current | 126 | Here is the description of the traditional FAT entry in the current |
127 | Windows 95 filesystem: | 127 | Windows 95 filesystem: |
128 | 128 | ||
129 | struct directory { // Short 8.3 names | 129 | struct directory { // Short 8.3 names |
130 | unsigned char name[8]; // file name | 130 | unsigned char name[8]; // file name |
131 | unsigned char ext[3]; // file extension | 131 | unsigned char ext[3]; // file extension |
132 | unsigned char attr; // attribute byte | 132 | unsigned char attr; // attribute byte |
133 | unsigned char lcase; // Case for base and extension | 133 | unsigned char lcase; // Case for base and extension |
134 | unsigned char ctime_ms; // Creation time, milliseconds | 134 | unsigned char ctime_ms; // Creation time, milliseconds |
135 | unsigned char ctime[2]; // Creation time | 135 | unsigned char ctime[2]; // Creation time |
136 | unsigned char cdate[2]; // Creation date | 136 | unsigned char cdate[2]; // Creation date |
137 | unsigned char adate[2]; // Last access date | 137 | unsigned char adate[2]; // Last access date |
138 | unsigned char reserved[2]; // reserved values (ignored) | 138 | unsigned char reserved[2]; // reserved values (ignored) |
139 | unsigned char time[2]; // time stamp | 139 | unsigned char time[2]; // time stamp |
140 | unsigned char date[2]; // date stamp | 140 | unsigned char date[2]; // date stamp |
141 | unsigned char start[2]; // starting cluster number | 141 | unsigned char start[2]; // starting cluster number |
142 | unsigned char size[4]; // size of the file | 142 | unsigned char size[4]; // size of the file |
143 | }; | 143 | }; |
144 | 144 | ||
145 | The lcase field specifies if the base and/or the extension of an 8.3 | 145 | The lcase field specifies if the base and/or the extension of an 8.3 |
146 | name should be capitalized. This field does not seem to be used by | 146 | name should be capitalized. This field does not seem to be used by |
147 | Windows 95 but it is used by Windows NT. The case of filenames is not | 147 | Windows 95 but it is used by Windows NT. The case of filenames is not |
148 | completely compatible from Windows NT to Windows 95. It is not completely | 148 | completely compatible from Windows NT to Windows 95. It is not completely |
149 | compatible in the reverse direction, however. Filenames that fit in | 149 | compatible in the reverse direction, however. Filenames that fit in |
150 | the 8.3 namespace and are written on Windows NT to be lowercase will | 150 | the 8.3 namespace and are written on Windows NT to be lowercase will |
151 | show up as uppercase on Windows 95. | 151 | show up as uppercase on Windows 95. |
152 | 152 | ||
153 | Note that the "start" and "size" values are actually little | 153 | Note that the "start" and "size" values are actually little |
154 | endian integer values. The descriptions of the fields in this | 154 | endian integer values. The descriptions of the fields in this |
155 | structure are public knowledge and can be found elsewhere. | 155 | structure are public knowledge and can be found elsewhere. |
156 | 156 | ||
157 | With the extended FAT system, Microsoft has inserted extra | 157 | With the extended FAT system, Microsoft has inserted extra |
158 | directory entries for any files with extended names. (Any name which | 158 | directory entries for any files with extended names. (Any name which |
159 | legally fits within the old 8.3 encoding scheme does not have extra | 159 | legally fits within the old 8.3 encoding scheme does not have extra |
160 | entries.) I call these extra entries slots. Basically, a slot is a | 160 | entries.) I call these extra entries slots. Basically, a slot is a |
161 | specially formatted directory entry which holds up to 13 characters of | 161 | specially formatted directory entry which holds up to 13 characters of |
162 | a file's extended name. Think of slots as additional labeling for the | 162 | a file's extended name. Think of slots as additional labeling for the |
163 | directory entry of the file to which they correspond. Microsoft | 163 | directory entry of the file to which they correspond. Microsoft |
164 | prefers to refer to the 8.3 entry for a file as its alias and the | 164 | prefers to refer to the 8.3 entry for a file as its alias and the |
165 | extended slot directory entries as the file name. | 165 | extended slot directory entries as the file name. |
166 | 166 | ||
167 | The C structure for a slot directory entry follows: | 167 | The C structure for a slot directory entry follows: |
168 | 168 | ||
169 | struct slot { // Up to 13 characters of a long name | 169 | struct slot { // Up to 13 characters of a long name |
170 | unsigned char id; // sequence number for slot | 170 | unsigned char id; // sequence number for slot |
171 | unsigned char name0_4[10]; // first 5 characters in name | 171 | unsigned char name0_4[10]; // first 5 characters in name |
172 | unsigned char attr; // attribute byte | 172 | unsigned char attr; // attribute byte |
173 | unsigned char reserved; // always 0 | 173 | unsigned char reserved; // always 0 |
174 | unsigned char alias_checksum; // checksum for 8.3 alias | 174 | unsigned char alias_checksum; // checksum for 8.3 alias |
175 | unsigned char name5_10[12]; // 6 more characters in name | 175 | unsigned char name5_10[12]; // 6 more characters in name |
176 | unsigned char start[2]; // starting cluster number | 176 | unsigned char start[2]; // starting cluster number |
177 | unsigned char name11_12[4]; // last 2 characters in name | 177 | unsigned char name11_12[4]; // last 2 characters in name |
178 | }; | 178 | }; |
179 | 179 | ||
180 | If the layout of the slots looks a little odd, it's only | 180 | If the layout of the slots looks a little odd, it's only |
181 | because of Microsoft's efforts to maintain compatibility with old | 181 | because of Microsoft's efforts to maintain compatibility with old |
182 | software. The slots must be disguised to prevent old software from | 182 | software. The slots must be disguised to prevent old software from |
183 | panicking. To this end, a number of measures are taken: | 183 | panicking. To this end, a number of measures are taken: |
184 | 184 | ||
185 | 1) The attribute byte for a slot directory entry is always set | 185 | 1) The attribute byte for a slot directory entry is always set |
186 | to 0x0f. This corresponds to an old directory entry with | 186 | to 0x0f. This corresponds to an old directory entry with |
187 | attributes of "hidden", "system", "read-only", and "volume | 187 | attributes of "hidden", "system", "read-only", and "volume |
188 | label". Most old software will ignore any directory | 188 | label". Most old software will ignore any directory |
189 | entries with the "volume label" bit set. Real volume label | 189 | entries with the "volume label" bit set. Real volume label |
190 | entries don't have the other three bits set. | 190 | entries don't have the other three bits set. |
191 | 191 | ||
192 | 2) The starting cluster is always set to 0, an impossible | 192 | 2) The starting cluster is always set to 0, an impossible |
193 | value for a DOS file. | 193 | value for a DOS file. |
194 | 194 | ||
195 | Because the extended FAT system is backward compatible, it is | 195 | Because the extended FAT system is backward compatible, it is |
196 | possible for old software to modify directory entries. Measures must | 196 | possible for old software to modify directory entries. Measures must |
197 | be taken to ensure the validity of slots. An extended FAT system can | 197 | be taken to ensure the validity of slots. An extended FAT system can |
198 | verify that a slot does in fact belong to an 8.3 directory entry by | 198 | verify that a slot does in fact belong to an 8.3 directory entry by |
199 | the following: | 199 | the following: |
200 | 200 | ||
201 | 1) Positioning. Slots for a file always immediately proceed | 201 | 1) Positioning. Slots for a file always immediately proceed |
202 | their corresponding 8.3 directory entry. In addition, each | 202 | their corresponding 8.3 directory entry. In addition, each |
203 | slot has an id which marks its order in the extended file | 203 | slot has an id which marks its order in the extended file |
204 | name. Here is a very abbreviated view of an 8.3 directory | 204 | name. Here is a very abbreviated view of an 8.3 directory |
205 | entry and its corresponding long name slots for the file | 205 | entry and its corresponding long name slots for the file |
206 | "My Big File.Extension which is long": | 206 | "My Big File.Extension which is long": |
207 | 207 | ||
208 | <proceeding files...> | 208 | <proceeding files...> |
209 | <slot #3, id = 0x43, characters = "h is long"> | 209 | <slot #3, id = 0x43, characters = "h is long"> |
210 | <slot #2, id = 0x02, characters = "xtension whic"> | 210 | <slot #2, id = 0x02, characters = "xtension whic"> |
211 | <slot #1, id = 0x01, characters = "My Big File.E"> | 211 | <slot #1, id = 0x01, characters = "My Big File.E"> |
212 | <directory entry, name = "MYBIGFIL.EXT"> | 212 | <directory entry, name = "MYBIGFIL.EXT"> |
213 | 213 | ||
214 | Note that the slots are stored from last to first. Slots | 214 | Note that the slots are stored from last to first. Slots |
215 | are numbered from 1 to N. The Nth slot is or'ed with 0x40 | 215 | are numbered from 1 to N. The Nth slot is or'ed with 0x40 |
216 | to mark it as the last one. | 216 | to mark it as the last one. |
217 | 217 | ||
218 | 2) Checksum. Each slot has an "alias_checksum" value. The | 218 | 2) Checksum. Each slot has an "alias_checksum" value. The |
219 | checksum is calculated from the 8.3 name using the | 219 | checksum is calculated from the 8.3 name using the |
220 | following algorithm: | 220 | following algorithm: |
221 | 221 | ||
222 | for (sum = i = 0; i < 11; i++) { | 222 | for (sum = i = 0; i < 11; i++) { |
223 | sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] | 223 | sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] |
224 | } | 224 | } |
225 | 225 | ||
226 | 3) If there is free space in the final slot, a Unicode NULL (0x0000) | 226 | 3) If there is free space in the final slot, a Unicode NULL (0x0000) |
227 | is stored after the final character. After that, all unused | 227 | is stored after the final character. After that, all unused |
228 | characters in the final slot are set to Unicode 0xFFFF. | 228 | characters in the final slot are set to Unicode 0xFFFF. |
229 | 229 | ||
230 | Finally, note that the extended name is stored in Unicode. Each Unicode | 230 | Finally, note that the extended name is stored in Unicode. Each Unicode |
231 | character takes two bytes. | 231 | character takes two bytes. |
232 | 232 |
Documentation/filesystems/vfs.txt
1 | 1 | ||
2 | Overview of the Linux Virtual File System | 2 | Overview of the Linux Virtual File System |
3 | 3 | ||
4 | Original author: Richard Gooch <rgooch@atnf.csiro.au> | 4 | Original author: Richard Gooch <rgooch@atnf.csiro.au> |
5 | 5 | ||
6 | Last updated on October 28, 2005 | 6 | Last updated on October 28, 2005 |
7 | 7 | ||
8 | Copyright (C) 1999 Richard Gooch | 8 | Copyright (C) 1999 Richard Gooch |
9 | Copyright (C) 2005 Pekka Enberg | 9 | Copyright (C) 2005 Pekka Enberg |
10 | 10 | ||
11 | This file is released under the GPLv2. | 11 | This file is released under the GPLv2. |
12 | 12 | ||
13 | 13 | ||
14 | Introduction | 14 | Introduction |
15 | ============ | 15 | ============ |
16 | 16 | ||
17 | The Virtual File System (also known as the Virtual Filesystem Switch) | 17 | The Virtual File System (also known as the Virtual Filesystem Switch) |
18 | is the software layer in the kernel that provides the filesystem | 18 | is the software layer in the kernel that provides the filesystem |
19 | interface to userspace programs. It also provides an abstraction | 19 | interface to userspace programs. It also provides an abstraction |
20 | within the kernel which allows different filesystem implementations to | 20 | within the kernel which allows different filesystem implementations to |
21 | coexist. | 21 | coexist. |
22 | 22 | ||
23 | VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so | 23 | VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so |
24 | on are called from a process context. Filesystem locking is described | 24 | on are called from a process context. Filesystem locking is described |
25 | in the document Documentation/filesystems/Locking. | 25 | in the document Documentation/filesystems/Locking. |
26 | 26 | ||
27 | 27 | ||
28 | Directory Entry Cache (dcache) | 28 | Directory Entry Cache (dcache) |
29 | ------------------------------ | 29 | ------------------------------ |
30 | 30 | ||
31 | The VFS implements the open(2), stat(2), chmod(2), and similar system | 31 | The VFS implements the open(2), stat(2), chmod(2), and similar system |
32 | calls. The pathname argument that is passed to them is used by the VFS | 32 | calls. The pathname argument that is passed to them is used by the VFS |
33 | to search through the directory entry cache (also known as the dentry | 33 | to search through the directory entry cache (also known as the dentry |
34 | cache or dcache). This provides a very fast look-up mechanism to | 34 | cache or dcache). This provides a very fast look-up mechanism to |
35 | translate a pathname (filename) into a specific dentry. Dentries live | 35 | translate a pathname (filename) into a specific dentry. Dentries live |
36 | in RAM and are never saved to disc: they exist only for performance. | 36 | in RAM and are never saved to disc: they exist only for performance. |
37 | 37 | ||
38 | The dentry cache is meant to be a view into your entire filespace. As | 38 | The dentry cache is meant to be a view into your entire filespace. As |
39 | most computers cannot fit all dentries in the RAM at the same time, | 39 | most computers cannot fit all dentries in the RAM at the same time, |
40 | some bits of the cache are missing. In order to resolve your pathname | 40 | some bits of the cache are missing. In order to resolve your pathname |
41 | into a dentry, the VFS may have to resort to creating dentries along | 41 | into a dentry, the VFS may have to resort to creating dentries along |
42 | the way, and then loading the inode. This is done by looking up the | 42 | the way, and then loading the inode. This is done by looking up the |
43 | inode. | 43 | inode. |
44 | 44 | ||
45 | 45 | ||
46 | The Inode Object | 46 | The Inode Object |
47 | ---------------- | 47 | ---------------- |
48 | 48 | ||
49 | An individual dentry usually has a pointer to an inode. Inodes are | 49 | An individual dentry usually has a pointer to an inode. Inodes are |
50 | filesystem objects such as regular files, directories, FIFOs and other | 50 | filesystem objects such as regular files, directories, FIFOs and other |
51 | beasts. They live either on the disc (for block device filesystems) | 51 | beasts. They live either on the disc (for block device filesystems) |
52 | or in the memory (for pseudo filesystems). Inodes that live on the | 52 | or in the memory (for pseudo filesystems). Inodes that live on the |
53 | disc are copied into the memory when required and changes to the inode | 53 | disc are copied into the memory when required and changes to the inode |
54 | are written back to disc. A single inode can be pointed to by multiple | 54 | are written back to disc. A single inode can be pointed to by multiple |
55 | dentries (hard links, for example, do this). | 55 | dentries (hard links, for example, do this). |
56 | 56 | ||
57 | To look up an inode requires that the VFS calls the lookup() method of | 57 | To look up an inode requires that the VFS calls the lookup() method of |
58 | the parent directory inode. This method is installed by the specific | 58 | the parent directory inode. This method is installed by the specific |
59 | filesystem implementation that the inode lives in. Once the VFS has | 59 | filesystem implementation that the inode lives in. Once the VFS has |
60 | the required dentry (and hence the inode), we can do all those boring | 60 | the required dentry (and hence the inode), we can do all those boring |
61 | things like open(2) the file, or stat(2) it to peek at the inode | 61 | things like open(2) the file, or stat(2) it to peek at the inode |
62 | data. The stat(2) operation is fairly simple: once the VFS has the | 62 | data. The stat(2) operation is fairly simple: once the VFS has the |
63 | dentry, it peeks at the inode data and passes some of it back to | 63 | dentry, it peeks at the inode data and passes some of it back to |
64 | userspace. | 64 | userspace. |
65 | 65 | ||
66 | 66 | ||
67 | The File Object | 67 | The File Object |
68 | --------------- | 68 | --------------- |
69 | 69 | ||
70 | Opening a file requires another operation: allocation of a file | 70 | Opening a file requires another operation: allocation of a file |
71 | structure (this is the kernel-side implementation of file | 71 | structure (this is the kernel-side implementation of file |
72 | descriptors). The freshly allocated file structure is initialized with | 72 | descriptors). The freshly allocated file structure is initialized with |
73 | a pointer to the dentry and a set of file operation member functions. | 73 | a pointer to the dentry and a set of file operation member functions. |
74 | These are taken from the inode data. The open() file method is then | 74 | These are taken from the inode data. The open() file method is then |
75 | called so the specific filesystem implementation can do it's work. You | 75 | called so the specific filesystem implementation can do it's work. You |
76 | can see that this is another switch performed by the VFS. The file | 76 | can see that this is another switch performed by the VFS. The file |
77 | structure is placed into the file descriptor table for the process. | 77 | structure is placed into the file descriptor table for the process. |
78 | 78 | ||
79 | Reading, writing and closing files (and other assorted VFS operations) | 79 | Reading, writing and closing files (and other assorted VFS operations) |
80 | is done by using the userspace file descriptor to grab the appropriate | 80 | is done by using the userspace file descriptor to grab the appropriate |
81 | file structure, and then calling the required file structure method to | 81 | file structure, and then calling the required file structure method to |
82 | do whatever is required. For as long as the file is open, it keeps the | 82 | do whatever is required. For as long as the file is open, it keeps the |
83 | dentry in use, which in turn means that the VFS inode is still in use. | 83 | dentry in use, which in turn means that the VFS inode is still in use. |
84 | 84 | ||
85 | 85 | ||
86 | Registering and Mounting a Filesystem | 86 | Registering and Mounting a Filesystem |
87 | ===================================== | 87 | ===================================== |
88 | 88 | ||
89 | To register and unregister a filesystem, use the following API | 89 | To register and unregister a filesystem, use the following API |
90 | functions: | 90 | functions: |
91 | 91 | ||
92 | #include <linux/fs.h> | 92 | #include <linux/fs.h> |
93 | 93 | ||
94 | extern int register_filesystem(struct file_system_type *); | 94 | extern int register_filesystem(struct file_system_type *); |
95 | extern int unregister_filesystem(struct file_system_type *); | 95 | extern int unregister_filesystem(struct file_system_type *); |
96 | 96 | ||
97 | The passed struct file_system_type describes your filesystem. When a | 97 | The passed struct file_system_type describes your filesystem. When a |
98 | request is made to mount a device onto a directory in your filespace, | 98 | request is made to mount a device onto a directory in your filespace, |
99 | the VFS will call the appropriate get_sb() method for the specific | 99 | the VFS will call the appropriate get_sb() method for the specific |
100 | filesystem. The dentry for the mount point will then be updated to | 100 | filesystem. The dentry for the mount point will then be updated to |
101 | point to the root inode for the new filesystem. | 101 | point to the root inode for the new filesystem. |
102 | 102 | ||
103 | You can see all filesystems that are registered to the kernel in the | 103 | You can see all filesystems that are registered to the kernel in the |
104 | file /proc/filesystems. | 104 | file /proc/filesystems. |
105 | 105 | ||
106 | 106 | ||
107 | struct file_system_type | 107 | struct file_system_type |
108 | ----------------------- | 108 | ----------------------- |
109 | 109 | ||
110 | This describes the filesystem. As of kernel 2.6.13, the following | 110 | This describes the filesystem. As of kernel 2.6.13, the following |
111 | members are defined: | 111 | members are defined: |
112 | 112 | ||
113 | struct file_system_type { | 113 | struct file_system_type { |
114 | const char *name; | 114 | const char *name; |
115 | int fs_flags; | 115 | int fs_flags; |
116 | int (*get_sb) (struct file_system_type *, int, | 116 | int (*get_sb) (struct file_system_type *, int, |
117 | const char *, void *, struct vfsmount *); | 117 | const char *, void *, struct vfsmount *); |
118 | void (*kill_sb) (struct super_block *); | 118 | void (*kill_sb) (struct super_block *); |
119 | struct module *owner; | 119 | struct module *owner; |
120 | struct file_system_type * next; | 120 | struct file_system_type * next; |
121 | struct list_head fs_supers; | 121 | struct list_head fs_supers; |
122 | }; | 122 | }; |
123 | 123 | ||
124 | name: the name of the filesystem type, such as "ext2", "iso9660", | 124 | name: the name of the filesystem type, such as "ext2", "iso9660", |
125 | "msdos" and so on | 125 | "msdos" and so on |
126 | 126 | ||
127 | fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) | 127 | fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) |
128 | 128 | ||
129 | get_sb: the method to call when a new instance of this | 129 | get_sb: the method to call when a new instance of this |
130 | filesystem should be mounted | 130 | filesystem should be mounted |
131 | 131 | ||
132 | kill_sb: the method to call when an instance of this filesystem | 132 | kill_sb: the method to call when an instance of this filesystem |
133 | should be unmounted | 133 | should be unmounted |
134 | 134 | ||
135 | owner: for internal VFS use: you should initialize this to THIS_MODULE in | 135 | owner: for internal VFS use: you should initialize this to THIS_MODULE in |
136 | most cases. | 136 | most cases. |
137 | 137 | ||
138 | next: for internal VFS use: you should initialize this to NULL | 138 | next: for internal VFS use: you should initialize this to NULL |
139 | 139 | ||
140 | The get_sb() method has the following arguments: | 140 | The get_sb() method has the following arguments: |
141 | 141 | ||
142 | struct super_block *sb: the superblock structure. This is partially | 142 | struct super_block *sb: the superblock structure. This is partially |
143 | initialized by the VFS and the rest must be initialized by the | 143 | initialized by the VFS and the rest must be initialized by the |
144 | get_sb() method | 144 | get_sb() method |
145 | 145 | ||
146 | int flags: mount flags | 146 | int flags: mount flags |
147 | 147 | ||
148 | const char *dev_name: the device name we are mounting. | 148 | const char *dev_name: the device name we are mounting. |
149 | 149 | ||
150 | void *data: arbitrary mount options, usually comes as an ASCII | 150 | void *data: arbitrary mount options, usually comes as an ASCII |
151 | string | 151 | string |
152 | 152 | ||
153 | int silent: whether or not to be silent on error | 153 | int silent: whether or not to be silent on error |
154 | 154 | ||
155 | The get_sb() method must determine if the block device specified | 155 | The get_sb() method must determine if the block device specified |
156 | in the superblock contains a filesystem of the type the method | 156 | in the superblock contains a filesystem of the type the method |
157 | supports. On success the method returns the superblock pointer, on | 157 | supports. On success the method returns the superblock pointer, on |
158 | failure it returns NULL. | 158 | failure it returns NULL. |
159 | 159 | ||
160 | The most interesting member of the superblock structure that the | 160 | The most interesting member of the superblock structure that the |
161 | get_sb() method fills in is the "s_op" field. This is a pointer to | 161 | get_sb() method fills in is the "s_op" field. This is a pointer to |
162 | a "struct super_operations" which describes the next level of the | 162 | a "struct super_operations" which describes the next level of the |
163 | filesystem implementation. | 163 | filesystem implementation. |
164 | 164 | ||
165 | Usually, a filesystem uses one of the generic get_sb() implementations | 165 | Usually, a filesystem uses one of the generic get_sb() implementations |
166 | and provides a fill_super() method instead. The generic methods are: | 166 | and provides a fill_super() method instead. The generic methods are: |
167 | 167 | ||
168 | get_sb_bdev: mount a filesystem residing on a block device | 168 | get_sb_bdev: mount a filesystem residing on a block device |
169 | 169 | ||
170 | get_sb_nodev: mount a filesystem that is not backed by a device | 170 | get_sb_nodev: mount a filesystem that is not backed by a device |
171 | 171 | ||
172 | get_sb_single: mount a filesystem which shares the instance between | 172 | get_sb_single: mount a filesystem which shares the instance between |
173 | all mounts | 173 | all mounts |
174 | 174 | ||
175 | A fill_super() method implementation has the following arguments: | 175 | A fill_super() method implementation has the following arguments: |
176 | 176 | ||
177 | struct super_block *sb: the superblock structure. The method fill_super() | 177 | struct super_block *sb: the superblock structure. The method fill_super() |
178 | must initialize this properly. | 178 | must initialize this properly. |
179 | 179 | ||
180 | void *data: arbitrary mount options, usually comes as an ASCII | 180 | void *data: arbitrary mount options, usually comes as an ASCII |
181 | string | 181 | string |
182 | 182 | ||
183 | int silent: whether or not to be silent on error | 183 | int silent: whether or not to be silent on error |
184 | 184 | ||
185 | 185 | ||
186 | The Superblock Object | 186 | The Superblock Object |
187 | ===================== | 187 | ===================== |
188 | 188 | ||
189 | A superblock object represents a mounted filesystem. | 189 | A superblock object represents a mounted filesystem. |
190 | 190 | ||
191 | 191 | ||
192 | struct super_operations | 192 | struct super_operations |
193 | ----------------------- | 193 | ----------------------- |
194 | 194 | ||
195 | This describes how the VFS can manipulate the superblock of your | 195 | This describes how the VFS can manipulate the superblock of your |
196 | filesystem. As of kernel 2.6.13, the following members are defined: | 196 | filesystem. As of kernel 2.6.13, the following members are defined: |
197 | 197 | ||
198 | struct super_operations { | 198 | struct super_operations { |
199 | struct inode *(*alloc_inode)(struct super_block *sb); | 199 | struct inode *(*alloc_inode)(struct super_block *sb); |
200 | void (*destroy_inode)(struct inode *); | 200 | void (*destroy_inode)(struct inode *); |
201 | 201 | ||
202 | void (*read_inode) (struct inode *); | 202 | void (*read_inode) (struct inode *); |
203 | 203 | ||
204 | void (*dirty_inode) (struct inode *); | 204 | void (*dirty_inode) (struct inode *); |
205 | int (*write_inode) (struct inode *, int); | 205 | int (*write_inode) (struct inode *, int); |
206 | void (*put_inode) (struct inode *); | 206 | void (*put_inode) (struct inode *); |
207 | void (*drop_inode) (struct inode *); | 207 | void (*drop_inode) (struct inode *); |
208 | void (*delete_inode) (struct inode *); | 208 | void (*delete_inode) (struct inode *); |
209 | void (*put_super) (struct super_block *); | 209 | void (*put_super) (struct super_block *); |
210 | void (*write_super) (struct super_block *); | 210 | void (*write_super) (struct super_block *); |
211 | int (*sync_fs)(struct super_block *sb, int wait); | 211 | int (*sync_fs)(struct super_block *sb, int wait); |
212 | void (*write_super_lockfs) (struct super_block *); | 212 | void (*write_super_lockfs) (struct super_block *); |
213 | void (*unlockfs) (struct super_block *); | 213 | void (*unlockfs) (struct super_block *); |
214 | int (*statfs) (struct dentry *, struct kstatfs *); | 214 | int (*statfs) (struct dentry *, struct kstatfs *); |
215 | int (*remount_fs) (struct super_block *, int *, char *); | 215 | int (*remount_fs) (struct super_block *, int *, char *); |
216 | void (*clear_inode) (struct inode *); | 216 | void (*clear_inode) (struct inode *); |
217 | void (*umount_begin) (struct super_block *); | 217 | void (*umount_begin) (struct super_block *); |
218 | 218 | ||
219 | void (*sync_inodes) (struct super_block *sb, | 219 | void (*sync_inodes) (struct super_block *sb, |
220 | struct writeback_control *wbc); | 220 | struct writeback_control *wbc); |
221 | int (*show_options)(struct seq_file *, struct vfsmount *); | 221 | int (*show_options)(struct seq_file *, struct vfsmount *); |
222 | 222 | ||
223 | ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); | 223 | ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); |
224 | ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); | 224 | ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); |
225 | }; | 225 | }; |
226 | 226 | ||
227 | All methods are called without any locks being held, unless otherwise | 227 | All methods are called without any locks being held, unless otherwise |
228 | noted. This means that most methods can block safely. All methods are | 228 | noted. This means that most methods can block safely. All methods are |
229 | only called from a process context (i.e. not from an interrupt handler | 229 | only called from a process context (i.e. not from an interrupt handler |
230 | or bottom half). | 230 | or bottom half). |
231 | 231 | ||
232 | alloc_inode: this method is called by inode_alloc() to allocate memory | 232 | alloc_inode: this method is called by inode_alloc() to allocate memory |
233 | for struct inode and initialize it. If this function is not | 233 | for struct inode and initialize it. If this function is not |
234 | defined, a simple 'struct inode' is allocated. Normally | 234 | defined, a simple 'struct inode' is allocated. Normally |
235 | alloc_inode will be used to allocate a larger structure which | 235 | alloc_inode will be used to allocate a larger structure which |
236 | contains a 'struct inode' embedded within it. | 236 | contains a 'struct inode' embedded within it. |
237 | 237 | ||
238 | destroy_inode: this method is called by destroy_inode() to release | 238 | destroy_inode: this method is called by destroy_inode() to release |
239 | resources allocated for struct inode. It is only required if | 239 | resources allocated for struct inode. It is only required if |
240 | ->alloc_inode was defined and simply undoes anything done by | 240 | ->alloc_inode was defined and simply undoes anything done by |
241 | ->alloc_inode. | 241 | ->alloc_inode. |
242 | 242 | ||
243 | read_inode: this method is called to read a specific inode from the | 243 | read_inode: this method is called to read a specific inode from the |
244 | mounted filesystem. The i_ino member in the struct inode is | 244 | mounted filesystem. The i_ino member in the struct inode is |
245 | initialized by the VFS to indicate which inode to read. Other | 245 | initialized by the VFS to indicate which inode to read. Other |
246 | members are filled in by this method. | 246 | members are filled in by this method. |
247 | 247 | ||
248 | You can set this to NULL and use iget5_locked() instead of iget() | 248 | You can set this to NULL and use iget5_locked() instead of iget() |
249 | to read inodes. This is necessary for filesystems for which the | 249 | to read inodes. This is necessary for filesystems for which the |
250 | inode number is not sufficient to identify an inode. | 250 | inode number is not sufficient to identify an inode. |
251 | 251 | ||
252 | dirty_inode: this method is called by the VFS to mark an inode dirty. | 252 | dirty_inode: this method is called by the VFS to mark an inode dirty. |
253 | 253 | ||
254 | write_inode: this method is called when the VFS needs to write an | 254 | write_inode: this method is called when the VFS needs to write an |
255 | inode to disc. The second parameter indicates whether the write | 255 | inode to disc. The second parameter indicates whether the write |
256 | should be synchronous or not, not all filesystems check this flag. | 256 | should be synchronous or not, not all filesystems check this flag. |
257 | 257 | ||
258 | put_inode: called when the VFS inode is removed from the inode | 258 | put_inode: called when the VFS inode is removed from the inode |
259 | cache. | 259 | cache. |
260 | 260 | ||
261 | drop_inode: called when the last access to the inode is dropped, | 261 | drop_inode: called when the last access to the inode is dropped, |
262 | with the inode_lock spinlock held. | 262 | with the inode_lock spinlock held. |
263 | 263 | ||
264 | This method should be either NULL (normal UNIX filesystem | 264 | This method should be either NULL (normal UNIX filesystem |
265 | semantics) or "generic_delete_inode" (for filesystems that do not | 265 | semantics) or "generic_delete_inode" (for filesystems that do not |
266 | want to cache inodes - causing "delete_inode" to always be | 266 | want to cache inodes - causing "delete_inode" to always be |
267 | called regardless of the value of i_nlink) | 267 | called regardless of the value of i_nlink) |
268 | 268 | ||
269 | The "generic_delete_inode()" behavior is equivalent to the | 269 | The "generic_delete_inode()" behavior is equivalent to the |
270 | old practice of using "force_delete" in the put_inode() case, | 270 | old practice of using "force_delete" in the put_inode() case, |
271 | but does not have the races that the "force_delete()" approach | 271 | but does not have the races that the "force_delete()" approach |
272 | had. | 272 | had. |
273 | 273 | ||
274 | delete_inode: called when the VFS wants to delete an inode | 274 | delete_inode: called when the VFS wants to delete an inode |
275 | 275 | ||
276 | put_super: called when the VFS wishes to free the superblock | 276 | put_super: called when the VFS wishes to free the superblock |
277 | (i.e. unmount). This is called with the superblock lock held | 277 | (i.e. unmount). This is called with the superblock lock held |
278 | 278 | ||
279 | write_super: called when the VFS superblock needs to be written to | 279 | write_super: called when the VFS superblock needs to be written to |
280 | disc. This method is optional | 280 | disc. This method is optional |
281 | 281 | ||
282 | sync_fs: called when VFS is writing out all dirty data associated with | 282 | sync_fs: called when VFS is writing out all dirty data associated with |
283 | a superblock. The second parameter indicates whether the method | 283 | a superblock. The second parameter indicates whether the method |
284 | should wait until the write out has been completed. Optional. | 284 | should wait until the write out has been completed. Optional. |
285 | 285 | ||
286 | write_super_lockfs: called when VFS is locking a filesystem and | 286 | write_super_lockfs: called when VFS is locking a filesystem and |
287 | forcing it into a consistent state. This method is currently | 287 | forcing it into a consistent state. This method is currently |
288 | used by the Logical Volume Manager (LVM). | 288 | used by the Logical Volume Manager (LVM). |
289 | 289 | ||
290 | unlockfs: called when VFS is unlocking a filesystem and making it writable | 290 | unlockfs: called when VFS is unlocking a filesystem and making it writable |
291 | again. | 291 | again. |
292 | 292 | ||
293 | statfs: called when the VFS needs to get filesystem statistics. This | 293 | statfs: called when the VFS needs to get filesystem statistics. This |
294 | is called with the kernel lock held | 294 | is called with the kernel lock held |
295 | 295 | ||
296 | remount_fs: called when the filesystem is remounted. This is called | 296 | remount_fs: called when the filesystem is remounted. This is called |
297 | with the kernel lock held | 297 | with the kernel lock held |
298 | 298 | ||
299 | clear_inode: called then the VFS clears the inode. Optional | 299 | clear_inode: called then the VFS clears the inode. Optional |
300 | 300 | ||
301 | umount_begin: called when the VFS is unmounting a filesystem. | 301 | umount_begin: called when the VFS is unmounting a filesystem. |
302 | 302 | ||
303 | sync_inodes: called when the VFS is writing out dirty data associated with | 303 | sync_inodes: called when the VFS is writing out dirty data associated with |
304 | a superblock. | 304 | a superblock. |
305 | 305 | ||
306 | show_options: called by the VFS to show mount options for /proc/<pid>/mounts. | 306 | show_options: called by the VFS to show mount options for /proc/<pid>/mounts. |
307 | 307 | ||
308 | quota_read: called by the VFS to read from filesystem quota file. | 308 | quota_read: called by the VFS to read from filesystem quota file. |
309 | 309 | ||
310 | quota_write: called by the VFS to write to filesystem quota file. | 310 | quota_write: called by the VFS to write to filesystem quota file. |
311 | 311 | ||
312 | The read_inode() method is responsible for filling in the "i_op" | 312 | The read_inode() method is responsible for filling in the "i_op" |
313 | field. This is a pointer to a "struct inode_operations" which | 313 | field. This is a pointer to a "struct inode_operations" which |
314 | describes the methods that can be performed on individual inodes. | 314 | describes the methods that can be performed on individual inodes. |
315 | 315 | ||
316 | 316 | ||
317 | The Inode Object | 317 | The Inode Object |
318 | ================ | 318 | ================ |
319 | 319 | ||
320 | An inode object represents an object within the filesystem. | 320 | An inode object represents an object within the filesystem. |
321 | 321 | ||
322 | 322 | ||
323 | struct inode_operations | 323 | struct inode_operations |
324 | ----------------------- | 324 | ----------------------- |
325 | 325 | ||
326 | This describes how the VFS can manipulate an inode in your | 326 | This describes how the VFS can manipulate an inode in your |
327 | filesystem. As of kernel 2.6.13, the following members are defined: | 327 | filesystem. As of kernel 2.6.13, the following members are defined: |
328 | 328 | ||
329 | struct inode_operations { | 329 | struct inode_operations { |
330 | int (*create) (struct inode *,struct dentry *,int, struct nameidata *); | 330 | int (*create) (struct inode *,struct dentry *,int, struct nameidata *); |
331 | struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); | 331 | struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); |
332 | int (*link) (struct dentry *,struct inode *,struct dentry *); | 332 | int (*link) (struct dentry *,struct inode *,struct dentry *); |
333 | int (*unlink) (struct inode *,struct dentry *); | 333 | int (*unlink) (struct inode *,struct dentry *); |
334 | int (*symlink) (struct inode *,struct dentry *,const char *); | 334 | int (*symlink) (struct inode *,struct dentry *,const char *); |
335 | int (*mkdir) (struct inode *,struct dentry *,int); | 335 | int (*mkdir) (struct inode *,struct dentry *,int); |
336 | int (*rmdir) (struct inode *,struct dentry *); | 336 | int (*rmdir) (struct inode *,struct dentry *); |
337 | int (*mknod) (struct inode *,struct dentry *,int,dev_t); | 337 | int (*mknod) (struct inode *,struct dentry *,int,dev_t); |
338 | int (*rename) (struct inode *, struct dentry *, | 338 | int (*rename) (struct inode *, struct dentry *, |
339 | struct inode *, struct dentry *); | 339 | struct inode *, struct dentry *); |
340 | int (*readlink) (struct dentry *, char __user *,int); | 340 | int (*readlink) (struct dentry *, char __user *,int); |
341 | void * (*follow_link) (struct dentry *, struct nameidata *); | 341 | void * (*follow_link) (struct dentry *, struct nameidata *); |
342 | void (*put_link) (struct dentry *, struct nameidata *, void *); | 342 | void (*put_link) (struct dentry *, struct nameidata *, void *); |
343 | void (*truncate) (struct inode *); | 343 | void (*truncate) (struct inode *); |
344 | int (*permission) (struct inode *, int, struct nameidata *); | 344 | int (*permission) (struct inode *, int, struct nameidata *); |
345 | int (*setattr) (struct dentry *, struct iattr *); | 345 | int (*setattr) (struct dentry *, struct iattr *); |
346 | int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); | 346 | int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); |
347 | int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); | 347 | int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); |
348 | ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); | 348 | ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); |
349 | ssize_t (*listxattr) (struct dentry *, char *, size_t); | 349 | ssize_t (*listxattr) (struct dentry *, char *, size_t); |
350 | int (*removexattr) (struct dentry *, const char *); | 350 | int (*removexattr) (struct dentry *, const char *); |
351 | }; | 351 | }; |
352 | 352 | ||
353 | Again, all methods are called without any locks being held, unless | 353 | Again, all methods are called without any locks being held, unless |
354 | otherwise noted. | 354 | otherwise noted. |
355 | 355 | ||
356 | create: called by the open(2) and creat(2) system calls. Only | 356 | create: called by the open(2) and creat(2) system calls. Only |
357 | required if you want to support regular files. The dentry you | 357 | required if you want to support regular files. The dentry you |
358 | get should not have an inode (i.e. it should be a negative | 358 | get should not have an inode (i.e. it should be a negative |
359 | dentry). Here you will probably call d_instantiate() with the | 359 | dentry). Here you will probably call d_instantiate() with the |
360 | dentry and the newly created inode | 360 | dentry and the newly created inode |
361 | 361 | ||
362 | lookup: called when the VFS needs to look up an inode in a parent | 362 | lookup: called when the VFS needs to look up an inode in a parent |
363 | directory. The name to look for is found in the dentry. This | 363 | directory. The name to look for is found in the dentry. This |
364 | method must call d_add() to insert the found inode into the | 364 | method must call d_add() to insert the found inode into the |
365 | dentry. The "i_count" field in the inode structure should be | 365 | dentry. The "i_count" field in the inode structure should be |
366 | incremented. If the named inode does not exist a NULL inode | 366 | incremented. If the named inode does not exist a NULL inode |
367 | should be inserted into the dentry (this is called a negative | 367 | should be inserted into the dentry (this is called a negative |
368 | dentry). Returning an error code from this routine must only | 368 | dentry). Returning an error code from this routine must only |
369 | be done on a real error, otherwise creating inodes with system | 369 | be done on a real error, otherwise creating inodes with system |
370 | calls like create(2), mknod(2), mkdir(2) and so on will fail. | 370 | calls like create(2), mknod(2), mkdir(2) and so on will fail. |
371 | If you wish to overload the dentry methods then you should | 371 | If you wish to overload the dentry methods then you should |
372 | initialise the "d_dop" field in the dentry; this is a pointer | 372 | initialise the "d_dop" field in the dentry; this is a pointer |
373 | to a struct "dentry_operations". | 373 | to a struct "dentry_operations". |
374 | This method is called with the directory inode semaphore held | 374 | This method is called with the directory inode semaphore held |
375 | 375 | ||
376 | link: called by the link(2) system call. Only required if you want | 376 | link: called by the link(2) system call. Only required if you want |
377 | to support hard links. You will probably need to call | 377 | to support hard links. You will probably need to call |
378 | d_instantiate() just as you would in the create() method | 378 | d_instantiate() just as you would in the create() method |
379 | 379 | ||
380 | unlink: called by the unlink(2) system call. Only required if you | 380 | unlink: called by the unlink(2) system call. Only required if you |
381 | want to support deleting inodes | 381 | want to support deleting inodes |
382 | 382 | ||
383 | symlink: called by the symlink(2) system call. Only required if you | 383 | symlink: called by the symlink(2) system call. Only required if you |
384 | want to support symlinks. You will probably need to call | 384 | want to support symlinks. You will probably need to call |
385 | d_instantiate() just as you would in the create() method | 385 | d_instantiate() just as you would in the create() method |
386 | 386 | ||
387 | mkdir: called by the mkdir(2) system call. Only required if you want | 387 | mkdir: called by the mkdir(2) system call. Only required if you want |
388 | to support creating subdirectories. You will probably need to | 388 | to support creating subdirectories. You will probably need to |
389 | call d_instantiate() just as you would in the create() method | 389 | call d_instantiate() just as you would in the create() method |
390 | 390 | ||
391 | rmdir: called by the rmdir(2) system call. Only required if you want | 391 | rmdir: called by the rmdir(2) system call. Only required if you want |
392 | to support deleting subdirectories | 392 | to support deleting subdirectories |
393 | 393 | ||
394 | mknod: called by the mknod(2) system call to create a device (char, | 394 | mknod: called by the mknod(2) system call to create a device (char, |
395 | block) inode or a named pipe (FIFO) or socket. Only required | 395 | block) inode or a named pipe (FIFO) or socket. Only required |
396 | if you want to support creating these types of inodes. You | 396 | if you want to support creating these types of inodes. You |
397 | will probably need to call d_instantiate() just as you would | 397 | will probably need to call d_instantiate() just as you would |
398 | in the create() method | 398 | in the create() method |
399 | 399 | ||
400 | rename: called by the rename(2) system call to rename the object to | 400 | rename: called by the rename(2) system call to rename the object to |
401 | have the parent and name given by the second inode and dentry. | 401 | have the parent and name given by the second inode and dentry. |
402 | 402 | ||
403 | readlink: called by the readlink(2) system call. Only required if | 403 | readlink: called by the readlink(2) system call. Only required if |
404 | you want to support reading symbolic links | 404 | you want to support reading symbolic links |
405 | 405 | ||
406 | follow_link: called by the VFS to follow a symbolic link to the | 406 | follow_link: called by the VFS to follow a symbolic link to the |
407 | inode it points to. Only required if you want to support | 407 | inode it points to. Only required if you want to support |
408 | symbolic links. This method returns a void pointer cookie | 408 | symbolic links. This method returns a void pointer cookie |
409 | that is passed to put_link(). | 409 | that is passed to put_link(). |
410 | 410 | ||
411 | put_link: called by the VFS to release resources allocated by | 411 | put_link: called by the VFS to release resources allocated by |
412 | follow_link(). The cookie returned by follow_link() is passed | 412 | follow_link(). The cookie returned by follow_link() is passed |
413 | to to this method as the last parameter. It is used by | 413 | to this method as the last parameter. It is used by |
414 | filesystems such as NFS where page cache is not stable | 414 | filesystems such as NFS where page cache is not stable |
415 | (i.e. page that was installed when the symbolic link walk | 415 | (i.e. page that was installed when the symbolic link walk |
416 | started might not be in the page cache at the end of the | 416 | started might not be in the page cache at the end of the |
417 | walk). | 417 | walk). |
418 | 418 | ||
419 | truncate: called by the VFS to change the size of a file. The | 419 | truncate: called by the VFS to change the size of a file. The |
420 | i_size field of the inode is set to the desired size by the | 420 | i_size field of the inode is set to the desired size by the |
421 | VFS before this method is called. This method is called by | 421 | VFS before this method is called. This method is called by |
422 | the truncate(2) system call and related functionality. | 422 | the truncate(2) system call and related functionality. |
423 | 423 | ||
424 | permission: called by the VFS to check for access rights on a POSIX-like | 424 | permission: called by the VFS to check for access rights on a POSIX-like |
425 | filesystem. | 425 | filesystem. |
426 | 426 | ||
427 | setattr: called by the VFS to set attributes for a file. This method | 427 | setattr: called by the VFS to set attributes for a file. This method |
428 | is called by chmod(2) and related system calls. | 428 | is called by chmod(2) and related system calls. |
429 | 429 | ||
430 | getattr: called by the VFS to get attributes of a file. This method | 430 | getattr: called by the VFS to get attributes of a file. This method |
431 | is called by stat(2) and related system calls. | 431 | is called by stat(2) and related system calls. |
432 | 432 | ||
433 | setxattr: called by the VFS to set an extended attribute for a file. | 433 | setxattr: called by the VFS to set an extended attribute for a file. |
434 | Extended attribute is a name:value pair associated with an | 434 | Extended attribute is a name:value pair associated with an |
435 | inode. This method is called by setxattr(2) system call. | 435 | inode. This method is called by setxattr(2) system call. |
436 | 436 | ||
437 | getxattr: called by the VFS to retrieve the value of an extended | 437 | getxattr: called by the VFS to retrieve the value of an extended |
438 | attribute name. This method is called by getxattr(2) function | 438 | attribute name. This method is called by getxattr(2) function |
439 | call. | 439 | call. |
440 | 440 | ||
441 | listxattr: called by the VFS to list all extended attributes for a | 441 | listxattr: called by the VFS to list all extended attributes for a |
442 | given file. This method is called by listxattr(2) system call. | 442 | given file. This method is called by listxattr(2) system call. |
443 | 443 | ||
444 | removexattr: called by the VFS to remove an extended attribute from | 444 | removexattr: called by the VFS to remove an extended attribute from |
445 | a file. This method is called by removexattr(2) system call. | 445 | a file. This method is called by removexattr(2) system call. |
446 | 446 | ||
447 | 447 | ||
448 | The Address Space Object | 448 | The Address Space Object |
449 | ======================== | 449 | ======================== |
450 | 450 | ||
451 | The address space object is used to group and manage pages in the page | 451 | The address space object is used to group and manage pages in the page |
452 | cache. It can be used to keep track of the pages in a file (or | 452 | cache. It can be used to keep track of the pages in a file (or |
453 | anything else) and also track the mapping of sections of the file into | 453 | anything else) and also track the mapping of sections of the file into |
454 | process address spaces. | 454 | process address spaces. |
455 | 455 | ||
456 | There are a number of distinct yet related services that an | 456 | There are a number of distinct yet related services that an |
457 | address-space can provide. These include communicating memory | 457 | address-space can provide. These include communicating memory |
458 | pressure, page lookup by address, and keeping track of pages tagged as | 458 | pressure, page lookup by address, and keeping track of pages tagged as |
459 | Dirty or Writeback. | 459 | Dirty or Writeback. |
460 | 460 | ||
461 | The first can be used independently to the others. The VM can try to | 461 | The first can be used independently to the others. The VM can try to |
462 | either write dirty pages in order to clean them, or release clean | 462 | either write dirty pages in order to clean them, or release clean |
463 | pages in order to reuse them. To do this it can call the ->writepage | 463 | pages in order to reuse them. To do this it can call the ->writepage |
464 | method on dirty pages, and ->releasepage on clean pages with | 464 | method on dirty pages, and ->releasepage on clean pages with |
465 | PagePrivate set. Clean pages without PagePrivate and with no external | 465 | PagePrivate set. Clean pages without PagePrivate and with no external |
466 | references will be released without notice being given to the | 466 | references will be released without notice being given to the |
467 | address_space. | 467 | address_space. |
468 | 468 | ||
469 | To achieve this functionality, pages need to be placed on an LRU with | 469 | To achieve this functionality, pages need to be placed on an LRU with |
470 | lru_cache_add and mark_page_active needs to be called whenever the | 470 | lru_cache_add and mark_page_active needs to be called whenever the |
471 | page is used. | 471 | page is used. |
472 | 472 | ||
473 | Pages are normally kept in a radix tree index by ->index. This tree | 473 | Pages are normally kept in a radix tree index by ->index. This tree |
474 | maintains information about the PG_Dirty and PG_Writeback status of | 474 | maintains information about the PG_Dirty and PG_Writeback status of |
475 | each page, so that pages with either of these flags can be found | 475 | each page, so that pages with either of these flags can be found |
476 | quickly. | 476 | quickly. |
477 | 477 | ||
478 | The Dirty tag is primarily used by mpage_writepages - the default | 478 | The Dirty tag is primarily used by mpage_writepages - the default |
479 | ->writepages method. It uses the tag to find dirty pages to call | 479 | ->writepages method. It uses the tag to find dirty pages to call |
480 | ->writepage on. If mpage_writepages is not used (i.e. the address | 480 | ->writepage on. If mpage_writepages is not used (i.e. the address |
481 | provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is | 481 | provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is |
482 | almost unused. write_inode_now and sync_inode do use it (through | 482 | almost unused. write_inode_now and sync_inode do use it (through |
483 | __sync_single_inode) to check if ->writepages has been successful in | 483 | __sync_single_inode) to check if ->writepages has been successful in |
484 | writing out the whole address_space. | 484 | writing out the whole address_space. |
485 | 485 | ||
486 | The Writeback tag is used by filemap*wait* and sync_page* functions, | 486 | The Writeback tag is used by filemap*wait* and sync_page* functions, |
487 | via wait_on_page_writeback_range, to wait for all writeback to | 487 | via wait_on_page_writeback_range, to wait for all writeback to |
488 | complete. While waiting ->sync_page (if defined) will be called on | 488 | complete. While waiting ->sync_page (if defined) will be called on |
489 | each page that is found to require writeback. | 489 | each page that is found to require writeback. |
490 | 490 | ||
491 | An address_space handler may attach extra information to a page, | 491 | An address_space handler may attach extra information to a page, |
492 | typically using the 'private' field in the 'struct page'. If such | 492 | typically using the 'private' field in the 'struct page'. If such |
493 | information is attached, the PG_Private flag should be set. This will | 493 | information is attached, the PG_Private flag should be set. This will |
494 | cause various VM routines to make extra calls into the address_space | 494 | cause various VM routines to make extra calls into the address_space |
495 | handler to deal with that data. | 495 | handler to deal with that data. |
496 | 496 | ||
497 | An address space acts as an intermediate between storage and | 497 | An address space acts as an intermediate between storage and |
498 | application. Data is read into the address space a whole page at a | 498 | application. Data is read into the address space a whole page at a |
499 | time, and provided to the application either by copying of the page, | 499 | time, and provided to the application either by copying of the page, |
500 | or by memory-mapping the page. | 500 | or by memory-mapping the page. |
501 | Data is written into the address space by the application, and then | 501 | Data is written into the address space by the application, and then |
502 | written-back to storage typically in whole pages, however the | 502 | written-back to storage typically in whole pages, however the |
503 | address_space has finer control of write sizes. | 503 | address_space has finer control of write sizes. |
504 | 504 | ||
505 | The read process essentially only requires 'readpage'. The write | 505 | The read process essentially only requires 'readpage'. The write |
506 | process is more complicated and uses prepare_write/commit_write or | 506 | process is more complicated and uses prepare_write/commit_write or |
507 | set_page_dirty to write data into the address_space, and writepage, | 507 | set_page_dirty to write data into the address_space, and writepage, |
508 | sync_page, and writepages to writeback data to storage. | 508 | sync_page, and writepages to writeback data to storage. |
509 | 509 | ||
510 | Adding and removing pages to/from an address_space is protected by the | 510 | Adding and removing pages to/from an address_space is protected by the |
511 | inode's i_mutex. | 511 | inode's i_mutex. |
512 | 512 | ||
513 | When data is written to a page, the PG_Dirty flag should be set. It | 513 | When data is written to a page, the PG_Dirty flag should be set. It |
514 | typically remains set until writepage asks for it to be written. This | 514 | typically remains set until writepage asks for it to be written. This |
515 | should clear PG_Dirty and set PG_Writeback. It can be actually | 515 | should clear PG_Dirty and set PG_Writeback. It can be actually |
516 | written at any point after PG_Dirty is clear. Once it is known to be | 516 | written at any point after PG_Dirty is clear. Once it is known to be |
517 | safe, PG_Writeback is cleared. | 517 | safe, PG_Writeback is cleared. |
518 | 518 | ||
519 | Writeback makes use of a writeback_control structure... | 519 | Writeback makes use of a writeback_control structure... |
520 | 520 | ||
521 | struct address_space_operations | 521 | struct address_space_operations |
522 | ------------------------------- | 522 | ------------------------------- |
523 | 523 | ||
524 | This describes how the VFS can manipulate mapping of a file to page cache in | 524 | This describes how the VFS can manipulate mapping of a file to page cache in |
525 | your filesystem. As of kernel 2.6.16, the following members are defined: | 525 | your filesystem. As of kernel 2.6.16, the following members are defined: |
526 | 526 | ||
527 | struct address_space_operations { | 527 | struct address_space_operations { |
528 | int (*writepage)(struct page *page, struct writeback_control *wbc); | 528 | int (*writepage)(struct page *page, struct writeback_control *wbc); |
529 | int (*readpage)(struct file *, struct page *); | 529 | int (*readpage)(struct file *, struct page *); |
530 | int (*sync_page)(struct page *); | 530 | int (*sync_page)(struct page *); |
531 | int (*writepages)(struct address_space *, struct writeback_control *); | 531 | int (*writepages)(struct address_space *, struct writeback_control *); |
532 | int (*set_page_dirty)(struct page *page); | 532 | int (*set_page_dirty)(struct page *page); |
533 | int (*readpages)(struct file *filp, struct address_space *mapping, | 533 | int (*readpages)(struct file *filp, struct address_space *mapping, |
534 | struct list_head *pages, unsigned nr_pages); | 534 | struct list_head *pages, unsigned nr_pages); |
535 | int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); | 535 | int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); |
536 | int (*commit_write)(struct file *, struct page *, unsigned, unsigned); | 536 | int (*commit_write)(struct file *, struct page *, unsigned, unsigned); |
537 | sector_t (*bmap)(struct address_space *, sector_t); | 537 | sector_t (*bmap)(struct address_space *, sector_t); |
538 | int (*invalidatepage) (struct page *, unsigned long); | 538 | int (*invalidatepage) (struct page *, unsigned long); |
539 | int (*releasepage) (struct page *, int); | 539 | int (*releasepage) (struct page *, int); |
540 | ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, | 540 | ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, |
541 | loff_t offset, unsigned long nr_segs); | 541 | loff_t offset, unsigned long nr_segs); |
542 | struct page* (*get_xip_page)(struct address_space *, sector_t, | 542 | struct page* (*get_xip_page)(struct address_space *, sector_t, |
543 | int); | 543 | int); |
544 | /* migrate the contents of a page to the specified target */ | 544 | /* migrate the contents of a page to the specified target */ |
545 | int (*migratepage) (struct page *, struct page *); | 545 | int (*migratepage) (struct page *, struct page *); |
546 | }; | 546 | }; |
547 | 547 | ||
548 | writepage: called by the VM to write a dirty page to backing store. | 548 | writepage: called by the VM to write a dirty page to backing store. |
549 | This may happen for data integrity reasons (i.e. 'sync'), or | 549 | This may happen for data integrity reasons (i.e. 'sync'), or |
550 | to free up memory (flush). The difference can be seen in | 550 | to free up memory (flush). The difference can be seen in |
551 | wbc->sync_mode. | 551 | wbc->sync_mode. |
552 | The PG_Dirty flag has been cleared and PageLocked is true. | 552 | The PG_Dirty flag has been cleared and PageLocked is true. |
553 | writepage should start writeout, should set PG_Writeback, | 553 | writepage should start writeout, should set PG_Writeback, |
554 | and should make sure the page is unlocked, either synchronously | 554 | and should make sure the page is unlocked, either synchronously |
555 | or asynchronously when the write operation completes. | 555 | or asynchronously when the write operation completes. |
556 | 556 | ||
557 | If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to | 557 | If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to |
558 | try too hard if there are problems, and may choose to write out | 558 | try too hard if there are problems, and may choose to write out |
559 | other pages from the mapping if that is easier (e.g. due to | 559 | other pages from the mapping if that is easier (e.g. due to |
560 | internal dependencies). If it chooses not to start writeout, it | 560 | internal dependencies). If it chooses not to start writeout, it |
561 | should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep | 561 | should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep |
562 | calling ->writepage on that page. | 562 | calling ->writepage on that page. |
563 | 563 | ||
564 | See the file "Locking" for more details. | 564 | See the file "Locking" for more details. |
565 | 565 | ||
566 | readpage: called by the VM to read a page from backing store. | 566 | readpage: called by the VM to read a page from backing store. |
567 | The page will be Locked when readpage is called, and should be | 567 | The page will be Locked when readpage is called, and should be |
568 | unlocked and marked uptodate once the read completes. | 568 | unlocked and marked uptodate once the read completes. |
569 | If ->readpage discovers that it needs to unlock the page for | 569 | If ->readpage discovers that it needs to unlock the page for |
570 | some reason, it can do so, and then return AOP_TRUNCATED_PAGE. | 570 | some reason, it can do so, and then return AOP_TRUNCATED_PAGE. |
571 | In this case, the page will be relocated, relocked and if | 571 | In this case, the page will be relocated, relocked and if |
572 | that all succeeds, ->readpage will be called again. | 572 | that all succeeds, ->readpage will be called again. |
573 | 573 | ||
574 | sync_page: called by the VM to notify the backing store to perform all | 574 | sync_page: called by the VM to notify the backing store to perform all |
575 | queued I/O operations for a page. I/O operations for other pages | 575 | queued I/O operations for a page. I/O operations for other pages |
576 | associated with this address_space object may also be performed. | 576 | associated with this address_space object may also be performed. |
577 | 577 | ||
578 | This function is optional and is called only for pages with | 578 | This function is optional and is called only for pages with |
579 | PG_Writeback set while waiting for the writeback to complete. | 579 | PG_Writeback set while waiting for the writeback to complete. |
580 | 580 | ||
581 | writepages: called by the VM to write out pages associated with the | 581 | writepages: called by the VM to write out pages associated with the |
582 | address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then | 582 | address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then |
583 | the writeback_control will specify a range of pages that must be | 583 | the writeback_control will specify a range of pages that must be |
584 | written out. If it is WBC_SYNC_NONE, then a nr_to_write is given | 584 | written out. If it is WBC_SYNC_NONE, then a nr_to_write is given |
585 | and that many pages should be written if possible. | 585 | and that many pages should be written if possible. |
586 | If no ->writepages is given, then mpage_writepages is used | 586 | If no ->writepages is given, then mpage_writepages is used |
587 | instead. This will choose pages from the address space that are | 587 | instead. This will choose pages from the address space that are |
588 | tagged as DIRTY and will pass them to ->writepage. | 588 | tagged as DIRTY and will pass them to ->writepage. |
589 | 589 | ||
590 | set_page_dirty: called by the VM to set a page dirty. | 590 | set_page_dirty: called by the VM to set a page dirty. |
591 | This is particularly needed if an address space attaches | 591 | This is particularly needed if an address space attaches |
592 | private data to a page, and that data needs to be updated when | 592 | private data to a page, and that data needs to be updated when |
593 | a page is dirtied. This is called, for example, when a memory | 593 | a page is dirtied. This is called, for example, when a memory |
594 | mapped page gets modified. | 594 | mapped page gets modified. |
595 | If defined, it should set the PageDirty flag, and the | 595 | If defined, it should set the PageDirty flag, and the |
596 | PAGECACHE_TAG_DIRTY tag in the radix tree. | 596 | PAGECACHE_TAG_DIRTY tag in the radix tree. |
597 | 597 | ||
598 | readpages: called by the VM to read pages associated with the address_space | 598 | readpages: called by the VM to read pages associated with the address_space |
599 | object. This is essentially just a vector version of | 599 | object. This is essentially just a vector version of |
600 | readpage. Instead of just one page, several pages are | 600 | readpage. Instead of just one page, several pages are |
601 | requested. | 601 | requested. |
602 | readpages is only used for read-ahead, so read errors are | 602 | readpages is only used for read-ahead, so read errors are |
603 | ignored. If anything goes wrong, feel free to give up. | 603 | ignored. If anything goes wrong, feel free to give up. |
604 | 604 | ||
605 | prepare_write: called by the generic write path in VM to set up a write | 605 | prepare_write: called by the generic write path in VM to set up a write |
606 | request for a page. This indicates to the address space that | 606 | request for a page. This indicates to the address space that |
607 | the given range of bytes is about to be written. The | 607 | the given range of bytes is about to be written. The |
608 | address_space should check that the write will be able to | 608 | address_space should check that the write will be able to |
609 | complete, by allocating space if necessary and doing any other | 609 | complete, by allocating space if necessary and doing any other |
610 | internal housekeeping. If the write will update parts of | 610 | internal housekeeping. If the write will update parts of |
611 | any basic-blocks on storage, then those blocks should be | 611 | any basic-blocks on storage, then those blocks should be |
612 | pre-read (if they haven't been read already) so that the | 612 | pre-read (if they haven't been read already) so that the |
613 | updated blocks can be written out properly. | 613 | updated blocks can be written out properly. |
614 | The page will be locked. If prepare_write wants to unlock the | 614 | The page will be locked. If prepare_write wants to unlock the |
615 | page it, like readpage, may do so and return | 615 | page it, like readpage, may do so and return |
616 | AOP_TRUNCATED_PAGE. | 616 | AOP_TRUNCATED_PAGE. |
617 | In this case the prepare_write will be retried one the lock is | 617 | In this case the prepare_write will be retried one the lock is |
618 | regained. | 618 | regained. |
619 | 619 | ||
620 | commit_write: If prepare_write succeeds, new data will be copied | 620 | commit_write: If prepare_write succeeds, new data will be copied |
621 | into the page and then commit_write will be called. It will | 621 | into the page and then commit_write will be called. It will |
622 | typically update the size of the file (if appropriate) and | 622 | typically update the size of the file (if appropriate) and |
623 | mark the inode as dirty, and do any other related housekeeping | 623 | mark the inode as dirty, and do any other related housekeeping |
624 | operations. It should avoid returning an error if possible - | 624 | operations. It should avoid returning an error if possible - |
625 | errors should have been handled by prepare_write. | 625 | errors should have been handled by prepare_write. |
626 | 626 | ||
627 | bmap: called by the VFS to map a logical block offset within object to | 627 | bmap: called by the VFS to map a logical block offset within object to |
628 | physical block number. This method is used by the FIBMAP | 628 | physical block number. This method is used by the FIBMAP |
629 | ioctl and for working with swap-files. To be able to swap to | 629 | ioctl and for working with swap-files. To be able to swap to |
630 | a file, the file must have a stable mapping to a block | 630 | a file, the file must have a stable mapping to a block |
631 | device. The swap system does not go through the filesystem | 631 | device. The swap system does not go through the filesystem |
632 | but instead uses bmap to find out where the blocks in the file | 632 | but instead uses bmap to find out where the blocks in the file |
633 | are and uses those addresses directly. | 633 | are and uses those addresses directly. |
634 | 634 | ||
635 | 635 | ||
636 | invalidatepage: If a page has PagePrivate set, then invalidatepage | 636 | invalidatepage: If a page has PagePrivate set, then invalidatepage |
637 | will be called when part or all of the page is to be removed | 637 | will be called when part or all of the page is to be removed |
638 | from the address space. This generally corresponds to either a | 638 | from the address space. This generally corresponds to either a |
639 | truncation or a complete invalidation of the address space | 639 | truncation or a complete invalidation of the address space |
640 | (in the latter case 'offset' will always be 0). | 640 | (in the latter case 'offset' will always be 0). |
641 | Any private data associated with the page should be updated | 641 | Any private data associated with the page should be updated |
642 | to reflect this truncation. If offset is 0, then | 642 | to reflect this truncation. If offset is 0, then |
643 | the private data should be released, because the page | 643 | the private data should be released, because the page |
644 | must be able to be completely discarded. This may be done by | 644 | must be able to be completely discarded. This may be done by |
645 | calling the ->releasepage function, but in this case the | 645 | calling the ->releasepage function, but in this case the |
646 | release MUST succeed. | 646 | release MUST succeed. |
647 | 647 | ||
648 | releasepage: releasepage is called on PagePrivate pages to indicate | 648 | releasepage: releasepage is called on PagePrivate pages to indicate |
649 | that the page should be freed if possible. ->releasepage | 649 | that the page should be freed if possible. ->releasepage |
650 | should remove any private data from the page and clear the | 650 | should remove any private data from the page and clear the |
651 | PagePrivate flag. It may also remove the page from the | 651 | PagePrivate flag. It may also remove the page from the |
652 | address_space. If this fails for some reason, it may indicate | 652 | address_space. If this fails for some reason, it may indicate |
653 | failure with a 0 return value. | 653 | failure with a 0 return value. |
654 | This is used in two distinct though related cases. The first | 654 | This is used in two distinct though related cases. The first |
655 | is when the VM finds a clean page with no active users and | 655 | is when the VM finds a clean page with no active users and |
656 | wants to make it a free page. If ->releasepage succeeds, the | 656 | wants to make it a free page. If ->releasepage succeeds, the |
657 | page will be removed from the address_space and become free. | 657 | page will be removed from the address_space and become free. |
658 | 658 | ||
659 | The second case if when a request has been made to invalidate | 659 | The second case if when a request has been made to invalidate |
660 | some or all pages in an address_space. This can happen | 660 | some or all pages in an address_space. This can happen |
661 | through the fadvice(POSIX_FADV_DONTNEED) system call or by the | 661 | through the fadvice(POSIX_FADV_DONTNEED) system call or by the |
662 | filesystem explicitly requesting it as nfs and 9fs do (when | 662 | filesystem explicitly requesting it as nfs and 9fs do (when |
663 | they believe the cache may be out of date with storage) by | 663 | they believe the cache may be out of date with storage) by |
664 | calling invalidate_inode_pages2(). | 664 | calling invalidate_inode_pages2(). |
665 | If the filesystem makes such a call, and needs to be certain | 665 | If the filesystem makes such a call, and needs to be certain |
666 | that all pages are invalidated, then its releasepage will | 666 | that all pages are invalidated, then its releasepage will |
667 | need to ensure this. Possibly it can clear the PageUptodate | 667 | need to ensure this. Possibly it can clear the PageUptodate |
668 | bit if it cannot free private data yet. | 668 | bit if it cannot free private data yet. |
669 | 669 | ||
670 | direct_IO: called by the generic read/write routines to perform | 670 | direct_IO: called by the generic read/write routines to perform |
671 | direct_IO - that is IO requests which bypass the page cache | 671 | direct_IO - that is IO requests which bypass the page cache |
672 | and transfer data directly between the storage and the | 672 | and transfer data directly between the storage and the |
673 | application's address space. | 673 | application's address space. |
674 | 674 | ||
675 | get_xip_page: called by the VM to translate a block number to a page. | 675 | get_xip_page: called by the VM to translate a block number to a page. |
676 | The page is valid until the corresponding filesystem is unmounted. | 676 | The page is valid until the corresponding filesystem is unmounted. |
677 | Filesystems that want to use execute-in-place (XIP) need to implement | 677 | Filesystems that want to use execute-in-place (XIP) need to implement |
678 | it. An example implementation can be found in fs/ext2/xip.c. | 678 | it. An example implementation can be found in fs/ext2/xip.c. |
679 | 679 | ||
680 | migrate_page: This is used to compact the physical memory usage. | 680 | migrate_page: This is used to compact the physical memory usage. |
681 | If the VM wants to relocate a page (maybe off a memory card | 681 | If the VM wants to relocate a page (maybe off a memory card |
682 | that is signalling imminent failure) it will pass a new page | 682 | that is signalling imminent failure) it will pass a new page |
683 | and an old page to this function. migrate_page should | 683 | and an old page to this function. migrate_page should |
684 | transfer any private data across and update any references | 684 | transfer any private data across and update any references |
685 | that it has to the page. | 685 | that it has to the page. |
686 | 686 | ||
687 | The File Object | 687 | The File Object |
688 | =============== | 688 | =============== |
689 | 689 | ||
690 | A file object represents a file opened by a process. | 690 | A file object represents a file opened by a process. |
691 | 691 | ||
692 | 692 | ||
693 | struct file_operations | 693 | struct file_operations |
694 | ---------------------- | 694 | ---------------------- |
695 | 695 | ||
696 | This describes how the VFS can manipulate an open file. As of kernel | 696 | This describes how the VFS can manipulate an open file. As of kernel |
697 | 2.6.17, the following members are defined: | 697 | 2.6.17, the following members are defined: |
698 | 698 | ||
699 | struct file_operations { | 699 | struct file_operations { |
700 | loff_t (*llseek) (struct file *, loff_t, int); | 700 | loff_t (*llseek) (struct file *, loff_t, int); |
701 | ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); | 701 | ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); |
702 | ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); | 702 | ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); |
703 | ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); | 703 | ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); |
704 | ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); | 704 | ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); |
705 | int (*readdir) (struct file *, void *, filldir_t); | 705 | int (*readdir) (struct file *, void *, filldir_t); |
706 | unsigned int (*poll) (struct file *, struct poll_table_struct *); | 706 | unsigned int (*poll) (struct file *, struct poll_table_struct *); |
707 | int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); | 707 | int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); |
708 | long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); | 708 | long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); |
709 | long (*compat_ioctl) (struct file *, unsigned int, unsigned long); | 709 | long (*compat_ioctl) (struct file *, unsigned int, unsigned long); |
710 | int (*mmap) (struct file *, struct vm_area_struct *); | 710 | int (*mmap) (struct file *, struct vm_area_struct *); |
711 | int (*open) (struct inode *, struct file *); | 711 | int (*open) (struct inode *, struct file *); |
712 | int (*flush) (struct file *); | 712 | int (*flush) (struct file *); |
713 | int (*release) (struct inode *, struct file *); | 713 | int (*release) (struct inode *, struct file *); |
714 | int (*fsync) (struct file *, struct dentry *, int datasync); | 714 | int (*fsync) (struct file *, struct dentry *, int datasync); |
715 | int (*aio_fsync) (struct kiocb *, int datasync); | 715 | int (*aio_fsync) (struct kiocb *, int datasync); |
716 | int (*fasync) (int, struct file *, int); | 716 | int (*fasync) (int, struct file *, int); |
717 | int (*lock) (struct file *, int, struct file_lock *); | 717 | int (*lock) (struct file *, int, struct file_lock *); |
718 | ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); | 718 | ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); |
719 | ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); | 719 | ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); |
720 | ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *); | 720 | ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *); |
721 | ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); | 721 | ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); |
722 | unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); | 722 | unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); |
723 | int (*check_flags)(int); | 723 | int (*check_flags)(int); |
724 | int (*dir_notify)(struct file *filp, unsigned long arg); | 724 | int (*dir_notify)(struct file *filp, unsigned long arg); |
725 | int (*flock) (struct file *, int, struct file_lock *); | 725 | int (*flock) (struct file *, int, struct file_lock *); |
726 | ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned | 726 | ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned |
727 | int); | 727 | int); |
728 | ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned | 728 | ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned |
729 | int); | 729 | int); |
730 | }; | 730 | }; |
731 | 731 | ||
732 | Again, all methods are called without any locks being held, unless | 732 | Again, all methods are called without any locks being held, unless |
733 | otherwise noted. | 733 | otherwise noted. |
734 | 734 | ||
735 | llseek: called when the VFS needs to move the file position index | 735 | llseek: called when the VFS needs to move the file position index |
736 | 736 | ||
737 | read: called by read(2) and related system calls | 737 | read: called by read(2) and related system calls |
738 | 738 | ||
739 | aio_read: called by io_submit(2) and other asynchronous I/O operations | 739 | aio_read: called by io_submit(2) and other asynchronous I/O operations |
740 | 740 | ||
741 | write: called by write(2) and related system calls | 741 | write: called by write(2) and related system calls |
742 | 742 | ||
743 | aio_write: called by io_submit(2) and other asynchronous I/O operations | 743 | aio_write: called by io_submit(2) and other asynchronous I/O operations |
744 | 744 | ||
745 | readdir: called when the VFS needs to read the directory contents | 745 | readdir: called when the VFS needs to read the directory contents |
746 | 746 | ||
747 | poll: called by the VFS when a process wants to check if there is | 747 | poll: called by the VFS when a process wants to check if there is |
748 | activity on this file and (optionally) go to sleep until there | 748 | activity on this file and (optionally) go to sleep until there |
749 | is activity. Called by the select(2) and poll(2) system calls | 749 | is activity. Called by the select(2) and poll(2) system calls |
750 | 750 | ||
751 | ioctl: called by the ioctl(2) system call | 751 | ioctl: called by the ioctl(2) system call |
752 | 752 | ||
753 | unlocked_ioctl: called by the ioctl(2) system call. Filesystems that do not | 753 | unlocked_ioctl: called by the ioctl(2) system call. Filesystems that do not |
754 | require the BKL should use this method instead of the ioctl() above. | 754 | require the BKL should use this method instead of the ioctl() above. |
755 | 755 | ||
756 | compat_ioctl: called by the ioctl(2) system call when 32 bit system calls | 756 | compat_ioctl: called by the ioctl(2) system call when 32 bit system calls |
757 | are used on 64 bit kernels. | 757 | are used on 64 bit kernels. |
758 | 758 | ||
759 | mmap: called by the mmap(2) system call | 759 | mmap: called by the mmap(2) system call |
760 | 760 | ||
761 | open: called by the VFS when an inode should be opened. When the VFS | 761 | open: called by the VFS when an inode should be opened. When the VFS |
762 | opens a file, it creates a new "struct file". It then calls the | 762 | opens a file, it creates a new "struct file". It then calls the |
763 | open method for the newly allocated file structure. You might | 763 | open method for the newly allocated file structure. You might |
764 | think that the open method really belongs in | 764 | think that the open method really belongs in |
765 | "struct inode_operations", and you may be right. I think it's | 765 | "struct inode_operations", and you may be right. I think it's |
766 | done the way it is because it makes filesystems simpler to | 766 | done the way it is because it makes filesystems simpler to |
767 | implement. The open() method is a good place to initialize the | 767 | implement. The open() method is a good place to initialize the |
768 | "private_data" member in the file structure if you want to point | 768 | "private_data" member in the file structure if you want to point |
769 | to a device structure | 769 | to a device structure |
770 | 770 | ||
771 | flush: called by the close(2) system call to flush a file | 771 | flush: called by the close(2) system call to flush a file |
772 | 772 | ||
773 | release: called when the last reference to an open file is closed | 773 | release: called when the last reference to an open file is closed |
774 | 774 | ||
775 | fsync: called by the fsync(2) system call | 775 | fsync: called by the fsync(2) system call |
776 | 776 | ||
777 | fasync: called by the fcntl(2) system call when asynchronous | 777 | fasync: called by the fcntl(2) system call when asynchronous |
778 | (non-blocking) mode is enabled for a file | 778 | (non-blocking) mode is enabled for a file |
779 | 779 | ||
780 | lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW | 780 | lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW |
781 | commands | 781 | commands |
782 | 782 | ||
783 | readv: called by the readv(2) system call | 783 | readv: called by the readv(2) system call |
784 | 784 | ||
785 | writev: called by the writev(2) system call | 785 | writev: called by the writev(2) system call |
786 | 786 | ||
787 | sendfile: called by the sendfile(2) system call | 787 | sendfile: called by the sendfile(2) system call |
788 | 788 | ||
789 | get_unmapped_area: called by the mmap(2) system call | 789 | get_unmapped_area: called by the mmap(2) system call |
790 | 790 | ||
791 | check_flags: called by the fcntl(2) system call for F_SETFL command | 791 | check_flags: called by the fcntl(2) system call for F_SETFL command |
792 | 792 | ||
793 | dir_notify: called by the fcntl(2) system call for F_NOTIFY command | 793 | dir_notify: called by the fcntl(2) system call for F_NOTIFY command |
794 | 794 | ||
795 | flock: called by the flock(2) system call | 795 | flock: called by the flock(2) system call |
796 | 796 | ||
797 | splice_write: called by the VFS to splice data from a pipe to a file. This | 797 | splice_write: called by the VFS to splice data from a pipe to a file. This |
798 | method is used by the splice(2) system call | 798 | method is used by the splice(2) system call |
799 | 799 | ||
800 | splice_read: called by the VFS to splice data from file to a pipe. This | 800 | splice_read: called by the VFS to splice data from file to a pipe. This |
801 | method is used by the splice(2) system call | 801 | method is used by the splice(2) system call |
802 | 802 | ||
803 | Note that the file operations are implemented by the specific | 803 | Note that the file operations are implemented by the specific |
804 | filesystem in which the inode resides. When opening a device node | 804 | filesystem in which the inode resides. When opening a device node |
805 | (character or block special) most filesystems will call special | 805 | (character or block special) most filesystems will call special |
806 | support routines in the VFS which will locate the required device | 806 | support routines in the VFS which will locate the required device |
807 | driver information. These support routines replace the filesystem file | 807 | driver information. These support routines replace the filesystem file |
808 | operations with those for the device driver, and then proceed to call | 808 | operations with those for the device driver, and then proceed to call |
809 | the new open() method for the file. This is how opening a device file | 809 | the new open() method for the file. This is how opening a device file |
810 | in the filesystem eventually ends up calling the device driver open() | 810 | in the filesystem eventually ends up calling the device driver open() |
811 | method. | 811 | method. |
812 | 812 | ||
813 | 813 | ||
814 | Directory Entry Cache (dcache) | 814 | Directory Entry Cache (dcache) |
815 | ============================== | 815 | ============================== |
816 | 816 | ||
817 | 817 | ||
818 | struct dentry_operations | 818 | struct dentry_operations |
819 | ------------------------ | 819 | ------------------------ |
820 | 820 | ||
821 | This describes how a filesystem can overload the standard dentry | 821 | This describes how a filesystem can overload the standard dentry |
822 | operations. Dentries and the dcache are the domain of the VFS and the | 822 | operations. Dentries and the dcache are the domain of the VFS and the |
823 | individual filesystem implementations. Device drivers have no business | 823 | individual filesystem implementations. Device drivers have no business |
824 | here. These methods may be set to NULL, as they are either optional or | 824 | here. These methods may be set to NULL, as they are either optional or |
825 | the VFS uses a default. As of kernel 2.6.13, the following members are | 825 | the VFS uses a default. As of kernel 2.6.13, the following members are |
826 | defined: | 826 | defined: |
827 | 827 | ||
828 | struct dentry_operations { | 828 | struct dentry_operations { |
829 | int (*d_revalidate)(struct dentry *, struct nameidata *); | 829 | int (*d_revalidate)(struct dentry *, struct nameidata *); |
830 | int (*d_hash) (struct dentry *, struct qstr *); | 830 | int (*d_hash) (struct dentry *, struct qstr *); |
831 | int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); | 831 | int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); |
832 | int (*d_delete)(struct dentry *); | 832 | int (*d_delete)(struct dentry *); |
833 | void (*d_release)(struct dentry *); | 833 | void (*d_release)(struct dentry *); |
834 | void (*d_iput)(struct dentry *, struct inode *); | 834 | void (*d_iput)(struct dentry *, struct inode *); |
835 | }; | 835 | }; |
836 | 836 | ||
837 | d_revalidate: called when the VFS needs to revalidate a dentry. This | 837 | d_revalidate: called when the VFS needs to revalidate a dentry. This |
838 | is called whenever a name look-up finds a dentry in the | 838 | is called whenever a name look-up finds a dentry in the |
839 | dcache. Most filesystems leave this as NULL, because all their | 839 | dcache. Most filesystems leave this as NULL, because all their |
840 | dentries in the dcache are valid | 840 | dentries in the dcache are valid |
841 | 841 | ||
842 | d_hash: called when the VFS adds a dentry to the hash table | 842 | d_hash: called when the VFS adds a dentry to the hash table |
843 | 843 | ||
844 | d_compare: called when a dentry should be compared with another | 844 | d_compare: called when a dentry should be compared with another |
845 | 845 | ||
846 | d_delete: called when the last reference to a dentry is | 846 | d_delete: called when the last reference to a dentry is |
847 | deleted. This means no-one is using the dentry, however it is | 847 | deleted. This means no-one is using the dentry, however it is |
848 | still valid and in the dcache | 848 | still valid and in the dcache |
849 | 849 | ||
850 | d_release: called when a dentry is really deallocated | 850 | d_release: called when a dentry is really deallocated |
851 | 851 | ||
852 | d_iput: called when a dentry loses its inode (just prior to its | 852 | d_iput: called when a dentry loses its inode (just prior to its |
853 | being deallocated). The default when this is NULL is that the | 853 | being deallocated). The default when this is NULL is that the |
854 | VFS calls iput(). If you define this method, you must call | 854 | VFS calls iput(). If you define this method, you must call |
855 | iput() yourself | 855 | iput() yourself |
856 | 856 | ||
857 | Each dentry has a pointer to its parent dentry, as well as a hash list | 857 | Each dentry has a pointer to its parent dentry, as well as a hash list |
858 | of child dentries. Child dentries are basically like files in a | 858 | of child dentries. Child dentries are basically like files in a |
859 | directory. | 859 | directory. |
860 | 860 | ||
861 | 861 | ||
862 | Directory Entry Cache API | 862 | Directory Entry Cache API |
863 | -------------------------- | 863 | -------------------------- |
864 | 864 | ||
865 | There are a number of functions defined which permit a filesystem to | 865 | There are a number of functions defined which permit a filesystem to |
866 | manipulate dentries: | 866 | manipulate dentries: |
867 | 867 | ||
868 | dget: open a new handle for an existing dentry (this just increments | 868 | dget: open a new handle for an existing dentry (this just increments |
869 | the usage count) | 869 | the usage count) |
870 | 870 | ||
871 | dput: close a handle for a dentry (decrements the usage count). If | 871 | dput: close a handle for a dentry (decrements the usage count). If |
872 | the usage count drops to 0, the "d_delete" method is called | 872 | the usage count drops to 0, the "d_delete" method is called |
873 | and the dentry is placed on the unused list if the dentry is | 873 | and the dentry is placed on the unused list if the dentry is |
874 | still in its parents hash list. Putting the dentry on the | 874 | still in its parents hash list. Putting the dentry on the |
875 | unused list just means that if the system needs some RAM, it | 875 | unused list just means that if the system needs some RAM, it |
876 | goes through the unused list of dentries and deallocates them. | 876 | goes through the unused list of dentries and deallocates them. |
877 | If the dentry has already been unhashed and the usage count | 877 | If the dentry has already been unhashed and the usage count |
878 | drops to 0, in this case the dentry is deallocated after the | 878 | drops to 0, in this case the dentry is deallocated after the |
879 | "d_delete" method is called | 879 | "d_delete" method is called |
880 | 880 | ||
881 | d_drop: this unhashes a dentry from its parents hash list. A | 881 | d_drop: this unhashes a dentry from its parents hash list. A |
882 | subsequent call to dput() will deallocate the dentry if its | 882 | subsequent call to dput() will deallocate the dentry if its |
883 | usage count drops to 0 | 883 | usage count drops to 0 |
884 | 884 | ||
885 | d_delete: delete a dentry. If there are no other open references to | 885 | d_delete: delete a dentry. If there are no other open references to |
886 | the dentry then the dentry is turned into a negative dentry | 886 | the dentry then the dentry is turned into a negative dentry |
887 | (the d_iput() method is called). If there are other | 887 | (the d_iput() method is called). If there are other |
888 | references, then d_drop() is called instead | 888 | references, then d_drop() is called instead |
889 | 889 | ||
890 | d_add: add a dentry to its parents hash list and then calls | 890 | d_add: add a dentry to its parents hash list and then calls |
891 | d_instantiate() | 891 | d_instantiate() |
892 | 892 | ||
893 | d_instantiate: add a dentry to the alias hash list for the inode and | 893 | d_instantiate: add a dentry to the alias hash list for the inode and |
894 | updates the "d_inode" member. The "i_count" member in the | 894 | updates the "d_inode" member. The "i_count" member in the |
895 | inode structure should be set/incremented. If the inode | 895 | inode structure should be set/incremented. If the inode |
896 | pointer is NULL, the dentry is called a "negative | 896 | pointer is NULL, the dentry is called a "negative |
897 | dentry". This function is commonly called when an inode is | 897 | dentry". This function is commonly called when an inode is |
898 | created for an existing negative dentry | 898 | created for an existing negative dentry |
899 | 899 | ||
900 | d_lookup: look up a dentry given its parent and path name component | 900 | d_lookup: look up a dentry given its parent and path name component |
901 | It looks up the child of that given name from the dcache | 901 | It looks up the child of that given name from the dcache |
902 | hash table. If it is found, the reference count is incremented | 902 | hash table. If it is found, the reference count is incremented |
903 | and the dentry is returned. The caller must use d_put() | 903 | and the dentry is returned. The caller must use d_put() |
904 | to free the dentry when it finishes using it. | 904 | to free the dentry when it finishes using it. |
905 | 905 | ||
906 | For further information on dentry locking, please refer to the document | 906 | For further information on dentry locking, please refer to the document |
907 | Documentation/filesystems/dentry-locking.txt. | 907 | Documentation/filesystems/dentry-locking.txt. |
908 | 908 | ||
909 | 909 | ||
910 | Resources | 910 | Resources |
911 | ========= | 911 | ========= |
912 | 912 | ||
913 | (Note some of these resources are not up-to-date with the latest kernel | 913 | (Note some of these resources are not up-to-date with the latest kernel |
914 | version.) | 914 | version.) |
915 | 915 | ||
916 | Creating Linux virtual filesystems. 2002 | 916 | Creating Linux virtual filesystems. 2002 |
917 | <http://lwn.net/Articles/13325/> | 917 | <http://lwn.net/Articles/13325/> |
918 | 918 | ||
919 | The Linux Virtual File-system Layer by Neil Brown. 1999 | 919 | The Linux Virtual File-system Layer by Neil Brown. 1999 |
920 | <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html> | 920 | <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html> |
921 | 921 | ||
922 | A tour of the Linux VFS by Michael K. Johnson. 1996 | 922 | A tour of the Linux VFS by Michael K. Johnson. 1996 |
923 | <http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html> | 923 | <http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html> |
924 | 924 | ||
925 | A small trail through the Linux kernel by Andries Brouwer. 2001 | 925 | A small trail through the Linux kernel by Andries Brouwer. 2001 |
926 | <http://www.win.tue.nl/~aeb/linux/vfs/trail.html> | 926 | <http://www.win.tue.nl/~aeb/linux/vfs/trail.html> |
927 | 927 |
Documentation/fujitsu/frv/mmu-layout.txt
1 | ================================= | 1 | ================================= |
2 | FR451 MMU LINUX MEMORY MANAGEMENT | 2 | FR451 MMU LINUX MEMORY MANAGEMENT |
3 | ================================= | 3 | ================================= |
4 | 4 | ||
5 | ============ | 5 | ============ |
6 | MMU HARDWARE | 6 | MMU HARDWARE |
7 | ============ | 7 | ============ |
8 | 8 | ||
9 | FR451 MMU Linux puts the MMU into EDAT mode whilst running. This means that it uses both the SAT | 9 | FR451 MMU Linux puts the MMU into EDAT mode whilst running. This means that it uses both the SAT |
10 | registers and the DAT TLB to perform address translation. | 10 | registers and the DAT TLB to perform address translation. |
11 | 11 | ||
12 | There are 8 IAMLR/IAMPR register pairs and 16 DAMLR/DAMPR register pairs for SAT mode. | 12 | There are 8 IAMLR/IAMPR register pairs and 16 DAMLR/DAMPR register pairs for SAT mode. |
13 | 13 | ||
14 | In DAT mode, there is also a TLB organised in cache format as 64 lines x 2 ways. Each line spans a | 14 | In DAT mode, there is also a TLB organised in cache format as 64 lines x 2 ways. Each line spans a |
15 | 16KB range of addresses, but can match a larger region. | 15 | 16KB range of addresses, but can match a larger region. |
16 | 16 | ||
17 | 17 | ||
18 | =========================== | 18 | =========================== |
19 | MEMORY MANAGEMENT REGISTERS | 19 | MEMORY MANAGEMENT REGISTERS |
20 | =========================== | 20 | =========================== |
21 | 21 | ||
22 | Certain control registers are used by the kernel memory management routines: | 22 | Certain control registers are used by the kernel memory management routines: |
23 | 23 | ||
24 | REGISTERS USAGE | 24 | REGISTERS USAGE |
25 | ====================== ================================================== | 25 | ====================== ================================================== |
26 | IAMR0, DAMR0 Kernel image and data mappings | 26 | IAMR0, DAMR0 Kernel image and data mappings |
27 | IAMR1, DAMR1 First-chance TLB lookup mapping | 27 | IAMR1, DAMR1 First-chance TLB lookup mapping |
28 | DAMR2 Page attachment for cache flush by page | 28 | DAMR2 Page attachment for cache flush by page |
29 | DAMR3 Current PGD mapping | 29 | DAMR3 Current PGD mapping |
30 | SCR0, DAMR4 Instruction TLB PGE/PTD cache | 30 | SCR0, DAMR4 Instruction TLB PGE/PTD cache |
31 | SCR1, DAMR5 Data TLB PGE/PTD cache | 31 | SCR1, DAMR5 Data TLB PGE/PTD cache |
32 | DAMR6-10 kmap_atomic() mappings | 32 | DAMR6-10 kmap_atomic() mappings |
33 | DAMR11 I/O mapping | 33 | DAMR11 I/O mapping |
34 | CXNR mm_struct context ID | 34 | CXNR mm_struct context ID |
35 | TTBR Page directory (PGD) pointer (physical address) | 35 | TTBR Page directory (PGD) pointer (physical address) |
36 | 36 | ||
37 | 37 | ||
38 | ===================== | 38 | ===================== |
39 | GENERAL MEMORY LAYOUT | 39 | GENERAL MEMORY LAYOUT |
40 | ===================== | 40 | ===================== |
41 | 41 | ||
42 | The physical memory layout is as follows: | 42 | The physical memory layout is as follows: |
43 | 43 | ||
44 | PHYSICAL ADDRESS CONTROLLER DEVICE | 44 | PHYSICAL ADDRESS CONTROLLER DEVICE |
45 | =================== ============== ======================================= | 45 | =================== ============== ======================================= |
46 | 00000000 - BFFFFFFF SDRAM SDRAM area | 46 | 00000000 - BFFFFFFF SDRAM SDRAM area |
47 | E0000000 - EFFFFFFF L-BUS CS2# VDK SLBUS/PCI window | 47 | E0000000 - EFFFFFFF L-BUS CS2# VDK SLBUS/PCI window |
48 | F0000000 - F0FFFFFF L-BUS CS5# MB93493 CSC area (DAV daughter board) | 48 | F0000000 - F0FFFFFF L-BUS CS5# MB93493 CSC area (DAV daughter board) |
49 | F1000000 - F1FFFFFF L-BUS CS7# (CB70 CPU-card PCMCIA port I/O space) | 49 | F1000000 - F1FFFFFF L-BUS CS7# (CB70 CPU-card PCMCIA port I/O space) |
50 | FC000000 - FC0FFFFF L-BUS CS1# VDK MB86943 config space | 50 | FC000000 - FC0FFFFF L-BUS CS1# VDK MB86943 config space |
51 | FC100000 - FC1FFFFF L-BUS CS6# DM9000 NIC I/O space | 51 | FC100000 - FC1FFFFF L-BUS CS6# DM9000 NIC I/O space |
52 | FC200000 - FC2FFFFF L-BUS CS3# MB93493 CSR area (DAV daughter board) | 52 | FC200000 - FC2FFFFF L-BUS CS3# MB93493 CSR area (DAV daughter board) |
53 | FD000000 - FDFFFFFF L-BUS CS4# (CB70 CPU-card extra flash space) | 53 | FD000000 - FDFFFFFF L-BUS CS4# (CB70 CPU-card extra flash space) |
54 | FE000000 - FEFFFFFF Internal CPU peripherals | 54 | FE000000 - FEFFFFFF Internal CPU peripherals |
55 | FF000000 - FF1FFFFF L-BUS CS0# Flash 1 | 55 | FF000000 - FF1FFFFF L-BUS CS0# Flash 1 |
56 | FF200000 - FF3FFFFF L-BUS CS0# Flash 2 | 56 | FF200000 - FF3FFFFF L-BUS CS0# Flash 2 |
57 | FFC00000 - FFC0001F L-BUS CS0# FPGA | 57 | FFC00000 - FFC0001F L-BUS CS0# FPGA |
58 | 58 | ||
59 | The virtual memory layout is: | 59 | The virtual memory layout is: |
60 | 60 | ||
61 | VIRTUAL ADDRESS PHYSICAL TRANSLATOR FLAGS SIZE OCCUPATION | 61 | VIRTUAL ADDRESS PHYSICAL TRANSLATOR FLAGS SIZE OCCUPATION |
62 | ================= ======== ============== ======= ======= =================================== | 62 | ================= ======== ============== ======= ======= =================================== |
63 | 00004000-BFFFFFFF various TLB,xAMR1 D-N-??V 3GB Userspace | 63 | 00004000-BFFFFFFF various TLB,xAMR1 D-N-??V 3GB Userspace |
64 | C0000000-CFFFFFFF 00000000 xAMPR0 -L-S--V 256MB Kernel image and data | 64 | C0000000-CFFFFFFF 00000000 xAMPR0 -L-S--V 256MB Kernel image and data |
65 | D0000000-D7FFFFFF various TLB,xAMR1 D-NS??V 128MB vmalloc area | 65 | D0000000-D7FFFFFF various TLB,xAMR1 D-NS??V 128MB vmalloc area |
66 | D8000000-DBFFFFFF various TLB,xAMR1 D-NS??V 64MB kmap() area | 66 | D8000000-DBFFFFFF various TLB,xAMR1 D-NS??V 64MB kmap() area |
67 | DC000000-DCFFFFFF various TLB 1MB Secondary kmap_atomic() frame | 67 | DC000000-DCFFFFFF various TLB 1MB Secondary kmap_atomic() frame |
68 | DD000000-DD27FFFF various DAMR 160KB Primary kmap_atomic() frame | 68 | DD000000-DD27FFFF various DAMR 160KB Primary kmap_atomic() frame |
69 | DD040000 DAMR2/IAMR2 -L-S--V page Page cache flush attachment point | 69 | DD040000 DAMR2/IAMR2 -L-S--V page Page cache flush attachment point |
70 | DD080000 DAMR3 -L-SC-V page Page Directory (PGD) | 70 | DD080000 DAMR3 -L-SC-V page Page Directory (PGD) |
71 | DD0C0000 DAMR4 -L-SC-V page Cached insn TLB Page Table lookup | 71 | DD0C0000 DAMR4 -L-SC-V page Cached insn TLB Page Table lookup |
72 | DD100000 DAMR5 -L-SC-V page Cached data TLB Page Table lookup | 72 | DD100000 DAMR5 -L-SC-V page Cached data TLB Page Table lookup |
73 | DD140000 DAMR6 -L-S--V page kmap_atomic(KM_BOUNCE_READ) | 73 | DD140000 DAMR6 -L-S--V page kmap_atomic(KM_BOUNCE_READ) |
74 | DD180000 DAMR7 -L-S--V page kmap_atomic(KM_SKB_SUNRPC_DATA) | 74 | DD180000 DAMR7 -L-S--V page kmap_atomic(KM_SKB_SUNRPC_DATA) |
75 | DD1C0000 DAMR8 -L-S--V page kmap_atomic(KM_SKB_DATA_SOFTIRQ) | 75 | DD1C0000 DAMR8 -L-S--V page kmap_atomic(KM_SKB_DATA_SOFTIRQ) |
76 | DD200000 DAMR9 -L-S--V page kmap_atomic(KM_USER0) | 76 | DD200000 DAMR9 -L-S--V page kmap_atomic(KM_USER0) |
77 | DD240000 DAMR10 -L-S--V page kmap_atomic(KM_USER1) | 77 | DD240000 DAMR10 -L-S--V page kmap_atomic(KM_USER1) |
78 | E0000000-FFFFFFFF E0000000 DAMR11 -L-SC-V 512MB I/O region | 78 | E0000000-FFFFFFFF E0000000 DAMR11 -L-SC-V 512MB I/O region |
79 | 79 | ||
80 | IAMPR1 and DAMPR1 are used as an extension to the TLB. | 80 | IAMPR1 and DAMPR1 are used as an extension to the TLB. |
81 | 81 | ||
82 | 82 | ||
83 | ==================== | 83 | ==================== |
84 | KMAP AND KMAP_ATOMIC | 84 | KMAP AND KMAP_ATOMIC |
85 | ==================== | 85 | ==================== |
86 | 86 | ||
87 | To access pages in the page cache (which may not be directly accessible if highmem is available), | 87 | To access pages in the page cache (which may not be directly accessible if highmem is available), |
88 | the kernel calls kmap(), does the access and then calls kunmap(); or it calls kmap_atomic(), does | 88 | the kernel calls kmap(), does the access and then calls kunmap(); or it calls kmap_atomic(), does |
89 | the access and then calls kunmap_atomic(). | 89 | the access and then calls kunmap_atomic(). |
90 | 90 | ||
91 | kmap() creates an attachment between an arbitrary inaccessible page and a range of virtual | 91 | kmap() creates an attachment between an arbitrary inaccessible page and a range of virtual |
92 | addresses by installing a PTE in a special page table. The kernel can then access this page as it | 92 | addresses by installing a PTE in a special page table. The kernel can then access this page as it |
93 | wills. When it's finished, the kernel calls kunmap() to clear the PTE. | 93 | wills. When it's finished, the kernel calls kunmap() to clear the PTE. |
94 | 94 | ||
95 | kmap_atomic() does something slightly different. In the interests of speed, it chooses one of two | 95 | kmap_atomic() does something slightly different. In the interests of speed, it chooses one of two |
96 | strategies: | 96 | strategies: |
97 | 97 | ||
98 | (1) If possible, kmap_atomic() attaches the requested page to one of DAMPR5 through DAMPR10 | 98 | (1) If possible, kmap_atomic() attaches the requested page to one of DAMPR5 through DAMPR10 |
99 | register pairs; and the matching kunmap_atomic() clears the DAMPR. This makes high memory | 99 | register pairs; and the matching kunmap_atomic() clears the DAMPR. This makes high memory |
100 | support really fast as there's no need to flush the TLB or modify the page tables. The DAMLR | 100 | support really fast as there's no need to flush the TLB or modify the page tables. The DAMLR |
101 | registers being used for this are preset during boot and don't change over the lifetime of the | 101 | registers being used for this are preset during boot and don't change over the lifetime of the |
102 | process. There's a direct mapping between the first few kmap_atomic() types, DAMR number and | 102 | process. There's a direct mapping between the first few kmap_atomic() types, DAMR number and |
103 | virtual address slot. | 103 | virtual address slot. |
104 | 104 | ||
105 | However, there are more kmap_atomic() types defined than there are DAMR registers available, | 105 | However, there are more kmap_atomic() types defined than there are DAMR registers available, |
106 | so we fall back to: | 106 | so we fall back to: |
107 | 107 | ||
108 | (2) kmap_atomic() uses a slot in the secondary frame (determined by the type parameter), and then | 108 | (2) kmap_atomic() uses a slot in the secondary frame (determined by the type parameter), and then |
109 | locks an entry in the TLB to translate that slot to the specified page. The number of slots is | 109 | locks an entry in the TLB to translate that slot to the specified page. The number of slots is |
110 | obviously limited, and their positions are controlled such that each slot is matched by a | 110 | obviously limited, and their positions are controlled such that each slot is matched by a |
111 | different line in the TLB. kunmap() ejects the entry from the TLB. | 111 | different line in the TLB. kunmap() ejects the entry from the TLB. |
112 | 112 | ||
113 | Note that the first three kmap atomic types are really just declared as placeholders. The DAMPR | 113 | Note that the first three kmap atomic types are really just declared as placeholders. The DAMPR |
114 | registers involved are actually modified directly. | 114 | registers involved are actually modified directly. |
115 | 115 | ||
116 | Also note that kmap() itself may sleep, kmap_atomic() may never sleep and both always succeed; | 116 | Also note that kmap() itself may sleep, kmap_atomic() may never sleep and both always succeed; |
117 | furthermore, a driver using kmap() may sleep before calling kunmap(), but may not sleep before | 117 | furthermore, a driver using kmap() may sleep before calling kunmap(), but may not sleep before |
118 | calling kunmap_atomic() if it had previously called kmap_atomic(). | 118 | calling kunmap_atomic() if it had previously called kmap_atomic(). |
119 | 119 | ||
120 | 120 | ||
121 | =============================== | 121 | =============================== |
122 | USING MORE THAN 256MB OF MEMORY | 122 | USING MORE THAN 256MB OF MEMORY |
123 | =============================== | 123 | =============================== |
124 | 124 | ||
125 | The kernel cannot access more than 256MB of memory directly. The physical layout, however, permits | 125 | The kernel cannot access more than 256MB of memory directly. The physical layout, however, permits |
126 | up to 3GB of SDRAM (possibly 3.25GB) to be made available. By using CONFIG_HIGHMEM, the kernel can | 126 | up to 3GB of SDRAM (possibly 3.25GB) to be made available. By using CONFIG_HIGHMEM, the kernel can |
127 | allow userspace (by way of page tables) and itself (by way of kmap) to deal with the memory | 127 | allow userspace (by way of page tables) and itself (by way of kmap) to deal with the memory |
128 | allocation. | 128 | allocation. |
129 | 129 | ||
130 | External devices can, of course, still DMA to and from all of the SDRAM, even if the kernel can't | 130 | External devices can, of course, still DMA to and from all of the SDRAM, even if the kernel can't |
131 | see it directly. The kernel translates page references into real addresses for communicating to the | 131 | see it directly. The kernel translates page references into real addresses for communicating to the |
132 | devices. | 132 | devices. |
133 | 133 | ||
134 | 134 | ||
135 | =================== | 135 | =================== |
136 | PAGE TABLE TOPOLOGY | 136 | PAGE TABLE TOPOLOGY |
137 | =================== | 137 | =================== |
138 | 138 | ||
139 | The page tables are arranged in 2-layer format. There is a middle layer (PMD) that would be used in | 139 | The page tables are arranged in 2-layer format. There is a middle layer (PMD) that would be used in |
140 | 3-layer format tables but that is folded into the top layer (PGD) and so consumes no extra memory | 140 | 3-layer format tables but that is folded into the top layer (PGD) and so consumes no extra memory |
141 | or processing power. | 141 | or processing power. |
142 | 142 | ||
143 | +------+ PGD PMD | 143 | +------+ PGD PMD |
144 | | TTBR |--->+-------------------+ | 144 | | TTBR |--->+-------------------+ |
145 | +------+ | | : STE | | 145 | +------+ | | : STE | |
146 | | PGE0 | PME0 : STE | | 146 | | PGE0 | PME0 : STE | |
147 | | | : STE | | 147 | | | : STE | |
148 | +-------------------+ Page Table | 148 | +-------------------+ Page Table |
149 | | | : STE -------------->+--------+ +0x0000 | 149 | | | : STE -------------->+--------+ +0x0000 |
150 | | PGE1 | PME0 : STE -----------+ | PTE0 | | 150 | | PGE1 | PME0 : STE -----------+ | PTE0 | |
151 | | | : STE -------+ | +--------+ | 151 | | | : STE -------+ | +--------+ |
152 | +-------------------+ | | | PTE63 | | 152 | +-------------------+ | | | PTE63 | |
153 | | | : STE | | +-->+--------+ +0x0100 | 153 | | | : STE | | +-->+--------+ +0x0100 |
154 | | PGE2 | PME0 : STE | | | PTE64 | | 154 | | PGE2 | PME0 : STE | | | PTE64 | |
155 | | | : STE | | +--------+ | 155 | | | : STE | | +--------+ |
156 | +-------------------+ | | PTE127 | | 156 | +-------------------+ | | PTE127 | |
157 | | | : STE | +------>+--------+ +0x0200 | 157 | | | : STE | +------>+--------+ +0x0200 |
158 | | PGE3 | PME0 : STE | | PTE128 | | 158 | | PGE3 | PME0 : STE | | PTE128 | |
159 | | | : STE | +--------+ | 159 | | | : STE | +--------+ |
160 | +-------------------+ | PTE191 | | 160 | +-------------------+ | PTE191 | |
161 | +--------+ +0x0300 | 161 | +--------+ +0x0300 |
162 | 162 | ||
163 | Each Page Directory (PGD) is 16KB (page size) in size and is divided into 64 entries (PGEs). Each | 163 | Each Page Directory (PGD) is 16KB (page size) in size and is divided into 64 entries (PGEs). Each |
164 | PGE contains one Page Mid Directory (PMD). | 164 | PGE contains one Page Mid Directory (PMD). |
165 | 165 | ||
166 | Each PMD is 256 bytes in size and contains a single entry (PME). Each PME holds 64 FR451 MMU | 166 | Each PMD is 256 bytes in size and contains a single entry (PME). Each PME holds 64 FR451 MMU |
167 | segment table entries of 4 bytes apiece. Each PME "points to" a page table. In practice, each STE | 167 | segment table entries of 4 bytes apiece. Each PME "points to" a page table. In practice, each STE |
168 | points to a subset of the page table, the first to PT+0x0000, the second to PT+0x0100, the third to | 168 | points to a subset of the page table, the first to PT+0x0000, the second to PT+0x0100, the third to |
169 | PT+0x200, and so on. | 169 | PT+0x200, and so on. |
170 | 170 | ||
171 | Each PGE and PME covers 64MB of the total virtual address space. | 171 | Each PGE and PME covers 64MB of the total virtual address space. |
172 | 172 | ||
173 | Each Page Table (PTD) is 16KB (page size) in size, and is divided into 4096 entries (PTEs). Each | 173 | Each Page Table (PTD) is 16KB (page size) in size, and is divided into 4096 entries (PTEs). Each |
174 | entry can point to one 16KB page. In practice, each Linux page table is subdivided into 64 FR451 | 174 | entry can point to one 16KB page. In practice, each Linux page table is subdivided into 64 FR451 |
175 | MMU page tables. But they are all grouped together to make management easier, in particular rmap | 175 | MMU page tables. But they are all grouped together to make management easier, in particular rmap |
176 | support is then trivial. | 176 | support is then trivial. |
177 | 177 | ||
178 | Grouping page tables in this fashion makes PGE caching in SCR0/SCR1 more efficient because the | 178 | Grouping page tables in this fashion makes PGE caching in SCR0/SCR1 more efficient because the |
179 | coverage of the cached item is greater. | 179 | coverage of the cached item is greater. |
180 | 180 | ||
181 | Page tables for the vmalloc area are allocated at boot time and shared between all mm_structs. | 181 | Page tables for the vmalloc area are allocated at boot time and shared between all mm_structs. |
182 | 182 | ||
183 | 183 | ||
184 | ================= | 184 | ================= |
185 | USER SPACE LAYOUT | 185 | USER SPACE LAYOUT |
186 | ================= | 186 | ================= |
187 | 187 | ||
188 | For MMU capable Linux, the regions userspace code are allowed to access are kept entirely separate | 188 | For MMU capable Linux, the regions userspace code are allowed to access are kept entirely separate |
189 | from those dedicated to the kernel: | 189 | from those dedicated to the kernel: |
190 | 190 | ||
191 | VIRTUAL ADDRESS SIZE PURPOSE | 191 | VIRTUAL ADDRESS SIZE PURPOSE |
192 | ================= ===== =================================== | 192 | ================= ===== =================================== |
193 | 00000000-00003fff 4KB NULL pointer access trap | 193 | 00000000-00003fff 4KB NULL pointer access trap |
194 | 00004000-01ffffff ~32MB lower mmap space (grows up) | 194 | 00004000-01ffffff ~32MB lower mmap space (grows up) |
195 | 02000000-021fffff 2MB Stack space (grows down from top) | 195 | 02000000-021fffff 2MB Stack space (grows down from top) |
196 | 02200000-nnnnnnnn Executable mapping | 196 | 02200000-nnnnnnnn Executable mapping |
197 | nnnnnnnn- brk space (grows up) | 197 | nnnnnnnn- brk space (grows up) |
198 | -bfffffff upper mmap space (grows down) | 198 | -bfffffff upper mmap space (grows down) |
199 | 199 | ||
200 | This is so arranged so as to make best use of the 16KB page tables and the way in which PGEs/PMEs | 200 | This is so arranged so as to make best use of the 16KB page tables and the way in which PGEs/PMEs |
201 | are cached by the TLB handler. The lower mmap space is filled first, and then the upper mmap space | 201 | are cached by the TLB handler. The lower mmap space is filled first, and then the upper mmap space |
202 | is filled. | 202 | is filled. |
203 | 203 | ||
204 | 204 | ||
205 | =============================== | 205 | =============================== |
206 | GDB-STUB MMU DEBUGGING SERVICES | 206 | GDB-STUB MMU DEBUGGING SERVICES |
207 | =============================== | 207 | =============================== |
208 | 208 | ||
209 | The gdb-stub included in this kernel provides a number of services to aid in the debugging of MMU | 209 | The gdb-stub included in this kernel provides a number of services to aid in the debugging of MMU |
210 | related kernel services: | 210 | related kernel services: |
211 | 211 | ||
212 | (*) Every time the kernel stops, certain state information is dumped into __debug_mmu. This | 212 | (*) Every time the kernel stops, certain state information is dumped into __debug_mmu. This |
213 | variable is defined in arch/frv/kernel/gdb-stub.c. Note that the gdbinit file in this | 213 | variable is defined in arch/frv/kernel/gdb-stub.c. Note that the gdbinit file in this |
214 | directory has some useful macros for dealing with this. | 214 | directory has some useful macros for dealing with this. |
215 | 215 | ||
216 | (*) __debug_mmu.tlb[] | 216 | (*) __debug_mmu.tlb[] |
217 | 217 | ||
218 | This receives the current TLB contents. This can be viewed with the _tlb GDB macro: | 218 | This receives the current TLB contents. This can be viewed with the _tlb GDB macro: |
219 | 219 | ||
220 | (gdb) _tlb | 220 | (gdb) _tlb |
221 | tlb[0x00]: 01000005 00718203 01000002 00718203 | 221 | tlb[0x00]: 01000005 00718203 01000002 00718203 |
222 | tlb[0x01]: 01004002 006d4201 01004005 006d4203 | 222 | tlb[0x01]: 01004002 006d4201 01004005 006d4203 |
223 | tlb[0x02]: 01008002 006d0201 01008006 00004200 | 223 | tlb[0x02]: 01008002 006d0201 01008006 00004200 |
224 | tlb[0x03]: 0100c006 007f4202 0100c002 0064c202 | 224 | tlb[0x03]: 0100c006 007f4202 0100c002 0064c202 |
225 | tlb[0x04]: 01110005 00774201 01110002 00774201 | 225 | tlb[0x04]: 01110005 00774201 01110002 00774201 |
226 | tlb[0x05]: 01114005 00770201 01114002 00770201 | 226 | tlb[0x05]: 01114005 00770201 01114002 00770201 |
227 | tlb[0x06]: 01118002 0076c201 01118005 0076c201 | 227 | tlb[0x06]: 01118002 0076c201 01118005 0076c201 |
228 | ... | 228 | ... |
229 | tlb[0x3d]: 010f4002 00790200 001f4002 0054ca02 | 229 | tlb[0x3d]: 010f4002 00790200 001f4002 0054ca02 |
230 | tlb[0x3e]: 010f8005 0078c201 010f8002 0078c201 | 230 | tlb[0x3e]: 010f8005 0078c201 010f8002 0078c201 |
231 | tlb[0x3f]: 001fc002 0056ca01 001fc005 00538a01 | 231 | tlb[0x3f]: 001fc002 0056ca01 001fc005 00538a01 |
232 | 232 | ||
233 | (*) __debug_mmu.iamr[] | 233 | (*) __debug_mmu.iamr[] |
234 | (*) __debug_mmu.damr[] | 234 | (*) __debug_mmu.damr[] |
235 | 235 | ||
236 | These receive the current IAMR and DAMR contents. These can be viewed with with the _amr | 236 | These receive the current IAMR and DAMR contents. These can be viewed with the _amr |
237 | GDB macro: | 237 | GDB macro: |
238 | 238 | ||
239 | (gdb) _amr | 239 | (gdb) _amr |
240 | AMRx DAMR IAMR | 240 | AMRx DAMR IAMR |
241 | ==== ===================== ===================== | 241 | ==== ===================== ===================== |
242 | amr0 : L:c0000000 P:00000cb9 : L:c0000000 P:000004b9 | 242 | amr0 : L:c0000000 P:00000cb9 : L:c0000000 P:000004b9 |
243 | amr1 : L:01070005 P:006f9203 : L:0102c005 P:006a1201 | 243 | amr1 : L:01070005 P:006f9203 : L:0102c005 P:006a1201 |
244 | amr2 : L:d8d00000 P:00000000 : L:d8d00000 P:00000000 | 244 | amr2 : L:d8d00000 P:00000000 : L:d8d00000 P:00000000 |
245 | amr3 : L:d8d04000 P:00534c0d : L:00000000 P:00000000 | 245 | amr3 : L:d8d04000 P:00534c0d : L:00000000 P:00000000 |
246 | amr4 : L:d8d08000 P:00554c0d : L:00000000 P:00000000 | 246 | amr4 : L:d8d08000 P:00554c0d : L:00000000 P:00000000 |
247 | amr5 : L:d8d0c000 P:00554c0d : L:00000000 P:00000000 | 247 | amr5 : L:d8d0c000 P:00554c0d : L:00000000 P:00000000 |
248 | amr6 : L:d8d10000 P:00000000 : L:00000000 P:00000000 | 248 | amr6 : L:d8d10000 P:00000000 : L:00000000 P:00000000 |
249 | amr7 : L:d8d14000 P:00000000 : L:00000000 P:00000000 | 249 | amr7 : L:d8d14000 P:00000000 : L:00000000 P:00000000 |
250 | amr8 : L:d8d18000 P:00000000 | 250 | amr8 : L:d8d18000 P:00000000 |
251 | amr9 : L:d8d1c000 P:00000000 | 251 | amr9 : L:d8d1c000 P:00000000 |
252 | amr10: L:d8d20000 P:00000000 | 252 | amr10: L:d8d20000 P:00000000 |
253 | amr11: L:e0000000 P:e0000ccd | 253 | amr11: L:e0000000 P:e0000ccd |
254 | 254 | ||
255 | (*) The current task's page directory is bound to DAMR3. | 255 | (*) The current task's page directory is bound to DAMR3. |
256 | 256 | ||
257 | This can be viewed with the _pgd GDB macro: | 257 | This can be viewed with the _pgd GDB macro: |
258 | 258 | ||
259 | (gdb) _pgd | 259 | (gdb) _pgd |
260 | $3 = {{pge = {{ste = {0x554001, 0x554101, 0x554201, 0x554301, 0x554401, | 260 | $3 = {{pge = {{ste = {0x554001, 0x554101, 0x554201, 0x554301, 0x554401, |
261 | 0x554501, 0x554601, 0x554701, 0x554801, 0x554901, 0x554a01, | 261 | 0x554501, 0x554601, 0x554701, 0x554801, 0x554901, 0x554a01, |
262 | 0x554b01, 0x554c01, 0x554d01, 0x554e01, 0x554f01, 0x555001, | 262 | 0x554b01, 0x554c01, 0x554d01, 0x554e01, 0x554f01, 0x555001, |
263 | 0x555101, 0x555201, 0x555301, 0x555401, 0x555501, 0x555601, | 263 | 0x555101, 0x555201, 0x555301, 0x555401, 0x555501, 0x555601, |
264 | 0x555701, 0x555801, 0x555901, 0x555a01, 0x555b01, 0x555c01, | 264 | 0x555701, 0x555801, 0x555901, 0x555a01, 0x555b01, 0x555c01, |
265 | 0x555d01, 0x555e01, 0x555f01, 0x556001, 0x556101, 0x556201, | 265 | 0x555d01, 0x555e01, 0x555f01, 0x556001, 0x556101, 0x556201, |
266 | 0x556301, 0x556401, 0x556501, 0x556601, 0x556701, 0x556801, | 266 | 0x556301, 0x556401, 0x556501, 0x556601, 0x556701, 0x556801, |
267 | 0x556901, 0x556a01, 0x556b01, 0x556c01, 0x556d01, 0x556e01, | 267 | 0x556901, 0x556a01, 0x556b01, 0x556c01, 0x556d01, 0x556e01, |
268 | 0x556f01, 0x557001, 0x557101, 0x557201, 0x557301, 0x557401, | 268 | 0x556f01, 0x557001, 0x557101, 0x557201, 0x557301, 0x557401, |
269 | 0x557501, 0x557601, 0x557701, 0x557801, 0x557901, 0x557a01, | 269 | 0x557501, 0x557601, 0x557701, 0x557801, 0x557901, 0x557a01, |
270 | 0x557b01, 0x557c01, 0x557d01, 0x557e01, 0x557f01}}}}, {pge = {{ | 270 | 0x557b01, 0x557c01, 0x557d01, 0x557e01, 0x557f01}}}}, {pge = {{ |
271 | ste = {0x0 <repeats 64 times>}}}} <repeats 51 times>, {pge = {{ste = { | 271 | ste = {0x0 <repeats 64 times>}}}} <repeats 51 times>, {pge = {{ste = { |
272 | 0x248001, 0x248101, 0x248201, 0x248301, 0x248401, 0x248501, | 272 | 0x248001, 0x248101, 0x248201, 0x248301, 0x248401, 0x248501, |
273 | 0x248601, 0x248701, 0x248801, 0x248901, 0x248a01, 0x248b01, | 273 | 0x248601, 0x248701, 0x248801, 0x248901, 0x248a01, 0x248b01, |
274 | 0x248c01, 0x248d01, 0x248e01, 0x248f01, 0x249001, 0x249101, | 274 | 0x248c01, 0x248d01, 0x248e01, 0x248f01, 0x249001, 0x249101, |
275 | 0x249201, 0x249301, 0x249401, 0x249501, 0x249601, 0x249701, | 275 | 0x249201, 0x249301, 0x249401, 0x249501, 0x249601, 0x249701, |
276 | 0x249801, 0x249901, 0x249a01, 0x249b01, 0x249c01, 0x249d01, | 276 | 0x249801, 0x249901, 0x249a01, 0x249b01, 0x249c01, 0x249d01, |
277 | 0x249e01, 0x249f01, 0x24a001, 0x24a101, 0x24a201, 0x24a301, | 277 | 0x249e01, 0x249f01, 0x24a001, 0x24a101, 0x24a201, 0x24a301, |
278 | 0x24a401, 0x24a501, 0x24a601, 0x24a701, 0x24a801, 0x24a901, | 278 | 0x24a401, 0x24a501, 0x24a601, 0x24a701, 0x24a801, 0x24a901, |
279 | 0x24aa01, 0x24ab01, 0x24ac01, 0x24ad01, 0x24ae01, 0x24af01, | 279 | 0x24aa01, 0x24ab01, 0x24ac01, 0x24ad01, 0x24ae01, 0x24af01, |
280 | 0x24b001, 0x24b101, 0x24b201, 0x24b301, 0x24b401, 0x24b501, | 280 | 0x24b001, 0x24b101, 0x24b201, 0x24b301, 0x24b401, 0x24b501, |
281 | 0x24b601, 0x24b701, 0x24b801, 0x24b901, 0x24ba01, 0x24bb01, | 281 | 0x24b601, 0x24b701, 0x24b801, 0x24b901, 0x24ba01, 0x24bb01, |
282 | 0x24bc01, 0x24bd01, 0x24be01, 0x24bf01}}}}, {pge = {{ste = { | 282 | 0x24bc01, 0x24bd01, 0x24be01, 0x24bf01}}}}, {pge = {{ste = { |
283 | 0x0 <repeats 64 times>}}}} <repeats 11 times>} | 283 | 0x0 <repeats 64 times>}}}} <repeats 11 times>} |
284 | 284 | ||
285 | (*) The PTD last used by the instruction TLB miss handler is attached to DAMR4. | 285 | (*) The PTD last used by the instruction TLB miss handler is attached to DAMR4. |
286 | (*) The PTD last used by the data TLB miss handler is attached to DAMR5. | 286 | (*) The PTD last used by the data TLB miss handler is attached to DAMR5. |
287 | 287 | ||
288 | These can be viewed with the _ptd_i and _ptd_d GDB macros: | 288 | These can be viewed with the _ptd_i and _ptd_d GDB macros: |
289 | 289 | ||
290 | (gdb) _ptd_d | 290 | (gdb) _ptd_d |
291 | $5 = {{pte = 0x0} <repeats 127 times>, {pte = 0x539b01}, { | 291 | $5 = {{pte = 0x0} <repeats 127 times>, {pte = 0x539b01}, { |
292 | pte = 0x0} <repeats 896 times>, {pte = 0x719303}, {pte = 0x6d5303}, { | 292 | pte = 0x0} <repeats 896 times>, {pte = 0x719303}, {pte = 0x6d5303}, { |
293 | pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, { | 293 | pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, { |
294 | pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x6a1303}, { | 294 | pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x6a1303}, { |
295 | pte = 0x0} <repeats 12 times>, {pte = 0x709303}, {pte = 0x0}, {pte = 0x0}, | 295 | pte = 0x0} <repeats 12 times>, {pte = 0x709303}, {pte = 0x0}, {pte = 0x0}, |
296 | {pte = 0x6fd303}, {pte = 0x6f9303}, {pte = 0x6f5303}, {pte = 0x0}, { | 296 | {pte = 0x6fd303}, {pte = 0x6f9303}, {pte = 0x6f5303}, {pte = 0x0}, { |
297 | pte = 0x6ed303}, {pte = 0x531b01}, {pte = 0x50db01}, { | 297 | pte = 0x6ed303}, {pte = 0x531b01}, {pte = 0x50db01}, { |
298 | pte = 0x0} <repeats 13 times>, {pte = 0x5303}, {pte = 0x7f5303}, { | 298 | pte = 0x0} <repeats 13 times>, {pte = 0x5303}, {pte = 0x7f5303}, { |
299 | pte = 0x509b01}, {pte = 0x505b01}, {pte = 0x7c9303}, {pte = 0x7b9303}, { | 299 | pte = 0x509b01}, {pte = 0x505b01}, {pte = 0x7c9303}, {pte = 0x7b9303}, { |
300 | pte = 0x7b5303}, {pte = 0x7b1303}, {pte = 0x7ad303}, {pte = 0x0}, { | 300 | pte = 0x7b5303}, {pte = 0x7b1303}, {pte = 0x7ad303}, {pte = 0x0}, { |
301 | pte = 0x0}, {pte = 0x7a1303}, {pte = 0x0}, {pte = 0x795303}, {pte = 0x0}, { | 301 | pte = 0x0}, {pte = 0x7a1303}, {pte = 0x0}, {pte = 0x795303}, {pte = 0x0}, { |
302 | pte = 0x78d303}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, { | 302 | pte = 0x78d303}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, {pte = 0x0}, { |
303 | pte = 0x0}, {pte = 0x775303}, {pte = 0x771303}, {pte = 0x76d303}, { | 303 | pte = 0x0}, {pte = 0x775303}, {pte = 0x771303}, {pte = 0x76d303}, { |
304 | pte = 0x0}, {pte = 0x765303}, {pte = 0x7c5303}, {pte = 0x501b01}, { | 304 | pte = 0x0}, {pte = 0x765303}, {pte = 0x7c5303}, {pte = 0x501b01}, { |
305 | pte = 0x4f1b01}, {pte = 0x4edb01}, {pte = 0x0}, {pte = 0x4f9b01}, { | 305 | pte = 0x4f1b01}, {pte = 0x4edb01}, {pte = 0x0}, {pte = 0x4f9b01}, { |
306 | pte = 0x4fdb01}, {pte = 0x0} <repeats 2992 times>} | 306 | pte = 0x4fdb01}, {pte = 0x0} <repeats 2992 times>} |
307 | 307 |
Documentation/ia64/efirtc.txt
1 | EFI Real Time Clock driver | 1 | EFI Real Time Clock driver |
2 | ------------------------------- | 2 | ------------------------------- |
3 | S. Eranian <eranian@hpl.hp.com> | 3 | S. Eranian <eranian@hpl.hp.com> |
4 | March 2000 | 4 | March 2000 |
5 | 5 | ||
6 | I/ Introduction | 6 | I/ Introduction |
7 | 7 | ||
8 | This document describes the efirtc.c driver has provided for | 8 | This document describes the efirtc.c driver has provided for |
9 | the IA-64 platform. | 9 | the IA-64 platform. |
10 | 10 | ||
11 | The purpose of this driver is to supply an API for kernel and user applications | 11 | The purpose of this driver is to supply an API for kernel and user applications |
12 | to get access to the Time Service offered by EFI version 0.92. | 12 | to get access to the Time Service offered by EFI version 0.92. |
13 | 13 | ||
14 | EFI provides 4 calls one can make once the OS is booted: GetTime(), | 14 | EFI provides 4 calls one can make once the OS is booted: GetTime(), |
15 | SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this | 15 | SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this |
16 | driver. We describe those calls as well the design of the driver in the | 16 | driver. We describe those calls as well the design of the driver in the |
17 | following sections. | 17 | following sections. |
18 | 18 | ||
19 | II/ Design Decisions | 19 | II/ Design Decisions |
20 | 20 | ||
21 | The original ideas was to provide a very simple driver to get access to, | 21 | The original ideas was to provide a very simple driver to get access to, |
22 | at first, the time of day service. This is required in order to access, in a | 22 | at first, the time of day service. This is required in order to access, in a |
23 | portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock | 23 | portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock |
24 | to initialize the system view of the time during boot. | 24 | to initialize the system view of the time during boot. |
25 | 25 | ||
26 | Because we wanted to minimize the impact on existing user-level apps using | 26 | Because we wanted to minimize the impact on existing user-level apps using |
27 | the CMOS clock, we decided to expose an API that was very similar to the one | 27 | the CMOS clock, we decided to expose an API that was very similar to the one |
28 | used today with the legacy RTC driver (driver/char/rtc.c). However, because | 28 | used today with the legacy RTC driver (driver/char/rtc.c). However, because |
29 | EFI provides a simpler services, not all all ioctl() are available. Also | 29 | EFI provides a simpler services, not all ioctl() are available. Also |
30 | new ioctl()s have been introduced for things that EFI provides but not the | 30 | new ioctl()s have been introduced for things that EFI provides but not the |
31 | legacy. | 31 | legacy. |
32 | 32 | ||
33 | EFI uses a slightly different way of representing the time, noticeably | 33 | EFI uses a slightly different way of representing the time, noticeably |
34 | the reference date is different. Year is the using the full 4-digit format. | 34 | the reference date is different. Year is the using the full 4-digit format. |
35 | The Epoch is January 1st 1998. For backward compatibility reasons we don't | 35 | The Epoch is January 1st 1998. For backward compatibility reasons we don't |
36 | expose this new way of representing time. Instead we use something very | 36 | expose this new way of representing time. Instead we use something very |
37 | similar to the struct tm, i.e. struct rtc_time, as used by hwclock. | 37 | similar to the struct tm, i.e. struct rtc_time, as used by hwclock. |
38 | One of the reasons for doing it this way is to allow for EFI to still evolve | 38 | One of the reasons for doing it this way is to allow for EFI to still evolve |
39 | without necessarily impacting any of the user applications. The decoupling | 39 | without necessarily impacting any of the user applications. The decoupling |
40 | enables flexibility and permits writing wrapper code is ncase things change. | 40 | enables flexibility and permits writing wrapper code is ncase things change. |
41 | 41 | ||
42 | The driver exposes two interfaces, one via the device file and a set of | 42 | The driver exposes two interfaces, one via the device file and a set of |
43 | ioctl()s. The other is read-only via the /proc filesystem. | 43 | ioctl()s. The other is read-only via the /proc filesystem. |
44 | 44 | ||
45 | As of today we don't offer a /proc/sys interface. | 45 | As of today we don't offer a /proc/sys interface. |
46 | 46 | ||
47 | To allow for a uniform interface between the legacy RTC and EFI time service, | 47 | To allow for a uniform interface between the legacy RTC and EFI time service, |
48 | we have created the include/linux/rtc.h header file to contain only the | 48 | we have created the include/linux/rtc.h header file to contain only the |
49 | "public" API of the two drivers. The specifics of the legacy RTC are still | 49 | "public" API of the two drivers. The specifics of the legacy RTC are still |
50 | in include/linux/mc146818rtc.h. | 50 | in include/linux/mc146818rtc.h. |
51 | 51 | ||
52 | 52 | ||
53 | III/ Time of day service | 53 | III/ Time of day service |
54 | 54 | ||
55 | The part of the driver gives access to the time of day service of EFI. | 55 | The part of the driver gives access to the time of day service of EFI. |
56 | Two ioctl()s, compatible with the legacy RTC calls: | 56 | Two ioctl()s, compatible with the legacy RTC calls: |
57 | 57 | ||
58 | Read the CMOS clock: ioctl(d, RTC_RD_TIME, &rtc); | 58 | Read the CMOS clock: ioctl(d, RTC_RD_TIME, &rtc); |
59 | 59 | ||
60 | Write the CMOS clock: ioctl(d, RTC_SET_TIME, &rtc); | 60 | Write the CMOS clock: ioctl(d, RTC_SET_TIME, &rtc); |
61 | 61 | ||
62 | The rtc is a pointer to a data structure defined in rtc.h which is close | 62 | The rtc is a pointer to a data structure defined in rtc.h which is close |
63 | to a struct tm: | 63 | to a struct tm: |
64 | 64 | ||
65 | struct rtc_time { | 65 | struct rtc_time { |
66 | int tm_sec; | 66 | int tm_sec; |
67 | int tm_min; | 67 | int tm_min; |
68 | int tm_hour; | 68 | int tm_hour; |
69 | int tm_mday; | 69 | int tm_mday; |
70 | int tm_mon; | 70 | int tm_mon; |
71 | int tm_year; | 71 | int tm_year; |
72 | int tm_wday; | 72 | int tm_wday; |
73 | int tm_yday; | 73 | int tm_yday; |
74 | int tm_isdst; | 74 | int tm_isdst; |
75 | }; | 75 | }; |
76 | 76 | ||
77 | The driver takes care of converting back an forth between the EFI time and | 77 | The driver takes care of converting back an forth between the EFI time and |
78 | this format. | 78 | this format. |
79 | 79 | ||
80 | Those two ioctl()s can be exercised with the hwclock command: | 80 | Those two ioctl()s can be exercised with the hwclock command: |
81 | 81 | ||
82 | For reading: | 82 | For reading: |
83 | # /sbin/hwclock --show | 83 | # /sbin/hwclock --show |
84 | Mon Mar 6 15:32:32 2000 -0.910248 seconds | 84 | Mon Mar 6 15:32:32 2000 -0.910248 seconds |
85 | 85 | ||
86 | For setting: | 86 | For setting: |
87 | # /sbin/hwclock --systohc | 87 | # /sbin/hwclock --systohc |
88 | 88 | ||
89 | Root privileges are required to be able to set the time of day. | 89 | Root privileges are required to be able to set the time of day. |
90 | 90 | ||
91 | IV/ Wakeup Alarm service | 91 | IV/ Wakeup Alarm service |
92 | 92 | ||
93 | EFI provides an API by which one can program when a machine should wakeup, | 93 | EFI provides an API by which one can program when a machine should wakeup, |
94 | i.e. reboot. This is very different from the alarm provided by the legacy | 94 | i.e. reboot. This is very different from the alarm provided by the legacy |
95 | RTC which is some kind of interval timer alarm. For this reason we don't use | 95 | RTC which is some kind of interval timer alarm. For this reason we don't use |
96 | the same ioctl()s to get access to the service. Instead we have | 96 | the same ioctl()s to get access to the service. Instead we have |
97 | introduced 2 news ioctl()s to the interface of an RTC. | 97 | introduced 2 news ioctl()s to the interface of an RTC. |
98 | 98 | ||
99 | We have added 2 new ioctl()s that are specific to the EFI driver: | 99 | We have added 2 new ioctl()s that are specific to the EFI driver: |
100 | 100 | ||
101 | Read the current state of the alarm | 101 | Read the current state of the alarm |
102 | ioctl(d, RTC_WKLAM_RD, &wkt) | 102 | ioctl(d, RTC_WKLAM_RD, &wkt) |
103 | 103 | ||
104 | Set the alarm or change its status | 104 | Set the alarm or change its status |
105 | ioctl(d, RTC_WKALM_SET, &wkt) | 105 | ioctl(d, RTC_WKALM_SET, &wkt) |
106 | 106 | ||
107 | The wkt structure encapsulates a struct rtc_time + 2 extra fields to get | 107 | The wkt structure encapsulates a struct rtc_time + 2 extra fields to get |
108 | status information: | 108 | status information: |
109 | 109 | ||
110 | struct rtc_wkalrm { | 110 | struct rtc_wkalrm { |
111 | 111 | ||
112 | unsigned char enabled; /* =1 if alarm is enabled */ | 112 | unsigned char enabled; /* =1 if alarm is enabled */ |
113 | unsigned char pending; /* =1 if alarm is pending */ | 113 | unsigned char pending; /* =1 if alarm is pending */ |
114 | 114 | ||
115 | struct rtc_time time; | 115 | struct rtc_time time; |
116 | } | 116 | } |
117 | 117 | ||
118 | As of today, none of the existing user-level apps supports this feature. | 118 | As of today, none of the existing user-level apps supports this feature. |
119 | However writing such a program should be hard by simply using those two | 119 | However writing such a program should be hard by simply using those two |
120 | ioctl(). | 120 | ioctl(). |
121 | 121 | ||
122 | Root privileges are required to be able to set the alarm. | 122 | Root privileges are required to be able to set the alarm. |
123 | 123 | ||
124 | V/ References. | 124 | V/ References. |
125 | 125 | ||
126 | Checkout the following Web site for more information on EFI: | 126 | Checkout the following Web site for more information on EFI: |
127 | 127 | ||
128 | http://developer.intel.com/technology/efi/ | 128 | http://developer.intel.com/technology/efi/ |
129 | 129 |
Documentation/ia64/mca.txt
1 | An ad-hoc collection of notes on IA64 MCA and INIT processing. Feel | 1 | An ad-hoc collection of notes on IA64 MCA and INIT processing. Feel |
2 | free to update it with notes about any area that is not clear. | 2 | free to update it with notes about any area that is not clear. |
3 | 3 | ||
4 | --- | 4 | --- |
5 | 5 | ||
6 | MCA/INIT are completely asynchronous. They can occur at any time, when | 6 | MCA/INIT are completely asynchronous. They can occur at any time, when |
7 | the OS is in any state. Including when one of the cpus is already | 7 | the OS is in any state. Including when one of the cpus is already |
8 | holding a spinlock. Trying to get any lock from MCA/INIT state is | 8 | holding a spinlock. Trying to get any lock from MCA/INIT state is |
9 | asking for deadlock. Also the state of structures that are protected | 9 | asking for deadlock. Also the state of structures that are protected |
10 | by locks is indeterminate, including linked lists. | 10 | by locks is indeterminate, including linked lists. |
11 | 11 | ||
12 | --- | 12 | --- |
13 | 13 | ||
14 | The complicated ia64 MCA process. All of this is mandated by Intel's | 14 | The complicated ia64 MCA process. All of this is mandated by Intel's |
15 | specification for ia64 SAL, error recovery and and unwind, it is not as | 15 | specification for ia64 SAL, error recovery and unwind, it is not as |
16 | if we have a choice here. | 16 | if we have a choice here. |
17 | 17 | ||
18 | * MCA occurs on one cpu, usually due to a double bit memory error. | 18 | * MCA occurs on one cpu, usually due to a double bit memory error. |
19 | This is the monarch cpu. | 19 | This is the monarch cpu. |
20 | 20 | ||
21 | * SAL sends an MCA rendezvous interrupt (which is a normal interrupt) | 21 | * SAL sends an MCA rendezvous interrupt (which is a normal interrupt) |
22 | to all the other cpus, the slaves. | 22 | to all the other cpus, the slaves. |
23 | 23 | ||
24 | * Slave cpus that receive the MCA interrupt call down into SAL, they | 24 | * Slave cpus that receive the MCA interrupt call down into SAL, they |
25 | end up spinning disabled while the MCA is being serviced. | 25 | end up spinning disabled while the MCA is being serviced. |
26 | 26 | ||
27 | * If any slave cpu was already spinning disabled when the MCA occurred | 27 | * If any slave cpu was already spinning disabled when the MCA occurred |
28 | then it cannot service the MCA interrupt. SAL waits ~20 seconds then | 28 | then it cannot service the MCA interrupt. SAL waits ~20 seconds then |
29 | sends an unmaskable INIT event to the slave cpus that have not | 29 | sends an unmaskable INIT event to the slave cpus that have not |
30 | already rendezvoused. | 30 | already rendezvoused. |
31 | 31 | ||
32 | * Because MCA/INIT can be delivered at any time, including when the cpu | 32 | * Because MCA/INIT can be delivered at any time, including when the cpu |
33 | is down in PAL in physical mode, the registers at the time of the | 33 | is down in PAL in physical mode, the registers at the time of the |
34 | event are _completely_ undefined. In particular the MCA/INIT | 34 | event are _completely_ undefined. In particular the MCA/INIT |
35 | handlers cannot rely on the thread pointer, PAL physical mode can | 35 | handlers cannot rely on the thread pointer, PAL physical mode can |
36 | (and does) modify TP. It is allowed to do that as long as it resets | 36 | (and does) modify TP. It is allowed to do that as long as it resets |
37 | TP on return. However MCA/INIT events expose us to these PAL | 37 | TP on return. However MCA/INIT events expose us to these PAL |
38 | internal TP changes. Hence curr_task(). | 38 | internal TP changes. Hence curr_task(). |
39 | 39 | ||
40 | * If an MCA/INIT event occurs while the kernel was running (not user | 40 | * If an MCA/INIT event occurs while the kernel was running (not user |
41 | space) and the kernel has called PAL then the MCA/INIT handler cannot | 41 | space) and the kernel has called PAL then the MCA/INIT handler cannot |
42 | assume that the kernel stack is in a fit state to be used. Mainly | 42 | assume that the kernel stack is in a fit state to be used. Mainly |
43 | because PAL may or may not maintain the stack pointer internally. | 43 | because PAL may or may not maintain the stack pointer internally. |
44 | Because the MCA/INIT handlers cannot trust the kernel stack, they | 44 | Because the MCA/INIT handlers cannot trust the kernel stack, they |
45 | have to use their own, per-cpu stacks. The MCA/INIT stacks are | 45 | have to use their own, per-cpu stacks. The MCA/INIT stacks are |
46 | preformatted with just enough task state to let the relevant handlers | 46 | preformatted with just enough task state to let the relevant handlers |
47 | do their job. | 47 | do their job. |
48 | 48 | ||
49 | * Unlike most other architectures, the ia64 struct task is embedded in | 49 | * Unlike most other architectures, the ia64 struct task is embedded in |
50 | the kernel stack[1]. So switching to a new kernel stack means that | 50 | the kernel stack[1]. So switching to a new kernel stack means that |
51 | we switch to a new task as well. Because various bits of the kernel | 51 | we switch to a new task as well. Because various bits of the kernel |
52 | assume that current points into the struct task, switching to a new | 52 | assume that current points into the struct task, switching to a new |
53 | stack also means a new value for current. | 53 | stack also means a new value for current. |
54 | 54 | ||
55 | * Once all slaves have rendezvoused and are spinning disabled, the | 55 | * Once all slaves have rendezvoused and are spinning disabled, the |
56 | monarch is entered. The monarch now tries to diagnose the problem | 56 | monarch is entered. The monarch now tries to diagnose the problem |
57 | and decide if it can recover or not. | 57 | and decide if it can recover or not. |
58 | 58 | ||
59 | * Part of the monarch's job is to look at the state of all the other | 59 | * Part of the monarch's job is to look at the state of all the other |
60 | tasks. The only way to do that on ia64 is to call the unwinder, | 60 | tasks. The only way to do that on ia64 is to call the unwinder, |
61 | as mandated by Intel. | 61 | as mandated by Intel. |
62 | 62 | ||
63 | * The starting point for the unwind depends on whether a task is | 63 | * The starting point for the unwind depends on whether a task is |
64 | running or not. That is, whether it is on a cpu or is blocked. The | 64 | running or not. That is, whether it is on a cpu or is blocked. The |
65 | monarch has to determine whether or not a task is on a cpu before it | 65 | monarch has to determine whether or not a task is on a cpu before it |
66 | knows how to start unwinding it. The tasks that received an MCA or | 66 | knows how to start unwinding it. The tasks that received an MCA or |
67 | INIT event are no longer running, they have been converted to blocked | 67 | INIT event are no longer running, they have been converted to blocked |
68 | tasks. But (and its a big but), the cpus that received the MCA | 68 | tasks. But (and its a big but), the cpus that received the MCA |
69 | rendezvous interrupt are still running on their normal kernel stacks! | 69 | rendezvous interrupt are still running on their normal kernel stacks! |
70 | 70 | ||
71 | * To distinguish between these two cases, the monarch must know which | 71 | * To distinguish between these two cases, the monarch must know which |
72 | tasks are on a cpu and which are not. Hence each slave cpu that | 72 | tasks are on a cpu and which are not. Hence each slave cpu that |
73 | switches to an MCA/INIT stack, registers its new stack using | 73 | switches to an MCA/INIT stack, registers its new stack using |
74 | set_curr_task(), so the monarch can tell that the _original_ task is | 74 | set_curr_task(), so the monarch can tell that the _original_ task is |
75 | no longer running on that cpu. That gives us a decent chance of | 75 | no longer running on that cpu. That gives us a decent chance of |
76 | getting a valid backtrace of the _original_ task. | 76 | getting a valid backtrace of the _original_ task. |
77 | 77 | ||
78 | * MCA/INIT can be nested, to a depth of 2 on any cpu. In the case of a | 78 | * MCA/INIT can be nested, to a depth of 2 on any cpu. In the case of a |
79 | nested error, we want diagnostics on the MCA/INIT handler that | 79 | nested error, we want diagnostics on the MCA/INIT handler that |
80 | failed, not on the task that was originally running. Again this | 80 | failed, not on the task that was originally running. Again this |
81 | requires set_curr_task() so the MCA/INIT handlers can register their | 81 | requires set_curr_task() so the MCA/INIT handlers can register their |
82 | own stack as running on that cpu. Then a recursive error gets a | 82 | own stack as running on that cpu. Then a recursive error gets a |
83 | trace of the failing handler's "task". | 83 | trace of the failing handler's "task". |
84 | 84 | ||
85 | [1] My (Keith Owens) original design called for ia64 to separate its | 85 | [1] My (Keith Owens) original design called for ia64 to separate its |
86 | struct task and the kernel stacks. Then the MCA/INIT data would be | 86 | struct task and the kernel stacks. Then the MCA/INIT data would be |
87 | chained stacks like i386 interrupt stacks. But that required | 87 | chained stacks like i386 interrupt stacks. But that required |
88 | radical surgery on the rest of ia64, plus extra hard wired TLB | 88 | radical surgery on the rest of ia64, plus extra hard wired TLB |
89 | entries with its associated performance degradation. David | 89 | entries with its associated performance degradation. David |
90 | Mosberger vetoed that approach. Which meant that separate kernel | 90 | Mosberger vetoed that approach. Which meant that separate kernel |
91 | stacks meant separate "tasks" for the MCA/INIT handlers. | 91 | stacks meant separate "tasks" for the MCA/INIT handlers. |
92 | 92 | ||
93 | --- | 93 | --- |
94 | 94 | ||
95 | INIT is less complicated than MCA. Pressing the nmi button or using | 95 | INIT is less complicated than MCA. Pressing the nmi button or using |
96 | the equivalent command on the management console sends INIT to all | 96 | the equivalent command on the management console sends INIT to all |
97 | cpus. SAL picks one one of the cpus as the monarch and the rest are | 97 | cpus. SAL picks one of the cpus as the monarch and the rest are |
98 | slaves. All the OS INIT handlers are entered at approximately the same | 98 | slaves. All the OS INIT handlers are entered at approximately the same |
99 | time. The OS monarch prints the state of all tasks and returns, after | 99 | time. The OS monarch prints the state of all tasks and returns, after |
100 | which the slaves return and the system resumes. | 100 | which the slaves return and the system resumes. |
101 | 101 | ||
102 | At least that is what is supposed to happen. Alas there are broken | 102 | At least that is what is supposed to happen. Alas there are broken |
103 | versions of SAL out there. Some drive all the cpus as monarchs. Some | 103 | versions of SAL out there. Some drive all the cpus as monarchs. Some |
104 | drive them all as slaves. Some drive one cpu as monarch, wait for that | 104 | drive them all as slaves. Some drive one cpu as monarch, wait for that |
105 | cpu to return from the OS then drive the rest as slaves. Some versions | 105 | cpu to return from the OS then drive the rest as slaves. Some versions |
106 | of SAL cannot even cope with returning from the OS, they spin inside | 106 | of SAL cannot even cope with returning from the OS, they spin inside |
107 | SAL on resume. The OS INIT code has workarounds for some of these | 107 | SAL on resume. The OS INIT code has workarounds for some of these |
108 | broken SAL symptoms, but some simply cannot be fixed from the OS side. | 108 | broken SAL symptoms, but some simply cannot be fixed from the OS side. |
109 | 109 | ||
110 | --- | 110 | --- |
111 | 111 | ||
112 | The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer | 112 | The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer |
113 | violations. Unfortunately MCA/INIT start off as massive layer | 113 | violations. Unfortunately MCA/INIT start off as massive layer |
114 | violations (can occur at _any_ time) and they build from there. | 114 | violations (can occur at _any_ time) and they build from there. |
115 | 115 | ||
116 | At least ia64 makes an attempt at recovering from hardware errors, but | 116 | At least ia64 makes an attempt at recovering from hardware errors, but |
117 | it is a difficult problem because of the asynchronous nature of these | 117 | it is a difficult problem because of the asynchronous nature of these |
118 | errors. When processing an unmaskable interrupt we sometimes need | 118 | errors. When processing an unmaskable interrupt we sometimes need |
119 | special code to cope with our inability to take any locks. | 119 | special code to cope with our inability to take any locks. |
120 | 120 | ||
121 | --- | 121 | --- |
122 | 122 | ||
123 | How is ia64 MCA/INIT different from x86 NMI? | 123 | How is ia64 MCA/INIT different from x86 NMI? |
124 | 124 | ||
125 | * x86 NMI typically gets delivered to one cpu. MCA/INIT gets sent to | 125 | * x86 NMI typically gets delivered to one cpu. MCA/INIT gets sent to |
126 | all cpus. | 126 | all cpus. |
127 | 127 | ||
128 | * x86 NMI cannot be nested. MCA/INIT can be nested, to a depth of 2 | 128 | * x86 NMI cannot be nested. MCA/INIT can be nested, to a depth of 2 |
129 | per cpu. | 129 | per cpu. |
130 | 130 | ||
131 | * x86 has a separate struct task which points to one of multiple kernel | 131 | * x86 has a separate struct task which points to one of multiple kernel |
132 | stacks. ia64 has the struct task embedded in the single kernel | 132 | stacks. ia64 has the struct task embedded in the single kernel |
133 | stack, so switching stack means switching task. | 133 | stack, so switching stack means switching task. |
134 | 134 | ||
135 | * x86 does not call the BIOS so the NMI handler does not have to worry | 135 | * x86 does not call the BIOS so the NMI handler does not have to worry |
136 | about any registers having changed. MCA/INIT can occur while the cpu | 136 | about any registers having changed. MCA/INIT can occur while the cpu |
137 | is in PAL in physical mode, with undefined registers and an undefined | 137 | is in PAL in physical mode, with undefined registers and an undefined |
138 | kernel stack. | 138 | kernel stack. |
139 | 139 | ||
140 | * i386 backtrace is not very sensitive to whether a process is running | 140 | * i386 backtrace is not very sensitive to whether a process is running |
141 | or not. ia64 unwind is very, very sensitive to whether a process is | 141 | or not. ia64 unwind is very, very sensitive to whether a process is |
142 | running or not. | 142 | running or not. |
143 | 143 | ||
144 | --- | 144 | --- |
145 | 145 | ||
146 | What happens when MCA/INIT is delivered what a cpu is running user | 146 | What happens when MCA/INIT is delivered what a cpu is running user |
147 | space code? | 147 | space code? |
148 | 148 | ||
149 | The user mode registers are stored in the RSE area of the MCA/INIT on | 149 | The user mode registers are stored in the RSE area of the MCA/INIT on |
150 | entry to the OS and are restored from there on return to SAL, so user | 150 | entry to the OS and are restored from there on return to SAL, so user |
151 | mode registers are preserved across a recoverable MCA/INIT. Since the | 151 | mode registers are preserved across a recoverable MCA/INIT. Since the |
152 | OS has no idea what unwind data is available for the user space stack, | 152 | OS has no idea what unwind data is available for the user space stack, |
153 | MCA/INIT never tries to backtrace user space. Which means that the OS | 153 | MCA/INIT never tries to backtrace user space. Which means that the OS |
154 | does not bother making the user space process look like a blocked task, | 154 | does not bother making the user space process look like a blocked task, |
155 | i.e. the OS does not copy pt_regs and switch_stack to the user space | 155 | i.e. the OS does not copy pt_regs and switch_stack to the user space |
156 | stack. Also the OS has no idea how big the user space RSE and memory | 156 | stack. Also the OS has no idea how big the user space RSE and memory |
157 | stacks are, which makes it too risky to copy the saved state to a user | 157 | stacks are, which makes it too risky to copy the saved state to a user |
158 | mode stack. | 158 | mode stack. |
159 | 159 | ||
160 | --- | 160 | --- |
161 | 161 | ||
162 | How do we get a backtrace on the tasks that were running when MCA/INIT | 162 | How do we get a backtrace on the tasks that were running when MCA/INIT |
163 | was delivered? | 163 | was delivered? |
164 | 164 | ||
165 | mca.c:::ia64_mca_modify_original_stack(). That identifies and | 165 | mca.c:::ia64_mca_modify_original_stack(). That identifies and |
166 | verifies the original kernel stack, copies the dirty registers from | 166 | verifies the original kernel stack, copies the dirty registers from |
167 | the MCA/INIT stack's RSE to the original stack's RSE, copies the | 167 | the MCA/INIT stack's RSE to the original stack's RSE, copies the |
168 | skeleton struct pt_regs and switch_stack to the original stack, fills | 168 | skeleton struct pt_regs and switch_stack to the original stack, fills |
169 | in the skeleton structures from the PAL minstate area and updates the | 169 | in the skeleton structures from the PAL minstate area and updates the |
170 | original stack's thread.ksp. That makes the original stack look | 170 | original stack's thread.ksp. That makes the original stack look |
171 | exactly like any other blocked task, i.e. it now appears to be | 171 | exactly like any other blocked task, i.e. it now appears to be |
172 | sleeping. To get a backtrace, just start with thread.ksp for the | 172 | sleeping. To get a backtrace, just start with thread.ksp for the |
173 | original task and unwind like any other sleeping task. | 173 | original task and unwind like any other sleeping task. |
174 | 174 | ||
175 | --- | 175 | --- |
176 | 176 | ||
177 | How do we identify the tasks that were running when MCA/INIT was | 177 | How do we identify the tasks that were running when MCA/INIT was |
178 | delivered? | 178 | delivered? |
179 | 179 | ||
180 | If the previous task has been verified and converted to a blocked | 180 | If the previous task has been verified and converted to a blocked |
181 | state, then sos->prev_task on the MCA/INIT stack is updated to point to | 181 | state, then sos->prev_task on the MCA/INIT stack is updated to point to |
182 | the previous task. You can look at that field in dumps or debuggers. | 182 | the previous task. You can look at that field in dumps or debuggers. |
183 | To help distinguish between the handler and the original tasks, | 183 | To help distinguish between the handler and the original tasks, |
184 | handlers have _TIF_MCA_INIT set in thread_info.flags. | 184 | handlers have _TIF_MCA_INIT set in thread_info.flags. |
185 | 185 | ||
186 | The sos data is always in the MCA/INIT handler stack, at offset | 186 | The sos data is always in the MCA/INIT handler stack, at offset |
187 | MCA_SOS_OFFSET. You can get that value from mca_asm.h or calculate it | 187 | MCA_SOS_OFFSET. You can get that value from mca_asm.h or calculate it |
188 | as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct | 188 | as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct |
189 | ia64_sal_os_state), with 16 byte alignment for all structures. | 189 | ia64_sal_os_state), with 16 byte alignment for all structures. |
190 | 190 | ||
191 | Also the comm field of the MCA/INIT task is modified to include the pid | 191 | Also the comm field of the MCA/INIT task is modified to include the pid |
192 | of the original task, for humans to use. For example, a comm field of | 192 | of the original task, for humans to use. For example, a comm field of |
193 | 'MCA 12159' means that pid 12159 was running when the MCA was | 193 | 'MCA 12159' means that pid 12159 was running when the MCA was |
194 | delivered. | 194 | delivered. |
195 | 195 |
Documentation/input/input.txt
1 | Linux Input drivers v1.0 | 1 | Linux Input drivers v1.0 |
2 | (c) 1999-2001 Vojtech Pavlik <vojtech@ucw.cz> | 2 | (c) 1999-2001 Vojtech Pavlik <vojtech@ucw.cz> |
3 | Sponsored by SuSE | 3 | Sponsored by SuSE |
4 | $Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $ | 4 | $Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $ |
5 | ---------------------------------------------------------------------------- | 5 | ---------------------------------------------------------------------------- |
6 | 6 | ||
7 | 0. Disclaimer | 7 | 0. Disclaimer |
8 | ~~~~~~~~~~~~~ | 8 | ~~~~~~~~~~~~~ |
9 | This program is free software; you can redistribute it and/or modify it | 9 | This program is free software; you can redistribute it and/or modify it |
10 | under the terms of the GNU General Public License as published by the Free | 10 | under the terms of the GNU General Public License as published by the Free |
11 | Software Foundation; either version 2 of the License, or (at your option) | 11 | Software Foundation; either version 2 of the License, or (at your option) |
12 | any later version. | 12 | any later version. |
13 | 13 | ||
14 | This program is distributed in the hope that it will be useful, but | 14 | This program is distributed in the hope that it will be useful, but |
15 | WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY | 15 | WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
16 | or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | 16 | or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for |
17 | more details. | 17 | more details. |
18 | 18 | ||
19 | You should have received a copy of the GNU General Public License along | 19 | You should have received a copy of the GNU General Public License along |
20 | with this program; if not, write to the Free Software Foundation, Inc., 59 | 20 | with this program; if not, write to the Free Software Foundation, Inc., 59 |
21 | Temple Place, Suite 330, Boston, MA 02111-1307 USA | 21 | Temple Place, Suite 330, Boston, MA 02111-1307 USA |
22 | 22 | ||
23 | Should you need to contact me, the author, you can do so either by e-mail | 23 | Should you need to contact me, the author, you can do so either by e-mail |
24 | - mail your message to <vojtech@ucw.cz>, or by paper mail: Vojtech Pavlik, | 24 | - mail your message to <vojtech@ucw.cz>, or by paper mail: Vojtech Pavlik, |
25 | Simunkova 1594, Prague 8, 182 00 Czech Republic | 25 | Simunkova 1594, Prague 8, 182 00 Czech Republic |
26 | 26 | ||
27 | For your convenience, the GNU General Public License version 2 is included | 27 | For your convenience, the GNU General Public License version 2 is included |
28 | in the package: See the file COPYING. | 28 | in the package: See the file COPYING. |
29 | 29 | ||
30 | 1. Introduction | 30 | 1. Introduction |
31 | ~~~~~~~~~~~~~~~ | 31 | ~~~~~~~~~~~~~~~ |
32 | This is a collection of drivers that is designed to support all input | 32 | This is a collection of drivers that is designed to support all input |
33 | devices under Linux. While it is currently used only on for USB input | 33 | devices under Linux. While it is currently used only on for USB input |
34 | devices, future use (say 2.5/2.6) is expected to expand to replace | 34 | devices, future use (say 2.5/2.6) is expected to expand to replace |
35 | most of the existing input system, which is why it lives in | 35 | most of the existing input system, which is why it lives in |
36 | drivers/input/ instead of drivers/usb/. | 36 | drivers/input/ instead of drivers/usb/. |
37 | 37 | ||
38 | The centre of the input drivers is the input module, which must be | 38 | The centre of the input drivers is the input module, which must be |
39 | loaded before any other of the input modules - it serves as a way of | 39 | loaded before any other of the input modules - it serves as a way of |
40 | communication between two groups of modules: | 40 | communication between two groups of modules: |
41 | 41 | ||
42 | 1.1 Device drivers | 42 | 1.1 Device drivers |
43 | ~~~~~~~~~~~~~~~~~~ | 43 | ~~~~~~~~~~~~~~~~~~ |
44 | These modules talk to the hardware (for example via USB), and provide | 44 | These modules talk to the hardware (for example via USB), and provide |
45 | events (keystrokes, mouse movements) to the input module. | 45 | events (keystrokes, mouse movements) to the input module. |
46 | 46 | ||
47 | 1.2 Event handlers | 47 | 1.2 Event handlers |
48 | ~~~~~~~~~~~~~~~~~~ | 48 | ~~~~~~~~~~~~~~~~~~ |
49 | These modules get events from input and pass them where needed via | 49 | These modules get events from input and pass them where needed via |
50 | various interfaces - keystrokes to the kernel, mouse movements via a | 50 | various interfaces - keystrokes to the kernel, mouse movements via a |
51 | simulated PS/2 interface to GPM and X and so on. | 51 | simulated PS/2 interface to GPM and X and so on. |
52 | 52 | ||
53 | 2. Simple Usage | 53 | 2. Simple Usage |
54 | ~~~~~~~~~~~~~~~ | 54 | ~~~~~~~~~~~~~~~ |
55 | For the most usual configuration, with one USB mouse and one USB keyboard, | 55 | For the most usual configuration, with one USB mouse and one USB keyboard, |
56 | you'll have to load the following modules (or have them built in to the | 56 | you'll have to load the following modules (or have them built in to the |
57 | kernel): | 57 | kernel): |
58 | 58 | ||
59 | input | 59 | input |
60 | mousedev | 60 | mousedev |
61 | keybdev | 61 | keybdev |
62 | usbcore | 62 | usbcore |
63 | uhci_hcd or ohci_hcd or ehci_hcd | 63 | uhci_hcd or ohci_hcd or ehci_hcd |
64 | usbhid | 64 | usbhid |
65 | 65 | ||
66 | After this, the USB keyboard will work straight away, and the USB mouse | 66 | After this, the USB keyboard will work straight away, and the USB mouse |
67 | will be available as a character device on major 13, minor 63: | 67 | will be available as a character device on major 13, minor 63: |
68 | 68 | ||
69 | crw-r--r-- 1 root root 13, 63 Mar 28 22:45 mice | 69 | crw-r--r-- 1 root root 13, 63 Mar 28 22:45 mice |
70 | 70 | ||
71 | This device has to be created. | 71 | This device has to be created. |
72 | The commands to create it by hand are: | 72 | The commands to create it by hand are: |
73 | 73 | ||
74 | cd /dev | 74 | cd /dev |
75 | mkdir input | 75 | mkdir input |
76 | mknod input/mice c 13 63 | 76 | mknod input/mice c 13 63 |
77 | 77 | ||
78 | After that you have to point GPM (the textmode mouse cut&paste tool) and | 78 | After that you have to point GPM (the textmode mouse cut&paste tool) and |
79 | XFree to this device to use it - GPM should be called like: | 79 | XFree to this device to use it - GPM should be called like: |
80 | 80 | ||
81 | gpm -t ps2 -m /dev/input/mice | 81 | gpm -t ps2 -m /dev/input/mice |
82 | 82 | ||
83 | And in X: | 83 | And in X: |
84 | 84 | ||
85 | Section "Pointer" | 85 | Section "Pointer" |
86 | Protocol "ImPS/2" | 86 | Protocol "ImPS/2" |
87 | Device "/dev/input/mice" | 87 | Device "/dev/input/mice" |
88 | ZAxisMapping 4 5 | 88 | ZAxisMapping 4 5 |
89 | EndSection | 89 | EndSection |
90 | 90 | ||
91 | When you do all of the above, you can use your USB mouse and keyboard. | 91 | When you do all of the above, you can use your USB mouse and keyboard. |
92 | 92 | ||
93 | 3. Detailed Description | 93 | 3. Detailed Description |
94 | ~~~~~~~~~~~~~~~~~~~~~~~ | 94 | ~~~~~~~~~~~~~~~~~~~~~~~ |
95 | 3.1 Device drivers | 95 | 3.1 Device drivers |
96 | ~~~~~~~~~~~~~~~~~~ | 96 | ~~~~~~~~~~~~~~~~~~ |
97 | Device drivers are the modules that generate events. The events are | 97 | Device drivers are the modules that generate events. The events are |
98 | however not useful without being handled, so you also will need to use some | 98 | however not useful without being handled, so you also will need to use some |
99 | of the modules from section 3.2. | 99 | of the modules from section 3.2. |
100 | 100 | ||
101 | 3.1.1 usbhid | 101 | 3.1.1 usbhid |
102 | ~~~~~~~~~~~~ | 102 | ~~~~~~~~~~~~ |
103 | usbhid is the largest and most complex driver of the whole suite. It | 103 | usbhid is the largest and most complex driver of the whole suite. It |
104 | handles all HID devices, and because there is a very wide variety of them, | 104 | handles all HID devices, and because there is a very wide variety of them, |
105 | and because the USB HID specification isn't simple, it needs to be this big. | 105 | and because the USB HID specification isn't simple, it needs to be this big. |
106 | 106 | ||
107 | Currently, it handles USB mice, joysticks, gamepads, steering wheels | 107 | Currently, it handles USB mice, joysticks, gamepads, steering wheels |
108 | keyboards, trackballs and digitizers. | 108 | keyboards, trackballs and digitizers. |
109 | 109 | ||
110 | However, USB uses HID also for monitor controls, speaker controls, UPSs, | 110 | However, USB uses HID also for monitor controls, speaker controls, UPSs, |
111 | LCDs and many other purposes. | 111 | LCDs and many other purposes. |
112 | 112 | ||
113 | The monitor and speaker controls should be easy to add to the hid/input | 113 | The monitor and speaker controls should be easy to add to the hid/input |
114 | interface, but for the UPSs and LCDs it doesn't make much sense. For this, | 114 | interface, but for the UPSs and LCDs it doesn't make much sense. For this, |
115 | the hiddev interface was designed. See Documentation/usb/hiddev.txt | 115 | the hiddev interface was designed. See Documentation/usb/hiddev.txt |
116 | for more information about it. | 116 | for more information about it. |
117 | 117 | ||
118 | The usage of the usbhid module is very simple, it takes no parameters, | 118 | The usage of the usbhid module is very simple, it takes no parameters, |
119 | detects everything automatically and when a HID device is inserted, it | 119 | detects everything automatically and when a HID device is inserted, it |
120 | detects it appropriately. | 120 | detects it appropriately. |
121 | 121 | ||
122 | However, because the devices vary wildly, you might happen to have a | 122 | However, because the devices vary wildly, you might happen to have a |
123 | device that doesn't work well. In that case #define DEBUG at the beginning | 123 | device that doesn't work well. In that case #define DEBUG at the beginning |
124 | of hid-core.c and send me the syslog traces. | 124 | of hid-core.c and send me the syslog traces. |
125 | 125 | ||
126 | 3.1.2 usbmouse | 126 | 3.1.2 usbmouse |
127 | ~~~~~~~~~~~~~~ | 127 | ~~~~~~~~~~~~~~ |
128 | For embedded systems, for mice with broken HID descriptors and just any | 128 | For embedded systems, for mice with broken HID descriptors and just any |
129 | other use when the big usbhid wouldn't be a good choice, there is the | 129 | other use when the big usbhid wouldn't be a good choice, there is the |
130 | usbmouse driver. It handles USB mice only. It uses a simpler HIDBP | 130 | usbmouse driver. It handles USB mice only. It uses a simpler HIDBP |
131 | protocol. This also means the mice must support this simpler protocol. Not | 131 | protocol. This also means the mice must support this simpler protocol. Not |
132 | all do. If you don't have any strong reason to use this module, use usbhid | 132 | all do. If you don't have any strong reason to use this module, use usbhid |
133 | instead. | 133 | instead. |
134 | 134 | ||
135 | 3.1.3 usbkbd | 135 | 3.1.3 usbkbd |
136 | ~~~~~~~~~~~~ | 136 | ~~~~~~~~~~~~ |
137 | Much like usbmouse, this module talks to keyboards with a simplified | 137 | Much like usbmouse, this module talks to keyboards with a simplified |
138 | HIDBP protocol. It's smaller, but doesn't support any extra special keys. | 138 | HIDBP protocol. It's smaller, but doesn't support any extra special keys. |
139 | Use usbhid instead if there isn't any special reason to use this. | 139 | Use usbhid instead if there isn't any special reason to use this. |
140 | 140 | ||
141 | 3.1.4 wacom | 141 | 3.1.4 wacom |
142 | ~~~~~~~~~~~ | 142 | ~~~~~~~~~~~ |
143 | This is a driver for Wacom Graphire and Intuos tablets. Not for Wacom | 143 | This is a driver for Wacom Graphire and Intuos tablets. Not for Wacom |
144 | PenPartner, that one is handled by the HID driver. Although the Intuos and | 144 | PenPartner, that one is handled by the HID driver. Although the Intuos and |
145 | Graphire tablets claim that they are HID tablets as well, they are not and | 145 | Graphire tablets claim that they are HID tablets as well, they are not and |
146 | thus need this specific driver. | 146 | thus need this specific driver. |
147 | 147 | ||
148 | 3.1.5 iforce | 148 | 3.1.5 iforce |
149 | ~~~~~~~~~~~~ | 149 | ~~~~~~~~~~~~ |
150 | A driver for I-Force joysticks and wheels, both over USB and RS232. | 150 | A driver for I-Force joysticks and wheels, both over USB and RS232. |
151 | It includes ForceFeedback support now, even though Immersion | 151 | It includes ForceFeedback support now, even though Immersion |
152 | Corp. considers the protocol a trade secret and won't disclose a word | 152 | Corp. considers the protocol a trade secret and won't disclose a word |
153 | about it. | 153 | about it. |
154 | 154 | ||
155 | 3.2 Event handlers | 155 | 3.2 Event handlers |
156 | ~~~~~~~~~~~~~~~~~~ | 156 | ~~~~~~~~~~~~~~~~~~ |
157 | Event handlers distribute the events from the devices to userland and | 157 | Event handlers distribute the events from the devices to userland and |
158 | kernel, as needed. | 158 | kernel, as needed. |
159 | 159 | ||
160 | 3.2.1 keybdev | 160 | 3.2.1 keybdev |
161 | ~~~~~~~~~~~~~ | 161 | ~~~~~~~~~~~~~ |
162 | keybdev is currently a rather ugly hack that translates the input | 162 | keybdev is currently a rather ugly hack that translates the input |
163 | events into architecture-specific keyboard raw mode (Xlated AT Set2 on | 163 | events into architecture-specific keyboard raw mode (Xlated AT Set2 on |
164 | x86), and passes them into the handle_scancode function of the | 164 | x86), and passes them into the handle_scancode function of the |
165 | keyboard.c module. This works well enough on all architectures that | 165 | keyboard.c module. This works well enough on all architectures that |
166 | keybdev can generate rawmode on, other architectures can be added to | 166 | keybdev can generate rawmode on, other architectures can be added to |
167 | it. | 167 | it. |
168 | 168 | ||
169 | The right way would be to pass the events to keyboard.c directly, | 169 | The right way would be to pass the events to keyboard.c directly, |
170 | best if keyboard.c would itself be an event handler. This is done in | 170 | best if keyboard.c would itself be an event handler. This is done in |
171 | the input patch, available on the webpage mentioned below. | 171 | the input patch, available on the webpage mentioned below. |
172 | 172 | ||
173 | 3.2.2 mousedev | 173 | 3.2.2 mousedev |
174 | ~~~~~~~~~~~~~~ | 174 | ~~~~~~~~~~~~~~ |
175 | mousedev is also a hack to make programs that use mouse input | 175 | mousedev is also a hack to make programs that use mouse input |
176 | work. It takes events from either mice or digitizers/tablets and makes | 176 | work. It takes events from either mice or digitizers/tablets and makes |
177 | a PS/2-style (a la /dev/psaux) mouse device available to the | 177 | a PS/2-style (a la /dev/psaux) mouse device available to the |
178 | userland. Ideally, the programs could use a more reasonable interface, | 178 | userland. Ideally, the programs could use a more reasonable interface, |
179 | for example evdev | 179 | for example evdev |
180 | 180 | ||
181 | Mousedev devices in /dev/input (as shown above) are: | 181 | Mousedev devices in /dev/input (as shown above) are: |
182 | 182 | ||
183 | crw-r--r-- 1 root root 13, 32 Mar 28 22:45 mouse0 | 183 | crw-r--r-- 1 root root 13, 32 Mar 28 22:45 mouse0 |
184 | crw-r--r-- 1 root root 13, 33 Mar 29 00:41 mouse1 | 184 | crw-r--r-- 1 root root 13, 33 Mar 29 00:41 mouse1 |
185 | crw-r--r-- 1 root root 13, 34 Mar 29 00:41 mouse2 | 185 | crw-r--r-- 1 root root 13, 34 Mar 29 00:41 mouse2 |
186 | crw-r--r-- 1 root root 13, 35 Apr 1 10:50 mouse3 | 186 | crw-r--r-- 1 root root 13, 35 Apr 1 10:50 mouse3 |
187 | ... | 187 | ... |
188 | ... | 188 | ... |
189 | crw-r--r-- 1 root root 13, 62 Apr 1 10:50 mouse30 | 189 | crw-r--r-- 1 root root 13, 62 Apr 1 10:50 mouse30 |
190 | crw-r--r-- 1 root root 13, 63 Apr 1 10:50 mice | 190 | crw-r--r-- 1 root root 13, 63 Apr 1 10:50 mice |
191 | 191 | ||
192 | Each 'mouse' device is assigned to a single mouse or digitizer, except | 192 | Each 'mouse' device is assigned to a single mouse or digitizer, except |
193 | the last one - 'mice'. This single character device is shared by all | 193 | the last one - 'mice'. This single character device is shared by all |
194 | mice and digitizers, and even if none are connected, the device is | 194 | mice and digitizers, and even if none are connected, the device is |
195 | present. This is useful for hotplugging USB mice, so that programs | 195 | present. This is useful for hotplugging USB mice, so that programs |
196 | can open the device even when no mice are present. | 196 | can open the device even when no mice are present. |
197 | 197 | ||
198 | CONFIG_INPUT_MOUSEDEV_SCREEN_[XY] in the kernel configuration are | 198 | CONFIG_INPUT_MOUSEDEV_SCREEN_[XY] in the kernel configuration are |
199 | the size of your screen (in pixels) in XFree86. This is needed if you | 199 | the size of your screen (in pixels) in XFree86. This is needed if you |
200 | want to use your digitizer in X, because its movement is sent to X | 200 | want to use your digitizer in X, because its movement is sent to X |
201 | via a virtual PS/2 mouse and thus needs to be scaled | 201 | via a virtual PS/2 mouse and thus needs to be scaled |
202 | accordingly. These values won't be used if you use a mouse only. | 202 | accordingly. These values won't be used if you use a mouse only. |
203 | 203 | ||
204 | Mousedev will generate either PS/2, ImPS/2 (Microsoft IntelliMouse) or | 204 | Mousedev will generate either PS/2, ImPS/2 (Microsoft IntelliMouse) or |
205 | ExplorerPS/2 (IntelliMouse Explorer) protocols, depending on what the | 205 | ExplorerPS/2 (IntelliMouse Explorer) protocols, depending on what the |
206 | program reading the data wishes. You can set GPM and X to any of | 206 | program reading the data wishes. You can set GPM and X to any of |
207 | these. You'll need ImPS/2 if you want to make use of a wheel on a USB | 207 | these. You'll need ImPS/2 if you want to make use of a wheel on a USB |
208 | mouse and ExplorerPS/2 if you want to use extra (up to 5) buttons. | 208 | mouse and ExplorerPS/2 if you want to use extra (up to 5) buttons. |
209 | 209 | ||
210 | 3.2.3 joydev | 210 | 3.2.3 joydev |
211 | ~~~~~~~~~~~~ | 211 | ~~~~~~~~~~~~ |
212 | Joydev implements v0.x and v1.x Linux joystick api, much like | 212 | Joydev implements v0.x and v1.x Linux joystick api, much like |
213 | drivers/char/joystick/joystick.c used to in earlier versions. See | 213 | drivers/char/joystick/joystick.c used to in earlier versions. See |
214 | joystick-api.txt in the Documentation subdirectory for details. As | 214 | joystick-api.txt in the Documentation subdirectory for details. As |
215 | soon as any joystick is connected, it can be accessed in /dev/input | 215 | soon as any joystick is connected, it can be accessed in /dev/input |
216 | on: | 216 | on: |
217 | 217 | ||
218 | crw-r--r-- 1 root root 13, 0 Apr 1 10:50 js0 | 218 | crw-r--r-- 1 root root 13, 0 Apr 1 10:50 js0 |
219 | crw-r--r-- 1 root root 13, 1 Apr 1 10:50 js1 | 219 | crw-r--r-- 1 root root 13, 1 Apr 1 10:50 js1 |
220 | crw-r--r-- 1 root root 13, 2 Apr 1 10:50 js2 | 220 | crw-r--r-- 1 root root 13, 2 Apr 1 10:50 js2 |
221 | crw-r--r-- 1 root root 13, 3 Apr 1 10:50 js3 | 221 | crw-r--r-- 1 root root 13, 3 Apr 1 10:50 js3 |
222 | ... | 222 | ... |
223 | 223 | ||
224 | And so on up to js31. | 224 | And so on up to js31. |
225 | 225 | ||
226 | 3.2.4 evdev | 226 | 3.2.4 evdev |
227 | ~~~~~~~~~~~ | 227 | ~~~~~~~~~~~ |
228 | evdev is the generic input event interface. It passes the events | 228 | evdev is the generic input event interface. It passes the events |
229 | generated in the kernel straight to the program, with timestamps. The | 229 | generated in the kernel straight to the program, with timestamps. The |
230 | API is still evolving, but should be useable now. It's described in | 230 | API is still evolving, but should be useable now. It's described in |
231 | section 5. | 231 | section 5. |
232 | 232 | ||
233 | This should be the way for GPM and X to get keyboard and mouse mouse | 233 | This should be the way for GPM and X to get keyboard and mouse |
234 | events. It allows for multihead in X without any specific multihead | 234 | events. It allows for multihead in X without any specific multihead |
235 | kernel support. The event codes are the same on all architectures and | 235 | kernel support. The event codes are the same on all architectures and |
236 | are hardware independent. | 236 | are hardware independent. |
237 | 237 | ||
238 | The devices are in /dev/input: | 238 | The devices are in /dev/input: |
239 | 239 | ||
240 | crw-r--r-- 1 root root 13, 64 Apr 1 10:49 event0 | 240 | crw-r--r-- 1 root root 13, 64 Apr 1 10:49 event0 |
241 | crw-r--r-- 1 root root 13, 65 Apr 1 10:50 event1 | 241 | crw-r--r-- 1 root root 13, 65 Apr 1 10:50 event1 |
242 | crw-r--r-- 1 root root 13, 66 Apr 1 10:50 event2 | 242 | crw-r--r-- 1 root root 13, 66 Apr 1 10:50 event2 |
243 | crw-r--r-- 1 root root 13, 67 Apr 1 10:50 event3 | 243 | crw-r--r-- 1 root root 13, 67 Apr 1 10:50 event3 |
244 | ... | 244 | ... |
245 | 245 | ||
246 | And so on up to event31. | 246 | And so on up to event31. |
247 | 247 | ||
248 | 4. Verifying if it works | 248 | 4. Verifying if it works |
249 | ~~~~~~~~~~~~~~~~~~~~~~~~ | 249 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
250 | Typing a couple keys on the keyboard should be enough to check that | 250 | Typing a couple keys on the keyboard should be enough to check that |
251 | a USB keyboard works and is correctly connected to the kernel keyboard | 251 | a USB keyboard works and is correctly connected to the kernel keyboard |
252 | driver. | 252 | driver. |
253 | 253 | ||
254 | Doing a cat /dev/input/mouse0 (c, 13, 32) will verify that a mouse | 254 | Doing a cat /dev/input/mouse0 (c, 13, 32) will verify that a mouse |
255 | is also emulated, characters should appear if you move it. | 255 | is also emulated, characters should appear if you move it. |
256 | 256 | ||
257 | You can test the joystick emulation with the 'jstest' utility, | 257 | You can test the joystick emulation with the 'jstest' utility, |
258 | available in the joystick package (see Documentation/input/joystick.txt). | 258 | available in the joystick package (see Documentation/input/joystick.txt). |
259 | 259 | ||
260 | You can test the event devices with the 'evtest' utility available | 260 | You can test the event devices with the 'evtest' utility available |
261 | in the LinuxConsole project CVS archive (see the URL below). | 261 | in the LinuxConsole project CVS archive (see the URL below). |
262 | 262 | ||
263 | 5. Event interface | 263 | 5. Event interface |
264 | ~~~~~~~~~~~~~~~~~~ | 264 | ~~~~~~~~~~~~~~~~~~ |
265 | Should you want to add event device support into any application (X, gpm, | 265 | Should you want to add event device support into any application (X, gpm, |
266 | svgalib ...) I <vojtech@ucw.cz> will be happy to provide you any help I | 266 | svgalib ...) I <vojtech@ucw.cz> will be happy to provide you any help I |
267 | can. Here goes a description of the current state of things, which is going | 267 | can. Here goes a description of the current state of things, which is going |
268 | to be extended, but not changed incompatibly as time goes: | 268 | to be extended, but not changed incompatibly as time goes: |
269 | 269 | ||
270 | You can use blocking and nonblocking reads, also select() on the | 270 | You can use blocking and nonblocking reads, also select() on the |
271 | /dev/input/eventX devices, and you'll always get a whole number of input | 271 | /dev/input/eventX devices, and you'll always get a whole number of input |
272 | events on a read. Their layout is: | 272 | events on a read. Their layout is: |
273 | 273 | ||
274 | struct input_event { | 274 | struct input_event { |
275 | struct timeval time; | 275 | struct timeval time; |
276 | unsigned short type; | 276 | unsigned short type; |
277 | unsigned short code; | 277 | unsigned short code; |
278 | unsigned int value; | 278 | unsigned int value; |
279 | }; | 279 | }; |
280 | 280 | ||
281 | 'time' is the timestamp, it returns the time at which the event happened. | 281 | 'time' is the timestamp, it returns the time at which the event happened. |
282 | Type is for example EV_REL for relative moment, REL_KEY for a keypress or | 282 | Type is for example EV_REL for relative moment, REL_KEY for a keypress or |
283 | release. More types are defined in include/linux/input.h. | 283 | release. More types are defined in include/linux/input.h. |
284 | 284 | ||
285 | 'code' is event code, for example REL_X or KEY_BACKSPACE, again a complete | 285 | 'code' is event code, for example REL_X or KEY_BACKSPACE, again a complete |
286 | list is in include/linux/input.h. | 286 | list is in include/linux/input.h. |
287 | 287 | ||
288 | 'value' is the value the event carries. Either a relative change for | 288 | 'value' is the value the event carries. Either a relative change for |
289 | EV_REL, absolute new value for EV_ABS (joysticks ...), or 0 for EV_KEY for | 289 | EV_REL, absolute new value for EV_ABS (joysticks ...), or 0 for EV_KEY for |
290 | release, 1 for keypress and 2 for autorepeat. | 290 | release, 1 for keypress and 2 for autorepeat. |
291 | 291 | ||
292 | 292 |
Documentation/isdn/INTERFACE.fax
1 | $Id: INTERFACE.fax,v 1.2 2000/08/06 09:22:50 armin Exp $ | 1 | $Id: INTERFACE.fax,v 1.2 2000/08/06 09:22:50 armin Exp $ |
2 | 2 | ||
3 | 3 | ||
4 | Description of the fax-subinterface between linklevel and hardwarelevel of | 4 | Description of the fax-subinterface between linklevel and hardwarelevel of |
5 | isdn4linux. | 5 | isdn4linux. |
6 | 6 | ||
7 | The communication between linklevel (LL) and hardwarelevel (HL) for fax | 7 | The communication between linklevel (LL) and hardwarelevel (HL) for fax |
8 | is based on the struct T30_s (defined in isdnif.h). | 8 | is based on the struct T30_s (defined in isdnif.h). |
9 | This struct is allocated in the LL. | 9 | This struct is allocated in the LL. |
10 | In order to use fax, the LL provides the pointer to this struct with the | 10 | In order to use fax, the LL provides the pointer to this struct with the |
11 | command ISDN_CMD_SETL3 (parm.fax). This pointer expires in case of hangup | 11 | command ISDN_CMD_SETL3 (parm.fax). This pointer expires in case of hangup |
12 | and when a new channel to a new connection is assigned. | 12 | and when a new channel to a new connection is assigned. |
13 | 13 | ||
14 | 14 | ||
15 | Data handling: | 15 | Data handling: |
16 | In send-mode the HL-driver has to handle the <DLE> codes and the bit-order | 16 | In send-mode the HL-driver has to handle the <DLE> codes and the bit-order |
17 | conversion by itself. | 17 | conversion by itself. |
18 | In receive-mode the LL-driver takes care of the bit-order conversion | 18 | In receive-mode the LL-driver takes care of the bit-order conversion |
19 | (specified by +FBOR) | 19 | (specified by +FBOR) |
20 | 20 | ||
21 | Structure T30_s description: | 21 | Structure T30_s description: |
22 | 22 | ||
23 | This structure stores the values (set by AT-commands), the remote- | 23 | This structure stores the values (set by AT-commands), the remote- |
24 | capability-values and the command-codes between LL and HL. | 24 | capability-values and the command-codes between LL and HL. |
25 | 25 | ||
26 | If the HL-driver receives ISDN_CMD_FAXCMD, all needed information | 26 | If the HL-driver receives ISDN_CMD_FAXCMD, all needed information |
27 | is in this struct set by the LL. | 27 | is in this struct set by the LL. |
28 | To signal information to the LL, the HL-driver has to set the | 28 | To signal information to the LL, the HL-driver has to set the |
29 | the parameters and use ISDN_STAT_FAXIND. | 29 | parameters and use ISDN_STAT_FAXIND. |
30 | (Please refer to INTERFACE) | 30 | (Please refer to INTERFACE) |
31 | 31 | ||
32 | Structure T30_s: | 32 | Structure T30_s: |
33 | 33 | ||
34 | All members are 8-bit unsigned (__u8) | 34 | All members are 8-bit unsigned (__u8) |
35 | 35 | ||
36 | - resolution | 36 | - resolution |
37 | - rate | 37 | - rate |
38 | - width | 38 | - width |
39 | - length | 39 | - length |
40 | - compression | 40 | - compression |
41 | - ecm | 41 | - ecm |
42 | - binary | 42 | - binary |
43 | - scantime | 43 | - scantime |
44 | - id[] | 44 | - id[] |
45 | Local faxmachine's parameters, set by +FDIS, +FDCS, +FLID, ... | 45 | Local faxmachine's parameters, set by +FDIS, +FDCS, +FLID, ... |
46 | 46 | ||
47 | - r_resolution | 47 | - r_resolution |
48 | - r_rate | 48 | - r_rate |
49 | - r_width | 49 | - r_width |
50 | - r_length | 50 | - r_length |
51 | - r_compression | 51 | - r_compression |
52 | - r_ecm | 52 | - r_ecm |
53 | - r_binary | 53 | - r_binary |
54 | - r_scantime | 54 | - r_scantime |
55 | - r_id[] | 55 | - r_id[] |
56 | Remote faxmachine's parameters. To be set by HL-driver. | 56 | Remote faxmachine's parameters. To be set by HL-driver. |
57 | 57 | ||
58 | - phase | 58 | - phase |
59 | Defines the actual state of fax connection. Set by HL or LL | 59 | Defines the actual state of fax connection. Set by HL or LL |
60 | depending on progress and type of connection. | 60 | depending on progress and type of connection. |
61 | If the phase changes because of an AT command, the LL driver | 61 | If the phase changes because of an AT command, the LL driver |
62 | changes this value. Otherwise the HL-driver takes care of it, but | 62 | changes this value. Otherwise the HL-driver takes care of it, but |
63 | only necessary on call establishment (from IDLE to PHASE_A). | 63 | only necessary on call establishment (from IDLE to PHASE_A). |
64 | (one of the constants ISDN_FAX_PHASE_[IDLE,A,B,C,D,E]) | 64 | (one of the constants ISDN_FAX_PHASE_[IDLE,A,B,C,D,E]) |
65 | 65 | ||
66 | - direction | 66 | - direction |
67 | Defines outgoing/send or incoming/receive connection. | 67 | Defines outgoing/send or incoming/receive connection. |
68 | (ISDN_TTY_FAX_CONN_[IN,OUT]) | 68 | (ISDN_TTY_FAX_CONN_[IN,OUT]) |
69 | 69 | ||
70 | - code | 70 | - code |
71 | Commands from LL to HL; possible constants : | 71 | Commands from LL to HL; possible constants : |
72 | ISDN_TTY_FAX_DR signals +FDR command to HL | 72 | ISDN_TTY_FAX_DR signals +FDR command to HL |
73 | 73 | ||
74 | ISDN_TTY_FAX_DT signals +FDT command to HL | 74 | ISDN_TTY_FAX_DT signals +FDT command to HL |
75 | 75 | ||
76 | ISDN_TTY_FAX_ET signals +FET command to HL | 76 | ISDN_TTY_FAX_ET signals +FET command to HL |
77 | 77 | ||
78 | 78 | ||
79 | Other than that the "code" is set with the hangup-code value at | 79 | Other than that the "code" is set with the hangup-code value at |
80 | the end of connection for the +FHNG message. | 80 | the end of connection for the +FHNG message. |
81 | 81 | ||
82 | - r_code | 82 | - r_code |
83 | Commands from HL to LL; possible constants : | 83 | Commands from HL to LL; possible constants : |
84 | ISDN_TTY_FAX_CFR output of +FCFR message. | 84 | ISDN_TTY_FAX_CFR output of +FCFR message. |
85 | 85 | ||
86 | ISDN_TTY_FAX_RID output of remote ID set in r_id[] | 86 | ISDN_TTY_FAX_RID output of remote ID set in r_id[] |
87 | (+FCSI/+FTSI on send/receive) | 87 | (+FCSI/+FTSI on send/receive) |
88 | 88 | ||
89 | ISDN_TTY_FAX_DCS output of +FDCS and CONNECT message, | 89 | ISDN_TTY_FAX_DCS output of +FDCS and CONNECT message, |
90 | switching to phase C. | 90 | switching to phase C. |
91 | 91 | ||
92 | ISDN_TTY_FAX_ET signals end of data, | 92 | ISDN_TTY_FAX_ET signals end of data, |
93 | switching to phase D. | 93 | switching to phase D. |
94 | 94 | ||
95 | ISDN_TTY_FAX_FCON signals the established, outgoing connection, | 95 | ISDN_TTY_FAX_FCON signals the established, outgoing connection, |
96 | switching to phase B. | 96 | switching to phase B. |
97 | 97 | ||
98 | ISDN_TTY_FAX_FCON_I signals the established, incoming connection, | 98 | ISDN_TTY_FAX_FCON_I signals the established, incoming connection, |
99 | switching to phase B. | 99 | switching to phase B. |
100 | 100 | ||
101 | ISDN_TTY_FAX_DIS output of +FDIS message and values. | 101 | ISDN_TTY_FAX_DIS output of +FDIS message and values. |
102 | 102 | ||
103 | ISDN_TTY_FAX_SENT signals that all data has been sent | 103 | ISDN_TTY_FAX_SENT signals that all data has been sent |
104 | and <DLE><ETX> is acknowledged, | 104 | and <DLE><ETX> is acknowledged, |
105 | OK message will be sent. | 105 | OK message will be sent. |
106 | 106 | ||
107 | ISDN_TTY_FAX_PTS signals a msg-confirmation (page sent successful), | 107 | ISDN_TTY_FAX_PTS signals a msg-confirmation (page sent successful), |
108 | depending on fet value: | 108 | depending on fet value: |
109 | 0: output OK message (more pages follow) | 109 | 0: output OK message (more pages follow) |
110 | 1: switching to phase B (next document) | 110 | 1: switching to phase B (next document) |
111 | 111 | ||
112 | ISDN_TTY_FAX_TRAIN_OK output of +FDCS and OK message (for receive mode). | 112 | ISDN_TTY_FAX_TRAIN_OK output of +FDCS and OK message (for receive mode). |
113 | 113 | ||
114 | ISDN_TTY_FAX_EOP signals end of data in receive mode, | 114 | ISDN_TTY_FAX_EOP signals end of data in receive mode, |
115 | switching to phase D. | 115 | switching to phase D. |
116 | 116 | ||
117 | ISDN_TTY_FAX_HNG output of the +FHNG and value set by code and | 117 | ISDN_TTY_FAX_HNG output of the +FHNG and value set by code and |
118 | OK message, switching to phase E. | 118 | OK message, switching to phase E. |
119 | 119 | ||
120 | 120 | ||
121 | - badlin | 121 | - badlin |
122 | Value of +FBADLIN | 122 | Value of +FBADLIN |
123 | 123 | ||
124 | - badmul | 124 | - badmul |
125 | Value of +FBADMUL | 125 | Value of +FBADMUL |
126 | 126 | ||
127 | - bor | 127 | - bor |
128 | Value of +FBOR | 128 | Value of +FBOR |
129 | 129 | ||
130 | - fet | 130 | - fet |
131 | Value of +FET command in send-mode. | 131 | Value of +FET command in send-mode. |
132 | Set by HL in receive-mode for +FET message. | 132 | Set by HL in receive-mode for +FET message. |
133 | 133 | ||
134 | - pollid[] | 134 | - pollid[] |
135 | ID-string, set by +FCIG | 135 | ID-string, set by +FCIG |
136 | 136 | ||
137 | - cq | 137 | - cq |
138 | Value of +FCQ | 138 | Value of +FCQ |
139 | 139 | ||
140 | - cr | 140 | - cr |
141 | Value of +FCR | 141 | Value of +FCR |
142 | 142 | ||
143 | - ctcrty | 143 | - ctcrty |
144 | Value of +FCTCRTY | 144 | Value of +FCTCRTY |
145 | 145 | ||
146 | - minsp | 146 | - minsp |
147 | Value of +FMINSP | 147 | Value of +FMINSP |
148 | 148 | ||
149 | - phcto | 149 | - phcto |
150 | Value of +FPHCTO | 150 | Value of +FPHCTO |
151 | 151 | ||
152 | - rel | 152 | - rel |
153 | Value of +FREL | 153 | Value of +FREL |
154 | 154 | ||
155 | - nbc | 155 | - nbc |
156 | Value of +FNBC (0,1) | 156 | Value of +FNBC (0,1) |
157 | (+FNBC is not a known class 2 fax command, I added this to change the | 157 | (+FNBC is not a known class 2 fax command, I added this to change the |
158 | automatic "best capabilities" connection in the eicon HL-driver) | 158 | automatic "best capabilities" connection in the eicon HL-driver) |
159 | 159 | ||
160 | 160 | ||
161 | Armin | 161 | Armin |
162 | mac@melware.de | 162 | mac@melware.de |
163 | 163 | ||
164 | 164 |
Documentation/isdn/README.hysdn
1 | $Id: README.hysdn,v 1.3.6.1 2001/02/10 14:41:19 kai Exp $ | 1 | $Id: README.hysdn,v 1.3.6.1 2001/02/10 14:41:19 kai Exp $ |
2 | The hysdn driver has been written by | 2 | The hysdn driver has been written by |
3 | by Werner Cornelius (werner@isdn4linux.de or werner@titro.de) | 3 | Werner Cornelius (werner@isdn4linux.de or werner@titro.de) |
4 | for Hypercope GmbH Aachen Germany. Hypercope agreed to publish this driver | 4 | for Hypercope GmbH Aachen Germany. Hypercope agreed to publish this driver |
5 | under the GNU General Public License. | 5 | under the GNU General Public License. |
6 | 6 | ||
7 | The CAPI 2.0-support was added by Ulrich Albrecht (ualbrecht@hypercope.de) | 7 | The CAPI 2.0-support was added by Ulrich Albrecht (ualbrecht@hypercope.de) |
8 | for Hypercope GmbH Aachen, Germany. | 8 | for Hypercope GmbH Aachen, Germany. |
9 | 9 | ||
10 | 10 | ||
11 | This program is free software; you can redistribute it and/or modify | 11 | This program is free software; you can redistribute it and/or modify |
12 | it under the terms of the GNU General Public License as published by | 12 | it under the terms of the GNU General Public License as published by |
13 | the Free Software Foundation; either version 2 of the License, or | 13 | the Free Software Foundation; either version 2 of the License, or |
14 | (at your option) any later version. | 14 | (at your option) any later version. |
15 | 15 | ||
16 | This program is distributed in the hope that it will be useful, | 16 | This program is distributed in the hope that it will be useful, |
17 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 17 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
18 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 18 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
19 | GNU General Public License for more details. | 19 | GNU General Public License for more details. |
20 | 20 | ||
21 | You should have received a copy of the GNU General Public License | 21 | You should have received a copy of the GNU General Public License |
22 | along with this program; if not, write to the Free Software | 22 | along with this program; if not, write to the Free Software |
23 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 23 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
24 | 24 | ||
25 | Table of contents | 25 | Table of contents |
26 | ================= | 26 | ================= |
27 | 27 | ||
28 | 1. About the driver | 28 | 1. About the driver |
29 | 29 | ||
30 | 2. Loading/Unloading the driver | 30 | 2. Loading/Unloading the driver |
31 | 31 | ||
32 | 3. Entries in the /proc filesystem | 32 | 3. Entries in the /proc filesystem |
33 | 33 | ||
34 | 4. The /proc/net/hysdn/cardconfX file | 34 | 4. The /proc/net/hysdn/cardconfX file |
35 | 35 | ||
36 | 5. The /proc/net/hysdn/cardlogX file | 36 | 5. The /proc/net/hysdn/cardlogX file |
37 | 37 | ||
38 | 6. Where to get additional info and help | 38 | 6. Where to get additional info and help |
39 | 39 | ||
40 | 40 | ||
41 | 1. About the driver | 41 | 1. About the driver |
42 | 42 | ||
43 | The drivers/isdn/hysdn subdir contains a driver for HYPERCOPEs active | 43 | The drivers/isdn/hysdn subdir contains a driver for HYPERCOPEs active |
44 | PCI isdn cards Champ, Ergo and Metro. To enable support for this cards | 44 | PCI isdn cards Champ, Ergo and Metro. To enable support for this cards |
45 | enable ISDN support in the kernel config and support for HYSDN cards in | 45 | enable ISDN support in the kernel config and support for HYSDN cards in |
46 | the active cards submenu. The driver may only be compiled and used if | 46 | the active cards submenu. The driver may only be compiled and used if |
47 | support for loadable modules and the process filesystem have been enabled. | 47 | support for loadable modules and the process filesystem have been enabled. |
48 | 48 | ||
49 | These cards provide two different interfaces to the kernel. Without the | 49 | These cards provide two different interfaces to the kernel. Without the |
50 | optional CAPI 2.0 support, they register as ethernet card. IP-routing | 50 | optional CAPI 2.0 support, they register as ethernet card. IP-routing |
51 | to a ISDN-destination is performed on the card itself. All necessary | 51 | to a ISDN-destination is performed on the card itself. All necessary |
52 | handlers for various protocols like ppp and others as well as config info | 52 | handlers for various protocols like ppp and others as well as config info |
53 | and firmware may be fetched from Hypercopes WWW-Site www.hypercope.de. | 53 | and firmware may be fetched from Hypercopes WWW-Site www.hypercope.de. |
54 | 54 | ||
55 | With CAPI 2.0 support enabled, the card can also be used as a CAPI 2.0 | 55 | With CAPI 2.0 support enabled, the card can also be used as a CAPI 2.0 |
56 | compliant devices with either CAPI 2.0 applications | 56 | compliant devices with either CAPI 2.0 applications |
57 | (check isdn4k-utils) or -using the capidrv module- as a regular | 57 | (check isdn4k-utils) or -using the capidrv module- as a regular |
58 | isdn4linux device. This is done via the same mechanism as with the | 58 | isdn4linux device. This is done via the same mechanism as with the |
59 | active AVM cards and in fact uses the same module. | 59 | active AVM cards and in fact uses the same module. |
60 | 60 | ||
61 | 61 | ||
62 | 2. Loading/Unloading the driver | 62 | 2. Loading/Unloading the driver |
63 | 63 | ||
64 | The module has no command line parameters and auto detects up to 10 cards | 64 | The module has no command line parameters and auto detects up to 10 cards |
65 | in the id-range 0-9. | 65 | in the id-range 0-9. |
66 | If a loaded driver shall be unloaded all open files in the /proc/net/hysdn | 66 | If a loaded driver shall be unloaded all open files in the /proc/net/hysdn |
67 | subdir need to be closed and all ethernet interfaces allocated by this | 67 | subdir need to be closed and all ethernet interfaces allocated by this |
68 | driver must be shut down. Otherwise the module counter will avoid a module | 68 | driver must be shut down. Otherwise the module counter will avoid a module |
69 | unload. | 69 | unload. |
70 | 70 | ||
71 | If you are using the CAPI 2.0-interface, make sure to load/modprobe the | 71 | If you are using the CAPI 2.0-interface, make sure to load/modprobe the |
72 | kernelcapi-module first. | 72 | kernelcapi-module first. |
73 | 73 | ||
74 | If you plan to use the capidrv-link to isdn4linux, make sure to load | 74 | If you plan to use the capidrv-link to isdn4linux, make sure to load |
75 | capidrv.o after all modules using this driver (i.e. after hysdn and | 75 | capidrv.o after all modules using this driver (i.e. after hysdn and |
76 | any avm-specific modules). | 76 | any avm-specific modules). |
77 | 77 | ||
78 | 3. Entries in the /proc filesystem | 78 | 3. Entries in the /proc filesystem |
79 | 79 | ||
80 | When the module has been loaded it adds the directory hysdn in the | 80 | When the module has been loaded it adds the directory hysdn in the |
81 | /proc/net tree. This directory contains exactly 2 file entries for each | 81 | /proc/net tree. This directory contains exactly 2 file entries for each |
82 | card. One is called cardconfX and the other cardlogX, where X is the | 82 | card. One is called cardconfX and the other cardlogX, where X is the |
83 | card id number from 0 to 9. | 83 | card id number from 0 to 9. |
84 | The cards are numbered in the order found in the PCI config data. | 84 | The cards are numbered in the order found in the PCI config data. |
85 | 85 | ||
86 | 4. The /proc/net/hysdn/cardconfX file | 86 | 4. The /proc/net/hysdn/cardconfX file |
87 | 87 | ||
88 | This file may be read to get by everyone to get info about the cards type, | 88 | This file may be read to get by everyone to get info about the cards type, |
89 | actual state, available features and used resources. | 89 | actual state, available features and used resources. |
90 | The first 3 entries (id, bus and slot) are PCI info fields, the following | 90 | The first 3 entries (id, bus and slot) are PCI info fields, the following |
91 | type field gives the information about the cards type: | 91 | type field gives the information about the cards type: |
92 | 92 | ||
93 | 4 -> Ergo card (server card with 2 b-chans) | 93 | 4 -> Ergo card (server card with 2 b-chans) |
94 | 5 -> Metro card (server card with 4 or 8 b-chans) | 94 | 5 -> Metro card (server card with 4 or 8 b-chans) |
95 | 6 -> Champ card (client card with 2 b-chans) | 95 | 6 -> Champ card (client card with 2 b-chans) |
96 | 96 | ||
97 | The following 3 fields show the hardware assignments for irq, iobase and the | 97 | The following 3 fields show the hardware assignments for irq, iobase and the |
98 | dual ported memory (dp-mem). | 98 | dual ported memory (dp-mem). |
99 | The fields b-chans and fax-chans announce the available card resources of | 99 | The fields b-chans and fax-chans announce the available card resources of |
100 | this types for the user. | 100 | this types for the user. |
101 | The state variable indicates the actual drivers state for this card with the | 101 | The state variable indicates the actual drivers state for this card with the |
102 | following assignments. | 102 | following assignments. |
103 | 103 | ||
104 | 0 -> card has not been booted since driver load | 104 | 0 -> card has not been booted since driver load |
105 | 1 -> card booting is actually in progess | 105 | 1 -> card booting is actually in progess |
106 | 2 -> card is in an error state due to a previous boot failure | 106 | 2 -> card is in an error state due to a previous boot failure |
107 | 3 -> card is booted and active | 107 | 3 -> card is booted and active |
108 | 108 | ||
109 | And the last field (device) shows the name of the ethernet device assigned | 109 | And the last field (device) shows the name of the ethernet device assigned |
110 | to this card. Up to the first successful boot this field only shows a - | 110 | to this card. Up to the first successful boot this field only shows a - |
111 | to tell that no net device has been allocated up to now. Once a net device | 111 | to tell that no net device has been allocated up to now. Once a net device |
112 | has been allocated it remains assigned to this card, even if a card is | 112 | has been allocated it remains assigned to this card, even if a card is |
113 | rebooted and an boot error occurs. | 113 | rebooted and an boot error occurs. |
114 | 114 | ||
115 | Writing to the cardconfX file boots the card or transfers config lines to | 115 | Writing to the cardconfX file boots the card or transfers config lines to |
116 | the cards firmware. The type of data is automatically detected when the | 116 | the cards firmware. The type of data is automatically detected when the |
117 | first data is written. Only root has write access to this file. | 117 | first data is written. Only root has write access to this file. |
118 | The firmware boot files are normally called hyclient.pof for client cards | 118 | The firmware boot files are normally called hyclient.pof for client cards |
119 | and hyserver.pof for server cards. | 119 | and hyserver.pof for server cards. |
120 | After successfully writing the boot file, complete config files or single | 120 | After successfully writing the boot file, complete config files or single |
121 | config lines may be copied to this file. | 121 | config lines may be copied to this file. |
122 | If an error occurs the return value given to the writing process has the | 122 | If an error occurs the return value given to the writing process has the |
123 | following additional codes (decimal): | 123 | following additional codes (decimal): |
124 | 124 | ||
125 | 1000 Another process is currently bootng the card | 125 | 1000 Another process is currently bootng the card |
126 | 1001 Invalid firmware header | 126 | 1001 Invalid firmware header |
127 | 1002 Boards dual-port RAM test failed | 127 | 1002 Boards dual-port RAM test failed |
128 | 1003 Internal firmware handler error | 128 | 1003 Internal firmware handler error |
129 | 1004 Boot image size invalid | 129 | 1004 Boot image size invalid |
130 | 1005 First boot stage (bootstrap loader) failed | 130 | 1005 First boot stage (bootstrap loader) failed |
131 | 1006 Second boot stage failure | 131 | 1006 Second boot stage failure |
132 | 1007 Timeout waiting for card ready during boot | 132 | 1007 Timeout waiting for card ready during boot |
133 | 1008 Operation only allowed in booted state | 133 | 1008 Operation only allowed in booted state |
134 | 1009 Config line too long | 134 | 1009 Config line too long |
135 | 1010 Invalid channel number | 135 | 1010 Invalid channel number |
136 | 1011 Timeout sending config data | 136 | 1011 Timeout sending config data |
137 | 137 | ||
138 | Additional info about error reasons may be fetched from the log output. | 138 | Additional info about error reasons may be fetched from the log output. |
139 | 139 | ||
140 | 5. The /proc/net/hysdn/cardlogX file | 140 | 5. The /proc/net/hysdn/cardlogX file |
141 | 141 | ||
142 | The cardlogX file entry may be opened multiple for reading by everyone to | 142 | The cardlogX file entry may be opened multiple for reading by everyone to |
143 | get the cards and drivers log data. Card messages always start with the | 143 | get the cards and drivers log data. Card messages always start with the |
144 | keyword LOG. All other lines are output from the driver. | 144 | keyword LOG. All other lines are output from the driver. |
145 | The driver log data may be redirected to the syslog by selecting the | 145 | The driver log data may be redirected to the syslog by selecting the |
146 | appropriate bitmask. The cards log messages will always be send to this | 146 | appropriate bitmask. The cards log messages will always be send to this |
147 | interface but never to the syslog. | 147 | interface but never to the syslog. |
148 | 148 | ||
149 | A root user may write a decimal or hex (with 0x) value t this file to select | 149 | A root user may write a decimal or hex (with 0x) value t this file to select |
150 | desired output options. As mentioned above the cards log dat is always | 150 | desired output options. As mentioned above the cards log dat is always |
151 | written to the cardlog file independent of the following options only used | 151 | written to the cardlog file independent of the following options only used |
152 | to check and debug the driver itself: | 152 | to check and debug the driver itself: |
153 | 153 | ||
154 | For example: | 154 | For example: |
155 | echo "0x34560078" > /proc/net/hysdn/cardlog0 | 155 | echo "0x34560078" > /proc/net/hysdn/cardlog0 |
156 | to output the hex log mask 34560078 for card 0. | 156 | to output the hex log mask 34560078 for card 0. |
157 | 157 | ||
158 | The written value is regarded as an unsigned 32-Bit value, bit ored for | 158 | The written value is regarded as an unsigned 32-Bit value, bit ored for |
159 | desired output. The following bits are already assigned: | 159 | desired output. The following bits are already assigned: |
160 | 160 | ||
161 | 0x80000000 All driver log data is alternatively via syslog | 161 | 0x80000000 All driver log data is alternatively via syslog |
162 | 0x00000001 Log memory allocation errors | 162 | 0x00000001 Log memory allocation errors |
163 | 0x00000010 Firmware load start and close are logged | 163 | 0x00000010 Firmware load start and close are logged |
164 | 0x00000020 Log firmware record parser | 164 | 0x00000020 Log firmware record parser |
165 | 0x00000040 Log every firmware write actions | 165 | 0x00000040 Log every firmware write actions |
166 | 0x00000080 Log all card related boot messages | 166 | 0x00000080 Log all card related boot messages |
167 | 0x00000100 Output all config data sent for debugging purposes | 167 | 0x00000100 Output all config data sent for debugging purposes |
168 | 0x00000200 Only non comment config lines are shown wth channel | 168 | 0x00000200 Only non comment config lines are shown wth channel |
169 | 0x00000400 Additional conf log output | 169 | 0x00000400 Additional conf log output |
170 | 0x00001000 Log the asynchronous scheduler actions (config and log) | 170 | 0x00001000 Log the asynchronous scheduler actions (config and log) |
171 | 0x00100000 Log all open and close actions to /proc/net/hysdn/card files | 171 | 0x00100000 Log all open and close actions to /proc/net/hysdn/card files |
172 | 0x00200000 Log all actions from /proc file entries | 172 | 0x00200000 Log all actions from /proc file entries |
173 | 0x00010000 Log network interface init and deinit | 173 | 0x00010000 Log network interface init and deinit |
174 | 174 | ||
175 | 6. Where to get additional info and help | 175 | 6. Where to get additional info and help |
176 | 176 | ||
177 | If you have any problems concerning the driver or configuration contact | 177 | If you have any problems concerning the driver or configuration contact |
178 | the Hypercope support team (support@hypercope.de) and or the authors | 178 | the Hypercope support team (support@hypercope.de) and or the authors |
179 | Werner Cornelius (werner@isdn4linux or cornelius@titro.de) or | 179 | Werner Cornelius (werner@isdn4linux or cornelius@titro.de) or |
180 | Ulrich Albrecht (ualbrecht@hypercope.de). | 180 | Ulrich Albrecht (ualbrecht@hypercope.de). |
181 | 181 | ||
182 | 182 | ||
183 | 183 | ||
184 | 184 | ||
185 | 185 | ||
186 | 186 | ||
187 | 187 | ||
188 | 188 | ||
189 | 189 | ||
190 | 190 | ||
191 | 191 | ||
192 | 192 | ||
193 | 193 | ||
194 | 194 | ||
195 | 195 | ||
196 | 196 |
Documentation/kdump/kdump.txt
1 | ================================================================ | 1 | ================================================================ |
2 | Documentation for Kdump - The kexec-based Crash Dumping Solution | 2 | Documentation for Kdump - The kexec-based Crash Dumping Solution |
3 | ================================================================ | 3 | ================================================================ |
4 | 4 | ||
5 | This document includes overview, setup and installation, and analysis | 5 | This document includes overview, setup and installation, and analysis |
6 | information. | 6 | information. |
7 | 7 | ||
8 | Overview | 8 | Overview |
9 | ======== | 9 | ======== |
10 | 10 | ||
11 | Kdump uses kexec to quickly boot to a dump-capture kernel whenever a | 11 | Kdump uses kexec to quickly boot to a dump-capture kernel whenever a |
12 | dump of the system kernel's memory needs to be taken (for example, when | 12 | dump of the system kernel's memory needs to be taken (for example, when |
13 | the system panics). The system kernel's memory image is preserved across | 13 | the system panics). The system kernel's memory image is preserved across |
14 | the reboot and is accessible to the dump-capture kernel. | 14 | the reboot and is accessible to the dump-capture kernel. |
15 | 15 | ||
16 | You can use common Linux commands, such as cp and scp, to copy the | 16 | You can use common Linux commands, such as cp and scp, to copy the |
17 | memory image to a dump file on the local disk, or across the network to | 17 | memory image to a dump file on the local disk, or across the network to |
18 | a remote system. | 18 | a remote system. |
19 | 19 | ||
20 | Kdump and kexec are currently supported on the x86, x86_64, and ppc64 | 20 | Kdump and kexec are currently supported on the x86, x86_64, and ppc64 |
21 | architectures. | 21 | architectures. |
22 | 22 | ||
23 | When the system kernel boots, it reserves a small section of memory for | 23 | When the system kernel boots, it reserves a small section of memory for |
24 | the dump-capture kernel. This ensures that ongoing Direct Memory Access | 24 | the dump-capture kernel. This ensures that ongoing Direct Memory Access |
25 | (DMA) from the system kernel does not corrupt the dump-capture kernel. | 25 | (DMA) from the system kernel does not corrupt the dump-capture kernel. |
26 | The kexec -p command loads the dump-capture kernel into this reserved | 26 | The kexec -p command loads the dump-capture kernel into this reserved |
27 | memory. | 27 | memory. |
28 | 28 | ||
29 | On x86 machines, the first 640 KB of physical memory is needed to boot, | 29 | On x86 machines, the first 640 KB of physical memory is needed to boot, |
30 | regardless of where the kernel loads. Therefore, kexec backs up this | 30 | regardless of where the kernel loads. Therefore, kexec backs up this |
31 | region just before rebooting into the dump-capture kernel. | 31 | region just before rebooting into the dump-capture kernel. |
32 | 32 | ||
33 | All of the necessary information about the system kernel's core image is | 33 | All of the necessary information about the system kernel's core image is |
34 | encoded in the ELF format, and stored in a reserved area of memory | 34 | encoded in the ELF format, and stored in a reserved area of memory |
35 | before a crash. The physical address of the start of the ELF header is | 35 | before a crash. The physical address of the start of the ELF header is |
36 | passed to the dump-capture kernel through the elfcorehdr= boot | 36 | passed to the dump-capture kernel through the elfcorehdr= boot |
37 | parameter. | 37 | parameter. |
38 | 38 | ||
39 | With the dump-capture kernel, you can access the memory image, or "old | 39 | With the dump-capture kernel, you can access the memory image, or "old |
40 | memory," in two ways: | 40 | memory," in two ways: |
41 | 41 | ||
42 | - Through a /dev/oldmem device interface. A capture utility can read the | 42 | - Through a /dev/oldmem device interface. A capture utility can read the |
43 | device file and write out the memory in raw format. This is a raw dump | 43 | device file and write out the memory in raw format. This is a raw dump |
44 | of memory. Analysis and capture tools must be intelligent enough to | 44 | of memory. Analysis and capture tools must be intelligent enough to |
45 | determine where to look for the right information. | 45 | determine where to look for the right information. |
46 | 46 | ||
47 | - Through /proc/vmcore. This exports the dump as an ELF-format file that | 47 | - Through /proc/vmcore. This exports the dump as an ELF-format file that |
48 | you can write out using file copy commands such as cp or scp. Further, | 48 | you can write out using file copy commands such as cp or scp. Further, |
49 | you can use analysis tools such as the GNU Debugger (GDB) and the Crash | 49 | you can use analysis tools such as the GNU Debugger (GDB) and the Crash |
50 | tool to debug the dump file. This method ensures that the dump pages are | 50 | tool to debug the dump file. This method ensures that the dump pages are |
51 | correctly ordered. | 51 | correctly ordered. |
52 | 52 | ||
53 | 53 | ||
54 | Setup and Installation | 54 | Setup and Installation |
55 | ====================== | 55 | ====================== |
56 | 56 | ||
57 | Install kexec-tools and the Kdump patch | 57 | Install kexec-tools and the Kdump patch |
58 | --------------------------------------- | 58 | --------------------------------------- |
59 | 59 | ||
60 | 1) Login as the root user. | 60 | 1) Login as the root user. |
61 | 61 | ||
62 | 2) Download the kexec-tools user-space package from the following URL: | 62 | 2) Download the kexec-tools user-space package from the following URL: |
63 | 63 | ||
64 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz | 64 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz |
65 | 65 | ||
66 | 3) Unpack the tarball with the tar command, as follows: | 66 | 3) Unpack the tarball with the tar command, as follows: |
67 | 67 | ||
68 | tar xvpzf kexec-tools-1.101.tar.gz | 68 | tar xvpzf kexec-tools-1.101.tar.gz |
69 | 69 | ||
70 | 4) Download the latest consolidated Kdump patch from the following URL: | 70 | 4) Download the latest consolidated Kdump patch from the following URL: |
71 | 71 | ||
72 | http://lse.sourceforge.net/kdump/ | 72 | http://lse.sourceforge.net/kdump/ |
73 | 73 | ||
74 | (This location is being used until all the user-space Kdump patches | 74 | (This location is being used until all the user-space Kdump patches |
75 | are integrated with the kexec-tools package.) | 75 | are integrated with the kexec-tools package.) |
76 | 76 | ||
77 | 5) Change to the kexec-tools-1.101 directory, as follows: | 77 | 5) Change to the kexec-tools-1.101 directory, as follows: |
78 | 78 | ||
79 | cd kexec-tools-1.101 | 79 | cd kexec-tools-1.101 |
80 | 80 | ||
81 | 6) Apply the consolidated patch to the kexec-tools-1.101 source tree | 81 | 6) Apply the consolidated patch to the kexec-tools-1.101 source tree |
82 | with the patch command, as follows. (Modify the path to the downloaded | 82 | with the patch command, as follows. (Modify the path to the downloaded |
83 | patch as necessary.) | 83 | patch as necessary.) |
84 | 84 | ||
85 | patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch | 85 | patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch |
86 | 86 | ||
87 | 7) Configure the package, as follows: | 87 | 7) Configure the package, as follows: |
88 | 88 | ||
89 | ./configure | 89 | ./configure |
90 | 90 | ||
91 | 8) Compile the package, as follows: | 91 | 8) Compile the package, as follows: |
92 | 92 | ||
93 | make | 93 | make |
94 | 94 | ||
95 | 9) Install the package, as follows: | 95 | 9) Install the package, as follows: |
96 | 96 | ||
97 | make install | 97 | make install |
98 | 98 | ||
99 | 99 | ||
100 | Download and build the system and dump-capture kernels | 100 | Download and build the system and dump-capture kernels |
101 | ------------------------------------------------------ | 101 | ------------------------------------------------------ |
102 | 102 | ||
103 | Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer) | 103 | Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer) |
104 | from http://www.kernel.org. Two kernels must be built: a system kernel | 104 | from http://www.kernel.org. Two kernels must be built: a system kernel |
105 | and a dump-capture kernel. Use the following steps to configure these | 105 | and a dump-capture kernel. Use the following steps to configure these |
106 | kernels with the necessary kexec and Kdump features: | 106 | kernels with the necessary kexec and Kdump features: |
107 | 107 | ||
108 | System kernel | 108 | System kernel |
109 | ------------- | 109 | ------------- |
110 | 110 | ||
111 | 1) Enable "kexec system call" in "Processor type and features." | 111 | 1) Enable "kexec system call" in "Processor type and features." |
112 | 112 | ||
113 | CONFIG_KEXEC=y | 113 | CONFIG_KEXEC=y |
114 | 114 | ||
115 | 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo | 115 | 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo |
116 | filesystems." This is usually enabled by default. | 116 | filesystems." This is usually enabled by default. |
117 | 117 | ||
118 | CONFIG_SYSFS=y | 118 | CONFIG_SYSFS=y |
119 | 119 | ||
120 | Note that "sysfs file system support" might not appear in the "Pseudo | 120 | Note that "sysfs file system support" might not appear in the "Pseudo |
121 | filesystems" menu if "Configure standard kernel features (for small | 121 | filesystems" menu if "Configure standard kernel features (for small |
122 | systems)" is not enabled in "General Setup." In this case, check the | 122 | systems)" is not enabled in "General Setup." In this case, check the |
123 | .config file itself to ensure that sysfs is turned on, as follows: | 123 | .config file itself to ensure that sysfs is turned on, as follows: |
124 | 124 | ||
125 | grep 'CONFIG_SYSFS' .config | 125 | grep 'CONFIG_SYSFS' .config |
126 | 126 | ||
127 | 3) Enable "Compile the kernel with debug info" in "Kernel hacking." | 127 | 3) Enable "Compile the kernel with debug info" in "Kernel hacking." |
128 | 128 | ||
129 | CONFIG_DEBUG_INFO=Y | 129 | CONFIG_DEBUG_INFO=Y |
130 | 130 | ||
131 | This causes the kernel to be built with debug symbols. The dump | 131 | This causes the kernel to be built with debug symbols. The dump |
132 | analysis tools require a vmlinux with debug symbols in order to read | 132 | analysis tools require a vmlinux with debug symbols in order to read |
133 | and analyze a dump file. | 133 | and analyze a dump file. |
134 | 134 | ||
135 | 4) Make and install the kernel and its modules. Update the boot loader | 135 | 4) Make and install the kernel and its modules. Update the boot loader |
136 | (such as grub, yaboot, or lilo) configuration files as necessary. | 136 | (such as grub, yaboot, or lilo) configuration files as necessary. |
137 | 137 | ||
138 | 5) Boot the system kernel with the boot parameter "crashkernel=Y@X", | 138 | 5) Boot the system kernel with the boot parameter "crashkernel=Y@X", |
139 | where Y specifies how much memory to reserve for the dump-capture kernel | 139 | where Y specifies how much memory to reserve for the dump-capture kernel |
140 | and X specifies the beginning of this reserved memory. For example, | 140 | and X specifies the beginning of this reserved memory. For example, |
141 | "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory | 141 | "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory |
142 | starting at physical address 0x01000000 for the dump-capture kernel. | 142 | starting at physical address 0x01000000 for the dump-capture kernel. |
143 | 143 | ||
144 | On x86 and x86_64, use "crashkernel=64M@16M". | 144 | On x86 and x86_64, use "crashkernel=64M@16M". |
145 | 145 | ||
146 | On ppc64, use "crashkernel=128M@32M". | 146 | On ppc64, use "crashkernel=128M@32M". |
147 | 147 | ||
148 | 148 | ||
149 | The dump-capture kernel | 149 | The dump-capture kernel |
150 | ----------------------- | 150 | ----------------------- |
151 | 151 | ||
152 | 1) Under "General setup," append "-kdump" to the current string in | 152 | 1) Under "General setup," append "-kdump" to the current string in |
153 | "Local version." | 153 | "Local version." |
154 | 154 | ||
155 | 2) On x86, enable high memory support under "Processor type and | 155 | 2) On x86, enable high memory support under "Processor type and |
156 | features": | 156 | features": |
157 | 157 | ||
158 | CONFIG_HIGHMEM64G=y | 158 | CONFIG_HIGHMEM64G=y |
159 | or | 159 | or |
160 | CONFIG_HIGHMEM4G | 160 | CONFIG_HIGHMEM4G |
161 | 161 | ||
162 | 3) On x86 and x86_64, disable symmetric multi-processing support | 162 | 3) On x86 and x86_64, disable symmetric multi-processing support |
163 | under "Processor type and features": | 163 | under "Processor type and features": |
164 | 164 | ||
165 | CONFIG_SMP=n | 165 | CONFIG_SMP=n |
166 | (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line | 166 | (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line |
167 | when loading the dump-capture kernel, see section "Load the Dump-capture | 167 | when loading the dump-capture kernel, see section "Load the Dump-capture |
168 | Kernel".) | 168 | Kernel".) |
169 | 169 | ||
170 | 4) On ppc64, disable NUMA support and enable EMBEDDED support: | 170 | 4) On ppc64, disable NUMA support and enable EMBEDDED support: |
171 | 171 | ||
172 | CONFIG_NUMA=n | 172 | CONFIG_NUMA=n |
173 | CONFIG_EMBEDDED=y | 173 | CONFIG_EMBEDDED=y |
174 | CONFIG_EEH=N for the dump-capture kernel | 174 | CONFIG_EEH=N for the dump-capture kernel |
175 | 175 | ||
176 | 5) Enable "kernel crash dumps" support under "Processor type and | 176 | 5) Enable "kernel crash dumps" support under "Processor type and |
177 | features": | 177 | features": |
178 | 178 | ||
179 | CONFIG_CRASH_DUMP=y | 179 | CONFIG_CRASH_DUMP=y |
180 | 180 | ||
181 | 6) Use a suitable value for "Physical address where the kernel is | 181 | 6) Use a suitable value for "Physical address where the kernel is |
182 | loaded" (under "Processor type and features"). This only appears when | 182 | loaded" (under "Processor type and features"). This only appears when |
183 | "kernel crash dumps" is enabled. By default this value is 0x1000000 | 183 | "kernel crash dumps" is enabled. By default this value is 0x1000000 |
184 | (16MB). It should be the same as X in the "crashkernel=Y@X" boot | 184 | (16MB). It should be the same as X in the "crashkernel=Y@X" boot |
185 | parameter discussed above. | 185 | parameter discussed above. |
186 | 186 | ||
187 | On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000". | 187 | On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000". |
188 | 188 | ||
189 | On ppc64 the value is automatically set at 32MB when | 189 | On ppc64 the value is automatically set at 32MB when |
190 | CONFIG_CRASH_DUMP is set. | 190 | CONFIG_CRASH_DUMP is set. |
191 | 191 | ||
192 | 6) Optionally enable "/proc/vmcore support" under "Filesystems" -> | 192 | 6) Optionally enable "/proc/vmcore support" under "Filesystems" -> |
193 | "Pseudo filesystems". | 193 | "Pseudo filesystems". |
194 | 194 | ||
195 | CONFIG_PROC_VMCORE=y | 195 | CONFIG_PROC_VMCORE=y |
196 | (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.) | 196 | (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.) |
197 | 197 | ||
198 | 7) Make and install the kernel and its modules. DO NOT add this kernel | 198 | 7) Make and install the kernel and its modules. DO NOT add this kernel |
199 | to the boot loader configuration files. | 199 | to the boot loader configuration files. |
200 | 200 | ||
201 | 201 | ||
202 | Load the Dump-capture Kernel | 202 | Load the Dump-capture Kernel |
203 | ============================ | 203 | ============================ |
204 | 204 | ||
205 | After booting to the system kernel, load the dump-capture kernel using | 205 | After booting to the system kernel, load the dump-capture kernel using |
206 | the following command: | 206 | the following command: |
207 | 207 | ||
208 | kexec -p <dump-capture-kernel> \ | 208 | kexec -p <dump-capture-kernel> \ |
209 | --initrd=<initrd-for-dump-capture-kernel> --args-linux \ | 209 | --initrd=<initrd-for-dump-capture-kernel> --args-linux \ |
210 | --append="root=<root-dev> init 1 irqpoll" | 210 | --append="root=<root-dev> init 1 irqpoll" |
211 | 211 | ||
212 | 212 | ||
213 | Notes on loading the dump-capture kernel: | 213 | Notes on loading the dump-capture kernel: |
214 | 214 | ||
215 | * <dump-capture-kernel> must be a vmlinux image (that is, an | 215 | * <dump-capture-kernel> must be a vmlinux image (that is, an |
216 | uncompressed ELF image). bzImage does not work at this time. | 216 | uncompressed ELF image). bzImage does not work at this time. |
217 | 217 | ||
218 | * By default, the ELF headers are stored in ELF64 format to support | 218 | * By default, the ELF headers are stored in ELF64 format to support |
219 | systems with more than 4GB memory. The --elf32-core-headers option can | 219 | systems with more than 4GB memory. The --elf32-core-headers option can |
220 | be used to force the generation of ELF32 headers. This is necessary | 220 | be used to force the generation of ELF32 headers. This is necessary |
221 | because GDB currently cannot open vmcore files with ELF64 headers on | 221 | because GDB currently cannot open vmcore files with ELF64 headers on |
222 | 32-bit systems. ELF32 headers can be used on non-PAE systems (that is, | 222 | 32-bit systems. ELF32 headers can be used on non-PAE systems (that is, |
223 | less than 4GB of memory). | 223 | less than 4GB of memory). |
224 | 224 | ||
225 | * The "irqpoll" boot parameter reduces driver initialization failures | 225 | * The "irqpoll" boot parameter reduces driver initialization failures |
226 | due to shared interrupts in the dump-capture kernel. | 226 | due to shared interrupts in the dump-capture kernel. |
227 | 227 | ||
228 | * You must specify <root-dev> in the format corresponding to the root | 228 | * You must specify <root-dev> in the format corresponding to the root |
229 | device name in the output of mount command. | 229 | device name in the output of mount command. |
230 | 230 | ||
231 | * "init 1" boots the dump-capture kernel into single-user mode without | 231 | * "init 1" boots the dump-capture kernel into single-user mode without |
232 | networking. If you want networking, use "init 3." | 232 | networking. If you want networking, use "init 3." |
233 | 233 | ||
234 | 234 | ||
235 | Kernel Panic | 235 | Kernel Panic |
236 | ============ | 236 | ============ |
237 | 237 | ||
238 | After successfully loading the dump-capture kernel as previously | 238 | After successfully loading the dump-capture kernel as previously |
239 | described, the system will reboot into the dump-capture kernel if a | 239 | described, the system will reboot into the dump-capture kernel if a |
240 | system crash is triggered. Trigger points are located in panic(), | 240 | system crash is triggered. Trigger points are located in panic(), |
241 | die(), die_nmi() and in the sysrq handler (ALT-SysRq-c). | 241 | die(), die_nmi() and in the sysrq handler (ALT-SysRq-c). |
242 | 242 | ||
243 | The following conditions will execute a crash trigger point: | 243 | The following conditions will execute a crash trigger point: |
244 | 244 | ||
245 | If a hard lockup is detected and "NMI watchdog" is configured, the system | 245 | If a hard lockup is detected and "NMI watchdog" is configured, the system |
246 | will boot into the dump-capture kernel ( die_nmi() ). | 246 | will boot into the dump-capture kernel ( die_nmi() ). |
247 | 247 | ||
248 | If die() is called, and it happens to be a thread with pid 0 or 1, or die() | 248 | If die() is called, and it happens to be a thread with pid 0 or 1, or die() |
249 | is called inside interrupt context or die() is called and panic_on_oops is set, | 249 | is called inside interrupt context or die() is called and panic_on_oops is set, |
250 | the system will boot into the dump-capture kernel. | 250 | the system will boot into the dump-capture kernel. |
251 | 251 | ||
252 | On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel. | 252 | On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system will boot into the dump-capture kernel. |
253 | 253 | ||
254 | For testing purposes, you can trigger a crash by using "ALT-SysRq-c", | 254 | For testing purposes, you can trigger a crash by using "ALT-SysRq-c", |
255 | "echo c > /proc/sysrq-trigger or write a module to force the panic. | 255 | "echo c > /proc/sysrq-trigger or write a module to force the panic. |
256 | 256 | ||
257 | Write Out the Dump File | 257 | Write Out the Dump File |
258 | ======================= | 258 | ======================= |
259 | 259 | ||
260 | After the dump-capture kernel is booted, write out the dump file with | 260 | After the dump-capture kernel is booted, write out the dump file with |
261 | the following command: | 261 | the following command: |
262 | 262 | ||
263 | cp /proc/vmcore <dump-file> | 263 | cp /proc/vmcore <dump-file> |
264 | 264 | ||
265 | You can also access dumped memory as a /dev/oldmem device for a linear | 265 | You can also access dumped memory as a /dev/oldmem device for a linear |
266 | and raw view. To create the device, use the following command: | 266 | and raw view. To create the device, use the following command: |
267 | 267 | ||
268 | mknod /dev/oldmem c 1 12 | 268 | mknod /dev/oldmem c 1 12 |
269 | 269 | ||
270 | Use the dd command with suitable options for count, bs, and skip to | 270 | Use the dd command with suitable options for count, bs, and skip to |
271 | access specific portions of the dump. | 271 | access specific portions of the dump. |
272 | 272 | ||
273 | To see the entire memory, use the following command: | 273 | To see the entire memory, use the following command: |
274 | 274 | ||
275 | dd if=/dev/oldmem of=oldmem.001 | 275 | dd if=/dev/oldmem of=oldmem.001 |
276 | 276 | ||
277 | 277 | ||
278 | Analysis | 278 | Analysis |
279 | ======== | 279 | ======== |
280 | 280 | ||
281 | Before analyzing the dump image, you should reboot into a stable kernel. | 281 | Before analyzing the dump image, you should reboot into a stable kernel. |
282 | 282 | ||
283 | You can do limited analysis using GDB on the dump file copied out of | 283 | You can do limited analysis using GDB on the dump file copied out of |
284 | /proc/vmcore. Use the debug vmlinux built with -g and run the following | 284 | /proc/vmcore. Use the debug vmlinux built with -g and run the following |
285 | command: | 285 | command: |
286 | 286 | ||
287 | gdb vmlinux <dump-file> | 287 | gdb vmlinux <dump-file> |
288 | 288 | ||
289 | Stack trace for the task on processor 0, register display, and memory | 289 | Stack trace for the task on processor 0, register display, and memory |
290 | display work fine. | 290 | display work fine. |
291 | 291 | ||
292 | Note: GDB cannot analyze core files generated in ELF64 format for x86. | 292 | Note: GDB cannot analyze core files generated in ELF64 format for x86. |
293 | On systems with a maximum of 4GB of memory, you can generate | 293 | On systems with a maximum of 4GB of memory, you can generate |
294 | ELF32-format headers using the --elf32-core-headers kernel option on the | 294 | ELF32-format headers using the --elf32-core-headers kernel option on the |
295 | dump kernel. | 295 | dump kernel. |
296 | 296 | ||
297 | You can also use the Crash utility to analyze dump files in Kdump | 297 | You can also use the Crash utility to analyze dump files in Kdump |
298 | format. Crash is available on Dave Anderson's site at the following URL: | 298 | format. Crash is available on Dave Anderson's site at the following URL: |
299 | 299 | ||
300 | http://people.redhat.com/~anderson/ | 300 | http://people.redhat.com/~anderson/ |
301 | 301 | ||
302 | 302 | ||
303 | To Do | 303 | To Do |
304 | ===== | 304 | ===== |
305 | 305 | ||
306 | 1) Provide a kernel pages filtering mechanism, so core file size is not | 306 | 1) Provide a kernel pages filtering mechanism, so core file size is not |
307 | extreme on systems with huge memory banks. | 307 | extreme on systems with huge memory banks. |
308 | 308 | ||
309 | 2) Relocatable kernel can help in maintaining multiple kernels for | 309 | 2) Relocatable kernel can help in maintaining multiple kernels for |
310 | crash_dump, and the same kernel as the system kernel can be used to | 310 | crash_dump, and the same kernel as the system kernel can be used to |
311 | capture the dump. | 311 | capture the dump. |
312 | 312 | ||
313 | 313 | ||
314 | Contact | 314 | Contact |
315 | ======= | 315 | ======= |
316 | 316 | ||
317 | Vivek Goyal (vgoyal@in.ibm.com) | 317 | Vivek Goyal (vgoyal@in.ibm.com) |
318 | Maneesh Soni (maneesh@in.ibm.com) | 318 | Maneesh Soni (maneesh@in.ibm.com) |
319 | 319 | ||
320 | 320 | ||
321 | Trademark | 321 | Trademark |
322 | ========= | 322 | ========= |
323 | 323 | ||
324 | Linux is a trademark of Linus Torvalds in the United States, other | 324 | Linux is a trademark of Linus Torvalds in the United States, other |
325 | countries, or both. | 325 | countries, or both. |
326 | 326 |
Documentation/keys.txt
1 | ============================ | 1 | ============================ |
2 | KERNEL KEY RETENTION SERVICE | 2 | KERNEL KEY RETENTION SERVICE |
3 | ============================ | 3 | ============================ |
4 | 4 | ||
5 | This service allows cryptographic keys, authentication tokens, cross-domain | 5 | This service allows cryptographic keys, authentication tokens, cross-domain |
6 | user mappings, and similar to be cached in the kernel for the use of | 6 | user mappings, and similar to be cached in the kernel for the use of |
7 | filesystems other kernel services. | 7 | filesystems other kernel services. |
8 | 8 | ||
9 | Keyrings are permitted; these are a special type of key that can hold links to | 9 | Keyrings are permitted; these are a special type of key that can hold links to |
10 | other keys. Processes each have three standard keyring subscriptions that a | 10 | other keys. Processes each have three standard keyring subscriptions that a |
11 | kernel service can search for relevant keys. | 11 | kernel service can search for relevant keys. |
12 | 12 | ||
13 | The key service can be configured on by enabling: | 13 | The key service can be configured on by enabling: |
14 | 14 | ||
15 | "Security options"/"Enable access key retention support" (CONFIG_KEYS) | 15 | "Security options"/"Enable access key retention support" (CONFIG_KEYS) |
16 | 16 | ||
17 | This document has the following sections: | 17 | This document has the following sections: |
18 | 18 | ||
19 | - Key overview | 19 | - Key overview |
20 | - Key service overview | 20 | - Key service overview |
21 | - Key access permissions | 21 | - Key access permissions |
22 | - SELinux support | 22 | - SELinux support |
23 | - New procfs files | 23 | - New procfs files |
24 | - Userspace system call interface | 24 | - Userspace system call interface |
25 | - Kernel services | 25 | - Kernel services |
26 | - Notes on accessing payload contents | 26 | - Notes on accessing payload contents |
27 | - Defining a key type | 27 | - Defining a key type |
28 | - Request-key callback service | 28 | - Request-key callback service |
29 | - Key access filesystem | 29 | - Key access filesystem |
30 | 30 | ||
31 | 31 | ||
32 | ============ | 32 | ============ |
33 | KEY OVERVIEW | 33 | KEY OVERVIEW |
34 | ============ | 34 | ============ |
35 | 35 | ||
36 | In this context, keys represent units of cryptographic data, authentication | 36 | In this context, keys represent units of cryptographic data, authentication |
37 | tokens, keyrings, etc.. These are represented in the kernel by struct key. | 37 | tokens, keyrings, etc.. These are represented in the kernel by struct key. |
38 | 38 | ||
39 | Each key has a number of attributes: | 39 | Each key has a number of attributes: |
40 | 40 | ||
41 | - A serial number. | 41 | - A serial number. |
42 | - A type. | 42 | - A type. |
43 | - A description (for matching a key in a search). | 43 | - A description (for matching a key in a search). |
44 | - Access control information. | 44 | - Access control information. |
45 | - An expiry time. | 45 | - An expiry time. |
46 | - A payload. | 46 | - A payload. |
47 | - State. | 47 | - State. |
48 | 48 | ||
49 | 49 | ||
50 | (*) Each key is issued a serial number of type key_serial_t that is unique for | 50 | (*) Each key is issued a serial number of type key_serial_t that is unique for |
51 | the lifetime of that key. All serial numbers are positive non-zero 32-bit | 51 | the lifetime of that key. All serial numbers are positive non-zero 32-bit |
52 | integers. | 52 | integers. |
53 | 53 | ||
54 | Userspace programs can use a key's serial numbers as a way to gain access | 54 | Userspace programs can use a key's serial numbers as a way to gain access |
55 | to it, subject to permission checking. | 55 | to it, subject to permission checking. |
56 | 56 | ||
57 | (*) Each key is of a defined "type". Types must be registered inside the | 57 | (*) Each key is of a defined "type". Types must be registered inside the |
58 | kernel by a kernel service (such as a filesystem) before keys of that type | 58 | kernel by a kernel service (such as a filesystem) before keys of that type |
59 | can be added or used. Userspace programs cannot define new types directly. | 59 | can be added or used. Userspace programs cannot define new types directly. |
60 | 60 | ||
61 | Key types are represented in the kernel by struct key_type. This defines a | 61 | Key types are represented in the kernel by struct key_type. This defines a |
62 | number of operations that can be performed on a key of that type. | 62 | number of operations that can be performed on a key of that type. |
63 | 63 | ||
64 | Should a type be removed from the system, all the keys of that type will | 64 | Should a type be removed from the system, all the keys of that type will |
65 | be invalidated. | 65 | be invalidated. |
66 | 66 | ||
67 | (*) Each key has a description. This should be a printable string. The key | 67 | (*) Each key has a description. This should be a printable string. The key |
68 | type provides an operation to perform a match between the description on a | 68 | type provides an operation to perform a match between the description on a |
69 | key and a criterion string. | 69 | key and a criterion string. |
70 | 70 | ||
71 | (*) Each key has an owner user ID, a group ID and a permissions mask. These | 71 | (*) Each key has an owner user ID, a group ID and a permissions mask. These |
72 | are used to control what a process may do to a key from userspace, and | 72 | are used to control what a process may do to a key from userspace, and |
73 | whether a kernel service will be able to find the key. | 73 | whether a kernel service will be able to find the key. |
74 | 74 | ||
75 | (*) Each key can be set to expire at a specific time by the key type's | 75 | (*) Each key can be set to expire at a specific time by the key type's |
76 | instantiation function. Keys can also be immortal. | 76 | instantiation function. Keys can also be immortal. |
77 | 77 | ||
78 | (*) Each key can have a payload. This is a quantity of data that represent the | 78 | (*) Each key can have a payload. This is a quantity of data that represent the |
79 | actual "key". In the case of a keyring, this is a list of keys to which | 79 | actual "key". In the case of a keyring, this is a list of keys to which |
80 | the keyring links; in the case of a user-defined key, it's an arbitrary | 80 | the keyring links; in the case of a user-defined key, it's an arbitrary |
81 | blob of data. | 81 | blob of data. |
82 | 82 | ||
83 | Having a payload is not required; and the payload can, in fact, just be a | 83 | Having a payload is not required; and the payload can, in fact, just be a |
84 | value stored in the struct key itself. | 84 | value stored in the struct key itself. |
85 | 85 | ||
86 | When a key is instantiated, the key type's instantiation function is | 86 | When a key is instantiated, the key type's instantiation function is |
87 | called with a blob of data, and that then creates the key's payload in | 87 | called with a blob of data, and that then creates the key's payload in |
88 | some way. | 88 | some way. |
89 | 89 | ||
90 | Similarly, when userspace wants to read back the contents of the key, if | 90 | Similarly, when userspace wants to read back the contents of the key, if |
91 | permitted, another key type operation will be called to convert the key's | 91 | permitted, another key type operation will be called to convert the key's |
92 | attached payload back into a blob of data. | 92 | attached payload back into a blob of data. |
93 | 93 | ||
94 | (*) Each key can be in one of a number of basic states: | 94 | (*) Each key can be in one of a number of basic states: |
95 | 95 | ||
96 | (*) Uninstantiated. The key exists, but does not have any data attached. | 96 | (*) Uninstantiated. The key exists, but does not have any data attached. |
97 | Keys being requested from userspace will be in this state. | 97 | Keys being requested from userspace will be in this state. |
98 | 98 | ||
99 | (*) Instantiated. This is the normal state. The key is fully formed, and | 99 | (*) Instantiated. This is the normal state. The key is fully formed, and |
100 | has data attached. | 100 | has data attached. |
101 | 101 | ||
102 | (*) Negative. This is a relatively short-lived state. The key acts as a | 102 | (*) Negative. This is a relatively short-lived state. The key acts as a |
103 | note saying that a previous call out to userspace failed, and acts as | 103 | note saying that a previous call out to userspace failed, and acts as |
104 | a throttle on key lookups. A negative key can be updated to a normal | 104 | a throttle on key lookups. A negative key can be updated to a normal |
105 | state. | 105 | state. |
106 | 106 | ||
107 | (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, | 107 | (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, |
108 | they traverse to this state. An expired key can be updated back to a | 108 | they traverse to this state. An expired key can be updated back to a |
109 | normal state. | 109 | normal state. |
110 | 110 | ||
111 | (*) Revoked. A key is put in this state by userspace action. It can't be | 111 | (*) Revoked. A key is put in this state by userspace action. It can't be |
112 | found or operated upon (apart from by unlinking it). | 112 | found or operated upon (apart from by unlinking it). |
113 | 113 | ||
114 | (*) Dead. The key's type was unregistered, and so the key is now useless. | 114 | (*) Dead. The key's type was unregistered, and so the key is now useless. |
115 | 115 | ||
116 | 116 | ||
117 | ==================== | 117 | ==================== |
118 | KEY SERVICE OVERVIEW | 118 | KEY SERVICE OVERVIEW |
119 | ==================== | 119 | ==================== |
120 | 120 | ||
121 | The key service provides a number of features besides keys: | 121 | The key service provides a number of features besides keys: |
122 | 122 | ||
123 | (*) The key service defines two special key types: | 123 | (*) The key service defines two special key types: |
124 | 124 | ||
125 | (+) "keyring" | 125 | (+) "keyring" |
126 | 126 | ||
127 | Keyrings are special keys that contain a list of other keys. Keyring | 127 | Keyrings are special keys that contain a list of other keys. Keyring |
128 | lists can be modified using various system calls. Keyrings should not | 128 | lists can be modified using various system calls. Keyrings should not |
129 | be given a payload when created. | 129 | be given a payload when created. |
130 | 130 | ||
131 | (+) "user" | 131 | (+) "user" |
132 | 132 | ||
133 | A key of this type has a description and a payload that are arbitrary | 133 | A key of this type has a description and a payload that are arbitrary |
134 | blobs of data. These can be created, updated and read by userspace, | 134 | blobs of data. These can be created, updated and read by userspace, |
135 | and aren't intended for use by kernel services. | 135 | and aren't intended for use by kernel services. |
136 | 136 | ||
137 | (*) Each process subscribes to three keyrings: a thread-specific keyring, a | 137 | (*) Each process subscribes to three keyrings: a thread-specific keyring, a |
138 | process-specific keyring, and a session-specific keyring. | 138 | process-specific keyring, and a session-specific keyring. |
139 | 139 | ||
140 | The thread-specific keyring is discarded from the child when any sort of | 140 | The thread-specific keyring is discarded from the child when any sort of |
141 | clone, fork, vfork or execve occurs. A new keyring is created only when | 141 | clone, fork, vfork or execve occurs. A new keyring is created only when |
142 | required. | 142 | required. |
143 | 143 | ||
144 | The process-specific keyring is replaced with an empty one in the child on | 144 | The process-specific keyring is replaced with an empty one in the child on |
145 | clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is | 145 | clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is |
146 | shared. execve also discards the process's process keyring and creates a | 146 | shared. execve also discards the process's process keyring and creates a |
147 | new one. | 147 | new one. |
148 | 148 | ||
149 | The session-specific keyring is persistent across clone, fork, vfork and | 149 | The session-specific keyring is persistent across clone, fork, vfork and |
150 | execve, even when the latter executes a set-UID or set-GID binary. A | 150 | execve, even when the latter executes a set-UID or set-GID binary. A |
151 | process can, however, replace its current session keyring with a new one | 151 | process can, however, replace its current session keyring with a new one |
152 | by using PR_JOIN_SESSION_KEYRING. It is permitted to request an anonymous | 152 | by using PR_JOIN_SESSION_KEYRING. It is permitted to request an anonymous |
153 | new one, or to attempt to create or join one of a specific name. | 153 | new one, or to attempt to create or join one of a specific name. |
154 | 154 | ||
155 | The ownership of the thread keyring changes when the real UID and GID of | 155 | The ownership of the thread keyring changes when the real UID and GID of |
156 | the thread changes. | 156 | the thread changes. |
157 | 157 | ||
158 | (*) Each user ID resident in the system holds two special keyrings: a user | 158 | (*) Each user ID resident in the system holds two special keyrings: a user |
159 | specific keyring and a default user session keyring. The default session | 159 | specific keyring and a default user session keyring. The default session |
160 | keyring is initialised with a link to the user-specific keyring. | 160 | keyring is initialised with a link to the user-specific keyring. |
161 | 161 | ||
162 | When a process changes its real UID, if it used to have no session key, it | 162 | When a process changes its real UID, if it used to have no session key, it |
163 | will be subscribed to the default session key for the new UID. | 163 | will be subscribed to the default session key for the new UID. |
164 | 164 | ||
165 | If a process attempts to access its session key when it doesn't have one, | 165 | If a process attempts to access its session key when it doesn't have one, |
166 | it will be subscribed to the default for its current UID. | 166 | it will be subscribed to the default for its current UID. |
167 | 167 | ||
168 | (*) Each user has two quotas against which the keys they own are tracked. One | 168 | (*) Each user has two quotas against which the keys they own are tracked. One |
169 | limits the total number of keys and keyrings, the other limits the total | 169 | limits the total number of keys and keyrings, the other limits the total |
170 | amount of description and payload space that can be consumed. | 170 | amount of description and payload space that can be consumed. |
171 | 171 | ||
172 | The user can view information on this and other statistics through procfs | 172 | The user can view information on this and other statistics through procfs |
173 | files. | 173 | files. |
174 | 174 | ||
175 | Process-specific and thread-specific keyrings are not counted towards a | 175 | Process-specific and thread-specific keyrings are not counted towards a |
176 | user's quota. | 176 | user's quota. |
177 | 177 | ||
178 | If a system call that modifies a key or keyring in some way would put the | 178 | If a system call that modifies a key or keyring in some way would put the |
179 | user over quota, the operation is refused and error EDQUOT is returned. | 179 | user over quota, the operation is refused and error EDQUOT is returned. |
180 | 180 | ||
181 | (*) There's a system call interface by which userspace programs can create and | 181 | (*) There's a system call interface by which userspace programs can create and |
182 | manipulate keys and keyrings. | 182 | manipulate keys and keyrings. |
183 | 183 | ||
184 | (*) There's a kernel interface by which services can register types and search | 184 | (*) There's a kernel interface by which services can register types and search |
185 | for keys. | 185 | for keys. |
186 | 186 | ||
187 | (*) There's a way for the a search done from the kernel to call back to | 187 | (*) There's a way for the a search done from the kernel to call back to |
188 | userspace to request a key that can't be found in a process's keyrings. | 188 | userspace to request a key that can't be found in a process's keyrings. |
189 | 189 | ||
190 | (*) An optional filesystem is available through which the key database can be | 190 | (*) An optional filesystem is available through which the key database can be |
191 | viewed and manipulated. | 191 | viewed and manipulated. |
192 | 192 | ||
193 | 193 | ||
194 | ====================== | 194 | ====================== |
195 | KEY ACCESS PERMISSIONS | 195 | KEY ACCESS PERMISSIONS |
196 | ====================== | 196 | ====================== |
197 | 197 | ||
198 | Keys have an owner user ID, a group access ID, and a permissions mask. The mask | 198 | Keys have an owner user ID, a group access ID, and a permissions mask. The mask |
199 | has up to eight bits each for possessor, user, group and other access. Only | 199 | has up to eight bits each for possessor, user, group and other access. Only |
200 | six of each set of eight bits are defined. These permissions granted are: | 200 | six of each set of eight bits are defined. These permissions granted are: |
201 | 201 | ||
202 | (*) View | 202 | (*) View |
203 | 203 | ||
204 | This permits a key or keyring's attributes to be viewed - including key | 204 | This permits a key or keyring's attributes to be viewed - including key |
205 | type and description. | 205 | type and description. |
206 | 206 | ||
207 | (*) Read | 207 | (*) Read |
208 | 208 | ||
209 | This permits a key's payload to be viewed or a keyring's list of linked | 209 | This permits a key's payload to be viewed or a keyring's list of linked |
210 | keys. | 210 | keys. |
211 | 211 | ||
212 | (*) Write | 212 | (*) Write |
213 | 213 | ||
214 | This permits a key's payload to be instantiated or updated, or it allows a | 214 | This permits a key's payload to be instantiated or updated, or it allows a |
215 | link to be added to or removed from a keyring. | 215 | link to be added to or removed from a keyring. |
216 | 216 | ||
217 | (*) Search | 217 | (*) Search |
218 | 218 | ||
219 | This permits keyrings to be searched and keys to be found. Searches can | 219 | This permits keyrings to be searched and keys to be found. Searches can |
220 | only recurse into nested keyrings that have search permission set. | 220 | only recurse into nested keyrings that have search permission set. |
221 | 221 | ||
222 | (*) Link | 222 | (*) Link |
223 | 223 | ||
224 | This permits a key or keyring to be linked to. To create a link from a | 224 | This permits a key or keyring to be linked to. To create a link from a |
225 | keyring to a key, a process must have Write permission on the keyring and | 225 | keyring to a key, a process must have Write permission on the keyring and |
226 | Link permission on the key. | 226 | Link permission on the key. |
227 | 227 | ||
228 | (*) Set Attribute | 228 | (*) Set Attribute |
229 | 229 | ||
230 | This permits a key's UID, GID and permissions mask to be changed. | 230 | This permits a key's UID, GID and permissions mask to be changed. |
231 | 231 | ||
232 | For changing the ownership, group ID or permissions mask, being the owner of | 232 | For changing the ownership, group ID or permissions mask, being the owner of |
233 | the key or having the sysadmin capability is sufficient. | 233 | the key or having the sysadmin capability is sufficient. |
234 | 234 | ||
235 | 235 | ||
236 | =============== | 236 | =============== |
237 | SELINUX SUPPORT | 237 | SELINUX SUPPORT |
238 | =============== | 238 | =============== |
239 | 239 | ||
240 | The security class "key" has been added to SELinux so that mandatory access | 240 | The security class "key" has been added to SELinux so that mandatory access |
241 | controls can be applied to keys created within various contexts. This support | 241 | controls can be applied to keys created within various contexts. This support |
242 | is preliminary, and is likely to change quite significantly in the near future. | 242 | is preliminary, and is likely to change quite significantly in the near future. |
243 | Currently, all of the basic permissions explained above are provided in SELinux | 243 | Currently, all of the basic permissions explained above are provided in SELinux |
244 | as well; SELinux is simply invoked after all basic permission checks have been | 244 | as well; SELinux is simply invoked after all basic permission checks have been |
245 | performed. | 245 | performed. |
246 | 246 | ||
247 | The value of the file /proc/self/attr/keycreate influences the labeling of | 247 | The value of the file /proc/self/attr/keycreate influences the labeling of |
248 | newly-created keys. If the contents of that file correspond to an SELinux | 248 | newly-created keys. If the contents of that file correspond to an SELinux |
249 | security context, then the key will be assigned that context. Otherwise, the | 249 | security context, then the key will be assigned that context. Otherwise, the |
250 | key will be assigned the current context of the task that invoked the key | 250 | key will be assigned the current context of the task that invoked the key |
251 | creation request. Tasks must be granted explicit permission to assign a | 251 | creation request. Tasks must be granted explicit permission to assign a |
252 | particular context to newly-created keys, using the "create" permission in the | 252 | particular context to newly-created keys, using the "create" permission in the |
253 | key security class. | 253 | key security class. |
254 | 254 | ||
255 | The default keyrings associated with users will be labeled with the default | 255 | The default keyrings associated with users will be labeled with the default |
256 | context of the user if and only if the login programs have been instrumented to | 256 | context of the user if and only if the login programs have been instrumented to |
257 | properly initialize keycreate during the login process. Otherwise, they will | 257 | properly initialize keycreate during the login process. Otherwise, they will |
258 | be labeled with the context of the login program itself. | 258 | be labeled with the context of the login program itself. |
259 | 259 | ||
260 | Note, however, that the default keyrings associated with the root user are | 260 | Note, however, that the default keyrings associated with the root user are |
261 | labeled with the default kernel context, since they are created early in the | 261 | labeled with the default kernel context, since they are created early in the |
262 | boot process, before root has a chance to log in. | 262 | boot process, before root has a chance to log in. |
263 | 263 | ||
264 | The keyrings associated with new threads are each labeled with the context of | 264 | The keyrings associated with new threads are each labeled with the context of |
265 | their associated thread, and both session and process keyrings are handled | 265 | their associated thread, and both session and process keyrings are handled |
266 | similarly. | 266 | similarly. |
267 | 267 | ||
268 | 268 | ||
269 | ================ | 269 | ================ |
270 | NEW PROCFS FILES | 270 | NEW PROCFS FILES |
271 | ================ | 271 | ================ |
272 | 272 | ||
273 | Two files have been added to procfs by which an administrator can find out | 273 | Two files have been added to procfs by which an administrator can find out |
274 | about the status of the key service: | 274 | about the status of the key service: |
275 | 275 | ||
276 | (*) /proc/keys | 276 | (*) /proc/keys |
277 | 277 | ||
278 | This lists the keys that are currently viewable by the task reading the | 278 | This lists the keys that are currently viewable by the task reading the |
279 | file, giving information about their type, description and permissions. | 279 | file, giving information about their type, description and permissions. |
280 | It is not possible to view the payload of the key this way, though some | 280 | It is not possible to view the payload of the key this way, though some |
281 | information about it may be given. | 281 | information about it may be given. |
282 | 282 | ||
283 | The only keys included in the list are those that grant View permission to | 283 | The only keys included in the list are those that grant View permission to |
284 | the reading process whether or not it possesses them. Note that LSM | 284 | the reading process whether or not it possesses them. Note that LSM |
285 | security checks are still performed, and may further filter out keys that | 285 | security checks are still performed, and may further filter out keys that |
286 | the current process is not authorised to view. | 286 | the current process is not authorised to view. |
287 | 287 | ||
288 | The contents of the file look like this: | 288 | The contents of the file look like this: |
289 | 289 | ||
290 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY | 290 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY |
291 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 | 291 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 |
292 | 00000002 I----- 2 perm 1f3f0000 0 0 keyring _uid.0: empty | 292 | 00000002 I----- 2 perm 1f3f0000 0 0 keyring _uid.0: empty |
293 | 00000007 I----- 1 perm 1f3f0000 0 0 keyring _pid.1: empty | 293 | 00000007 I----- 1 perm 1f3f0000 0 0 keyring _pid.1: empty |
294 | 0000018d I----- 1 perm 1f3f0000 0 0 keyring _pid.412: empty | 294 | 0000018d I----- 1 perm 1f3f0000 0 0 keyring _pid.412: empty |
295 | 000004d2 I--Q-- 1 perm 1f3f0000 32 -1 keyring _uid.32: 1/4 | 295 | 000004d2 I--Q-- 1 perm 1f3f0000 32 -1 keyring _uid.32: 1/4 |
296 | 000004d3 I--Q-- 3 perm 1f3f0000 32 -1 keyring _uid_ses.32: empty | 296 | 000004d3 I--Q-- 3 perm 1f3f0000 32 -1 keyring _uid_ses.32: empty |
297 | 00000892 I--QU- 1 perm 1f000000 0 0 user metal:copper: 0 | 297 | 00000892 I--QU- 1 perm 1f000000 0 0 user metal:copper: 0 |
298 | 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 | 298 | 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 |
299 | 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 | 299 | 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 |
300 | 300 | ||
301 | The flags are: | 301 | The flags are: |
302 | 302 | ||
303 | I Instantiated | 303 | I Instantiated |
304 | R Revoked | 304 | R Revoked |
305 | D Dead | 305 | D Dead |
306 | Q Contributes to user's quota | 306 | Q Contributes to user's quota |
307 | U Under contruction by callback to userspace | 307 | U Under contruction by callback to userspace |
308 | N Negative key | 308 | N Negative key |
309 | 309 | ||
310 | This file must be enabled at kernel configuration time as it allows anyone | 310 | This file must be enabled at kernel configuration time as it allows anyone |
311 | to list the keys database. | 311 | to list the keys database. |
312 | 312 | ||
313 | (*) /proc/key-users | 313 | (*) /proc/key-users |
314 | 314 | ||
315 | This file lists the tracking data for each user that has at least one key | 315 | This file lists the tracking data for each user that has at least one key |
316 | on the system. Such data includes quota information and statistics: | 316 | on the system. Such data includes quota information and statistics: |
317 | 317 | ||
318 | [root@andromeda root]# cat /proc/key-users | 318 | [root@andromeda root]# cat /proc/key-users |
319 | 0: 46 45/45 1/100 13/10000 | 319 | 0: 46 45/45 1/100 13/10000 |
320 | 29: 2 2/2 2/100 40/10000 | 320 | 29: 2 2/2 2/100 40/10000 |
321 | 32: 2 2/2 2/100 40/10000 | 321 | 32: 2 2/2 2/100 40/10000 |
322 | 38: 2 2/2 2/100 40/10000 | 322 | 38: 2 2/2 2/100 40/10000 |
323 | 323 | ||
324 | The format of each line is | 324 | The format of each line is |
325 | <UID>: User ID to which this applies | 325 | <UID>: User ID to which this applies |
326 | <usage> Structure refcount | 326 | <usage> Structure refcount |
327 | <inst>/<keys> Total number of keys and number instantiated | 327 | <inst>/<keys> Total number of keys and number instantiated |
328 | <keys>/<max> Key count quota | 328 | <keys>/<max> Key count quota |
329 | <bytes>/<max> Key size quota | 329 | <bytes>/<max> Key size quota |
330 | 330 | ||
331 | 331 | ||
332 | =============================== | 332 | =============================== |
333 | USERSPACE SYSTEM CALL INTERFACE | 333 | USERSPACE SYSTEM CALL INTERFACE |
334 | =============================== | 334 | =============================== |
335 | 335 | ||
336 | Userspace can manipulate keys directly through three new syscalls: add_key, | 336 | Userspace can manipulate keys directly through three new syscalls: add_key, |
337 | request_key and keyctl. The latter provides a number of functions for | 337 | request_key and keyctl. The latter provides a number of functions for |
338 | manipulating keys. | 338 | manipulating keys. |
339 | 339 | ||
340 | When referring to a key directly, userspace programs should use the key's | 340 | When referring to a key directly, userspace programs should use the key's |
341 | serial number (a positive 32-bit integer). However, there are some special | 341 | serial number (a positive 32-bit integer). However, there are some special |
342 | values available for referring to special keys and keyrings that relate to the | 342 | values available for referring to special keys and keyrings that relate to the |
343 | process making the call: | 343 | process making the call: |
344 | 344 | ||
345 | CONSTANT VALUE KEY REFERENCED | 345 | CONSTANT VALUE KEY REFERENCED |
346 | ============================== ====== =========================== | 346 | ============================== ====== =========================== |
347 | KEY_SPEC_THREAD_KEYRING -1 thread-specific keyring | 347 | KEY_SPEC_THREAD_KEYRING -1 thread-specific keyring |
348 | KEY_SPEC_PROCESS_KEYRING -2 process-specific keyring | 348 | KEY_SPEC_PROCESS_KEYRING -2 process-specific keyring |
349 | KEY_SPEC_SESSION_KEYRING -3 session-specific keyring | 349 | KEY_SPEC_SESSION_KEYRING -3 session-specific keyring |
350 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring | 350 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring |
351 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring | 351 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring |
352 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring | 352 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring |
353 | KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() | 353 | KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() |
354 | authorisation key | 354 | authorisation key |
355 | 355 | ||
356 | 356 | ||
357 | The main syscalls are: | 357 | The main syscalls are: |
358 | 358 | ||
359 | (*) Create a new key of given type, description and payload and add it to the | 359 | (*) Create a new key of given type, description and payload and add it to the |
360 | nominated keyring: | 360 | nominated keyring: |
361 | 361 | ||
362 | key_serial_t add_key(const char *type, const char *desc, | 362 | key_serial_t add_key(const char *type, const char *desc, |
363 | const void *payload, size_t plen, | 363 | const void *payload, size_t plen, |
364 | key_serial_t keyring); | 364 | key_serial_t keyring); |
365 | 365 | ||
366 | If a key of the same type and description as that proposed already exists | 366 | If a key of the same type and description as that proposed already exists |
367 | in the keyring, this will try to update it with the given payload, or it | 367 | in the keyring, this will try to update it with the given payload, or it |
368 | will return error EEXIST if that function is not supported by the key | 368 | will return error EEXIST if that function is not supported by the key |
369 | type. The process must also have permission to write to the key to be able | 369 | type. The process must also have permission to write to the key to be able |
370 | to update it. The new key will have all user permissions granted and no | 370 | to update it. The new key will have all user permissions granted and no |
371 | group or third party permissions. | 371 | group or third party permissions. |
372 | 372 | ||
373 | Otherwise, this will attempt to create a new key of the specified type and | 373 | Otherwise, this will attempt to create a new key of the specified type and |
374 | description, and to instantiate it with the supplied payload and attach it | 374 | description, and to instantiate it with the supplied payload and attach it |
375 | to the keyring. In this case, an error will be generated if the process | 375 | to the keyring. In this case, an error will be generated if the process |
376 | does not have permission to write to the keyring. | 376 | does not have permission to write to the keyring. |
377 | 377 | ||
378 | The payload is optional, and the pointer can be NULL if not required by | 378 | The payload is optional, and the pointer can be NULL if not required by |
379 | the type. The payload is plen in size, and plen can be zero for an empty | 379 | the type. The payload is plen in size, and plen can be zero for an empty |
380 | payload. | 380 | payload. |
381 | 381 | ||
382 | A new keyring can be generated by setting type "keyring", the keyring name | 382 | A new keyring can be generated by setting type "keyring", the keyring name |
383 | as the description (or NULL) and setting the payload to NULL. | 383 | as the description (or NULL) and setting the payload to NULL. |
384 | 384 | ||
385 | User defined keys can be created by specifying type "user". It is | 385 | User defined keys can be created by specifying type "user". It is |
386 | recommended that a user defined key's description by prefixed with a type | 386 | recommended that a user defined key's description by prefixed with a type |
387 | ID and a colon, such as "krb5tgt:" for a Kerberos 5 ticket granting | 387 | ID and a colon, such as "krb5tgt:" for a Kerberos 5 ticket granting |
388 | ticket. | 388 | ticket. |
389 | 389 | ||
390 | Any other type must have been registered with the kernel in advance by a | 390 | Any other type must have been registered with the kernel in advance by a |
391 | kernel service such as a filesystem. | 391 | kernel service such as a filesystem. |
392 | 392 | ||
393 | The ID of the new or updated key is returned if successful. | 393 | The ID of the new or updated key is returned if successful. |
394 | 394 | ||
395 | 395 | ||
396 | (*) Search the process's keyrings for a key, potentially calling out to | 396 | (*) Search the process's keyrings for a key, potentially calling out to |
397 | userspace to create it. | 397 | userspace to create it. |
398 | 398 | ||
399 | key_serial_t request_key(const char *type, const char *description, | 399 | key_serial_t request_key(const char *type, const char *description, |
400 | const char *callout_info, | 400 | const char *callout_info, |
401 | key_serial_t dest_keyring); | 401 | key_serial_t dest_keyring); |
402 | 402 | ||
403 | This function searches all the process's keyrings in the order thread, | 403 | This function searches all the process's keyrings in the order thread, |
404 | process, session for a matching key. This works very much like | 404 | process, session for a matching key. This works very much like |
405 | KEYCTL_SEARCH, including the optional attachment of the discovered key to | 405 | KEYCTL_SEARCH, including the optional attachment of the discovered key to |
406 | a keyring. | 406 | a keyring. |
407 | 407 | ||
408 | If a key cannot be found, and if callout_info is not NULL, then | 408 | If a key cannot be found, and if callout_info is not NULL, then |
409 | /sbin/request-key will be invoked in an attempt to obtain a key. The | 409 | /sbin/request-key will be invoked in an attempt to obtain a key. The |
410 | callout_info string will be passed as an argument to the program. | 410 | callout_info string will be passed as an argument to the program. |
411 | 411 | ||
412 | See also Documentation/keys-request-key.txt. | 412 | See also Documentation/keys-request-key.txt. |
413 | 413 | ||
414 | 414 | ||
415 | The keyctl syscall functions are: | 415 | The keyctl syscall functions are: |
416 | 416 | ||
417 | (*) Map a special key ID to a real key ID for this process: | 417 | (*) Map a special key ID to a real key ID for this process: |
418 | 418 | ||
419 | key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, | 419 | key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, |
420 | int create); | 420 | int create); |
421 | 421 | ||
422 | The special key specified by "id" is looked up (with the key being created | 422 | The special key specified by "id" is looked up (with the key being created |
423 | if necessary) and the ID of the key or keyring thus found is returned if | 423 | if necessary) and the ID of the key or keyring thus found is returned if |
424 | it exists. | 424 | it exists. |
425 | 425 | ||
426 | If the key does not yet exist, the key will be created if "create" is | 426 | If the key does not yet exist, the key will be created if "create" is |
427 | non-zero; and the error ENOKEY will be returned if "create" is zero. | 427 | non-zero; and the error ENOKEY will be returned if "create" is zero. |
428 | 428 | ||
429 | 429 | ||
430 | (*) Replace the session keyring this process subscribes to with a new one: | 430 | (*) Replace the session keyring this process subscribes to with a new one: |
431 | 431 | ||
432 | key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); | 432 | key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); |
433 | 433 | ||
434 | If name is NULL, an anonymous keyring is created attached to the process | 434 | If name is NULL, an anonymous keyring is created attached to the process |
435 | as its session keyring, displacing the old session keyring. | 435 | as its session keyring, displacing the old session keyring. |
436 | 436 | ||
437 | If name is not NULL, if a keyring of that name exists, the process | 437 | If name is not NULL, if a keyring of that name exists, the process |
438 | attempts to attach it as the session keyring, returning an error if that | 438 | attempts to attach it as the session keyring, returning an error if that |
439 | is not permitted; otherwise a new keyring of that name is created and | 439 | is not permitted; otherwise a new keyring of that name is created and |
440 | attached as the session keyring. | 440 | attached as the session keyring. |
441 | 441 | ||
442 | To attach to a named keyring, the keyring must have search permission for | 442 | To attach to a named keyring, the keyring must have search permission for |
443 | the process's ownership. | 443 | the process's ownership. |
444 | 444 | ||
445 | The ID of the new session keyring is returned if successful. | 445 | The ID of the new session keyring is returned if successful. |
446 | 446 | ||
447 | 447 | ||
448 | (*) Update the specified key: | 448 | (*) Update the specified key: |
449 | 449 | ||
450 | long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, | 450 | long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, |
451 | size_t plen); | 451 | size_t plen); |
452 | 452 | ||
453 | This will try to update the specified key with the given payload, or it | 453 | This will try to update the specified key with the given payload, or it |
454 | will return error EOPNOTSUPP if that function is not supported by the key | 454 | will return error EOPNOTSUPP if that function is not supported by the key |
455 | type. The process must also have permission to write to the key to be able | 455 | type. The process must also have permission to write to the key to be able |
456 | to update it. | 456 | to update it. |
457 | 457 | ||
458 | The payload is of length plen, and may be absent or empty as for | 458 | The payload is of length plen, and may be absent or empty as for |
459 | add_key(). | 459 | add_key(). |
460 | 460 | ||
461 | 461 | ||
462 | (*) Revoke a key: | 462 | (*) Revoke a key: |
463 | 463 | ||
464 | long keyctl(KEYCTL_REVOKE, key_serial_t key); | 464 | long keyctl(KEYCTL_REVOKE, key_serial_t key); |
465 | 465 | ||
466 | This makes a key unavailable for further operations. Further attempts to | 466 | This makes a key unavailable for further operations. Further attempts to |
467 | use the key will be met with error EKEYREVOKED, and the key will no longer | 467 | use the key will be met with error EKEYREVOKED, and the key will no longer |
468 | be findable. | 468 | be findable. |
469 | 469 | ||
470 | 470 | ||
471 | (*) Change the ownership of a key: | 471 | (*) Change the ownership of a key: |
472 | 472 | ||
473 | long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); | 473 | long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); |
474 | 474 | ||
475 | This function permits a key's owner and group ID to be changed. Either one | 475 | This function permits a key's owner and group ID to be changed. Either one |
476 | of uid or gid can be set to -1 to suppress that change. | 476 | of uid or gid can be set to -1 to suppress that change. |
477 | 477 | ||
478 | Only the superuser can change a key's owner to something other than the | 478 | Only the superuser can change a key's owner to something other than the |
479 | key's current owner. Similarly, only the superuser can change a key's | 479 | key's current owner. Similarly, only the superuser can change a key's |
480 | group ID to something other than the calling process's group ID or one of | 480 | group ID to something other than the calling process's group ID or one of |
481 | its group list members. | 481 | its group list members. |
482 | 482 | ||
483 | 483 | ||
484 | (*) Change the permissions mask on a key: | 484 | (*) Change the permissions mask on a key: |
485 | 485 | ||
486 | long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); | 486 | long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); |
487 | 487 | ||
488 | This function permits the owner of a key or the superuser to change the | 488 | This function permits the owner of a key or the superuser to change the |
489 | permissions mask on a key. | 489 | permissions mask on a key. |
490 | 490 | ||
491 | Only bits the available bits are permitted; if any other bits are set, | 491 | Only bits the available bits are permitted; if any other bits are set, |
492 | error EINVAL will be returned. | 492 | error EINVAL will be returned. |
493 | 493 | ||
494 | 494 | ||
495 | (*) Describe a key: | 495 | (*) Describe a key: |
496 | 496 | ||
497 | long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, | 497 | long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, |
498 | size_t buflen); | 498 | size_t buflen); |
499 | 499 | ||
500 | This function returns a summary of the key's attributes (but not its | 500 | This function returns a summary of the key's attributes (but not its |
501 | payload data) as a string in the buffer provided. | 501 | payload data) as a string in the buffer provided. |
502 | 502 | ||
503 | Unless there's an error, it always returns the amount of data it could | 503 | Unless there's an error, it always returns the amount of data it could |
504 | produce, even if that's too big for the buffer, but it won't copy more | 504 | produce, even if that's too big for the buffer, but it won't copy more |
505 | than requested to userspace. If the buffer pointer is NULL then no copy | 505 | than requested to userspace. If the buffer pointer is NULL then no copy |
506 | will take place. | 506 | will take place. |
507 | 507 | ||
508 | A process must have view permission on the key for this function to be | 508 | A process must have view permission on the key for this function to be |
509 | successful. | 509 | successful. |
510 | 510 | ||
511 | If successful, a string is placed in the buffer in the following format: | 511 | If successful, a string is placed in the buffer in the following format: |
512 | 512 | ||
513 | <type>;<uid>;<gid>;<perm>;<description> | 513 | <type>;<uid>;<gid>;<perm>;<description> |
514 | 514 | ||
515 | Where type and description are strings, uid and gid are decimal, and perm | 515 | Where type and description are strings, uid and gid are decimal, and perm |
516 | is hexadecimal. A NUL character is included at the end of the string if | 516 | is hexadecimal. A NUL character is included at the end of the string if |
517 | the buffer is sufficiently big. | 517 | the buffer is sufficiently big. |
518 | 518 | ||
519 | This can be parsed with | 519 | This can be parsed with |
520 | 520 | ||
521 | sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); | 521 | sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); |
522 | 522 | ||
523 | 523 | ||
524 | (*) Clear out a keyring: | 524 | (*) Clear out a keyring: |
525 | 525 | ||
526 | long keyctl(KEYCTL_CLEAR, key_serial_t keyring); | 526 | long keyctl(KEYCTL_CLEAR, key_serial_t keyring); |
527 | 527 | ||
528 | This function clears the list of keys attached to a keyring. The calling | 528 | This function clears the list of keys attached to a keyring. The calling |
529 | process must have write permission on the keyring, and it must be a | 529 | process must have write permission on the keyring, and it must be a |
530 | keyring (or else error ENOTDIR will result). | 530 | keyring (or else error ENOTDIR will result). |
531 | 531 | ||
532 | 532 | ||
533 | (*) Link a key into a keyring: | 533 | (*) Link a key into a keyring: |
534 | 534 | ||
535 | long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); | 535 | long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); |
536 | 536 | ||
537 | This function creates a link from the keyring to the key. The process must | 537 | This function creates a link from the keyring to the key. The process must |
538 | have write permission on the keyring and must have link permission on the | 538 | have write permission on the keyring and must have link permission on the |
539 | key. | 539 | key. |
540 | 540 | ||
541 | Should the keyring not be a keyring, error ENOTDIR will result; and if the | 541 | Should the keyring not be a keyring, error ENOTDIR will result; and if the |
542 | keyring is full, error ENFILE will result. | 542 | keyring is full, error ENFILE will result. |
543 | 543 | ||
544 | The link procedure checks the nesting of the keyrings, returning ELOOP if | 544 | The link procedure checks the nesting of the keyrings, returning ELOOP if |
545 | it appears too deep or EDEADLK if the link would introduce a cycle. | 545 | it appears too deep or EDEADLK if the link would introduce a cycle. |
546 | 546 | ||
547 | Any links within the keyring to keys that match the new key in terms of | 547 | Any links within the keyring to keys that match the new key in terms of |
548 | type and description will be discarded from the keyring as the new one is | 548 | type and description will be discarded from the keyring as the new one is |
549 | added. | 549 | added. |
550 | 550 | ||
551 | 551 | ||
552 | (*) Unlink a key or keyring from another keyring: | 552 | (*) Unlink a key or keyring from another keyring: |
553 | 553 | ||
554 | long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); | 554 | long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); |
555 | 555 | ||
556 | This function looks through the keyring for the first link to the | 556 | This function looks through the keyring for the first link to the |
557 | specified key, and removes it if found. Subsequent links to that key are | 557 | specified key, and removes it if found. Subsequent links to that key are |
558 | ignored. The process must have write permission on the keyring. | 558 | ignored. The process must have write permission on the keyring. |
559 | 559 | ||
560 | If the keyring is not a keyring, error ENOTDIR will result; and if the key | 560 | If the keyring is not a keyring, error ENOTDIR will result; and if the key |
561 | is not present, error ENOENT will be the result. | 561 | is not present, error ENOENT will be the result. |
562 | 562 | ||
563 | 563 | ||
564 | (*) Search a keyring tree for a key: | 564 | (*) Search a keyring tree for a key: |
565 | 565 | ||
566 | key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, | 566 | key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, |
567 | const char *type, const char *description, | 567 | const char *type, const char *description, |
568 | key_serial_t dest_keyring); | 568 | key_serial_t dest_keyring); |
569 | 569 | ||
570 | This searches the keyring tree headed by the specified keyring until a key | 570 | This searches the keyring tree headed by the specified keyring until a key |
571 | is found that matches the type and description criteria. Each keyring is | 571 | is found that matches the type and description criteria. Each keyring is |
572 | checked for keys before recursion into its children occurs. | 572 | checked for keys before recursion into its children occurs. |
573 | 573 | ||
574 | The process must have search permission on the top level keyring, or else | 574 | The process must have search permission on the top level keyring, or else |
575 | error EACCES will result. Only keyrings that the process has search | 575 | error EACCES will result. Only keyrings that the process has search |
576 | permission on will be recursed into, and only keys and keyrings for which | 576 | permission on will be recursed into, and only keys and keyrings for which |
577 | a process has search permission can be matched. If the specified keyring | 577 | a process has search permission can be matched. If the specified keyring |
578 | is not a keyring, ENOTDIR will result. | 578 | is not a keyring, ENOTDIR will result. |
579 | 579 | ||
580 | If the search succeeds, the function will attempt to link the found key | 580 | If the search succeeds, the function will attempt to link the found key |
581 | into the destination keyring if one is supplied (non-zero ID). All the | 581 | into the destination keyring if one is supplied (non-zero ID). All the |
582 | constraints applicable to KEYCTL_LINK apply in this case too. | 582 | constraints applicable to KEYCTL_LINK apply in this case too. |
583 | 583 | ||
584 | Error ENOKEY, EKEYREVOKED or EKEYEXPIRED will be returned if the search | 584 | Error ENOKEY, EKEYREVOKED or EKEYEXPIRED will be returned if the search |
585 | fails. On success, the resulting key ID will be returned. | 585 | fails. On success, the resulting key ID will be returned. |
586 | 586 | ||
587 | 587 | ||
588 | (*) Read the payload data from a key: | 588 | (*) Read the payload data from a key: |
589 | 589 | ||
590 | long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, | 590 | long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, |
591 | size_t buflen); | 591 | size_t buflen); |
592 | 592 | ||
593 | This function attempts to read the payload data from the specified key | 593 | This function attempts to read the payload data from the specified key |
594 | into the buffer. The process must have read permission on the key to | 594 | into the buffer. The process must have read permission on the key to |
595 | succeed. | 595 | succeed. |
596 | 596 | ||
597 | The returned data will be processed for presentation by the key type. For | 597 | The returned data will be processed for presentation by the key type. For |
598 | instance, a keyring will return an array of key_serial_t entries | 598 | instance, a keyring will return an array of key_serial_t entries |
599 | representing the IDs of all the keys to which it is subscribed. The user | 599 | representing the IDs of all the keys to which it is subscribed. The user |
600 | defined key type will return its data as is. If a key type does not | 600 | defined key type will return its data as is. If a key type does not |
601 | implement this function, error EOPNOTSUPP will result. | 601 | implement this function, error EOPNOTSUPP will result. |
602 | 602 | ||
603 | As much of the data as can be fitted into the buffer will be copied to | 603 | As much of the data as can be fitted into the buffer will be copied to |
604 | userspace if the buffer pointer is not NULL. | 604 | userspace if the buffer pointer is not NULL. |
605 | 605 | ||
606 | On a successful return, the function will always return the amount of data | 606 | On a successful return, the function will always return the amount of data |
607 | available rather than the amount copied. | 607 | available rather than the amount copied. |
608 | 608 | ||
609 | 609 | ||
610 | (*) Instantiate a partially constructed key. | 610 | (*) Instantiate a partially constructed key. |
611 | 611 | ||
612 | long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, | 612 | long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, |
613 | const void *payload, size_t plen, | 613 | const void *payload, size_t plen, |
614 | key_serial_t keyring); | 614 | key_serial_t keyring); |
615 | 615 | ||
616 | If the kernel calls back to userspace to complete the instantiation of a | 616 | If the kernel calls back to userspace to complete the instantiation of a |
617 | key, userspace should use this call to supply data for the key before the | 617 | key, userspace should use this call to supply data for the key before the |
618 | invoked process returns, or else the key will be marked negative | 618 | invoked process returns, or else the key will be marked negative |
619 | automatically. | 619 | automatically. |
620 | 620 | ||
621 | The process must have write access on the key to be able to instantiate | 621 | The process must have write access on the key to be able to instantiate |
622 | it, and the key must be uninstantiated. | 622 | it, and the key must be uninstantiated. |
623 | 623 | ||
624 | If a keyring is specified (non-zero), the key will also be linked into | 624 | If a keyring is specified (non-zero), the key will also be linked into |
625 | that keyring, however all the constraints applying in KEYCTL_LINK apply in | 625 | that keyring, however all the constraints applying in KEYCTL_LINK apply in |
626 | this case too. | 626 | this case too. |
627 | 627 | ||
628 | The payload and plen arguments describe the payload data as for add_key(). | 628 | The payload and plen arguments describe the payload data as for add_key(). |
629 | 629 | ||
630 | 630 | ||
631 | (*) Negatively instantiate a partially constructed key. | 631 | (*) Negatively instantiate a partially constructed key. |
632 | 632 | ||
633 | long keyctl(KEYCTL_NEGATE, key_serial_t key, | 633 | long keyctl(KEYCTL_NEGATE, key_serial_t key, |
634 | unsigned timeout, key_serial_t keyring); | 634 | unsigned timeout, key_serial_t keyring); |
635 | 635 | ||
636 | If the kernel calls back to userspace to complete the instantiation of a | 636 | If the kernel calls back to userspace to complete the instantiation of a |
637 | key, userspace should use this call mark the key as negative before the | 637 | key, userspace should use this call mark the key as negative before the |
638 | invoked process returns if it is unable to fulfil the request. | 638 | invoked process returns if it is unable to fulfil the request. |
639 | 639 | ||
640 | The process must have write access on the key to be able to instantiate | 640 | The process must have write access on the key to be able to instantiate |
641 | it, and the key must be uninstantiated. | 641 | it, and the key must be uninstantiated. |
642 | 642 | ||
643 | If a keyring is specified (non-zero), the key will also be linked into | 643 | If a keyring is specified (non-zero), the key will also be linked into |
644 | that keyring, however all the constraints applying in KEYCTL_LINK apply in | 644 | that keyring, however all the constraints applying in KEYCTL_LINK apply in |
645 | this case too. | 645 | this case too. |
646 | 646 | ||
647 | 647 | ||
648 | (*) Set the default request-key destination keyring. | 648 | (*) Set the default request-key destination keyring. |
649 | 649 | ||
650 | long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); | 650 | long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); |
651 | 651 | ||
652 | This sets the default keyring to which implicitly requested keys will be | 652 | This sets the default keyring to which implicitly requested keys will be |
653 | attached for this thread. reqkey_defl should be one of these constants: | 653 | attached for this thread. reqkey_defl should be one of these constants: |
654 | 654 | ||
655 | CONSTANT VALUE NEW DEFAULT KEYRING | 655 | CONSTANT VALUE NEW DEFAULT KEYRING |
656 | ====================================== ====== ======================= | 656 | ====================================== ====== ======================= |
657 | KEY_REQKEY_DEFL_NO_CHANGE -1 No change | 657 | KEY_REQKEY_DEFL_NO_CHANGE -1 No change |
658 | KEY_REQKEY_DEFL_DEFAULT 0 Default[1] | 658 | KEY_REQKEY_DEFL_DEFAULT 0 Default[1] |
659 | KEY_REQKEY_DEFL_THREAD_KEYRING 1 Thread keyring | 659 | KEY_REQKEY_DEFL_THREAD_KEYRING 1 Thread keyring |
660 | KEY_REQKEY_DEFL_PROCESS_KEYRING 2 Process keyring | 660 | KEY_REQKEY_DEFL_PROCESS_KEYRING 2 Process keyring |
661 | KEY_REQKEY_DEFL_SESSION_KEYRING 3 Session keyring | 661 | KEY_REQKEY_DEFL_SESSION_KEYRING 3 Session keyring |
662 | KEY_REQKEY_DEFL_USER_KEYRING 4 User keyring | 662 | KEY_REQKEY_DEFL_USER_KEYRING 4 User keyring |
663 | KEY_REQKEY_DEFL_USER_SESSION_KEYRING 5 User session keyring | 663 | KEY_REQKEY_DEFL_USER_SESSION_KEYRING 5 User session keyring |
664 | KEY_REQKEY_DEFL_GROUP_KEYRING 6 Group keyring | 664 | KEY_REQKEY_DEFL_GROUP_KEYRING 6 Group keyring |
665 | 665 | ||
666 | The old default will be returned if successful and error EINVAL will be | 666 | The old default will be returned if successful and error EINVAL will be |
667 | returned if reqkey_defl is not one of the above values. | 667 | returned if reqkey_defl is not one of the above values. |
668 | 668 | ||
669 | The default keyring can be overridden by the keyring indicated to the | 669 | The default keyring can be overridden by the keyring indicated to the |
670 | request_key() system call. | 670 | request_key() system call. |
671 | 671 | ||
672 | Note that this setting is inherited across fork/exec. | 672 | Note that this setting is inherited across fork/exec. |
673 | 673 | ||
674 | [1] The default default is: the thread keyring if there is one, otherwise | 674 | [1] The default is: the thread keyring if there is one, otherwise |
675 | the process keyring if there is one, otherwise the session keyring if | 675 | the process keyring if there is one, otherwise the session keyring if |
676 | there is one, otherwise the user default session keyring. | 676 | there is one, otherwise the user default session keyring. |
677 | 677 | ||
678 | 678 | ||
679 | (*) Set the timeout on a key. | 679 | (*) Set the timeout on a key. |
680 | 680 | ||
681 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); | 681 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); |
682 | 682 | ||
683 | This sets or clears the timeout on a key. The timeout can be 0 to clear | 683 | This sets or clears the timeout on a key. The timeout can be 0 to clear |
684 | the timeout or a number of seconds to set the expiry time that far into | 684 | the timeout or a number of seconds to set the expiry time that far into |
685 | the future. | 685 | the future. |
686 | 686 | ||
687 | The process must have attribute modification access on a key to set its | 687 | The process must have attribute modification access on a key to set its |
688 | timeout. Timeouts may not be set with this function on negative, revoked | 688 | timeout. Timeouts may not be set with this function on negative, revoked |
689 | or expired keys. | 689 | or expired keys. |
690 | 690 | ||
691 | 691 | ||
692 | (*) Assume the authority granted to instantiate a key | 692 | (*) Assume the authority granted to instantiate a key |
693 | 693 | ||
694 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); | 694 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); |
695 | 695 | ||
696 | This assumes or divests the authority required to instantiate the | 696 | This assumes or divests the authority required to instantiate the |
697 | specified key. Authority can only be assumed if the thread has the | 697 | specified key. Authority can only be assumed if the thread has the |
698 | authorisation key associated with the specified key in its keyrings | 698 | authorisation key associated with the specified key in its keyrings |
699 | somewhere. | 699 | somewhere. |
700 | 700 | ||
701 | Once authority is assumed, searches for keys will also search the | 701 | Once authority is assumed, searches for keys will also search the |
702 | requester's keyrings using the requester's security label, UID, GID and | 702 | requester's keyrings using the requester's security label, UID, GID and |
703 | groups. | 703 | groups. |
704 | 704 | ||
705 | If the requested authority is unavailable, error EPERM will be returned, | 705 | If the requested authority is unavailable, error EPERM will be returned, |
706 | likewise if the authority has been revoked because the target key is | 706 | likewise if the authority has been revoked because the target key is |
707 | already instantiated. | 707 | already instantiated. |
708 | 708 | ||
709 | If the specified key is 0, then any assumed authority will be divested. | 709 | If the specified key is 0, then any assumed authority will be divested. |
710 | 710 | ||
711 | The assumed authoritative key is inherited across fork and exec. | 711 | The assumed authoritative key is inherited across fork and exec. |
712 | 712 | ||
713 | 713 | ||
714 | =============== | 714 | =============== |
715 | KERNEL SERVICES | 715 | KERNEL SERVICES |
716 | =============== | 716 | =============== |
717 | 717 | ||
718 | The kernel services for key management are fairly simple to deal with. They can | 718 | The kernel services for key management are fairly simple to deal with. They can |
719 | be broken down into two areas: keys and key types. | 719 | be broken down into two areas: keys and key types. |
720 | 720 | ||
721 | Dealing with keys is fairly straightforward. Firstly, the kernel service | 721 | Dealing with keys is fairly straightforward. Firstly, the kernel service |
722 | registers its type, then it searches for a key of that type. It should retain | 722 | registers its type, then it searches for a key of that type. It should retain |
723 | the key as long as it has need of it, and then it should release it. For a | 723 | the key as long as it has need of it, and then it should release it. For a |
724 | filesystem or device file, a search would probably be performed during the open | 724 | filesystem or device file, a search would probably be performed during the open |
725 | call, and the key released upon close. How to deal with conflicting keys due to | 725 | call, and the key released upon close. How to deal with conflicting keys due to |
726 | two different users opening the same file is left to the filesystem author to | 726 | two different users opening the same file is left to the filesystem author to |
727 | solve. | 727 | solve. |
728 | 728 | ||
729 | Note that there are two different types of pointers to keys that may be | 729 | Note that there are two different types of pointers to keys that may be |
730 | encountered: | 730 | encountered: |
731 | 731 | ||
732 | (*) struct key * | 732 | (*) struct key * |
733 | 733 | ||
734 | This simply points to the key structure itself. Key structures will be at | 734 | This simply points to the key structure itself. Key structures will be at |
735 | least four-byte aligned. | 735 | least four-byte aligned. |
736 | 736 | ||
737 | (*) key_ref_t | 737 | (*) key_ref_t |
738 | 738 | ||
739 | This is equivalent to a struct key *, but the least significant bit is set | 739 | This is equivalent to a struct key *, but the least significant bit is set |
740 | if the caller "possesses" the key. By "possession" it is meant that the | 740 | if the caller "possesses" the key. By "possession" it is meant that the |
741 | calling processes has a searchable link to the key from one of its | 741 | calling processes has a searchable link to the key from one of its |
742 | keyrings. There are three functions for dealing with these: | 742 | keyrings. There are three functions for dealing with these: |
743 | 743 | ||
744 | key_ref_t make_key_ref(const struct key *key, | 744 | key_ref_t make_key_ref(const struct key *key, |
745 | unsigned long possession); | 745 | unsigned long possession); |
746 | 746 | ||
747 | struct key *key_ref_to_ptr(const key_ref_t key_ref); | 747 | struct key *key_ref_to_ptr(const key_ref_t key_ref); |
748 | 748 | ||
749 | unsigned long is_key_possessed(const key_ref_t key_ref); | 749 | unsigned long is_key_possessed(const key_ref_t key_ref); |
750 | 750 | ||
751 | The first function constructs a key reference from a key pointer and | 751 | The first function constructs a key reference from a key pointer and |
752 | possession information (which must be 0 or 1 and not any other value). | 752 | possession information (which must be 0 or 1 and not any other value). |
753 | 753 | ||
754 | The second function retrieves the key pointer from a reference and the | 754 | The second function retrieves the key pointer from a reference and the |
755 | third retrieves the possession flag. | 755 | third retrieves the possession flag. |
756 | 756 | ||
757 | When accessing a key's payload contents, certain precautions must be taken to | 757 | When accessing a key's payload contents, certain precautions must be taken to |
758 | prevent access vs modification races. See the section "Notes on accessing | 758 | prevent access vs modification races. See the section "Notes on accessing |
759 | payload contents" for more information. | 759 | payload contents" for more information. |
760 | 760 | ||
761 | (*) To search for a key, call: | 761 | (*) To search for a key, call: |
762 | 762 | ||
763 | struct key *request_key(const struct key_type *type, | 763 | struct key *request_key(const struct key_type *type, |
764 | const char *description, | 764 | const char *description, |
765 | const char *callout_string); | 765 | const char *callout_string); |
766 | 766 | ||
767 | This is used to request a key or keyring with a description that matches | 767 | This is used to request a key or keyring with a description that matches |
768 | the description specified according to the key type's match function. This | 768 | the description specified according to the key type's match function. This |
769 | permits approximate matching to occur. If callout_string is not NULL, then | 769 | permits approximate matching to occur. If callout_string is not NULL, then |
770 | /sbin/request-key will be invoked in an attempt to obtain the key from | 770 | /sbin/request-key will be invoked in an attempt to obtain the key from |
771 | userspace. In that case, callout_string will be passed as an argument to | 771 | userspace. In that case, callout_string will be passed as an argument to |
772 | the program. | 772 | the program. |
773 | 773 | ||
774 | Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be | 774 | Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be |
775 | returned. | 775 | returned. |
776 | 776 | ||
777 | If successful, the key will have been attached to the default keyring for | 777 | If successful, the key will have been attached to the default keyring for |
778 | implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. | 778 | implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. |
779 | 779 | ||
780 | See also Documentation/keys-request-key.txt. | 780 | See also Documentation/keys-request-key.txt. |
781 | 781 | ||
782 | 782 | ||
783 | (*) To search for a key, passing auxiliary data to the upcaller, call: | 783 | (*) To search for a key, passing auxiliary data to the upcaller, call: |
784 | 784 | ||
785 | struct key *request_key_with_auxdata(const struct key_type *type, | 785 | struct key *request_key_with_auxdata(const struct key_type *type, |
786 | const char *description, | 786 | const char *description, |
787 | const char *callout_string, | 787 | const char *callout_string, |
788 | void *aux); | 788 | void *aux); |
789 | 789 | ||
790 | This is identical to request_key(), except that the auxiliary data is | 790 | This is identical to request_key(), except that the auxiliary data is |
791 | passed to the key_type->request_key() op if it exists. | 791 | passed to the key_type->request_key() op if it exists. |
792 | 792 | ||
793 | 793 | ||
794 | (*) When it is no longer required, the key should be released using: | 794 | (*) When it is no longer required, the key should be released using: |
795 | 795 | ||
796 | void key_put(struct key *key); | 796 | void key_put(struct key *key); |
797 | 797 | ||
798 | Or: | 798 | Or: |
799 | 799 | ||
800 | void key_ref_put(key_ref_t key_ref); | 800 | void key_ref_put(key_ref_t key_ref); |
801 | 801 | ||
802 | These can be called from interrupt context. If CONFIG_KEYS is not set then | 802 | These can be called from interrupt context. If CONFIG_KEYS is not set then |
803 | the argument will not be parsed. | 803 | the argument will not be parsed. |
804 | 804 | ||
805 | 805 | ||
806 | (*) Extra references can be made to a key by calling the following function: | 806 | (*) Extra references can be made to a key by calling the following function: |
807 | 807 | ||
808 | struct key *key_get(struct key *key); | 808 | struct key *key_get(struct key *key); |
809 | 809 | ||
810 | These need to be disposed of by calling key_put() when they've been | 810 | These need to be disposed of by calling key_put() when they've been |
811 | finished with. The key pointer passed in will be returned. If the pointer | 811 | finished with. The key pointer passed in will be returned. If the pointer |
812 | is NULL or CONFIG_KEYS is not set then the key will not be dereferenced and | 812 | is NULL or CONFIG_KEYS is not set then the key will not be dereferenced and |
813 | no increment will take place. | 813 | no increment will take place. |
814 | 814 | ||
815 | 815 | ||
816 | (*) A key's serial number can be obtained by calling: | 816 | (*) A key's serial number can be obtained by calling: |
817 | 817 | ||
818 | key_serial_t key_serial(struct key *key); | 818 | key_serial_t key_serial(struct key *key); |
819 | 819 | ||
820 | If key is NULL or if CONFIG_KEYS is not set then 0 will be returned (in the | 820 | If key is NULL or if CONFIG_KEYS is not set then 0 will be returned (in the |
821 | latter case without parsing the argument). | 821 | latter case without parsing the argument). |
822 | 822 | ||
823 | 823 | ||
824 | (*) If a keyring was found in the search, this can be further searched by: | 824 | (*) If a keyring was found in the search, this can be further searched by: |
825 | 825 | ||
826 | key_ref_t keyring_search(key_ref_t keyring_ref, | 826 | key_ref_t keyring_search(key_ref_t keyring_ref, |
827 | const struct key_type *type, | 827 | const struct key_type *type, |
828 | const char *description) | 828 | const char *description) |
829 | 829 | ||
830 | This searches the keyring tree specified for a matching key. Error ENOKEY | 830 | This searches the keyring tree specified for a matching key. Error ENOKEY |
831 | is returned upon failure (use IS_ERR/PTR_ERR to determine). If successful, | 831 | is returned upon failure (use IS_ERR/PTR_ERR to determine). If successful, |
832 | the returned key will need to be released. | 832 | the returned key will need to be released. |
833 | 833 | ||
834 | The possession attribute from the keyring reference is used to control | 834 | The possession attribute from the keyring reference is used to control |
835 | access through the permissions mask and is propagated to the returned key | 835 | access through the permissions mask and is propagated to the returned key |
836 | reference pointer if successful. | 836 | reference pointer if successful. |
837 | 837 | ||
838 | 838 | ||
839 | (*) To check the validity of a key, this function can be called: | 839 | (*) To check the validity of a key, this function can be called: |
840 | 840 | ||
841 | int validate_key(struct key *key); | 841 | int validate_key(struct key *key); |
842 | 842 | ||
843 | This checks that the key in question hasn't expired or and hasn't been | 843 | This checks that the key in question hasn't expired or and hasn't been |
844 | revoked. Should the key be invalid, error EKEYEXPIRED or EKEYREVOKED will | 844 | revoked. Should the key be invalid, error EKEYEXPIRED or EKEYREVOKED will |
845 | be returned. If the key is NULL or if CONFIG_KEYS is not set then 0 will be | 845 | be returned. If the key is NULL or if CONFIG_KEYS is not set then 0 will be |
846 | returned (in the latter case without parsing the argument). | 846 | returned (in the latter case without parsing the argument). |
847 | 847 | ||
848 | 848 | ||
849 | (*) To register a key type, the following function should be called: | 849 | (*) To register a key type, the following function should be called: |
850 | 850 | ||
851 | int register_key_type(struct key_type *type); | 851 | int register_key_type(struct key_type *type); |
852 | 852 | ||
853 | This will return error EEXIST if a type of the same name is already | 853 | This will return error EEXIST if a type of the same name is already |
854 | present. | 854 | present. |
855 | 855 | ||
856 | 856 | ||
857 | (*) To unregister a key type, call: | 857 | (*) To unregister a key type, call: |
858 | 858 | ||
859 | void unregister_key_type(struct key_type *type); | 859 | void unregister_key_type(struct key_type *type); |
860 | 860 | ||
861 | 861 | ||
862 | =================================== | 862 | =================================== |
863 | NOTES ON ACCESSING PAYLOAD CONTENTS | 863 | NOTES ON ACCESSING PAYLOAD CONTENTS |
864 | =================================== | 864 | =================================== |
865 | 865 | ||
866 | The simplest payload is just a number in key->payload.value. In this case, | 866 | The simplest payload is just a number in key->payload.value. In this case, |
867 | there's no need to indulge in RCU or locking when accessing the payload. | 867 | there's no need to indulge in RCU or locking when accessing the payload. |
868 | 868 | ||
869 | More complex payload contents must be allocated and a pointer to them set in | 869 | More complex payload contents must be allocated and a pointer to them set in |
870 | key->payload.data. One of the following ways must be selected to access the | 870 | key->payload.data. One of the following ways must be selected to access the |
871 | data: | 871 | data: |
872 | 872 | ||
873 | (1) Unmodifiable key type. | 873 | (1) Unmodifiable key type. |
874 | 874 | ||
875 | If the key type does not have a modify method, then the key's payload can | 875 | If the key type does not have a modify method, then the key's payload can |
876 | be accessed without any form of locking, provided that it's known to be | 876 | be accessed without any form of locking, provided that it's known to be |
877 | instantiated (uninstantiated keys cannot be "found"). | 877 | instantiated (uninstantiated keys cannot be "found"). |
878 | 878 | ||
879 | (2) The key's semaphore. | 879 | (2) The key's semaphore. |
880 | 880 | ||
881 | The semaphore could be used to govern access to the payload and to control | 881 | The semaphore could be used to govern access to the payload and to control |
882 | the payload pointer. It must be write-locked for modifications and would | 882 | the payload pointer. It must be write-locked for modifications and would |
883 | have to be read-locked for general access. The disadvantage of doing this | 883 | have to be read-locked for general access. The disadvantage of doing this |
884 | is that the accessor may be required to sleep. | 884 | is that the accessor may be required to sleep. |
885 | 885 | ||
886 | (3) RCU. | 886 | (3) RCU. |
887 | 887 | ||
888 | RCU must be used when the semaphore isn't already held; if the semaphore | 888 | RCU must be used when the semaphore isn't already held; if the semaphore |
889 | is held then the contents can't change under you unexpectedly as the | 889 | is held then the contents can't change under you unexpectedly as the |
890 | semaphore must still be used to serialise modifications to the key. The | 890 | semaphore must still be used to serialise modifications to the key. The |
891 | key management code takes care of this for the key type. | 891 | key management code takes care of this for the key type. |
892 | 892 | ||
893 | However, this means using: | 893 | However, this means using: |
894 | 894 | ||
895 | rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() | 895 | rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() |
896 | 896 | ||
897 | to read the pointer, and: | 897 | to read the pointer, and: |
898 | 898 | ||
899 | rcu_dereference() ... rcu_assign_pointer() ... call_rcu() | 899 | rcu_dereference() ... rcu_assign_pointer() ... call_rcu() |
900 | 900 | ||
901 | to set the pointer and dispose of the old contents after a grace period. | 901 | to set the pointer and dispose of the old contents after a grace period. |
902 | Note that only the key type should ever modify a key's payload. | 902 | Note that only the key type should ever modify a key's payload. |
903 | 903 | ||
904 | Furthermore, an RCU controlled payload must hold a struct rcu_head for the | 904 | Furthermore, an RCU controlled payload must hold a struct rcu_head for the |
905 | use of call_rcu() and, if the payload is of variable size, the length of | 905 | use of call_rcu() and, if the payload is of variable size, the length of |
906 | the payload. key->datalen cannot be relied upon to be consistent with the | 906 | the payload. key->datalen cannot be relied upon to be consistent with the |
907 | payload just dereferenced if the key's semaphore is not held. | 907 | payload just dereferenced if the key's semaphore is not held. |
908 | 908 | ||
909 | 909 | ||
910 | =================== | 910 | =================== |
911 | DEFINING A KEY TYPE | 911 | DEFINING A KEY TYPE |
912 | =================== | 912 | =================== |
913 | 913 | ||
914 | A kernel service may want to define its own key type. For instance, an AFS | 914 | A kernel service may want to define its own key type. For instance, an AFS |
915 | filesystem might want to define a Kerberos 5 ticket key type. To do this, it | 915 | filesystem might want to define a Kerberos 5 ticket key type. To do this, it |
916 | author fills in a struct key_type and registers it with the system. | 916 | author fills in a struct key_type and registers it with the system. |
917 | 917 | ||
918 | The structure has a number of fields, some of which are mandatory: | 918 | The structure has a number of fields, some of which are mandatory: |
919 | 919 | ||
920 | (*) const char *name | 920 | (*) const char *name |
921 | 921 | ||
922 | The name of the key type. This is used to translate a key type name | 922 | The name of the key type. This is used to translate a key type name |
923 | supplied by userspace into a pointer to the structure. | 923 | supplied by userspace into a pointer to the structure. |
924 | 924 | ||
925 | 925 | ||
926 | (*) size_t def_datalen | 926 | (*) size_t def_datalen |
927 | 927 | ||
928 | This is optional - it supplies the default payload data length as | 928 | This is optional - it supplies the default payload data length as |
929 | contributed to the quota. If the key type's payload is always or almost | 929 | contributed to the quota. If the key type's payload is always or almost |
930 | always the same size, then this is a more efficient way to do things. | 930 | always the same size, then this is a more efficient way to do things. |
931 | 931 | ||
932 | The data length (and quota) on a particular key can always be changed | 932 | The data length (and quota) on a particular key can always be changed |
933 | during instantiation or update by calling: | 933 | during instantiation or update by calling: |
934 | 934 | ||
935 | int key_payload_reserve(struct key *key, size_t datalen); | 935 | int key_payload_reserve(struct key *key, size_t datalen); |
936 | 936 | ||
937 | With the revised data length. Error EDQUOT will be returned if this is not | 937 | With the revised data length. Error EDQUOT will be returned if this is not |
938 | viable. | 938 | viable. |
939 | 939 | ||
940 | 940 | ||
941 | (*) int (*instantiate)(struct key *key, const void *data, size_t datalen); | 941 | (*) int (*instantiate)(struct key *key, const void *data, size_t datalen); |
942 | 942 | ||
943 | This method is called to attach a payload to a key during construction. | 943 | This method is called to attach a payload to a key during construction. |
944 | The payload attached need not bear any relation to the data passed to this | 944 | The payload attached need not bear any relation to the data passed to this |
945 | function. | 945 | function. |
946 | 946 | ||
947 | If the amount of data attached to the key differs from the size in | 947 | If the amount of data attached to the key differs from the size in |
948 | keytype->def_datalen, then key_payload_reserve() should be called. | 948 | keytype->def_datalen, then key_payload_reserve() should be called. |
949 | 949 | ||
950 | This method does not have to lock the key in order to attach a payload. | 950 | This method does not have to lock the key in order to attach a payload. |
951 | The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents | 951 | The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents |
952 | anything else from gaining access to the key. | 952 | anything else from gaining access to the key. |
953 | 953 | ||
954 | It is safe to sleep in this method. | 954 | It is safe to sleep in this method. |
955 | 955 | ||
956 | 956 | ||
957 | (*) int (*update)(struct key *key, const void *data, size_t datalen); | 957 | (*) int (*update)(struct key *key, const void *data, size_t datalen); |
958 | 958 | ||
959 | If this type of key can be updated, then this method should be provided. | 959 | If this type of key can be updated, then this method should be provided. |
960 | It is called to update a key's payload from the blob of data provided. | 960 | It is called to update a key's payload from the blob of data provided. |
961 | 961 | ||
962 | key_payload_reserve() should be called if the data length might change | 962 | key_payload_reserve() should be called if the data length might change |
963 | before any changes are actually made. Note that if this succeeds, the type | 963 | before any changes are actually made. Note that if this succeeds, the type |
964 | is committed to changing the key because it's already been altered, so all | 964 | is committed to changing the key because it's already been altered, so all |
965 | memory allocation must be done first. | 965 | memory allocation must be done first. |
966 | 966 | ||
967 | The key will have its semaphore write-locked before this method is called, | 967 | The key will have its semaphore write-locked before this method is called, |
968 | but this only deters other writers; any changes to the key's payload must | 968 | but this only deters other writers; any changes to the key's payload must |
969 | be made under RCU conditions, and call_rcu() must be used to dispose of | 969 | be made under RCU conditions, and call_rcu() must be used to dispose of |
970 | the old payload. | 970 | the old payload. |
971 | 971 | ||
972 | key_payload_reserve() should be called before the changes are made, but | 972 | key_payload_reserve() should be called before the changes are made, but |
973 | after all allocations and other potentially failing function calls are | 973 | after all allocations and other potentially failing function calls are |
974 | made. | 974 | made. |
975 | 975 | ||
976 | It is safe to sleep in this method. | 976 | It is safe to sleep in this method. |
977 | 977 | ||
978 | 978 | ||
979 | (*) int (*match)(const struct key *key, const void *desc); | 979 | (*) int (*match)(const struct key *key, const void *desc); |
980 | 980 | ||
981 | This method is called to match a key against a description. It should | 981 | This method is called to match a key against a description. It should |
982 | return non-zero if the two match, zero if they don't. | 982 | return non-zero if the two match, zero if they don't. |
983 | 983 | ||
984 | This method should not need to lock the key in any way. The type and | 984 | This method should not need to lock the key in any way. The type and |
985 | description can be considered invariant, and the payload should not be | 985 | description can be considered invariant, and the payload should not be |
986 | accessed (the key may not yet be instantiated). | 986 | accessed (the key may not yet be instantiated). |
987 | 987 | ||
988 | It is not safe to sleep in this method; the caller may hold spinlocks. | 988 | It is not safe to sleep in this method; the caller may hold spinlocks. |
989 | 989 | ||
990 | 990 | ||
991 | (*) void (*revoke)(struct key *key); | 991 | (*) void (*revoke)(struct key *key); |
992 | 992 | ||
993 | This method is optional. It is called to discard part of the payload | 993 | This method is optional. It is called to discard part of the payload |
994 | data upon a key being revoked. The caller will have the key semaphore | 994 | data upon a key being revoked. The caller will have the key semaphore |
995 | write-locked. | 995 | write-locked. |
996 | 996 | ||
997 | It is safe to sleep in this method, though care should be taken to avoid | 997 | It is safe to sleep in this method, though care should be taken to avoid |
998 | a deadlock against the key semaphore. | 998 | a deadlock against the key semaphore. |
999 | 999 | ||
1000 | 1000 | ||
1001 | (*) void (*destroy)(struct key *key); | 1001 | (*) void (*destroy)(struct key *key); |
1002 | 1002 | ||
1003 | This method is optional. It is called to discard the payload data on a key | 1003 | This method is optional. It is called to discard the payload data on a key |
1004 | when it is being destroyed. | 1004 | when it is being destroyed. |
1005 | 1005 | ||
1006 | This method does not need to lock the key to access the payload; it can | 1006 | This method does not need to lock the key to access the payload; it can |
1007 | consider the key as being inaccessible at this time. Note that the key's | 1007 | consider the key as being inaccessible at this time. Note that the key's |
1008 | type may have been changed before this function is called. | 1008 | type may have been changed before this function is called. |
1009 | 1009 | ||
1010 | It is not safe to sleep in this method; the caller may hold spinlocks. | 1010 | It is not safe to sleep in this method; the caller may hold spinlocks. |
1011 | 1011 | ||
1012 | 1012 | ||
1013 | (*) void (*describe)(const struct key *key, struct seq_file *p); | 1013 | (*) void (*describe)(const struct key *key, struct seq_file *p); |
1014 | 1014 | ||
1015 | This method is optional. It is called during /proc/keys reading to | 1015 | This method is optional. It is called during /proc/keys reading to |
1016 | summarise a key's description and payload in text form. | 1016 | summarise a key's description and payload in text form. |
1017 | 1017 | ||
1018 | This method will be called with the RCU read lock held. rcu_dereference() | 1018 | This method will be called with the RCU read lock held. rcu_dereference() |
1019 | should be used to read the payload pointer if the payload is to be | 1019 | should be used to read the payload pointer if the payload is to be |
1020 | accessed. key->datalen cannot be trusted to stay consistent with the | 1020 | accessed. key->datalen cannot be trusted to stay consistent with the |
1021 | contents of the payload. | 1021 | contents of the payload. |
1022 | 1022 | ||
1023 | The description will not change, though the key's state may. | 1023 | The description will not change, though the key's state may. |
1024 | 1024 | ||
1025 | It is not safe to sleep in this method; the RCU read lock is held by the | 1025 | It is not safe to sleep in this method; the RCU read lock is held by the |
1026 | caller. | 1026 | caller. |
1027 | 1027 | ||
1028 | 1028 | ||
1029 | (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); | 1029 | (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); |
1030 | 1030 | ||
1031 | This method is optional. It is called by KEYCTL_READ to translate the | 1031 | This method is optional. It is called by KEYCTL_READ to translate the |
1032 | key's payload into something a blob of data for userspace to deal with. | 1032 | key's payload into something a blob of data for userspace to deal with. |
1033 | Ideally, the blob should be in the same format as that passed in to the | 1033 | Ideally, the blob should be in the same format as that passed in to the |
1034 | instantiate and update methods. | 1034 | instantiate and update methods. |
1035 | 1035 | ||
1036 | If successful, the blob size that could be produced should be returned | 1036 | If successful, the blob size that could be produced should be returned |
1037 | rather than the size copied. | 1037 | rather than the size copied. |
1038 | 1038 | ||
1039 | This method will be called with the key's semaphore read-locked. This will | 1039 | This method will be called with the key's semaphore read-locked. This will |
1040 | prevent the key's payload changing. It is not necessary to use RCU locking | 1040 | prevent the key's payload changing. It is not necessary to use RCU locking |
1041 | when accessing the key's payload. It is safe to sleep in this method, such | 1041 | when accessing the key's payload. It is safe to sleep in this method, such |
1042 | as might happen when the userspace buffer is accessed. | 1042 | as might happen when the userspace buffer is accessed. |
1043 | 1043 | ||
1044 | 1044 | ||
1045 | (*) int (*request_key)(struct key *key, struct key *authkey, const char *op, | 1045 | (*) int (*request_key)(struct key *key, struct key *authkey, const char *op, |
1046 | void *aux); | 1046 | void *aux); |
1047 | 1047 | ||
1048 | This method is optional. If provided, request_key() and | 1048 | This method is optional. If provided, request_key() and |
1049 | request_key_with_auxdata() will invoke this function rather than | 1049 | request_key_with_auxdata() will invoke this function rather than |
1050 | upcalling to /sbin/request-key to operate upon a key of this type. | 1050 | upcalling to /sbin/request-key to operate upon a key of this type. |
1051 | 1051 | ||
1052 | The aux parameter is as passed to request_key_with_auxdata() or is NULL | 1052 | The aux parameter is as passed to request_key_with_auxdata() or is NULL |
1053 | otherwise. Also passed are the key to be operated upon, the | 1053 | otherwise. Also passed are the key to be operated upon, the |
1054 | authorisation key for this operation and the operation type (currently | 1054 | authorisation key for this operation and the operation type (currently |
1055 | only "create"). | 1055 | only "create"). |
1056 | 1056 | ||
1057 | This function should return only when the upcall is complete. Upon return | 1057 | This function should return only when the upcall is complete. Upon return |
1058 | the authorisation key will be revoked, and the target key will be | 1058 | the authorisation key will be revoked, and the target key will be |
1059 | negatively instantiated if it is still uninstantiated. The error will be | 1059 | negatively instantiated if it is still uninstantiated. The error will be |
1060 | returned to the caller of request_key*(). | 1060 | returned to the caller of request_key*(). |
1061 | 1061 | ||
1062 | 1062 | ||
1063 | ============================ | 1063 | ============================ |
1064 | REQUEST-KEY CALLBACK SERVICE | 1064 | REQUEST-KEY CALLBACK SERVICE |
1065 | ============================ | 1065 | ============================ |
1066 | 1066 | ||
1067 | To create a new key, the kernel will attempt to execute the following command | 1067 | To create a new key, the kernel will attempt to execute the following command |
1068 | line: | 1068 | line: |
1069 | 1069 | ||
1070 | /sbin/request-key create <key> <uid> <gid> \ | 1070 | /sbin/request-key create <key> <uid> <gid> \ |
1071 | <threadring> <processring> <sessionring> <callout_info> | 1071 | <threadring> <processring> <sessionring> <callout_info> |
1072 | 1072 | ||
1073 | <key> is the key being constructed, and the three keyrings are the process | 1073 | <key> is the key being constructed, and the three keyrings are the process |
1074 | keyrings from the process that caused the search to be issued. These are | 1074 | keyrings from the process that caused the search to be issued. These are |
1075 | included for two reasons: | 1075 | included for two reasons: |
1076 | 1076 | ||
1077 | (1) There may be an authentication token in one of the keyrings that is | 1077 | (1) There may be an authentication token in one of the keyrings that is |
1078 | required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. | 1078 | required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. |
1079 | 1079 | ||
1080 | (2) The new key should probably be cached in one of these rings. | 1080 | (2) The new key should probably be cached in one of these rings. |
1081 | 1081 | ||
1082 | This program should set it UID and GID to those specified before attempting to | 1082 | This program should set it UID and GID to those specified before attempting to |
1083 | access any more keys. It may then look around for a user specific process to | 1083 | access any more keys. It may then look around for a user specific process to |
1084 | hand the request off to (perhaps a path held in placed in another key by, for | 1084 | hand the request off to (perhaps a path held in placed in another key by, for |
1085 | example, the KDE desktop manager). | 1085 | example, the KDE desktop manager). |
1086 | 1086 | ||
1087 | The program (or whatever it calls) should finish construction of the key by | 1087 | The program (or whatever it calls) should finish construction of the key by |
1088 | calling KEYCTL_INSTANTIATE, which also permits it to cache the key in one of | 1088 | calling KEYCTL_INSTANTIATE, which also permits it to cache the key in one of |
1089 | the keyrings (probably the session ring) before returning. Alternatively, the | 1089 | the keyrings (probably the session ring) before returning. Alternatively, the |
1090 | key can be marked as negative with KEYCTL_NEGATE; this also permits the key to | 1090 | key can be marked as negative with KEYCTL_NEGATE; this also permits the key to |
1091 | be cached in one of the keyrings. | 1091 | be cached in one of the keyrings. |
1092 | 1092 | ||
1093 | If it returns with the key remaining in the unconstructed state, the key will | 1093 | If it returns with the key remaining in the unconstructed state, the key will |
1094 | be marked as being negative, it will be added to the session keyring, and an | 1094 | be marked as being negative, it will be added to the session keyring, and an |
1095 | error will be returned to the key requestor. | 1095 | error will be returned to the key requestor. |
1096 | 1096 | ||
1097 | Supplementary information may be provided from whoever or whatever invoked this | 1097 | Supplementary information may be provided from whoever or whatever invoked this |
1098 | service. This will be passed as the <callout_info> parameter. If no such | 1098 | service. This will be passed as the <callout_info> parameter. If no such |
1099 | information was made available, then "-" will be passed as this parameter | 1099 | information was made available, then "-" will be passed as this parameter |
1100 | instead. | 1100 | instead. |
1101 | 1101 | ||
1102 | 1102 | ||
1103 | Similarly, the kernel may attempt to update an expired or a soon to expire key | 1103 | Similarly, the kernel may attempt to update an expired or a soon to expire key |
1104 | by executing: | 1104 | by executing: |
1105 | 1105 | ||
1106 | /sbin/request-key update <key> <uid> <gid> \ | 1106 | /sbin/request-key update <key> <uid> <gid> \ |
1107 | <threadring> <processring> <sessionring> | 1107 | <threadring> <processring> <sessionring> |
1108 | 1108 | ||
1109 | In this case, the program isn't required to actually attach the key to a ring; | 1109 | In this case, the program isn't required to actually attach the key to a ring; |
1110 | the rings are provided for reference. | 1110 | the rings are provided for reference. |
1111 | 1111 |
Documentation/m68k/kernel-options.txt
1 | 1 | ||
2 | 2 | ||
3 | Command Line Options for Linux/m68k | 3 | Command Line Options for Linux/m68k |
4 | =================================== | 4 | =================================== |
5 | 5 | ||
6 | Last Update: 2 May 1999 | 6 | Last Update: 2 May 1999 |
7 | Linux/m68k version: 2.2.6 | 7 | Linux/m68k version: 2.2.6 |
8 | Author: Roman.Hodek@informatik.uni-erlangen.de (Roman Hodek) | 8 | Author: Roman.Hodek@informatik.uni-erlangen.de (Roman Hodek) |
9 | Update: jds@kom.auc.dk (Jes Sorensen) and faq@linux-m68k.org (Chris Lawrence) | 9 | Update: jds@kom.auc.dk (Jes Sorensen) and faq@linux-m68k.org (Chris Lawrence) |
10 | 10 | ||
11 | 0) Introduction | 11 | 0) Introduction |
12 | =============== | 12 | =============== |
13 | 13 | ||
14 | Often I've been asked which command line options the Linux/m68k | 14 | Often I've been asked which command line options the Linux/m68k |
15 | kernel understands, or how the exact syntax for the ... option is, or | 15 | kernel understands, or how the exact syntax for the ... option is, or |
16 | ... about the option ... . I hope, this document supplies all the | 16 | ... about the option ... . I hope, this document supplies all the |
17 | answers... | 17 | answers... |
18 | 18 | ||
19 | Note that some options might be outdated, their descriptions being | 19 | Note that some options might be outdated, their descriptions being |
20 | incomplete or missing. Please update the information and send in the | 20 | incomplete or missing. Please update the information and send in the |
21 | patches. | 21 | patches. |
22 | 22 | ||
23 | 23 | ||
24 | 1) Overview of the Kernel's Option Processing | 24 | 1) Overview of the Kernel's Option Processing |
25 | ============================================= | 25 | ============================================= |
26 | 26 | ||
27 | The kernel knows three kinds of options on its command line: | 27 | The kernel knows three kinds of options on its command line: |
28 | 28 | ||
29 | 1) kernel options | 29 | 1) kernel options |
30 | 2) environment settings | 30 | 2) environment settings |
31 | 3) arguments for init | 31 | 3) arguments for init |
32 | 32 | ||
33 | To which of these classes an argument belongs is determined as | 33 | To which of these classes an argument belongs is determined as |
34 | follows: If the option is known to the kernel itself, i.e. if the name | 34 | follows: If the option is known to the kernel itself, i.e. if the name |
35 | (the part before the '=') or, in some cases, the whole argument string | 35 | (the part before the '=') or, in some cases, the whole argument string |
36 | is known to the kernel, it belongs to class 1. Otherwise, if the | 36 | is known to the kernel, it belongs to class 1. Otherwise, if the |
37 | argument contains an '=', it is of class 2, and the definition is put | 37 | argument contains an '=', it is of class 2, and the definition is put |
38 | into init's environment. All other arguments are passed to init as | 38 | into init's environment. All other arguments are passed to init as |
39 | command line options. | 39 | command line options. |
40 | 40 | ||
41 | This document describes the valid kernel options for Linux/m68k in | 41 | This document describes the valid kernel options for Linux/m68k in |
42 | the version mentioned at the start of this file. Later revisions may | 42 | the version mentioned at the start of this file. Later revisions may |
43 | add new such options, and some may be missing in older versions. | 43 | add new such options, and some may be missing in older versions. |
44 | 44 | ||
45 | In general, the value (the part after the '=') of an option is a | 45 | In general, the value (the part after the '=') of an option is a |
46 | list of values separated by commas. The interpretation of these values | 46 | list of values separated by commas. The interpretation of these values |
47 | is up to the driver that "owns" the option. This association of | 47 | is up to the driver that "owns" the option. This association of |
48 | options with drivers is also the reason that some are further | 48 | options with drivers is also the reason that some are further |
49 | subdivided. | 49 | subdivided. |
50 | 50 | ||
51 | 51 | ||
52 | 2) General Kernel Options | 52 | 2) General Kernel Options |
53 | ========================= | 53 | ========================= |
54 | 54 | ||
55 | 2.1) root= | 55 | 2.1) root= |
56 | ---------- | 56 | ---------- |
57 | 57 | ||
58 | Syntax: root=/dev/<device> | 58 | Syntax: root=/dev/<device> |
59 | or: root=<hex_number> | 59 | or: root=<hex_number> |
60 | 60 | ||
61 | This tells the kernel which device it should mount as the root | 61 | This tells the kernel which device it should mount as the root |
62 | filesystem. The device must be a block device with a valid filesystem | 62 | filesystem. The device must be a block device with a valid filesystem |
63 | on it. | 63 | on it. |
64 | 64 | ||
65 | The first syntax gives the device by name. These names are converted | 65 | The first syntax gives the device by name. These names are converted |
66 | into a major/minor number internally in the kernel in an unusual way. | 66 | into a major/minor number internally in the kernel in an unusual way. |
67 | Normally, this "conversion" is done by the device files in /dev, but | 67 | Normally, this "conversion" is done by the device files in /dev, but |
68 | this isn't possible here, because the root filesystem (with /dev) | 68 | this isn't possible here, because the root filesystem (with /dev) |
69 | isn't mounted yet... So the kernel parses the name itself, with some | 69 | isn't mounted yet... So the kernel parses the name itself, with some |
70 | hardcoded name to number mappings. The name must always be a | 70 | hardcoded name to number mappings. The name must always be a |
71 | combination of two or three letters, followed by a decimal number. | 71 | combination of two or three letters, followed by a decimal number. |
72 | Valid names are: | 72 | Valid names are: |
73 | 73 | ||
74 | /dev/ram: -> 0x0100 (initial ramdisk) | 74 | /dev/ram: -> 0x0100 (initial ramdisk) |
75 | /dev/hda: -> 0x0300 (first IDE disk) | 75 | /dev/hda: -> 0x0300 (first IDE disk) |
76 | /dev/hdb: -> 0x0340 (second IDE disk) | 76 | /dev/hdb: -> 0x0340 (second IDE disk) |
77 | /dev/sda: -> 0x0800 (first SCSI disk) | 77 | /dev/sda: -> 0x0800 (first SCSI disk) |
78 | /dev/sdb: -> 0x0810 (second SCSI disk) | 78 | /dev/sdb: -> 0x0810 (second SCSI disk) |
79 | /dev/sdc: -> 0x0820 (third SCSI disk) | 79 | /dev/sdc: -> 0x0820 (third SCSI disk) |
80 | /dev/sdd: -> 0x0830 (forth SCSI disk) | 80 | /dev/sdd: -> 0x0830 (forth SCSI disk) |
81 | /dev/sde: -> 0x0840 (fifth SCSI disk) | 81 | /dev/sde: -> 0x0840 (fifth SCSI disk) |
82 | /dev/fd : -> 0x0200 (floppy disk) | 82 | /dev/fd : -> 0x0200 (floppy disk) |
83 | /dev/xda: -> 0x0c00 (first XT disk, unused in Linux/m68k) | 83 | /dev/xda: -> 0x0c00 (first XT disk, unused in Linux/m68k) |
84 | /dev/xdb: -> 0x0c40 (second XT disk, unused in Linux/m68k) | 84 | /dev/xdb: -> 0x0c40 (second XT disk, unused in Linux/m68k) |
85 | /dev/ada: -> 0x1c00 (first ACSI device) | 85 | /dev/ada: -> 0x1c00 (first ACSI device) |
86 | /dev/adb: -> 0x1c10 (second ACSI device) | 86 | /dev/adb: -> 0x1c10 (second ACSI device) |
87 | /dev/adc: -> 0x1c20 (third ACSI device) | 87 | /dev/adc: -> 0x1c20 (third ACSI device) |
88 | /dev/add: -> 0x1c30 (forth ACSI device) | 88 | /dev/add: -> 0x1c30 (forth ACSI device) |
89 | 89 | ||
90 | The last four names are available only if the kernel has been compiled | 90 | The last four names are available only if the kernel has been compiled |
91 | with Atari and ACSI support. | 91 | with Atari and ACSI support. |
92 | 92 | ||
93 | The name must be followed by a decimal number, that stands for the | 93 | The name must be followed by a decimal number, that stands for the |
94 | partition number. Internally, the value of the number is just | 94 | partition number. Internally, the value of the number is just |
95 | added to the device number mentioned in the table above. The | 95 | added to the device number mentioned in the table above. The |
96 | exceptions are /dev/ram and /dev/fd, where /dev/ram refers to an | 96 | exceptions are /dev/ram and /dev/fd, where /dev/ram refers to an |
97 | initial ramdisk loaded by your bootstrap program (please consult the | 97 | initial ramdisk loaded by your bootstrap program (please consult the |
98 | instructions for your bootstrap program to find out how to load an | 98 | instructions for your bootstrap program to find out how to load an |
99 | initial ramdisk). As of kernel version 2.0.18 you must specify | 99 | initial ramdisk). As of kernel version 2.0.18 you must specify |
100 | /dev/ram as the root device if you want to boot from an initial | 100 | /dev/ram as the root device if you want to boot from an initial |
101 | ramdisk. For the floppy devices, /dev/fd, the number stands for the | 101 | ramdisk. For the floppy devices, /dev/fd, the number stands for the |
102 | floppy drive number (there are no partitions on floppy disks). I.e., | 102 | floppy drive number (there are no partitions on floppy disks). I.e., |
103 | /dev/fd0 stands for the first drive, /dev/fd1 for the second, and so | 103 | /dev/fd0 stands for the first drive, /dev/fd1 for the second, and so |
104 | on. Since the number is just added, you can also force the disk format | 104 | on. Since the number is just added, you can also force the disk format |
105 | by adding a number greater than 3. If you look into your /dev | 105 | by adding a number greater than 3. If you look into your /dev |
106 | directory, use can see the /dev/fd0D720 has major 2 and minor 16. You | 106 | directory, use can see the /dev/fd0D720 has major 2 and minor 16. You |
107 | can specify this device for the root FS by writing "root=/dev/fd16" on | 107 | can specify this device for the root FS by writing "root=/dev/fd16" on |
108 | the kernel command line. | 108 | the kernel command line. |
109 | 109 | ||
110 | [Strange and maybe uninteresting stuff ON] | 110 | [Strange and maybe uninteresting stuff ON] |
111 | 111 | ||
112 | This unusual translation of device names has some strange | 112 | This unusual translation of device names has some strange |
113 | consequences: If, for example, you have a symbolic link from /dev/fd | 113 | consequences: If, for example, you have a symbolic link from /dev/fd |
114 | to /dev/fd0D720 as an abbreviation for floppy driver #0 in DD format, | 114 | to /dev/fd0D720 as an abbreviation for floppy driver #0 in DD format, |
115 | you cannot use this name for specifying the root device, because the | 115 | you cannot use this name for specifying the root device, because the |
116 | kernel cannot see this symlink before mounting the root FS and it | 116 | kernel cannot see this symlink before mounting the root FS and it |
117 | isn't in the table above. If you use it, the root device will not be | 117 | isn't in the table above. If you use it, the root device will not be |
118 | set at all, without an error message. Another example: You cannot use a | 118 | set at all, without an error message. Another example: You cannot use a |
119 | partition on e.g. the sixth SCSI disk as the root filesystem, if you | 119 | partition on e.g. the sixth SCSI disk as the root filesystem, if you |
120 | want to specify it by name. This is, because only the devices up to | 120 | want to specify it by name. This is, because only the devices up to |
121 | /dev/sde are in the table above, but not /dev/sdf. Although, you can | 121 | /dev/sde are in the table above, but not /dev/sdf. Although, you can |
122 | use the sixth SCSI disk for the root FS, but you have to specify the | 122 | use the sixth SCSI disk for the root FS, but you have to specify the |
123 | device by number... (see below). Or, even more strange, you can use the | 123 | device by number... (see below). Or, even more strange, you can use the |
124 | fact that there is no range checking of the partition number, and your | 124 | fact that there is no range checking of the partition number, and your |
125 | knowledge that each disk uses 16 minors, and write "root=/dev/sde17" | 125 | knowledge that each disk uses 16 minors, and write "root=/dev/sde17" |
126 | (for /dev/sdf1). | 126 | (for /dev/sdf1). |
127 | 127 | ||
128 | [Strange and maybe uninteresting stuff OFF] | 128 | [Strange and maybe uninteresting stuff OFF] |
129 | 129 | ||
130 | If the device containing your root partition isn't in the table | 130 | If the device containing your root partition isn't in the table |
131 | above, you can also specify it by major and minor numbers. These are | 131 | above, you can also specify it by major and minor numbers. These are |
132 | written in hex, with no prefix and no separator between. E.g., if you | 132 | written in hex, with no prefix and no separator between. E.g., if you |
133 | have a CD with contents appropriate as a root filesystem in the first | 133 | have a CD with contents appropriate as a root filesystem in the first |
134 | SCSI CD-ROM drive, you boot from it by "root=0b00". Here, hex "0b" = | 134 | SCSI CD-ROM drive, you boot from it by "root=0b00". Here, hex "0b" = |
135 | decimal 11 is the major of SCSI CD-ROMs, and the minor 0 stands for | 135 | decimal 11 is the major of SCSI CD-ROMs, and the minor 0 stands for |
136 | the first of these. You can find out all valid major numbers by | 136 | the first of these. You can find out all valid major numbers by |
137 | looking into include/linux/major.h. | 137 | looking into include/linux/major.h. |
138 | 138 | ||
139 | 139 | ||
140 | 2.2) ro, rw | 140 | 2.2) ro, rw |
141 | ----------- | 141 | ----------- |
142 | 142 | ||
143 | Syntax: ro | 143 | Syntax: ro |
144 | or: rw | 144 | or: rw |
145 | 145 | ||
146 | These two options tell the kernel whether it should mount the root | 146 | These two options tell the kernel whether it should mount the root |
147 | filesystem read-only or read-write. The default is read-only, except | 147 | filesystem read-only or read-write. The default is read-only, except |
148 | for ramdisks, which default to read-write. | 148 | for ramdisks, which default to read-write. |
149 | 149 | ||
150 | 150 | ||
151 | 2.3) debug | 151 | 2.3) debug |
152 | ---------- | 152 | ---------- |
153 | 153 | ||
154 | Syntax: debug | 154 | Syntax: debug |
155 | 155 | ||
156 | This raises the kernel log level to 10 (the default is 7). This is the | 156 | This raises the kernel log level to 10 (the default is 7). This is the |
157 | same level as set by the "dmesg" command, just that the maximum level | 157 | same level as set by the "dmesg" command, just that the maximum level |
158 | selectable by dmesg is 8. | 158 | selectable by dmesg is 8. |
159 | 159 | ||
160 | 160 | ||
161 | 2.4) debug= | 161 | 2.4) debug= |
162 | ----------- | 162 | ----------- |
163 | 163 | ||
164 | Syntax: debug=<device> | 164 | Syntax: debug=<device> |
165 | 165 | ||
166 | This option causes certain kernel messages be printed to the selected | 166 | This option causes certain kernel messages be printed to the selected |
167 | debugging device. This can aid debugging the kernel, since the | 167 | debugging device. This can aid debugging the kernel, since the |
168 | messages can be captured and analyzed on some other machine. Which | 168 | messages can be captured and analyzed on some other machine. Which |
169 | devices are possible depends on the machine type. There are no checks | 169 | devices are possible depends on the machine type. There are no checks |
170 | for the validity of the device name. If the device isn't implemented, | 170 | for the validity of the device name. If the device isn't implemented, |
171 | nothing happens. | 171 | nothing happens. |
172 | 172 | ||
173 | Messages logged this way are in general stack dumps after kernel | 173 | Messages logged this way are in general stack dumps after kernel |
174 | memory faults or bad kernel traps, and kernel panics. To be exact: all | 174 | memory faults or bad kernel traps, and kernel panics. To be exact: all |
175 | messages of level 0 (panic messages) and all messages printed while | 175 | messages of level 0 (panic messages) and all messages printed while |
176 | the log level is 8 or more (their level doesn't matter). Before stack | 176 | the log level is 8 or more (their level doesn't matter). Before stack |
177 | dumps, the kernel sets the log level to 10 automatically. A level of | 177 | dumps, the kernel sets the log level to 10 automatically. A level of |
178 | at least 8 can also be set by the "debug" command line option (see | 178 | at least 8 can also be set by the "debug" command line option (see |
179 | 2.3) and at run time with "dmesg -n 8". | 179 | 2.3) and at run time with "dmesg -n 8". |
180 | 180 | ||
181 | Devices possible for Amiga: | 181 | Devices possible for Amiga: |
182 | 182 | ||
183 | - "ser": built-in serial port; parameters: 9600bps, 8N1 | 183 | - "ser": built-in serial port; parameters: 9600bps, 8N1 |
184 | - "mem": Save the messages to a reserved area in chip mem. After | 184 | - "mem": Save the messages to a reserved area in chip mem. After |
185 | rebooting, they can be read under AmigaOS with the tool | 185 | rebooting, they can be read under AmigaOS with the tool |
186 | 'dmesg'. | 186 | 'dmesg'. |
187 | 187 | ||
188 | Devices possible for Atari: | 188 | Devices possible for Atari: |
189 | 189 | ||
190 | - "ser1": ST-MFP serial port ("Modem1"); parameters: 9600bps, 8N1 | 190 | - "ser1": ST-MFP serial port ("Modem1"); parameters: 9600bps, 8N1 |
191 | - "ser2": SCC channel B serial port ("Modem2"); parameters: 9600bps, 8N1 | 191 | - "ser2": SCC channel B serial port ("Modem2"); parameters: 9600bps, 8N1 |
192 | - "ser" : default serial port | 192 | - "ser" : default serial port |
193 | This is "ser2" for a Falcon, and "ser1" for any other machine | 193 | This is "ser2" for a Falcon, and "ser1" for any other machine |
194 | - "midi": The MIDI port; parameters: 31250bps, 8N1 | 194 | - "midi": The MIDI port; parameters: 31250bps, 8N1 |
195 | - "par" : parallel port | 195 | - "par" : parallel port |
196 | The printing routine for this implements a timeout for the | 196 | The printing routine for this implements a timeout for the |
197 | case there's no printer connected (else the kernel would | 197 | case there's no printer connected (else the kernel would |
198 | lock up). The timeout is not exact, but usually a few | 198 | lock up). The timeout is not exact, but usually a few |
199 | seconds. | 199 | seconds. |
200 | 200 | ||
201 | 201 | ||
202 | 2.6) ramdisk= | 202 | 2.6) ramdisk= |
203 | ------------- | 203 | ------------- |
204 | 204 | ||
205 | Syntax: ramdisk=<size> | 205 | Syntax: ramdisk=<size> |
206 | 206 | ||
207 | This option instructs the kernel to set up a ramdisk of the given | 207 | This option instructs the kernel to set up a ramdisk of the given |
208 | size in KBytes. Do not use this option if the ramdisk contents are | 208 | size in KBytes. Do not use this option if the ramdisk contents are |
209 | passed by bootstrap! In this case, the size is selected automatically | 209 | passed by bootstrap! In this case, the size is selected automatically |
210 | and should not be overwritten. | 210 | and should not be overwritten. |
211 | 211 | ||
212 | The only application is for root filesystems on floppy disks, that | 212 | The only application is for root filesystems on floppy disks, that |
213 | should be loaded into memory. To do that, select the corresponding | 213 | should be loaded into memory. To do that, select the corresponding |
214 | size of the disk as ramdisk size, and set the root device to the disk | 214 | size of the disk as ramdisk size, and set the root device to the disk |
215 | drive (with "root="). | 215 | drive (with "root="). |
216 | 216 | ||
217 | 217 | ||
218 | 2.7) swap= | 218 | 2.7) swap= |
219 | 2.8) buff= | 219 | 2.8) buff= |
220 | ----------- | 220 | ----------- |
221 | 221 | ||
222 | I can't find any sign of these options in 2.2.6. | 222 | I can't find any sign of these options in 2.2.6. |
223 | 223 | ||
224 | 224 | ||
225 | 3) General Device Options (Amiga and Atari) | 225 | 3) General Device Options (Amiga and Atari) |
226 | =========================================== | 226 | =========================================== |
227 | 227 | ||
228 | 3.1) ether= | 228 | 3.1) ether= |
229 | ----------- | 229 | ----------- |
230 | 230 | ||
231 | Syntax: ether=[<irq>[,<base_addr>[,<mem_start>[,<mem_end>]]]],<dev-name> | 231 | Syntax: ether=[<irq>[,<base_addr>[,<mem_start>[,<mem_end>]]]],<dev-name> |
232 | 232 | ||
233 | <dev-name> is the name of a net driver, as specified in | 233 | <dev-name> is the name of a net driver, as specified in |
234 | drivers/net/Space.c in the Linux source. Most prominent are eth0, ... | 234 | drivers/net/Space.c in the Linux source. Most prominent are eth0, ... |
235 | eth3, sl0, ... sl3, ppp0, ..., ppp3, dummy, and lo. | 235 | eth3, sl0, ... sl3, ppp0, ..., ppp3, dummy, and lo. |
236 | 236 | ||
237 | The non-ethernet drivers (sl, ppp, dummy, lo) obviously ignore the | 237 | The non-ethernet drivers (sl, ppp, dummy, lo) obviously ignore the |
238 | settings by this options. Also, the existing ethernet drivers for | 238 | settings by this options. Also, the existing ethernet drivers for |
239 | Linux/m68k (ariadne, a2065, hydra) don't use them because Zorro boards | 239 | Linux/m68k (ariadne, a2065, hydra) don't use them because Zorro boards |
240 | are really Plug-'n-Play, so the "ether=" option is useless altogether | 240 | are really Plug-'n-Play, so the "ether=" option is useless altogether |
241 | for Linux/m68k. | 241 | for Linux/m68k. |
242 | 242 | ||
243 | 243 | ||
244 | 3.2) hd= | 244 | 3.2) hd= |
245 | -------- | 245 | -------- |
246 | 246 | ||
247 | Syntax: hd=<cylinders>,<heads>,<sectors> | 247 | Syntax: hd=<cylinders>,<heads>,<sectors> |
248 | 248 | ||
249 | This option sets the disk geometry of an IDE disk. The first hd= | 249 | This option sets the disk geometry of an IDE disk. The first hd= |
250 | option is for the first IDE disk, the second for the second one. | 250 | option is for the first IDE disk, the second for the second one. |
251 | (I.e., you can give this option twice.) In most cases, you won't have | 251 | (I.e., you can give this option twice.) In most cases, you won't have |
252 | to use this option, since the kernel can obtain the geometry data | 252 | to use this option, since the kernel can obtain the geometry data |
253 | itself. It exists just for the case that this fails for one of your | 253 | itself. It exists just for the case that this fails for one of your |
254 | disks. | 254 | disks. |
255 | 255 | ||
256 | 256 | ||
257 | 3.3) max_scsi_luns= | 257 | 3.3) max_scsi_luns= |
258 | ------------------- | 258 | ------------------- |
259 | 259 | ||
260 | Syntax: max_scsi_luns=<n> | 260 | Syntax: max_scsi_luns=<n> |
261 | 261 | ||
262 | Sets the maximum number of LUNs (logical units) of SCSI devices to | 262 | Sets the maximum number of LUNs (logical units) of SCSI devices to |
263 | be scanned. Valid values for <n> are between 1 and 8. Default is 8 if | 263 | be scanned. Valid values for <n> are between 1 and 8. Default is 8 if |
264 | "Probe all LUNs on each SCSI device" was selected during the kernel | 264 | "Probe all LUNs on each SCSI device" was selected during the kernel |
265 | configuration, else 1. | 265 | configuration, else 1. |
266 | 266 | ||
267 | 267 | ||
268 | 3.4) st= | 268 | 3.4) st= |
269 | -------- | 269 | -------- |
270 | 270 | ||
271 | Syntax: st=<buffer_size>,[<write_thres>,[<max_buffers>]] | 271 | Syntax: st=<buffer_size>,[<write_thres>,[<max_buffers>]] |
272 | 272 | ||
273 | Sets several parameters of the SCSI tape driver. <buffer_size> is | 273 | Sets several parameters of the SCSI tape driver. <buffer_size> is |
274 | the number of 512-byte buffers reserved for tape operations for each | 274 | the number of 512-byte buffers reserved for tape operations for each |
275 | device. <write_thres> sets the number of blocks which must be filled | 275 | device. <write_thres> sets the number of blocks which must be filled |
276 | to start an actual write operation to the tape. Maximum value is the | 276 | to start an actual write operation to the tape. Maximum value is the |
277 | total number of buffers. <max_buffer> limits the total number of | 277 | total number of buffers. <max_buffer> limits the total number of |
278 | buffers allocated for all tape devices. | 278 | buffers allocated for all tape devices. |
279 | 279 | ||
280 | 280 | ||
281 | 3.5) dmasound= | 281 | 3.5) dmasound= |
282 | -------------- | 282 | -------------- |
283 | 283 | ||
284 | Syntax: dmasound=[<buffers>,<buffer-size>[,<catch-radius>]] | 284 | Syntax: dmasound=[<buffers>,<buffer-size>[,<catch-radius>]] |
285 | 285 | ||
286 | This option controls some configurations of the Linux/m68k DMA sound | 286 | This option controls some configurations of the Linux/m68k DMA sound |
287 | driver (Amiga and Atari): <buffers> is the number of buffers you want | 287 | driver (Amiga and Atari): <buffers> is the number of buffers you want |
288 | to use (minimum 4, default 4), <buffer-size> is the size of each | 288 | to use (minimum 4, default 4), <buffer-size> is the size of each |
289 | buffer in kilobytes (minimum 4, default 32) and <catch-radius> says | 289 | buffer in kilobytes (minimum 4, default 32) and <catch-radius> says |
290 | how much percent of error will be tolerated when setting a frequency | 290 | how much percent of error will be tolerated when setting a frequency |
291 | (maximum 10, default 0). For example with 3% you can play 8000Hz | 291 | (maximum 10, default 0). For example with 3% you can play 8000Hz |
292 | AU-Files on the Falcon with its hardware frequency of 8195Hz and thus | 292 | AU-Files on the Falcon with its hardware frequency of 8195Hz and thus |
293 | don't need to expand the sound. | 293 | don't need to expand the sound. |
294 | 294 | ||
295 | 295 | ||
296 | 296 | ||
297 | 4) Options for Atari Only | 297 | 4) Options for Atari Only |
298 | ========================= | 298 | ========================= |
299 | 299 | ||
300 | 4.1) video= | 300 | 4.1) video= |
301 | ----------- | 301 | ----------- |
302 | 302 | ||
303 | Syntax: video=<fbname>:<sub-options...> | 303 | Syntax: video=<fbname>:<sub-options...> |
304 | 304 | ||
305 | The <fbname> parameter specifies the name of the frame buffer, | 305 | The <fbname> parameter specifies the name of the frame buffer, |
306 | eg. most atari users will want to specify `atafb' here. The | 306 | eg. most atari users will want to specify `atafb' here. The |
307 | <sub-options> is a comma-separated list of the sub-options listed | 307 | <sub-options> is a comma-separated list of the sub-options listed |
308 | below. | 308 | below. |
309 | 309 | ||
310 | NB: Please notice that this option was renamed from `atavideo' to | 310 | NB: Please notice that this option was renamed from `atavideo' to |
311 | `video' during the development of the 1.3.x kernels, thus you | 311 | `video' during the development of the 1.3.x kernels, thus you |
312 | might need to update your boot-scripts if upgrading to 2.x from | 312 | might need to update your boot-scripts if upgrading to 2.x from |
313 | an 1.2.x kernel. | 313 | an 1.2.x kernel. |
314 | 314 | ||
315 | NBB: The behavior of video= was changed in 2.1.57 so the recommended | 315 | NBB: The behavior of video= was changed in 2.1.57 so the recommended |
316 | option is to specify the name of the frame buffer. | 316 | option is to specify the name of the frame buffer. |
317 | 317 | ||
318 | 4.1.1) Video Mode | 318 | 4.1.1) Video Mode |
319 | ----------------- | 319 | ----------------- |
320 | 320 | ||
321 | This sub-option may be any of the predefined video modes, as listed | 321 | This sub-option may be any of the predefined video modes, as listed |
322 | in atari/atafb.c in the Linux/m68k source tree. The kernel will | 322 | in atari/atafb.c in the Linux/m68k source tree. The kernel will |
323 | activate the given video mode at boot time and make it the default | 323 | activate the given video mode at boot time and make it the default |
324 | mode, if the hardware allows. Currently defined names are: | 324 | mode, if the hardware allows. Currently defined names are: |
325 | 325 | ||
326 | - stlow : 320x200x4 | 326 | - stlow : 320x200x4 |
327 | - stmid, default5 : 640x200x2 | 327 | - stmid, default5 : 640x200x2 |
328 | - sthigh, default4: 640x400x1 | 328 | - sthigh, default4: 640x400x1 |
329 | - ttlow : 320x480x8, TT only | 329 | - ttlow : 320x480x8, TT only |
330 | - ttmid, default1 : 640x480x4, TT only | 330 | - ttmid, default1 : 640x480x4, TT only |
331 | - tthigh, default2: 1280x960x1, TT only | 331 | - tthigh, default2: 1280x960x1, TT only |
332 | - vga2 : 640x480x1, Falcon only | 332 | - vga2 : 640x480x1, Falcon only |
333 | - vga4 : 640x480x2, Falcon only | 333 | - vga4 : 640x480x2, Falcon only |
334 | - vga16, default3 : 640x480x4, Falcon only | 334 | - vga16, default3 : 640x480x4, Falcon only |
335 | - vga256 : 640x480x8, Falcon only | 335 | - vga256 : 640x480x8, Falcon only |
336 | - falh2 : 896x608x1, Falcon only | 336 | - falh2 : 896x608x1, Falcon only |
337 | - falh16 : 896x608x4, Falcon only | 337 | - falh16 : 896x608x4, Falcon only |
338 | 338 | ||
339 | If no video mode is given on the command line, the kernel tries the | 339 | If no video mode is given on the command line, the kernel tries the |
340 | modes names "default<n>" in turn, until one is possible with the | 340 | modes names "default<n>" in turn, until one is possible with the |
341 | hardware in use. | 341 | hardware in use. |
342 | 342 | ||
343 | A video mode setting doesn't make sense, if the external driver is | 343 | A video mode setting doesn't make sense, if the external driver is |
344 | activated by a "external:" sub-option. | 344 | activated by a "external:" sub-option. |
345 | 345 | ||
346 | 4.1.2) inverse | 346 | 4.1.2) inverse |
347 | -------------- | 347 | -------------- |
348 | 348 | ||
349 | Invert the display. This affects both, text (consoles) and graphics | 349 | Invert the display. This affects both, text (consoles) and graphics |
350 | (X) display. Usually, the background is chosen to be black. With this | 350 | (X) display. Usually, the background is chosen to be black. With this |
351 | option, you can make the background white. | 351 | option, you can make the background white. |
352 | 352 | ||
353 | 4.1.3) font | 353 | 4.1.3) font |
354 | ----------- | 354 | ----------- |
355 | 355 | ||
356 | Syntax: font:<fontname> | 356 | Syntax: font:<fontname> |
357 | 357 | ||
358 | Specify the font to use in text modes. Currently you can choose only | 358 | Specify the font to use in text modes. Currently you can choose only |
359 | between `VGA8x8', `VGA8x16' and `PEARL8x8'. `VGA8x8' is default, if the | 359 | between `VGA8x8', `VGA8x16' and `PEARL8x8'. `VGA8x8' is default, if the |
360 | vertical size of the display is less than 400 pixel rows. Otherwise, the | 360 | vertical size of the display is less than 400 pixel rows. Otherwise, the |
361 | `VGA8x16' font is the default. | 361 | `VGA8x16' font is the default. |
362 | 362 | ||
363 | 4.1.4) hwscroll_ | 363 | 4.1.4) hwscroll_ |
364 | ---------------- | 364 | ---------------- |
365 | 365 | ||
366 | Syntax: hwscroll_<n> | 366 | Syntax: hwscroll_<n> |
367 | 367 | ||
368 | The number of additional lines of video memory to reserve for | 368 | The number of additional lines of video memory to reserve for |
369 | speeding up the scrolling ("hardware scrolling"). Hardware scrolling | 369 | speeding up the scrolling ("hardware scrolling"). Hardware scrolling |
370 | is possible only if the kernel can set the video base address in steps | 370 | is possible only if the kernel can set the video base address in steps |
371 | fine enough. This is true for STE, MegaSTE, TT, and Falcon. It is not | 371 | fine enough. This is true for STE, MegaSTE, TT, and Falcon. It is not |
372 | possible with plain STs and graphics cards (The former because the | 372 | possible with plain STs and graphics cards (The former because the |
373 | base address must be on a 256 byte boundary there, the latter because | 373 | base address must be on a 256 byte boundary there, the latter because |
374 | the kernel doesn't know how to set the base address at all.) | 374 | the kernel doesn't know how to set the base address at all.) |
375 | 375 | ||
376 | By default, <n> is set to the number of visible text lines on the | 376 | By default, <n> is set to the number of visible text lines on the |
377 | display. Thus, the amount of video memory is doubled, compared to no | 377 | display. Thus, the amount of video memory is doubled, compared to no |
378 | hardware scrolling. You can turn off the hardware scrolling altogether | 378 | hardware scrolling. You can turn off the hardware scrolling altogether |
379 | by setting <n> to 0. | 379 | by setting <n> to 0. |
380 | 380 | ||
381 | 4.1.5) internal: | 381 | 4.1.5) internal: |
382 | ---------------- | 382 | ---------------- |
383 | 383 | ||
384 | Syntax: internal:<xres>;<yres>[;<xres_max>;<yres_max>;<offset>] | 384 | Syntax: internal:<xres>;<yres>[;<xres_max>;<yres_max>;<offset>] |
385 | 385 | ||
386 | This option specifies the capabilities of some extended internal video | 386 | This option specifies the capabilities of some extended internal video |
387 | hardware, like e.g. OverScan. <xres> and <yres> give the (extended) | 387 | hardware, like e.g. OverScan. <xres> and <yres> give the (extended) |
388 | dimensions of the screen. | 388 | dimensions of the screen. |
389 | 389 | ||
390 | If your OverScan needs a black border, you have to write the last | 390 | If your OverScan needs a black border, you have to write the last |
391 | three arguments of the "internal:". <xres_max> is the maximum line | 391 | three arguments of the "internal:". <xres_max> is the maximum line |
392 | length the hardware allows, <yres_max> the maximum number of lines. | 392 | length the hardware allows, <yres_max> the maximum number of lines. |
393 | <offset> is the offset of the visible part of the screen memory to its | 393 | <offset> is the offset of the visible part of the screen memory to its |
394 | physical start, in bytes. | 394 | physical start, in bytes. |
395 | 395 | ||
396 | Often, extended interval video hardware has to be activated somehow. | 396 | Often, extended interval video hardware has to be activated somehow. |
397 | For this, see the "sw_*" options below. | 397 | For this, see the "sw_*" options below. |
398 | 398 | ||
399 | 4.1.6) external: | 399 | 4.1.6) external: |
400 | ---------------- | 400 | ---------------- |
401 | 401 | ||
402 | Syntax: | 402 | Syntax: |
403 | external:<xres>;<yres>;<depth>;<org>;<scrmem>[;<scrlen>[;<vgabase>\ | 403 | external:<xres>;<yres>;<depth>;<org>;<scrmem>[;<scrlen>[;<vgabase>\ |
404 | [;<colw>[;<coltype>[;<xres_virtual>]]]]] | 404 | [;<colw>[;<coltype>[;<xres_virtual>]]]]] |
405 | 405 | ||
406 | [I had to break this line...] | 406 | [I had to break this line...] |
407 | 407 | ||
408 | This is probably the most complicated parameter... It specifies that | 408 | This is probably the most complicated parameter... It specifies that |
409 | you have some external video hardware (a graphics board), and how to | 409 | you have some external video hardware (a graphics board), and how to |
410 | use it under Linux/m68k. The kernel cannot know more about the hardware | 410 | use it under Linux/m68k. The kernel cannot know more about the hardware |
411 | than you tell it here! The kernel also is unable to set or change any | 411 | than you tell it here! The kernel also is unable to set or change any |
412 | video modes, since it doesn't know about any board internal. So, you | 412 | video modes, since it doesn't know about any board internal. So, you |
413 | have to switch to that video mode before you start Linux, and cannot | 413 | have to switch to that video mode before you start Linux, and cannot |
414 | switch to another mode once Linux has started. | 414 | switch to another mode once Linux has started. |
415 | 415 | ||
416 | The first 3 parameters of this sub-option should be obvious: <xres>, | 416 | The first 3 parameters of this sub-option should be obvious: <xres>, |
417 | <yres> and <depth> give the dimensions of the screen and the number of | 417 | <yres> and <depth> give the dimensions of the screen and the number of |
418 | planes (depth). The depth is is the logarithm to base 2 of the number | 418 | planes (depth). The depth is the logarithm to base 2 of the number |
419 | of colors possible. (Or, the other way round: The number of colors is | 419 | of colors possible. (Or, the other way round: The number of colors is |
420 | 2^depth). | 420 | 2^depth). |
421 | 421 | ||
422 | You have to tell the kernel furthermore how the video memory is | 422 | You have to tell the kernel furthermore how the video memory is |
423 | organized. This is done by a letter as <org> parameter: | 423 | organized. This is done by a letter as <org> parameter: |
424 | 424 | ||
425 | 'n': "normal planes", i.e. one whole plane after another | 425 | 'n': "normal planes", i.e. one whole plane after another |
426 | 'i': "interleaved planes", i.e. 16 bit of the first plane, than 16 bit | 426 | 'i': "interleaved planes", i.e. 16 bit of the first plane, than 16 bit |
427 | of the next, and so on... This mode is used only with the | 427 | of the next, and so on... This mode is used only with the |
428 | built-in Atari video modes, I think there is no card that | 428 | built-in Atari video modes, I think there is no card that |
429 | supports this mode. | 429 | supports this mode. |
430 | 'p': "packed pixels", i.e. <depth> consecutive bits stand for all | 430 | 'p': "packed pixels", i.e. <depth> consecutive bits stand for all |
431 | planes of one pixel; this is the most common mode for 8 planes | 431 | planes of one pixel; this is the most common mode for 8 planes |
432 | (256 colors) on graphic cards | 432 | (256 colors) on graphic cards |
433 | 't': "true color" (more or less packed pixels, but without a color | 433 | 't': "true color" (more or less packed pixels, but without a color |
434 | lookup table); usually depth is 24 | 434 | lookup table); usually depth is 24 |
435 | 435 | ||
436 | For monochrome modes (i.e., <depth> is 1), the <org> letter has a | 436 | For monochrome modes (i.e., <depth> is 1), the <org> letter has a |
437 | different meaning: | 437 | different meaning: |
438 | 438 | ||
439 | 'n': normal colors, i.e. 0=white, 1=black | 439 | 'n': normal colors, i.e. 0=white, 1=black |
440 | 'i': inverted colors, i.e. 0=black, 1=white | 440 | 'i': inverted colors, i.e. 0=black, 1=white |
441 | 441 | ||
442 | The next important information about the video hardware is the base | 442 | The next important information about the video hardware is the base |
443 | address of the video memory. That is given in the <scrmem> parameter, | 443 | address of the video memory. That is given in the <scrmem> parameter, |
444 | as a hexadecimal number with a "0x" prefix. You have to find out this | 444 | as a hexadecimal number with a "0x" prefix. You have to find out this |
445 | address in the documentation of your hardware. | 445 | address in the documentation of your hardware. |
446 | 446 | ||
447 | The next parameter, <scrlen>, tells the kernel about the size of the | 447 | The next parameter, <scrlen>, tells the kernel about the size of the |
448 | video memory. If it's missing, the size is calculated from <xres>, | 448 | video memory. If it's missing, the size is calculated from <xres>, |
449 | <yres>, and <depth>. For now, it is not useful to write a value here. | 449 | <yres>, and <depth>. For now, it is not useful to write a value here. |
450 | It would be used only for hardware scrolling (which isn't possible | 450 | It would be used only for hardware scrolling (which isn't possible |
451 | with the external driver, because the kernel cannot set the video base | 451 | with the external driver, because the kernel cannot set the video base |
452 | address), or for virtual resolutions under X (which the X server | 452 | address), or for virtual resolutions under X (which the X server |
453 | doesn't support yet). So, it's currently best to leave this field | 453 | doesn't support yet). So, it's currently best to leave this field |
454 | empty, either by ending the "external:" after the video address or by | 454 | empty, either by ending the "external:" after the video address or by |
455 | writing two consecutive semicolons, if you want to give a <vgabase> | 455 | writing two consecutive semicolons, if you want to give a <vgabase> |
456 | (it is allowed to leave this parameter empty). | 456 | (it is allowed to leave this parameter empty). |
457 | 457 | ||
458 | The <vgabase> parameter is optional. If it is not given, the kernel | 458 | The <vgabase> parameter is optional. If it is not given, the kernel |
459 | cannot read or write any color registers of the video hardware, and | 459 | cannot read or write any color registers of the video hardware, and |
460 | thus you have to set appropriate colors before you start Linux. But if | 460 | thus you have to set appropriate colors before you start Linux. But if |
461 | your card is somehow VGA compatible, you can tell the kernel the base | 461 | your card is somehow VGA compatible, you can tell the kernel the base |
462 | address of the VGA register set, so it can change the color lookup | 462 | address of the VGA register set, so it can change the color lookup |
463 | table. You have to look up this address in your board's documentation. | 463 | table. You have to look up this address in your board's documentation. |
464 | To avoid misunderstandings: <vgabase> is the _base_ address, i.e. a 4k | 464 | To avoid misunderstandings: <vgabase> is the _base_ address, i.e. a 4k |
465 | aligned address. For read/writing the color registers, the kernel | 465 | aligned address. For read/writing the color registers, the kernel |
466 | uses the addresses vgabase+0x3c7...vgabase+0x3c9. The <vgabase> | 466 | uses the addresses vgabase+0x3c7...vgabase+0x3c9. The <vgabase> |
467 | parameter is written in hexadecimal with a "0x" prefix, just as | 467 | parameter is written in hexadecimal with a "0x" prefix, just as |
468 | <scrmem>. | 468 | <scrmem>. |
469 | 469 | ||
470 | <colw> is meaningful only if <vgabase> is specified. It tells the | 470 | <colw> is meaningful only if <vgabase> is specified. It tells the |
471 | kernel how wide each of the color register is, i.e. the number of bits | 471 | kernel how wide each of the color register is, i.e. the number of bits |
472 | per single color (red/green/blue). Default is 6, another quite usual | 472 | per single color (red/green/blue). Default is 6, another quite usual |
473 | value is 8. | 473 | value is 8. |
474 | 474 | ||
475 | Also <coltype> is used together with <vgabase>. It tells the kernel | 475 | Also <coltype> is used together with <vgabase>. It tells the kernel |
476 | about the color register model of your gfx board. Currently, the types | 476 | about the color register model of your gfx board. Currently, the types |
477 | "vga" (which is also the default) and "mv300" (SANG MV300) are | 477 | "vga" (which is also the default) and "mv300" (SANG MV300) are |
478 | implemented. | 478 | implemented. |
479 | 479 | ||
480 | Parameter <xres_virtual> is required for ProMST or ET4000 cards where | 480 | Parameter <xres_virtual> is required for ProMST or ET4000 cards where |
481 | the physical linelength differs from the visible length. With ProMST, | 481 | the physical linelength differs from the visible length. With ProMST, |
482 | xres_virtual must be set to 2048. For ET4000, xres_virtual depends on the | 482 | xres_virtual must be set to 2048. For ET4000, xres_virtual depends on the |
483 | initialisation of the video-card. | 483 | initialisation of the video-card. |
484 | If you're missing a corresponding yres_virtual: the external part is legacy, | 484 | If you're missing a corresponding yres_virtual: the external part is legacy, |
485 | therefore we don't support hardware-dependent functions like hardware-scroll, | 485 | therefore we don't support hardware-dependent functions like hardware-scroll, |
486 | panning or blanking. | 486 | panning or blanking. |
487 | 487 | ||
488 | 4.1.7) eclock: | 488 | 4.1.7) eclock: |
489 | -------------- | 489 | -------------- |
490 | 490 | ||
491 | The external pixel clock attached to the Falcon VIDEL shifter. This | 491 | The external pixel clock attached to the Falcon VIDEL shifter. This |
492 | currently works only with the ScreenWonder! | 492 | currently works only with the ScreenWonder! |
493 | 493 | ||
494 | 4.1.8) monitorcap: | 494 | 4.1.8) monitorcap: |
495 | ------------------- | 495 | ------------------- |
496 | 496 | ||
497 | Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax> | 497 | Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax> |
498 | 498 | ||
499 | This describes the capabilities of a multisync monitor. Don't use it | 499 | This describes the capabilities of a multisync monitor. Don't use it |
500 | with a fixed-frequency monitor! For now, only the Falcon frame buffer | 500 | with a fixed-frequency monitor! For now, only the Falcon frame buffer |
501 | uses the settings of "monitorcap:". | 501 | uses the settings of "monitorcap:". |
502 | 502 | ||
503 | <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies | 503 | <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies |
504 | your monitor can work with, in Hz. <hmin> and <hmax> are the same for | 504 | your monitor can work with, in Hz. <hmin> and <hmax> are the same for |
505 | the horizontal frequency, in kHz. | 505 | the horizontal frequency, in kHz. |
506 | 506 | ||
507 | The defaults are 58;62;31;32 (VGA compatible). | 507 | The defaults are 58;62;31;32 (VGA compatible). |
508 | 508 | ||
509 | The defaults for TV/SC1224/SC1435 cover both PAL and NTSC standards. | 509 | The defaults for TV/SC1224/SC1435 cover both PAL and NTSC standards. |
510 | 510 | ||
511 | 4.1.9) keep | 511 | 4.1.9) keep |
512 | ------------ | 512 | ------------ |
513 | 513 | ||
514 | If this option is given, the framebuffer device doesn't do any video | 514 | If this option is given, the framebuffer device doesn't do any video |
515 | mode calculations and settings on its own. The only Atari fb device | 515 | mode calculations and settings on its own. The only Atari fb device |
516 | that does this currently is the Falcon. | 516 | that does this currently is the Falcon. |
517 | 517 | ||
518 | What you reach with this: Settings for unknown video extensions | 518 | What you reach with this: Settings for unknown video extensions |
519 | aren't overridden by the driver, so you can still use the mode found | 519 | aren't overridden by the driver, so you can still use the mode found |
520 | when booting, when the driver doesn't know to set this mode itself. | 520 | when booting, when the driver doesn't know to set this mode itself. |
521 | But this also means, that you can't switch video modes anymore... | 521 | But this also means, that you can't switch video modes anymore... |
522 | 522 | ||
523 | An example where you may want to use "keep" is the ScreenBlaster for | 523 | An example where you may want to use "keep" is the ScreenBlaster for |
524 | the Falcon. | 524 | the Falcon. |
525 | 525 | ||
526 | 526 | ||
527 | 4.2) atamouse= | 527 | 4.2) atamouse= |
528 | -------------- | 528 | -------------- |
529 | 529 | ||
530 | Syntax: atamouse=<x-threshold>,[<y-threshold>] | 530 | Syntax: atamouse=<x-threshold>,[<y-threshold>] |
531 | 531 | ||
532 | With this option, you can set the mouse movement reporting threshold. | 532 | With this option, you can set the mouse movement reporting threshold. |
533 | This is the number of pixels of mouse movement that have to accumulate | 533 | This is the number of pixels of mouse movement that have to accumulate |
534 | before the IKBD sends a new mouse packet to the kernel. Higher values | 534 | before the IKBD sends a new mouse packet to the kernel. Higher values |
535 | reduce the mouse interrupt load and thus reduce the chance of keyboard | 535 | reduce the mouse interrupt load and thus reduce the chance of keyboard |
536 | overruns. Lower values give a slightly faster mouse responses and | 536 | overruns. Lower values give a slightly faster mouse responses and |
537 | slightly better mouse tracking. | 537 | slightly better mouse tracking. |
538 | 538 | ||
539 | You can set the threshold in x and y separately, but usually this is | 539 | You can set the threshold in x and y separately, but usually this is |
540 | of little practical use. If there's just one number in the option, it | 540 | of little practical use. If there's just one number in the option, it |
541 | is used for both dimensions. The default value is 2 for both | 541 | is used for both dimensions. The default value is 2 for both |
542 | thresholds. | 542 | thresholds. |
543 | 543 | ||
544 | 544 | ||
545 | 4.3) ataflop= | 545 | 4.3) ataflop= |
546 | ------------- | 546 | ------------- |
547 | 547 | ||
548 | Syntax: ataflop=<drive type>[,<trackbuffering>[,<steprateA>[,<steprateB>]]] | 548 | Syntax: ataflop=<drive type>[,<trackbuffering>[,<steprateA>[,<steprateB>]]] |
549 | 549 | ||
550 | The drive type may be 0, 1, or 2, for DD, HD, and ED, resp. This | 550 | The drive type may be 0, 1, or 2, for DD, HD, and ED, resp. This |
551 | setting affects how many buffers are reserved and which formats are | 551 | setting affects how many buffers are reserved and which formats are |
552 | probed (see also below). The default is 1 (HD). Only one drive type | 552 | probed (see also below). The default is 1 (HD). Only one drive type |
553 | can be selected. If you have two disk drives, select the "better" | 553 | can be selected. If you have two disk drives, select the "better" |
554 | type. | 554 | type. |
555 | 555 | ||
556 | The second parameter <trackbuffer> tells the kernel whether to use | 556 | The second parameter <trackbuffer> tells the kernel whether to use |
557 | track buffering (1) or not (0). The default is machine-dependent: | 557 | track buffering (1) or not (0). The default is machine-dependent: |
558 | no for the Medusa and yes for all others. | 558 | no for the Medusa and yes for all others. |
559 | 559 | ||
560 | With the two following parameters, you can change the default | 560 | With the two following parameters, you can change the default |
561 | steprate used for drive A and B, resp. | 561 | steprate used for drive A and B, resp. |
562 | 562 | ||
563 | 563 | ||
564 | 4.4) atascsi= | 564 | 4.4) atascsi= |
565 | ------------- | 565 | ------------- |
566 | 566 | ||
567 | Syntax: atascsi=<can_queue>[,<cmd_per_lun>[,<scat-gat>[,<host-id>[,<tagged>]]]] | 567 | Syntax: atascsi=<can_queue>[,<cmd_per_lun>[,<scat-gat>[,<host-id>[,<tagged>]]]] |
568 | 568 | ||
569 | This option sets some parameters for the Atari native SCSI driver. | 569 | This option sets some parameters for the Atari native SCSI driver. |
570 | Generally, any number of arguments can be omitted from the end. And | 570 | Generally, any number of arguments can be omitted from the end. And |
571 | for each of the numbers, a negative value means "use default". The | 571 | for each of the numbers, a negative value means "use default". The |
572 | defaults depend on whether TT-style or Falcon-style SCSI is used. | 572 | defaults depend on whether TT-style or Falcon-style SCSI is used. |
573 | Below, defaults are noted as n/m, where the first value refers to | 573 | Below, defaults are noted as n/m, where the first value refers to |
574 | TT-SCSI and the latter to Falcon-SCSI. If an illegal value is given | 574 | TT-SCSI and the latter to Falcon-SCSI. If an illegal value is given |
575 | for one parameter, an error message is printed and that one setting is | 575 | for one parameter, an error message is printed and that one setting is |
576 | ignored (others aren't affected). | 576 | ignored (others aren't affected). |
577 | 577 | ||
578 | <can_queue>: | 578 | <can_queue>: |
579 | This is the maximum number of SCSI commands queued internally to the | 579 | This is the maximum number of SCSI commands queued internally to the |
580 | Atari SCSI driver. A value of 1 effectively turns off the driver | 580 | Atari SCSI driver. A value of 1 effectively turns off the driver |
581 | internal multitasking (if it causes problems). Legal values are >= | 581 | internal multitasking (if it causes problems). Legal values are >= |
582 | 1. <can_queue> can be as high as you like, but values greater than | 582 | 1. <can_queue> can be as high as you like, but values greater than |
583 | <cmd_per_lun> times the number of SCSI targets (LUNs) you have | 583 | <cmd_per_lun> times the number of SCSI targets (LUNs) you have |
584 | don't make sense. Default: 16/8. | 584 | don't make sense. Default: 16/8. |
585 | 585 | ||
586 | <cmd_per_lun>: | 586 | <cmd_per_lun>: |
587 | Maximum number of SCSI commands issued to the driver for one | 587 | Maximum number of SCSI commands issued to the driver for one |
588 | logical unit (LUN, usually one SCSI target). Legal values start | 588 | logical unit (LUN, usually one SCSI target). Legal values start |
589 | from 1. If tagged queuing (see below) is not used, values greater | 589 | from 1. If tagged queuing (see below) is not used, values greater |
590 | than 2 don't make sense, but waste memory. Otherwise, the maximum | 590 | than 2 don't make sense, but waste memory. Otherwise, the maximum |
591 | is the number of command tags available to the driver (currently | 591 | is the number of command tags available to the driver (currently |
592 | 32). Default: 8/1. (Note: Values > 1 seem to cause problems on a | 592 | 32). Default: 8/1. (Note: Values > 1 seem to cause problems on a |
593 | Falcon, cause not yet known.) | 593 | Falcon, cause not yet known.) |
594 | 594 | ||
595 | The <cmd_per_lun> value at a great part determines the amount of | 595 | The <cmd_per_lun> value at a great part determines the amount of |
596 | memory SCSI reserves for itself. The formula is rather | 596 | memory SCSI reserves for itself. The formula is rather |
597 | complicated, but I can give you some hints: | 597 | complicated, but I can give you some hints: |
598 | no scatter-gather : cmd_per_lun * 232 bytes | 598 | no scatter-gather : cmd_per_lun * 232 bytes |
599 | full scatter-gather: cmd_per_lun * approx. 17 Kbytes | 599 | full scatter-gather: cmd_per_lun * approx. 17 Kbytes |
600 | 600 | ||
601 | <scat-gat>: | 601 | <scat-gat>: |
602 | Size of the scatter-gather table, i.e. the number of requests | 602 | Size of the scatter-gather table, i.e. the number of requests |
603 | consecutive on the disk that can be merged into one SCSI command. | 603 | consecutive on the disk that can be merged into one SCSI command. |
604 | Legal values are between 0 and 255. Default: 255/0. Note: This | 604 | Legal values are between 0 and 255. Default: 255/0. Note: This |
605 | value is forced to 0 on a Falcon, since scatter-gather isn't | 605 | value is forced to 0 on a Falcon, since scatter-gather isn't |
606 | possible with the ST-DMA. Not using scatter-gather hurts | 606 | possible with the ST-DMA. Not using scatter-gather hurts |
607 | performance significantly. | 607 | performance significantly. |
608 | 608 | ||
609 | <host-id>: | 609 | <host-id>: |
610 | The SCSI ID to be used by the initiator (your Atari). This is | 610 | The SCSI ID to be used by the initiator (your Atari). This is |
611 | usually 7, the highest possible ID. Every ID on the SCSI bus must | 611 | usually 7, the highest possible ID. Every ID on the SCSI bus must |
612 | be unique. Default: determined at run time: If the NV-RAM checksum | 612 | be unique. Default: determined at run time: If the NV-RAM checksum |
613 | is valid, and bit 7 in byte 30 of the NV-RAM is set, the lower 3 | 613 | is valid, and bit 7 in byte 30 of the NV-RAM is set, the lower 3 |
614 | bits of this byte are used as the host ID. (This method is defined | 614 | bits of this byte are used as the host ID. (This method is defined |
615 | by Atari and also used by some TOS HD drivers.) If the above | 615 | by Atari and also used by some TOS HD drivers.) If the above |
616 | isn't given, the default ID is 7. (both, TT and Falcon). | 616 | isn't given, the default ID is 7. (both, TT and Falcon). |
617 | 617 | ||
618 | <tagged>: | 618 | <tagged>: |
619 | 0 means turn off tagged queuing support, all other values > 0 mean | 619 | 0 means turn off tagged queuing support, all other values > 0 mean |
620 | use tagged queuing for targets that support it. Default: currently | 620 | use tagged queuing for targets that support it. Default: currently |
621 | off, but this may change when tagged queuing handling has been | 621 | off, but this may change when tagged queuing handling has been |
622 | proved to be reliable. | 622 | proved to be reliable. |
623 | 623 | ||
624 | Tagged queuing means that more than one command can be issued to | 624 | Tagged queuing means that more than one command can be issued to |
625 | one LUN, and the SCSI device itself orders the requests so they | 625 | one LUN, and the SCSI device itself orders the requests so they |
626 | can be performed in optimal order. Not all SCSI devices support | 626 | can be performed in optimal order. Not all SCSI devices support |
627 | tagged queuing (:-(). | 627 | tagged queuing (:-(). |
628 | 628 | ||
629 | 4.5 switches= | 629 | 4.5 switches= |
630 | ------------- | 630 | ------------- |
631 | 631 | ||
632 | Syntax: switches=<list of switches> | 632 | Syntax: switches=<list of switches> |
633 | 633 | ||
634 | With this option you can switch some hardware lines that are often | 634 | With this option you can switch some hardware lines that are often |
635 | used to enable/disable certain hardware extensions. Examples are | 635 | used to enable/disable certain hardware extensions. Examples are |
636 | OverScan, overclocking, ... | 636 | OverScan, overclocking, ... |
637 | 637 | ||
638 | The <list of switches> is a comma-separated list of the following | 638 | The <list of switches> is a comma-separated list of the following |
639 | items: | 639 | items: |
640 | 640 | ||
641 | ikbd: set RTS of the keyboard ACIA high | 641 | ikbd: set RTS of the keyboard ACIA high |
642 | midi: set RTS of the MIDI ACIA high | 642 | midi: set RTS of the MIDI ACIA high |
643 | snd6: set bit 6 of the PSG port A | 643 | snd6: set bit 6 of the PSG port A |
644 | snd7: set bit 6 of the PSG port A | 644 | snd7: set bit 6 of the PSG port A |
645 | 645 | ||
646 | It doesn't make sense to mention a switch more than once (no | 646 | It doesn't make sense to mention a switch more than once (no |
647 | difference to only once), but you can give as many switches as you | 647 | difference to only once), but you can give as many switches as you |
648 | want to enable different features. The switch lines are set as early | 648 | want to enable different features. The switch lines are set as early |
649 | as possible during kernel initialization (even before determining the | 649 | as possible during kernel initialization (even before determining the |
650 | present hardware.) | 650 | present hardware.) |
651 | 651 | ||
652 | All of the items can also be prefixed with "ov_", i.e. "ov_ikbd", | 652 | All of the items can also be prefixed with "ov_", i.e. "ov_ikbd", |
653 | "ov_midi", ... These options are meant for switching on an OverScan | 653 | "ov_midi", ... These options are meant for switching on an OverScan |
654 | video extension. The difference to the bare option is that the | 654 | video extension. The difference to the bare option is that the |
655 | switch-on is done after video initialization, and somehow synchronized | 655 | switch-on is done after video initialization, and somehow synchronized |
656 | to the HBLANK. A speciality is that ov_ikbd and ov_midi are switched | 656 | to the HBLANK. A speciality is that ov_ikbd and ov_midi are switched |
657 | off before rebooting, so that OverScan is disabled and TOS boots | 657 | off before rebooting, so that OverScan is disabled and TOS boots |
658 | correctly. | 658 | correctly. |
659 | 659 | ||
660 | If you give an option both, with and without the "ov_" prefix, the | 660 | If you give an option both, with and without the "ov_" prefix, the |
661 | earlier initialization ("ov_"-less) takes precedence. But the | 661 | earlier initialization ("ov_"-less) takes precedence. But the |
662 | switching-off on reset still happens in this case. | 662 | switching-off on reset still happens in this case. |
663 | 663 | ||
664 | 5) Options for Amiga Only: | 664 | 5) Options for Amiga Only: |
665 | ========================== | 665 | ========================== |
666 | 666 | ||
667 | 5.1) video= | 667 | 5.1) video= |
668 | ----------- | 668 | ----------- |
669 | 669 | ||
670 | Syntax: video=<fbname>:<sub-options...> | 670 | Syntax: video=<fbname>:<sub-options...> |
671 | 671 | ||
672 | The <fbname> parameter specifies the name of the frame buffer, valid | 672 | The <fbname> parameter specifies the name of the frame buffer, valid |
673 | options are `amifb', `cyber', 'virge', `retz3' and `clgen', provided | 673 | options are `amifb', `cyber', 'virge', `retz3' and `clgen', provided |
674 | that the respective frame buffer devices have been compiled into the | 674 | that the respective frame buffer devices have been compiled into the |
675 | kernel (or compiled as loadable modules). The behavior of the <fbname> | 675 | kernel (or compiled as loadable modules). The behavior of the <fbname> |
676 | option was changed in 2.1.57 so it is now recommended to specify this | 676 | option was changed in 2.1.57 so it is now recommended to specify this |
677 | option. | 677 | option. |
678 | 678 | ||
679 | The <sub-options> is a comma-separated list of the sub-options listed | 679 | The <sub-options> is a comma-separated list of the sub-options listed |
680 | below. This option is organized similar to the Atari version of the | 680 | below. This option is organized similar to the Atari version of the |
681 | "video"-option (4.1), but knows fewer sub-options. | 681 | "video"-option (4.1), but knows fewer sub-options. |
682 | 682 | ||
683 | 5.1.1) video mode | 683 | 5.1.1) video mode |
684 | ----------------- | 684 | ----------------- |
685 | 685 | ||
686 | Again, similar to the video mode for the Atari (see 4.1.1). Predefined | 686 | Again, similar to the video mode for the Atari (see 4.1.1). Predefined |
687 | modes depend on the used frame buffer device. | 687 | modes depend on the used frame buffer device. |
688 | 688 | ||
689 | OCS, ECS and AGA machines all use the color frame buffer. The following | 689 | OCS, ECS and AGA machines all use the color frame buffer. The following |
690 | predefined video modes are available: | 690 | predefined video modes are available: |
691 | 691 | ||
692 | NTSC modes: | 692 | NTSC modes: |
693 | - ntsc : 640x200, 15 kHz, 60 Hz | 693 | - ntsc : 640x200, 15 kHz, 60 Hz |
694 | - ntsc-lace : 640x400, 15 kHz, 60 Hz interlaced | 694 | - ntsc-lace : 640x400, 15 kHz, 60 Hz interlaced |
695 | PAL modes: | 695 | PAL modes: |
696 | - pal : 640x256, 15 kHz, 50 Hz | 696 | - pal : 640x256, 15 kHz, 50 Hz |
697 | - pal-lace : 640x512, 15 kHz, 50 Hz interlaced | 697 | - pal-lace : 640x512, 15 kHz, 50 Hz interlaced |
698 | ECS modes: | 698 | ECS modes: |
699 | - multiscan : 640x480, 29 kHz, 57 Hz | 699 | - multiscan : 640x480, 29 kHz, 57 Hz |
700 | - multiscan-lace : 640x960, 29 kHz, 57 Hz interlaced | 700 | - multiscan-lace : 640x960, 29 kHz, 57 Hz interlaced |
701 | - euro36 : 640x200, 15 kHz, 72 Hz | 701 | - euro36 : 640x200, 15 kHz, 72 Hz |
702 | - euro36-lace : 640x400, 15 kHz, 72 Hz interlaced | 702 | - euro36-lace : 640x400, 15 kHz, 72 Hz interlaced |
703 | - euro72 : 640x400, 29 kHz, 68 Hz | 703 | - euro72 : 640x400, 29 kHz, 68 Hz |
704 | - euro72-lace : 640x800, 29 kHz, 68 Hz interlaced | 704 | - euro72-lace : 640x800, 29 kHz, 68 Hz interlaced |
705 | - super72 : 800x300, 23 kHz, 70 Hz | 705 | - super72 : 800x300, 23 kHz, 70 Hz |
706 | - super72-lace : 800x600, 23 kHz, 70 Hz interlaced | 706 | - super72-lace : 800x600, 23 kHz, 70 Hz interlaced |
707 | - dblntsc-ff : 640x400, 27 kHz, 57 Hz | 707 | - dblntsc-ff : 640x400, 27 kHz, 57 Hz |
708 | - dblntsc-lace : 640x800, 27 kHz, 57 Hz interlaced | 708 | - dblntsc-lace : 640x800, 27 kHz, 57 Hz interlaced |
709 | - dblpal-ff : 640x512, 27 kHz, 47 Hz | 709 | - dblpal-ff : 640x512, 27 kHz, 47 Hz |
710 | - dblpal-lace : 640x1024, 27 kHz, 47 Hz interlaced | 710 | - dblpal-lace : 640x1024, 27 kHz, 47 Hz interlaced |
711 | - dblntsc : 640x200, 27 kHz, 57 Hz doublescan | 711 | - dblntsc : 640x200, 27 kHz, 57 Hz doublescan |
712 | - dblpal : 640x256, 27 kHz, 47 Hz doublescan | 712 | - dblpal : 640x256, 27 kHz, 47 Hz doublescan |
713 | VGA modes: | 713 | VGA modes: |
714 | - vga : 640x480, 31 kHz, 60 Hz | 714 | - vga : 640x480, 31 kHz, 60 Hz |
715 | - vga70 : 640x400, 31 kHz, 70 Hz | 715 | - vga70 : 640x400, 31 kHz, 70 Hz |
716 | 716 | ||
717 | Please notice that the ECS and VGA modes require either an ECS or AGA | 717 | Please notice that the ECS and VGA modes require either an ECS or AGA |
718 | chipset, and that these modes are limited to 2-bit color for the ECS | 718 | chipset, and that these modes are limited to 2-bit color for the ECS |
719 | chipset and 8-bit color for the AGA chipset. | 719 | chipset and 8-bit color for the AGA chipset. |
720 | 720 | ||
721 | 5.1.2) depth | 721 | 5.1.2) depth |
722 | ------------ | 722 | ------------ |
723 | 723 | ||
724 | Syntax: depth:<nr. of bit-planes> | 724 | Syntax: depth:<nr. of bit-planes> |
725 | 725 | ||
726 | Specify the number of bit-planes for the selected video-mode. | 726 | Specify the number of bit-planes for the selected video-mode. |
727 | 727 | ||
728 | 5.1.3) inverse | 728 | 5.1.3) inverse |
729 | -------------- | 729 | -------------- |
730 | 730 | ||
731 | Use inverted display (black on white). Functionally the same as the | 731 | Use inverted display (black on white). Functionally the same as the |
732 | "inverse" sub-option for the Atari. | 732 | "inverse" sub-option for the Atari. |
733 | 733 | ||
734 | 5.1.4) font | 734 | 5.1.4) font |
735 | ----------- | 735 | ----------- |
736 | 736 | ||
737 | Syntax: font:<fontname> | 737 | Syntax: font:<fontname> |
738 | 738 | ||
739 | Specify the font to use in text modes. Functionally the same as the | 739 | Specify the font to use in text modes. Functionally the same as the |
740 | "font" sub-option for the Atari, except that `PEARL8x8' is used instead | 740 | "font" sub-option for the Atari, except that `PEARL8x8' is used instead |
741 | of `VGA8x8' if the vertical size of the display is less than 400 pixel | 741 | of `VGA8x8' if the vertical size of the display is less than 400 pixel |
742 | rows. | 742 | rows. |
743 | 743 | ||
744 | 5.1.5) monitorcap: | 744 | 5.1.5) monitorcap: |
745 | ------------------- | 745 | ------------------- |
746 | 746 | ||
747 | Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax> | 747 | Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax> |
748 | 748 | ||
749 | This describes the capabilities of a multisync monitor. For now, only | 749 | This describes the capabilities of a multisync monitor. For now, only |
750 | the color frame buffer uses the settings of "monitorcap:". | 750 | the color frame buffer uses the settings of "monitorcap:". |
751 | 751 | ||
752 | <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies | 752 | <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies |
753 | your monitor can work with, in Hz. <hmin> and <hmax> are the same for | 753 | your monitor can work with, in Hz. <hmin> and <hmax> are the same for |
754 | the horizontal frequency, in kHz. | 754 | the horizontal frequency, in kHz. |
755 | 755 | ||
756 | The defaults are 50;90;15;38 (Generic Amiga multisync monitor). | 756 | The defaults are 50;90;15;38 (Generic Amiga multisync monitor). |
757 | 757 | ||
758 | 758 | ||
759 | 5.2) fd_def_df0= | 759 | 5.2) fd_def_df0= |
760 | ---------------- | 760 | ---------------- |
761 | 761 | ||
762 | Syntax: fd_def_df0=<value> | 762 | Syntax: fd_def_df0=<value> |
763 | 763 | ||
764 | Sets the df0 value for "silent" floppy drives. The value should be in | 764 | Sets the df0 value for "silent" floppy drives. The value should be in |
765 | hexadecimal with "0x" prefix. | 765 | hexadecimal with "0x" prefix. |
766 | 766 | ||
767 | 767 | ||
768 | 5.3) wd33c93= | 768 | 5.3) wd33c93= |
769 | ------------- | 769 | ------------- |
770 | 770 | ||
771 | Syntax: wd33c93=<sub-options...> | 771 | Syntax: wd33c93=<sub-options...> |
772 | 772 | ||
773 | These options affect the A590/A2091, A3000 and GVP Series II SCSI | 773 | These options affect the A590/A2091, A3000 and GVP Series II SCSI |
774 | controllers. | 774 | controllers. |
775 | 775 | ||
776 | The <sub-options> is a comma-separated list of the sub-options listed | 776 | The <sub-options> is a comma-separated list of the sub-options listed |
777 | below. | 777 | below. |
778 | 778 | ||
779 | 5.3.1) nosync | 779 | 5.3.1) nosync |
780 | ------------- | 780 | ------------- |
781 | 781 | ||
782 | Syntax: nosync:bitmask | 782 | Syntax: nosync:bitmask |
783 | 783 | ||
784 | bitmask is a byte where the 1st 7 bits correspond with the 7 | 784 | bitmask is a byte where the 1st 7 bits correspond with the 7 |
785 | possible SCSI devices. Set a bit to prevent sync negotiation on that | 785 | possible SCSI devices. Set a bit to prevent sync negotiation on that |
786 | device. To maintain backwards compatibility, a command-line such as | 786 | device. To maintain backwards compatibility, a command-line such as |
787 | "wd33c93=255" will be automatically translated to | 787 | "wd33c93=255" will be automatically translated to |
788 | "wd33c93=nosync:0xff". The default is to disable sync negotiation for | 788 | "wd33c93=nosync:0xff". The default is to disable sync negotiation for |
789 | all devices, eg. nosync:0xff. | 789 | all devices, eg. nosync:0xff. |
790 | 790 | ||
791 | 5.3.2) period | 791 | 5.3.2) period |
792 | ------------- | 792 | ------------- |
793 | 793 | ||
794 | Syntax: period:ns | 794 | Syntax: period:ns |
795 | 795 | ||
796 | `ns' is the minimum # of nanoseconds in a SCSI data transfer | 796 | `ns' is the minimum # of nanoseconds in a SCSI data transfer |
797 | period. Default is 500; acceptable values are 250 - 1000. | 797 | period. Default is 500; acceptable values are 250 - 1000. |
798 | 798 | ||
799 | 5.3.3) disconnect | 799 | 5.3.3) disconnect |
800 | ----------------- | 800 | ----------------- |
801 | 801 | ||
802 | Syntax: disconnect:x | 802 | Syntax: disconnect:x |
803 | 803 | ||
804 | Specify x = 0 to never allow disconnects, 2 to always allow them. | 804 | Specify x = 0 to never allow disconnects, 2 to always allow them. |
805 | x = 1 does 'adaptive' disconnects, which is the default and generally | 805 | x = 1 does 'adaptive' disconnects, which is the default and generally |
806 | the best choice. | 806 | the best choice. |
807 | 807 | ||
808 | 5.3.4) debug | 808 | 5.3.4) debug |
809 | ------------ | 809 | ------------ |
810 | 810 | ||
811 | Syntax: debug:x | 811 | Syntax: debug:x |
812 | 812 | ||
813 | If `DEBUGGING_ON' is defined, x is a bit mask that causes various | 813 | If `DEBUGGING_ON' is defined, x is a bit mask that causes various |
814 | types of debug output to printed - see the DB_xxx defines in | 814 | types of debug output to printed - see the DB_xxx defines in |
815 | wd33c93.h. | 815 | wd33c93.h. |
816 | 816 | ||
817 | 5.3.5) clock | 817 | 5.3.5) clock |
818 | ------------ | 818 | ------------ |
819 | 819 | ||
820 | Syntax: clock:x | 820 | Syntax: clock:x |
821 | 821 | ||
822 | x = clock input in MHz for WD33c93 chip. Normal values would be from | 822 | x = clock input in MHz for WD33c93 chip. Normal values would be from |
823 | 8 through 20. The default value depends on your hostadapter(s), | 823 | 8 through 20. The default value depends on your hostadapter(s), |
824 | default for the A3000 internal controller is 14, for the A2091 it's 8 | 824 | default for the A3000 internal controller is 14, for the A2091 it's 8 |
825 | and for the GVP hostadapters it's either 8 or 14, depending on the | 825 | and for the GVP hostadapters it's either 8 or 14, depending on the |
826 | hostadapter and the SCSI-clock jumper present on some GVP | 826 | hostadapter and the SCSI-clock jumper present on some GVP |
827 | hostadapters. | 827 | hostadapters. |
828 | 828 | ||
829 | 5.3.6) next | 829 | 5.3.6) next |
830 | ----------- | 830 | ----------- |
831 | 831 | ||
832 | No argument. Used to separate blocks of keywords when there's more | 832 | No argument. Used to separate blocks of keywords when there's more |
833 | than one wd33c93-based host adapter in the system. | 833 | than one wd33c93-based host adapter in the system. |
834 | 834 | ||
835 | 5.3.7) nodma | 835 | 5.3.7) nodma |
836 | ------------ | 836 | ------------ |
837 | 837 | ||
838 | Syntax: nodma:x | 838 | Syntax: nodma:x |
839 | 839 | ||
840 | If x is 1 (or if the option is just written as "nodma"), the WD33c93 | 840 | If x is 1 (or if the option is just written as "nodma"), the WD33c93 |
841 | controller will not use DMA (= direct memory access) to access the | 841 | controller will not use DMA (= direct memory access) to access the |
842 | Amiga's memory. This is useful for some systems (like A3000's and | 842 | Amiga's memory. This is useful for some systems (like A3000's and |
843 | A4000's with the A3640 accelerator, revision 3.0) that have problems | 843 | A4000's with the A3640 accelerator, revision 3.0) that have problems |
844 | using DMA to chip memory. The default is 0, i.e. to use DMA if | 844 | using DMA to chip memory. The default is 0, i.e. to use DMA if |
845 | possible. | 845 | possible. |
846 | 846 | ||
847 | 847 | ||
848 | 5.4) gvp11= | 848 | 5.4) gvp11= |
849 | ----------- | 849 | ----------- |
850 | 850 | ||
851 | Syntax: gvp11=<addr-mask> | 851 | Syntax: gvp11=<addr-mask> |
852 | 852 | ||
853 | The earlier versions of the GVP driver did not handle DMA | 853 | The earlier versions of the GVP driver did not handle DMA |
854 | address-mask settings correctly which made it necessary for some | 854 | address-mask settings correctly which made it necessary for some |
855 | people to use this option, in order to get their GVP controller | 855 | people to use this option, in order to get their GVP controller |
856 | running under Linux. These problems have hopefully been solved and the | 856 | running under Linux. These problems have hopefully been solved and the |
857 | use of this option is now highly unrecommended! | 857 | use of this option is now highly unrecommended! |
858 | 858 | ||
859 | Incorrect use can lead to unpredictable behavior, so please only use | 859 | Incorrect use can lead to unpredictable behavior, so please only use |
860 | this option if you *know* what you are doing and have a reason to do | 860 | this option if you *know* what you are doing and have a reason to do |
861 | so. In any case if you experience problems and need to use this | 861 | so. In any case if you experience problems and need to use this |
862 | option, please inform us about it by mailing to the Linux/68k kernel | 862 | option, please inform us about it by mailing to the Linux/68k kernel |
863 | mailing list. | 863 | mailing list. |
864 | 864 | ||
865 | The address mask set by this option specifies which addresses are | 865 | The address mask set by this option specifies which addresses are |
866 | valid for DMA with the GVP Series II SCSI controller. An address is | 866 | valid for DMA with the GVP Series II SCSI controller. An address is |
867 | valid, if no bits are set except the bits that are set in the mask, | 867 | valid, if no bits are set except the bits that are set in the mask, |
868 | too. | 868 | too. |
869 | 869 | ||
870 | Some versions of the GVP can only DMA into a 24 bit address range, | 870 | Some versions of the GVP can only DMA into a 24 bit address range, |
871 | some can address a 25 bit address range while others can use the whole | 871 | some can address a 25 bit address range while others can use the whole |
872 | 32 bit address range for DMA. The correct setting depends on your | 872 | 32 bit address range for DMA. The correct setting depends on your |
873 | controller and should be autodetected by the driver. An example is the | 873 | controller and should be autodetected by the driver. An example is the |
874 | 24 bit region which is specified by a mask of 0x00fffffe. | 874 | 24 bit region which is specified by a mask of 0x00fffffe. |
875 | 875 | ||
876 | 876 | ||
877 | 5.5) 53c7xx= | 877 | 5.5) 53c7xx= |
878 | ------------ | 878 | ------------ |
879 | 879 | ||
880 | Syntax: 53c7xx=<sub-options...> | 880 | Syntax: 53c7xx=<sub-options...> |
881 | 881 | ||
882 | These options affect the A4000T, A4091, WarpEngine, Blizzard 603e+, | 882 | These options affect the A4000T, A4091, WarpEngine, Blizzard 603e+, |
883 | and GForce 040/060 SCSI controllers on the Amiga, as well as the | 883 | and GForce 040/060 SCSI controllers on the Amiga, as well as the |
884 | builtin MVME 16x SCSI controller. | 884 | builtin MVME 16x SCSI controller. |
885 | 885 | ||
886 | The <sub-options> is a comma-separated list of the sub-options listed | 886 | The <sub-options> is a comma-separated list of the sub-options listed |
887 | below. | 887 | below. |
888 | 888 | ||
889 | 5.5.1) nosync | 889 | 5.5.1) nosync |
890 | ------------- | 890 | ------------- |
891 | 891 | ||
892 | Syntax: nosync:0 | 892 | Syntax: nosync:0 |
893 | 893 | ||
894 | Disables sync negotiation for all devices. Any value after the | 894 | Disables sync negotiation for all devices. Any value after the |
895 | colon is acceptable (and has the same effect). | 895 | colon is acceptable (and has the same effect). |
896 | 896 | ||
897 | 5.5.2) noasync | 897 | 5.5.2) noasync |
898 | -------------- | 898 | -------------- |
899 | 899 | ||
900 | Syntax: noasync:0 | 900 | Syntax: noasync:0 |
901 | 901 | ||
902 | Disables async and sync negotiation for all devices. Any value | 902 | Disables async and sync negotiation for all devices. Any value |
903 | after the colon is acceptable (and has the same effect). | 903 | after the colon is acceptable (and has the same effect). |
904 | 904 | ||
905 | 5.5.3) nodisconnect | 905 | 5.5.3) nodisconnect |
906 | ------------------- | 906 | ------------------- |
907 | 907 | ||
908 | Syntax: nodisconnect:0 | 908 | Syntax: nodisconnect:0 |
909 | 909 | ||
910 | Disables SCSI disconnects. Any value after the colon is acceptable | 910 | Disables SCSI disconnects. Any value after the colon is acceptable |
911 | (and has the same effect). | 911 | (and has the same effect). |
912 | 912 | ||
913 | 5.5.4) validids | 913 | 5.5.4) validids |
914 | --------------- | 914 | --------------- |
915 | 915 | ||
916 | Syntax: validids:0xNN | 916 | Syntax: validids:0xNN |
917 | 917 | ||
918 | Specify which SCSI ids the driver should pay attention to. This is | 918 | Specify which SCSI ids the driver should pay attention to. This is |
919 | a bitmask (i.e. to only pay attention to ID#4, you'd use 0x10). | 919 | a bitmask (i.e. to only pay attention to ID#4, you'd use 0x10). |
920 | Default is 0x7f (devices 0-6). | 920 | Default is 0x7f (devices 0-6). |
921 | 921 | ||
922 | 5.5.5) opthi | 922 | 5.5.5) opthi |
923 | 5.5.6) optlo | 923 | 5.5.6) optlo |
924 | ------------ | 924 | ------------ |
925 | 925 | ||
926 | Syntax: opthi:M,optlo:N | 926 | Syntax: opthi:M,optlo:N |
927 | 927 | ||
928 | Specify options for "hostdata->options". The acceptable definitions | 928 | Specify options for "hostdata->options". The acceptable definitions |
929 | are listed in drivers/scsi/53c7xx.h; the 32 high bits should be in | 929 | are listed in drivers/scsi/53c7xx.h; the 32 high bits should be in |
930 | opthi and the 32 low bits in optlo. They must be specified in the | 930 | opthi and the 32 low bits in optlo. They must be specified in the |
931 | order opthi=M,optlo=N. | 931 | order opthi=M,optlo=N. |
932 | 932 | ||
933 | 5.5.7) next | 933 | 5.5.7) next |
934 | ----------- | 934 | ----------- |
935 | 935 | ||
936 | No argument. Used to separate blocks of keywords when there's more | 936 | No argument. Used to separate blocks of keywords when there's more |
937 | than one 53c7xx host adapter in the system. | 937 | than one 53c7xx host adapter in the system. |
938 | 938 | ||
939 | 939 | ||
940 | /* Local Variables: */ | 940 | /* Local Variables: */ |
941 | /* mode: text */ | 941 | /* mode: text */ |
942 | /* End: */ | 942 | /* End: */ |
943 | 943 |
Documentation/memory-barriers.txt
1 | ============================ | 1 | ============================ |
2 | LINUX KERNEL MEMORY BARRIERS | 2 | LINUX KERNEL MEMORY BARRIERS |
3 | ============================ | 3 | ============================ |
4 | 4 | ||
5 | By: David Howells <dhowells@redhat.com> | 5 | By: David Howells <dhowells@redhat.com> |
6 | 6 | ||
7 | Contents: | 7 | Contents: |
8 | 8 | ||
9 | (*) Abstract memory access model. | 9 | (*) Abstract memory access model. |
10 | 10 | ||
11 | - Device operations. | 11 | - Device operations. |
12 | - Guarantees. | 12 | - Guarantees. |
13 | 13 | ||
14 | (*) What are memory barriers? | 14 | (*) What are memory barriers? |
15 | 15 | ||
16 | - Varieties of memory barrier. | 16 | - Varieties of memory barrier. |
17 | - What may not be assumed about memory barriers? | 17 | - What may not be assumed about memory barriers? |
18 | - Data dependency barriers. | 18 | - Data dependency barriers. |
19 | - Control dependencies. | 19 | - Control dependencies. |
20 | - SMP barrier pairing. | 20 | - SMP barrier pairing. |
21 | - Examples of memory barrier sequences. | 21 | - Examples of memory barrier sequences. |
22 | - Read memory barriers vs load speculation. | 22 | - Read memory barriers vs load speculation. |
23 | 23 | ||
24 | (*) Explicit kernel barriers. | 24 | (*) Explicit kernel barriers. |
25 | 25 | ||
26 | - Compiler barrier. | 26 | - Compiler barrier. |
27 | - The CPU memory barriers. | 27 | - The CPU memory barriers. |
28 | - MMIO write barrier. | 28 | - MMIO write barrier. |
29 | 29 | ||
30 | (*) Implicit kernel memory barriers. | 30 | (*) Implicit kernel memory barriers. |
31 | 31 | ||
32 | - Locking functions. | 32 | - Locking functions. |
33 | - Interrupt disabling functions. | 33 | - Interrupt disabling functions. |
34 | - Miscellaneous functions. | 34 | - Miscellaneous functions. |
35 | 35 | ||
36 | (*) Inter-CPU locking barrier effects. | 36 | (*) Inter-CPU locking barrier effects. |
37 | 37 | ||
38 | - Locks vs memory accesses. | 38 | - Locks vs memory accesses. |
39 | - Locks vs I/O accesses. | 39 | - Locks vs I/O accesses. |
40 | 40 | ||
41 | (*) Where are memory barriers needed? | 41 | (*) Where are memory barriers needed? |
42 | 42 | ||
43 | - Interprocessor interaction. | 43 | - Interprocessor interaction. |
44 | - Atomic operations. | 44 | - Atomic operations. |
45 | - Accessing devices. | 45 | - Accessing devices. |
46 | - Interrupts. | 46 | - Interrupts. |
47 | 47 | ||
48 | (*) Kernel I/O barrier effects. | 48 | (*) Kernel I/O barrier effects. |
49 | 49 | ||
50 | (*) Assumed minimum execution ordering model. | 50 | (*) Assumed minimum execution ordering model. |
51 | 51 | ||
52 | (*) The effects of the cpu cache. | 52 | (*) The effects of the cpu cache. |
53 | 53 | ||
54 | - Cache coherency. | 54 | - Cache coherency. |
55 | - Cache coherency vs DMA. | 55 | - Cache coherency vs DMA. |
56 | - Cache coherency vs MMIO. | 56 | - Cache coherency vs MMIO. |
57 | 57 | ||
58 | (*) The things CPUs get up to. | 58 | (*) The things CPUs get up to. |
59 | 59 | ||
60 | - And then there's the Alpha. | 60 | - And then there's the Alpha. |
61 | 61 | ||
62 | (*) References. | 62 | (*) References. |
63 | 63 | ||
64 | 64 | ||
65 | ============================ | 65 | ============================ |
66 | ABSTRACT MEMORY ACCESS MODEL | 66 | ABSTRACT MEMORY ACCESS MODEL |
67 | ============================ | 67 | ============================ |
68 | 68 | ||
69 | Consider the following abstract model of the system: | 69 | Consider the following abstract model of the system: |
70 | 70 | ||
71 | : : | 71 | : : |
72 | : : | 72 | : : |
73 | : : | 73 | : : |
74 | +-------+ : +--------+ : +-------+ | 74 | +-------+ : +--------+ : +-------+ |
75 | | | : | | : | | | 75 | | | : | | : | | |
76 | | | : | | : | | | 76 | | | : | | : | | |
77 | | CPU 1 |<----->| Memory |<----->| CPU 2 | | 77 | | CPU 1 |<----->| Memory |<----->| CPU 2 | |
78 | | | : | | : | | | 78 | | | : | | : | | |
79 | | | : | | : | | | 79 | | | : | | : | | |
80 | +-------+ : +--------+ : +-------+ | 80 | +-------+ : +--------+ : +-------+ |
81 | ^ : ^ : ^ | 81 | ^ : ^ : ^ |
82 | | : | : | | 82 | | : | : | |
83 | | : | : | | 83 | | : | : | |
84 | | : v : | | 84 | | : v : | |
85 | | : +--------+ : | | 85 | | : +--------+ : | |
86 | | : | | : | | 86 | | : | | : | |
87 | | : | | : | | 87 | | : | | : | |
88 | +---------->| Device |<----------+ | 88 | +---------->| Device |<----------+ |
89 | : | | : | 89 | : | | : |
90 | : | | : | 90 | : | | : |
91 | : +--------+ : | 91 | : +--------+ : |
92 | : : | 92 | : : |
93 | 93 | ||
94 | Each CPU executes a program that generates memory access operations. In the | 94 | Each CPU executes a program that generates memory access operations. In the |
95 | abstract CPU, memory operation ordering is very relaxed, and a CPU may actually | 95 | abstract CPU, memory operation ordering is very relaxed, and a CPU may actually |
96 | perform the memory operations in any order it likes, provided program causality | 96 | perform the memory operations in any order it likes, provided program causality |
97 | appears to be maintained. Similarly, the compiler may also arrange the | 97 | appears to be maintained. Similarly, the compiler may also arrange the |
98 | instructions it emits in any order it likes, provided it doesn't affect the | 98 | instructions it emits in any order it likes, provided it doesn't affect the |
99 | apparent operation of the program. | 99 | apparent operation of the program. |
100 | 100 | ||
101 | So in the above diagram, the effects of the memory operations performed by a | 101 | So in the above diagram, the effects of the memory operations performed by a |
102 | CPU are perceived by the rest of the system as the operations cross the | 102 | CPU are perceived by the rest of the system as the operations cross the |
103 | interface between the CPU and rest of the system (the dotted lines). | 103 | interface between the CPU and rest of the system (the dotted lines). |
104 | 104 | ||
105 | 105 | ||
106 | For example, consider the following sequence of events: | 106 | For example, consider the following sequence of events: |
107 | 107 | ||
108 | CPU 1 CPU 2 | 108 | CPU 1 CPU 2 |
109 | =============== =============== | 109 | =============== =============== |
110 | { A == 1; B == 2 } | 110 | { A == 1; B == 2 } |
111 | A = 3; x = A; | 111 | A = 3; x = A; |
112 | B = 4; y = B; | 112 | B = 4; y = B; |
113 | 113 | ||
114 | The set of accesses as seen by the memory system in the middle can be arranged | 114 | The set of accesses as seen by the memory system in the middle can be arranged |
115 | in 24 different combinations: | 115 | in 24 different combinations: |
116 | 116 | ||
117 | STORE A=3, STORE B=4, x=LOAD A->3, y=LOAD B->4 | 117 | STORE A=3, STORE B=4, x=LOAD A->3, y=LOAD B->4 |
118 | STORE A=3, STORE B=4, y=LOAD B->4, x=LOAD A->3 | 118 | STORE A=3, STORE B=4, y=LOAD B->4, x=LOAD A->3 |
119 | STORE A=3, x=LOAD A->3, STORE B=4, y=LOAD B->4 | 119 | STORE A=3, x=LOAD A->3, STORE B=4, y=LOAD B->4 |
120 | STORE A=3, x=LOAD A->3, y=LOAD B->2, STORE B=4 | 120 | STORE A=3, x=LOAD A->3, y=LOAD B->2, STORE B=4 |
121 | STORE A=3, y=LOAD B->2, STORE B=4, x=LOAD A->3 | 121 | STORE A=3, y=LOAD B->2, STORE B=4, x=LOAD A->3 |
122 | STORE A=3, y=LOAD B->2, x=LOAD A->3, STORE B=4 | 122 | STORE A=3, y=LOAD B->2, x=LOAD A->3, STORE B=4 |
123 | STORE B=4, STORE A=3, x=LOAD A->3, y=LOAD B->4 | 123 | STORE B=4, STORE A=3, x=LOAD A->3, y=LOAD B->4 |
124 | STORE B=4, ... | 124 | STORE B=4, ... |
125 | ... | 125 | ... |
126 | 126 | ||
127 | and can thus result in four different combinations of values: | 127 | and can thus result in four different combinations of values: |
128 | 128 | ||
129 | x == 1, y == 2 | 129 | x == 1, y == 2 |
130 | x == 1, y == 4 | 130 | x == 1, y == 4 |
131 | x == 3, y == 2 | 131 | x == 3, y == 2 |
132 | x == 3, y == 4 | 132 | x == 3, y == 4 |
133 | 133 | ||
134 | 134 | ||
135 | Furthermore, the stores committed by a CPU to the memory system may not be | 135 | Furthermore, the stores committed by a CPU to the memory system may not be |
136 | perceived by the loads made by another CPU in the same order as the stores were | 136 | perceived by the loads made by another CPU in the same order as the stores were |
137 | committed. | 137 | committed. |
138 | 138 | ||
139 | 139 | ||
140 | As a further example, consider this sequence of events: | 140 | As a further example, consider this sequence of events: |
141 | 141 | ||
142 | CPU 1 CPU 2 | 142 | CPU 1 CPU 2 |
143 | =============== =============== | 143 | =============== =============== |
144 | { A == 1, B == 2, C = 3, P == &A, Q == &C } | 144 | { A == 1, B == 2, C = 3, P == &A, Q == &C } |
145 | B = 4; Q = P; | 145 | B = 4; Q = P; |
146 | P = &B D = *Q; | 146 | P = &B D = *Q; |
147 | 147 | ||
148 | There is an obvious data dependency here, as the value loaded into D depends on | 148 | There is an obvious data dependency here, as the value loaded into D depends on |
149 | the address retrieved from P by CPU 2. At the end of the sequence, any of the | 149 | the address retrieved from P by CPU 2. At the end of the sequence, any of the |
150 | following results are possible: | 150 | following results are possible: |
151 | 151 | ||
152 | (Q == &A) and (D == 1) | 152 | (Q == &A) and (D == 1) |
153 | (Q == &B) and (D == 2) | 153 | (Q == &B) and (D == 2) |
154 | (Q == &B) and (D == 4) | 154 | (Q == &B) and (D == 4) |
155 | 155 | ||
156 | Note that CPU 2 will never try and load C into D because the CPU will load P | 156 | Note that CPU 2 will never try and load C into D because the CPU will load P |
157 | into Q before issuing the load of *Q. | 157 | into Q before issuing the load of *Q. |
158 | 158 | ||
159 | 159 | ||
160 | DEVICE OPERATIONS | 160 | DEVICE OPERATIONS |
161 | ----------------- | 161 | ----------------- |
162 | 162 | ||
163 | Some devices present their control interfaces as collections of memory | 163 | Some devices present their control interfaces as collections of memory |
164 | locations, but the order in which the control registers are accessed is very | 164 | locations, but the order in which the control registers are accessed is very |
165 | important. For instance, imagine an ethernet card with a set of internal | 165 | important. For instance, imagine an ethernet card with a set of internal |
166 | registers that are accessed through an address port register (A) and a data | 166 | registers that are accessed through an address port register (A) and a data |
167 | port register (D). To read internal register 5, the following code might then | 167 | port register (D). To read internal register 5, the following code might then |
168 | be used: | 168 | be used: |
169 | 169 | ||
170 | *A = 5; | 170 | *A = 5; |
171 | x = *D; | 171 | x = *D; |
172 | 172 | ||
173 | but this might show up as either of the following two sequences: | 173 | but this might show up as either of the following two sequences: |
174 | 174 | ||
175 | STORE *A = 5, x = LOAD *D | 175 | STORE *A = 5, x = LOAD *D |
176 | x = LOAD *D, STORE *A = 5 | 176 | x = LOAD *D, STORE *A = 5 |
177 | 177 | ||
178 | the second of which will almost certainly result in a malfunction, since it set | 178 | the second of which will almost certainly result in a malfunction, since it set |
179 | the address _after_ attempting to read the register. | 179 | the address _after_ attempting to read the register. |
180 | 180 | ||
181 | 181 | ||
182 | GUARANTEES | 182 | GUARANTEES |
183 | ---------- | 183 | ---------- |
184 | 184 | ||
185 | There are some minimal guarantees that may be expected of a CPU: | 185 | There are some minimal guarantees that may be expected of a CPU: |
186 | 186 | ||
187 | (*) On any given CPU, dependent memory accesses will be issued in order, with | 187 | (*) On any given CPU, dependent memory accesses will be issued in order, with |
188 | respect to itself. This means that for: | 188 | respect to itself. This means that for: |
189 | 189 | ||
190 | Q = P; D = *Q; | 190 | Q = P; D = *Q; |
191 | 191 | ||
192 | the CPU will issue the following memory operations: | 192 | the CPU will issue the following memory operations: |
193 | 193 | ||
194 | Q = LOAD P, D = LOAD *Q | 194 | Q = LOAD P, D = LOAD *Q |
195 | 195 | ||
196 | and always in that order. | 196 | and always in that order. |
197 | 197 | ||
198 | (*) Overlapping loads and stores within a particular CPU will appear to be | 198 | (*) Overlapping loads and stores within a particular CPU will appear to be |
199 | ordered within that CPU. This means that for: | 199 | ordered within that CPU. This means that for: |
200 | 200 | ||
201 | a = *X; *X = b; | 201 | a = *X; *X = b; |
202 | 202 | ||
203 | the CPU will only issue the following sequence of memory operations: | 203 | the CPU will only issue the following sequence of memory operations: |
204 | 204 | ||
205 | a = LOAD *X, STORE *X = b | 205 | a = LOAD *X, STORE *X = b |
206 | 206 | ||
207 | And for: | 207 | And for: |
208 | 208 | ||
209 | *X = c; d = *X; | 209 | *X = c; d = *X; |
210 | 210 | ||
211 | the CPU will only issue: | 211 | the CPU will only issue: |
212 | 212 | ||
213 | STORE *X = c, d = LOAD *X | 213 | STORE *X = c, d = LOAD *X |
214 | 214 | ||
215 | (Loads and stores overlap if they are targetted at overlapping pieces of | 215 | (Loads and stores overlap if they are targetted at overlapping pieces of |
216 | memory). | 216 | memory). |
217 | 217 | ||
218 | And there are a number of things that _must_ or _must_not_ be assumed: | 218 | And there are a number of things that _must_ or _must_not_ be assumed: |
219 | 219 | ||
220 | (*) It _must_not_ be assumed that independent loads and stores will be issued | 220 | (*) It _must_not_ be assumed that independent loads and stores will be issued |
221 | in the order given. This means that for: | 221 | in the order given. This means that for: |
222 | 222 | ||
223 | X = *A; Y = *B; *D = Z; | 223 | X = *A; Y = *B; *D = Z; |
224 | 224 | ||
225 | we may get any of the following sequences: | 225 | we may get any of the following sequences: |
226 | 226 | ||
227 | X = LOAD *A, Y = LOAD *B, STORE *D = Z | 227 | X = LOAD *A, Y = LOAD *B, STORE *D = Z |
228 | X = LOAD *A, STORE *D = Z, Y = LOAD *B | 228 | X = LOAD *A, STORE *D = Z, Y = LOAD *B |
229 | Y = LOAD *B, X = LOAD *A, STORE *D = Z | 229 | Y = LOAD *B, X = LOAD *A, STORE *D = Z |
230 | Y = LOAD *B, STORE *D = Z, X = LOAD *A | 230 | Y = LOAD *B, STORE *D = Z, X = LOAD *A |
231 | STORE *D = Z, X = LOAD *A, Y = LOAD *B | 231 | STORE *D = Z, X = LOAD *A, Y = LOAD *B |
232 | STORE *D = Z, Y = LOAD *B, X = LOAD *A | 232 | STORE *D = Z, Y = LOAD *B, X = LOAD *A |
233 | 233 | ||
234 | (*) It _must_ be assumed that overlapping memory accesses may be merged or | 234 | (*) It _must_ be assumed that overlapping memory accesses may be merged or |
235 | discarded. This means that for: | 235 | discarded. This means that for: |
236 | 236 | ||
237 | X = *A; Y = *(A + 4); | 237 | X = *A; Y = *(A + 4); |
238 | 238 | ||
239 | we may get any one of the following sequences: | 239 | we may get any one of the following sequences: |
240 | 240 | ||
241 | X = LOAD *A; Y = LOAD *(A + 4); | 241 | X = LOAD *A; Y = LOAD *(A + 4); |
242 | Y = LOAD *(A + 4); X = LOAD *A; | 242 | Y = LOAD *(A + 4); X = LOAD *A; |
243 | {X, Y} = LOAD {*A, *(A + 4) }; | 243 | {X, Y} = LOAD {*A, *(A + 4) }; |
244 | 244 | ||
245 | And for: | 245 | And for: |
246 | 246 | ||
247 | *A = X; Y = *A; | 247 | *A = X; Y = *A; |
248 | 248 | ||
249 | we may get either of: | 249 | we may get either of: |
250 | 250 | ||
251 | STORE *A = X; Y = LOAD *A; | 251 | STORE *A = X; Y = LOAD *A; |
252 | STORE *A = Y = X; | 252 | STORE *A = Y = X; |
253 | 253 | ||
254 | 254 | ||
255 | ========================= | 255 | ========================= |
256 | WHAT ARE MEMORY BARRIERS? | 256 | WHAT ARE MEMORY BARRIERS? |
257 | ========================= | 257 | ========================= |
258 | 258 | ||
259 | As can be seen above, independent memory operations are effectively performed | 259 | As can be seen above, independent memory operations are effectively performed |
260 | in random order, but this can be a problem for CPU-CPU interaction and for I/O. | 260 | in random order, but this can be a problem for CPU-CPU interaction and for I/O. |
261 | What is required is some way of intervening to instruct the compiler and the | 261 | What is required is some way of intervening to instruct the compiler and the |
262 | CPU to restrict the order. | 262 | CPU to restrict the order. |
263 | 263 | ||
264 | Memory barriers are such interventions. They impose a perceived partial | 264 | Memory barriers are such interventions. They impose a perceived partial |
265 | ordering over the memory operations on either side of the barrier. | 265 | ordering over the memory operations on either side of the barrier. |
266 | 266 | ||
267 | Such enforcement is important because the CPUs and other devices in a system | 267 | Such enforcement is important because the CPUs and other devices in a system |
268 | can use a variety of tricks to improve performance - including reordering, | 268 | can use a variety of tricks to improve performance - including reordering, |
269 | deferral and combination of memory operations; speculative loads; speculative | 269 | deferral and combination of memory operations; speculative loads; speculative |
270 | branch prediction and various types of caching. Memory barriers are used to | 270 | branch prediction and various types of caching. Memory barriers are used to |
271 | override or suppress these tricks, allowing the code to sanely control the | 271 | override or suppress these tricks, allowing the code to sanely control the |
272 | interaction of multiple CPUs and/or devices. | 272 | interaction of multiple CPUs and/or devices. |
273 | 273 | ||
274 | 274 | ||
275 | VARIETIES OF MEMORY BARRIER | 275 | VARIETIES OF MEMORY BARRIER |
276 | --------------------------- | 276 | --------------------------- |
277 | 277 | ||
278 | Memory barriers come in four basic varieties: | 278 | Memory barriers come in four basic varieties: |
279 | 279 | ||
280 | (1) Write (or store) memory barriers. | 280 | (1) Write (or store) memory barriers. |
281 | 281 | ||
282 | A write memory barrier gives a guarantee that all the STORE operations | 282 | A write memory barrier gives a guarantee that all the STORE operations |
283 | specified before the barrier will appear to happen before all the STORE | 283 | specified before the barrier will appear to happen before all the STORE |
284 | operations specified after the barrier with respect to the other | 284 | operations specified after the barrier with respect to the other |
285 | components of the system. | 285 | components of the system. |
286 | 286 | ||
287 | A write barrier is a partial ordering on stores only; it is not required | 287 | A write barrier is a partial ordering on stores only; it is not required |
288 | to have any effect on loads. | 288 | to have any effect on loads. |
289 | 289 | ||
290 | A CPU can be viewed as committing a sequence of store operations to the | 290 | A CPU can be viewed as committing a sequence of store operations to the |
291 | memory system as time progresses. All stores before a write barrier will | 291 | memory system as time progresses. All stores before a write barrier will |
292 | occur in the sequence _before_ all the stores after the write barrier. | 292 | occur in the sequence _before_ all the stores after the write barrier. |
293 | 293 | ||
294 | [!] Note that write barriers should normally be paired with read or data | 294 | [!] Note that write barriers should normally be paired with read or data |
295 | dependency barriers; see the "SMP barrier pairing" subsection. | 295 | dependency barriers; see the "SMP barrier pairing" subsection. |
296 | 296 | ||
297 | 297 | ||
298 | (2) Data dependency barriers. | 298 | (2) Data dependency barriers. |
299 | 299 | ||
300 | A data dependency barrier is a weaker form of read barrier. In the case | 300 | A data dependency barrier is a weaker form of read barrier. In the case |
301 | where two loads are performed such that the second depends on the result | 301 | where two loads are performed such that the second depends on the result |
302 | of the first (eg: the first load retrieves the address to which the second | 302 | of the first (eg: the first load retrieves the address to which the second |
303 | load will be directed), a data dependency barrier would be required to | 303 | load will be directed), a data dependency barrier would be required to |
304 | make sure that the target of the second load is updated before the address | 304 | make sure that the target of the second load is updated before the address |
305 | obtained by the first load is accessed. | 305 | obtained by the first load is accessed. |
306 | 306 | ||
307 | A data dependency barrier is a partial ordering on interdependent loads | 307 | A data dependency barrier is a partial ordering on interdependent loads |
308 | only; it is not required to have any effect on stores, independent loads | 308 | only; it is not required to have any effect on stores, independent loads |
309 | or overlapping loads. | 309 | or overlapping loads. |
310 | 310 | ||
311 | As mentioned in (1), the other CPUs in the system can be viewed as | 311 | As mentioned in (1), the other CPUs in the system can be viewed as |
312 | committing sequences of stores to the memory system that the CPU being | 312 | committing sequences of stores to the memory system that the CPU being |
313 | considered can then perceive. A data dependency barrier issued by the CPU | 313 | considered can then perceive. A data dependency barrier issued by the CPU |
314 | under consideration guarantees that for any load preceding it, if that | 314 | under consideration guarantees that for any load preceding it, if that |
315 | load touches one of a sequence of stores from another CPU, then by the | 315 | load touches one of a sequence of stores from another CPU, then by the |
316 | time the barrier completes, the effects of all the stores prior to that | 316 | time the barrier completes, the effects of all the stores prior to that |
317 | touched by the load will be perceptible to any loads issued after the data | 317 | touched by the load will be perceptible to any loads issued after the data |
318 | dependency barrier. | 318 | dependency barrier. |
319 | 319 | ||
320 | See the "Examples of memory barrier sequences" subsection for diagrams | 320 | See the "Examples of memory barrier sequences" subsection for diagrams |
321 | showing the ordering constraints. | 321 | showing the ordering constraints. |
322 | 322 | ||
323 | [!] Note that the first load really has to have a _data_ dependency and | 323 | [!] Note that the first load really has to have a _data_ dependency and |
324 | not a control dependency. If the address for the second load is dependent | 324 | not a control dependency. If the address for the second load is dependent |
325 | on the first load, but the dependency is through a conditional rather than | 325 | on the first load, but the dependency is through a conditional rather than |
326 | actually loading the address itself, then it's a _control_ dependency and | 326 | actually loading the address itself, then it's a _control_ dependency and |
327 | a full read barrier or better is required. See the "Control dependencies" | 327 | a full read barrier or better is required. See the "Control dependencies" |
328 | subsection for more information. | 328 | subsection for more information. |
329 | 329 | ||
330 | [!] Note that data dependency barriers should normally be paired with | 330 | [!] Note that data dependency barriers should normally be paired with |
331 | write barriers; see the "SMP barrier pairing" subsection. | 331 | write barriers; see the "SMP barrier pairing" subsection. |
332 | 332 | ||
333 | 333 | ||
334 | (3) Read (or load) memory barriers. | 334 | (3) Read (or load) memory barriers. |
335 | 335 | ||
336 | A read barrier is a data dependency barrier plus a guarantee that all the | 336 | A read barrier is a data dependency barrier plus a guarantee that all the |
337 | LOAD operations specified before the barrier will appear to happen before | 337 | LOAD operations specified before the barrier will appear to happen before |
338 | all the LOAD operations specified after the barrier with respect to the | 338 | all the LOAD operations specified after the barrier with respect to the |
339 | other components of the system. | 339 | other components of the system. |
340 | 340 | ||
341 | A read barrier is a partial ordering on loads only; it is not required to | 341 | A read barrier is a partial ordering on loads only; it is not required to |
342 | have any effect on stores. | 342 | have any effect on stores. |
343 | 343 | ||
344 | Read memory barriers imply data dependency barriers, and so can substitute | 344 | Read memory barriers imply data dependency barriers, and so can substitute |
345 | for them. | 345 | for them. |
346 | 346 | ||
347 | [!] Note that read barriers should normally be paired with write barriers; | 347 | [!] Note that read barriers should normally be paired with write barriers; |
348 | see the "SMP barrier pairing" subsection. | 348 | see the "SMP barrier pairing" subsection. |
349 | 349 | ||
350 | 350 | ||
351 | (4) General memory barriers. | 351 | (4) General memory barriers. |
352 | 352 | ||
353 | A general memory barrier gives a guarantee that all the LOAD and STORE | 353 | A general memory barrier gives a guarantee that all the LOAD and STORE |
354 | operations specified before the barrier will appear to happen before all | 354 | operations specified before the barrier will appear to happen before all |
355 | the LOAD and STORE operations specified after the barrier with respect to | 355 | the LOAD and STORE operations specified after the barrier with respect to |
356 | the other components of the system. | 356 | the other components of the system. |
357 | 357 | ||
358 | A general memory barrier is a partial ordering over both loads and stores. | 358 | A general memory barrier is a partial ordering over both loads and stores. |
359 | 359 | ||
360 | General memory barriers imply both read and write memory barriers, and so | 360 | General memory barriers imply both read and write memory barriers, and so |
361 | can substitute for either. | 361 | can substitute for either. |
362 | 362 | ||
363 | 363 | ||
364 | And a couple of implicit varieties: | 364 | And a couple of implicit varieties: |
365 | 365 | ||
366 | (5) LOCK operations. | 366 | (5) LOCK operations. |
367 | 367 | ||
368 | This acts as a one-way permeable barrier. It guarantees that all memory | 368 | This acts as a one-way permeable barrier. It guarantees that all memory |
369 | operations after the LOCK operation will appear to happen after the LOCK | 369 | operations after the LOCK operation will appear to happen after the LOCK |
370 | operation with respect to the other components of the system. | 370 | operation with respect to the other components of the system. |
371 | 371 | ||
372 | Memory operations that occur before a LOCK operation may appear to happen | 372 | Memory operations that occur before a LOCK operation may appear to happen |
373 | after it completes. | 373 | after it completes. |
374 | 374 | ||
375 | A LOCK operation should almost always be paired with an UNLOCK operation. | 375 | A LOCK operation should almost always be paired with an UNLOCK operation. |
376 | 376 | ||
377 | 377 | ||
378 | (6) UNLOCK operations. | 378 | (6) UNLOCK operations. |
379 | 379 | ||
380 | This also acts as a one-way permeable barrier. It guarantees that all | 380 | This also acts as a one-way permeable barrier. It guarantees that all |
381 | memory operations before the UNLOCK operation will appear to happen before | 381 | memory operations before the UNLOCK operation will appear to happen before |
382 | the UNLOCK operation with respect to the other components of the system. | 382 | the UNLOCK operation with respect to the other components of the system. |
383 | 383 | ||
384 | Memory operations that occur after an UNLOCK operation may appear to | 384 | Memory operations that occur after an UNLOCK operation may appear to |
385 | happen before it completes. | 385 | happen before it completes. |
386 | 386 | ||
387 | LOCK and UNLOCK operations are guaranteed to appear with respect to each | 387 | LOCK and UNLOCK operations are guaranteed to appear with respect to each |
388 | other strictly in the order specified. | 388 | other strictly in the order specified. |
389 | 389 | ||
390 | The use of LOCK and UNLOCK operations generally precludes the need for | 390 | The use of LOCK and UNLOCK operations generally precludes the need for |
391 | other sorts of memory barrier (but note the exceptions mentioned in the | 391 | other sorts of memory barrier (but note the exceptions mentioned in the |
392 | subsection "MMIO write barrier"). | 392 | subsection "MMIO write barrier"). |
393 | 393 | ||
394 | 394 | ||
395 | Memory barriers are only required where there's a possibility of interaction | 395 | Memory barriers are only required where there's a possibility of interaction |
396 | between two CPUs or between a CPU and a device. If it can be guaranteed that | 396 | between two CPUs or between a CPU and a device. If it can be guaranteed that |
397 | there won't be any such interaction in any particular piece of code, then | 397 | there won't be any such interaction in any particular piece of code, then |
398 | memory barriers are unnecessary in that piece of code. | 398 | memory barriers are unnecessary in that piece of code. |
399 | 399 | ||
400 | 400 | ||
401 | Note that these are the _minimum_ guarantees. Different architectures may give | 401 | Note that these are the _minimum_ guarantees. Different architectures may give |
402 | more substantial guarantees, but they may _not_ be relied upon outside of arch | 402 | more substantial guarantees, but they may _not_ be relied upon outside of arch |
403 | specific code. | 403 | specific code. |
404 | 404 | ||
405 | 405 | ||
406 | WHAT MAY NOT BE ASSUMED ABOUT MEMORY BARRIERS? | 406 | WHAT MAY NOT BE ASSUMED ABOUT MEMORY BARRIERS? |
407 | ---------------------------------------------- | 407 | ---------------------------------------------- |
408 | 408 | ||
409 | There are certain things that the Linux kernel memory barriers do not guarantee: | 409 | There are certain things that the Linux kernel memory barriers do not guarantee: |
410 | 410 | ||
411 | (*) There is no guarantee that any of the memory accesses specified before a | 411 | (*) There is no guarantee that any of the memory accesses specified before a |
412 | memory barrier will be _complete_ by the completion of a memory barrier | 412 | memory barrier will be _complete_ by the completion of a memory barrier |
413 | instruction; the barrier can be considered to draw a line in that CPU's | 413 | instruction; the barrier can be considered to draw a line in that CPU's |
414 | access queue that accesses of the appropriate type may not cross. | 414 | access queue that accesses of the appropriate type may not cross. |
415 | 415 | ||
416 | (*) There is no guarantee that issuing a memory barrier on one CPU will have | 416 | (*) There is no guarantee that issuing a memory barrier on one CPU will have |
417 | any direct effect on another CPU or any other hardware in the system. The | 417 | any direct effect on another CPU or any other hardware in the system. The |
418 | indirect effect will be the order in which the second CPU sees the effects | 418 | indirect effect will be the order in which the second CPU sees the effects |
419 | of the first CPU's accesses occur, but see the next point: | 419 | of the first CPU's accesses occur, but see the next point: |
420 | 420 | ||
421 | (*) There is no guarantee that a CPU will see the correct order of effects | 421 | (*) There is no guarantee that a CPU will see the correct order of effects |
422 | from a second CPU's accesses, even _if_ the second CPU uses a memory | 422 | from a second CPU's accesses, even _if_ the second CPU uses a memory |
423 | barrier, unless the first CPU _also_ uses a matching memory barrier (see | 423 | barrier, unless the first CPU _also_ uses a matching memory barrier (see |
424 | the subsection on "SMP Barrier Pairing"). | 424 | the subsection on "SMP Barrier Pairing"). |
425 | 425 | ||
426 | (*) There is no guarantee that some intervening piece of off-the-CPU | 426 | (*) There is no guarantee that some intervening piece of off-the-CPU |
427 | hardware[*] will not reorder the memory accesses. CPU cache coherency | 427 | hardware[*] will not reorder the memory accesses. CPU cache coherency |
428 | mechanisms should propagate the indirect effects of a memory barrier | 428 | mechanisms should propagate the indirect effects of a memory barrier |
429 | between CPUs, but might not do so in order. | 429 | between CPUs, but might not do so in order. |
430 | 430 | ||
431 | [*] For information on bus mastering DMA and coherency please read: | 431 | [*] For information on bus mastering DMA and coherency please read: |
432 | 432 | ||
433 | Documentation/pci.txt | 433 | Documentation/pci.txt |
434 | Documentation/DMA-mapping.txt | 434 | Documentation/DMA-mapping.txt |
435 | Documentation/DMA-API.txt | 435 | Documentation/DMA-API.txt |
436 | 436 | ||
437 | 437 | ||
438 | DATA DEPENDENCY BARRIERS | 438 | DATA DEPENDENCY BARRIERS |
439 | ------------------------ | 439 | ------------------------ |
440 | 440 | ||
441 | The usage requirements of data dependency barriers are a little subtle, and | 441 | The usage requirements of data dependency barriers are a little subtle, and |
442 | it's not always obvious that they're needed. To illustrate, consider the | 442 | it's not always obvious that they're needed. To illustrate, consider the |
443 | following sequence of events: | 443 | following sequence of events: |
444 | 444 | ||
445 | CPU 1 CPU 2 | 445 | CPU 1 CPU 2 |
446 | =============== =============== | 446 | =============== =============== |
447 | { A == 1, B == 2, C = 3, P == &A, Q == &C } | 447 | { A == 1, B == 2, C = 3, P == &A, Q == &C } |
448 | B = 4; | 448 | B = 4; |
449 | <write barrier> | 449 | <write barrier> |
450 | P = &B | 450 | P = &B |
451 | Q = P; | 451 | Q = P; |
452 | D = *Q; | 452 | D = *Q; |
453 | 453 | ||
454 | There's a clear data dependency here, and it would seem that by the end of the | 454 | There's a clear data dependency here, and it would seem that by the end of the |
455 | sequence, Q must be either &A or &B, and that: | 455 | sequence, Q must be either &A or &B, and that: |
456 | 456 | ||
457 | (Q == &A) implies (D == 1) | 457 | (Q == &A) implies (D == 1) |
458 | (Q == &B) implies (D == 4) | 458 | (Q == &B) implies (D == 4) |
459 | 459 | ||
460 | But! CPU 2's perception of P may be updated _before_ its perception of B, thus | 460 | But! CPU 2's perception of P may be updated _before_ its perception of B, thus |
461 | leading to the following situation: | 461 | leading to the following situation: |
462 | 462 | ||
463 | (Q == &B) and (D == 2) ???? | 463 | (Q == &B) and (D == 2) ???? |
464 | 464 | ||
465 | Whilst this may seem like a failure of coherency or causality maintenance, it | 465 | Whilst this may seem like a failure of coherency or causality maintenance, it |
466 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC | 466 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC |
467 | Alpha). | 467 | Alpha). |
468 | 468 | ||
469 | To deal with this, a data dependency barrier or better must be inserted | 469 | To deal with this, a data dependency barrier or better must be inserted |
470 | between the address load and the data load: | 470 | between the address load and the data load: |
471 | 471 | ||
472 | CPU 1 CPU 2 | 472 | CPU 1 CPU 2 |
473 | =============== =============== | 473 | =============== =============== |
474 | { A == 1, B == 2, C = 3, P == &A, Q == &C } | 474 | { A == 1, B == 2, C = 3, P == &A, Q == &C } |
475 | B = 4; | 475 | B = 4; |
476 | <write barrier> | 476 | <write barrier> |
477 | P = &B | 477 | P = &B |
478 | Q = P; | 478 | Q = P; |
479 | <data dependency barrier> | 479 | <data dependency barrier> |
480 | D = *Q; | 480 | D = *Q; |
481 | 481 | ||
482 | This enforces the occurrence of one of the two implications, and prevents the | 482 | This enforces the occurrence of one of the two implications, and prevents the |
483 | third possibility from arising. | 483 | third possibility from arising. |
484 | 484 | ||
485 | [!] Note that this extremely counterintuitive situation arises most easily on | 485 | [!] Note that this extremely counterintuitive situation arises most easily on |
486 | machines with split caches, so that, for example, one cache bank processes | 486 | machines with split caches, so that, for example, one cache bank processes |
487 | even-numbered cache lines and the other bank processes odd-numbered cache | 487 | even-numbered cache lines and the other bank processes odd-numbered cache |
488 | lines. The pointer P might be stored in an odd-numbered cache line, and the | 488 | lines. The pointer P might be stored in an odd-numbered cache line, and the |
489 | variable B might be stored in an even-numbered cache line. Then, if the | 489 | variable B might be stored in an even-numbered cache line. Then, if the |
490 | even-numbered bank of the reading CPU's cache is extremely busy while the | 490 | even-numbered bank of the reading CPU's cache is extremely busy while the |
491 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), | 491 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), |
492 | but the old value of the variable B (2). | 492 | but the old value of the variable B (2). |
493 | 493 | ||
494 | 494 | ||
495 | Another example of where data dependency barriers might by required is where a | 495 | Another example of where data dependency barriers might by required is where a |
496 | number is read from memory and then used to calculate the index for an array | 496 | number is read from memory and then used to calculate the index for an array |
497 | access: | 497 | access: |
498 | 498 | ||
499 | CPU 1 CPU 2 | 499 | CPU 1 CPU 2 |
500 | =============== =============== | 500 | =============== =============== |
501 | { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } | 501 | { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } |
502 | M[1] = 4; | 502 | M[1] = 4; |
503 | <write barrier> | 503 | <write barrier> |
504 | P = 1 | 504 | P = 1 |
505 | Q = P; | 505 | Q = P; |
506 | <data dependency barrier> | 506 | <data dependency barrier> |
507 | D = M[Q]; | 507 | D = M[Q]; |
508 | 508 | ||
509 | 509 | ||
510 | The data dependency barrier is very important to the RCU system, for example. | 510 | The data dependency barrier is very important to the RCU system, for example. |
511 | See rcu_dereference() in include/linux/rcupdate.h. This permits the current | 511 | See rcu_dereference() in include/linux/rcupdate.h. This permits the current |
512 | target of an RCU'd pointer to be replaced with a new modified target, without | 512 | target of an RCU'd pointer to be replaced with a new modified target, without |
513 | the replacement target appearing to be incompletely initialised. | 513 | the replacement target appearing to be incompletely initialised. |
514 | 514 | ||
515 | See also the subsection on "Cache Coherency" for a more thorough example. | 515 | See also the subsection on "Cache Coherency" for a more thorough example. |
516 | 516 | ||
517 | 517 | ||
518 | CONTROL DEPENDENCIES | 518 | CONTROL DEPENDENCIES |
519 | -------------------- | 519 | -------------------- |
520 | 520 | ||
521 | A control dependency requires a full read memory barrier, not simply a data | 521 | A control dependency requires a full read memory barrier, not simply a data |
522 | dependency barrier to make it work correctly. Consider the following bit of | 522 | dependency barrier to make it work correctly. Consider the following bit of |
523 | code: | 523 | code: |
524 | 524 | ||
525 | q = &a; | 525 | q = &a; |
526 | if (p) | 526 | if (p) |
527 | q = &b; | 527 | q = &b; |
528 | <data dependency barrier> | 528 | <data dependency barrier> |
529 | x = *q; | 529 | x = *q; |
530 | 530 | ||
531 | This will not have the desired effect because there is no actual data | 531 | This will not have the desired effect because there is no actual data |
532 | dependency, but rather a control dependency that the CPU may short-circuit by | 532 | dependency, but rather a control dependency that the CPU may short-circuit by |
533 | attempting to predict the outcome in advance. In such a case what's actually | 533 | attempting to predict the outcome in advance. In such a case what's actually |
534 | required is: | 534 | required is: |
535 | 535 | ||
536 | q = &a; | 536 | q = &a; |
537 | if (p) | 537 | if (p) |
538 | q = &b; | 538 | q = &b; |
539 | <read barrier> | 539 | <read barrier> |
540 | x = *q; | 540 | x = *q; |
541 | 541 | ||
542 | 542 | ||
543 | SMP BARRIER PAIRING | 543 | SMP BARRIER PAIRING |
544 | ------------------- | 544 | ------------------- |
545 | 545 | ||
546 | When dealing with CPU-CPU interactions, certain types of memory barrier should | 546 | When dealing with CPU-CPU interactions, certain types of memory barrier should |
547 | always be paired. A lack of appropriate pairing is almost certainly an error. | 547 | always be paired. A lack of appropriate pairing is almost certainly an error. |
548 | 548 | ||
549 | A write barrier should always be paired with a data dependency barrier or read | 549 | A write barrier should always be paired with a data dependency barrier or read |
550 | barrier, though a general barrier would also be viable. Similarly a read | 550 | barrier, though a general barrier would also be viable. Similarly a read |
551 | barrier or a data dependency barrier should always be paired with at least an | 551 | barrier or a data dependency barrier should always be paired with at least an |
552 | write barrier, though, again, a general barrier is viable: | 552 | write barrier, though, again, a general barrier is viable: |
553 | 553 | ||
554 | CPU 1 CPU 2 | 554 | CPU 1 CPU 2 |
555 | =============== =============== | 555 | =============== =============== |
556 | a = 1; | 556 | a = 1; |
557 | <write barrier> | 557 | <write barrier> |
558 | b = 2; x = b; | 558 | b = 2; x = b; |
559 | <read barrier> | 559 | <read barrier> |
560 | y = a; | 560 | y = a; |
561 | 561 | ||
562 | Or: | 562 | Or: |
563 | 563 | ||
564 | CPU 1 CPU 2 | 564 | CPU 1 CPU 2 |
565 | =============== =============================== | 565 | =============== =============================== |
566 | a = 1; | 566 | a = 1; |
567 | <write barrier> | 567 | <write barrier> |
568 | b = &a; x = b; | 568 | b = &a; x = b; |
569 | <data dependency barrier> | 569 | <data dependency barrier> |
570 | y = *x; | 570 | y = *x; |
571 | 571 | ||
572 | Basically, the read barrier always has to be there, even though it can be of | 572 | Basically, the read barrier always has to be there, even though it can be of |
573 | the "weaker" type. | 573 | the "weaker" type. |
574 | 574 | ||
575 | [!] Note that the stores before the write barrier would normally be expected to | 575 | [!] Note that the stores before the write barrier would normally be expected to |
576 | match the loads after the read barrier or data dependency barrier, and vice | 576 | match the loads after the read barrier or data dependency barrier, and vice |
577 | versa: | 577 | versa: |
578 | 578 | ||
579 | CPU 1 CPU 2 | 579 | CPU 1 CPU 2 |
580 | =============== =============== | 580 | =============== =============== |
581 | a = 1; }---- --->{ v = c | 581 | a = 1; }---- --->{ v = c |
582 | b = 2; } \ / { w = d | 582 | b = 2; } \ / { w = d |
583 | <write barrier> \ <read barrier> | 583 | <write barrier> \ <read barrier> |
584 | c = 3; } / \ { x = a; | 584 | c = 3; } / \ { x = a; |
585 | d = 4; }---- --->{ y = b; | 585 | d = 4; }---- --->{ y = b; |
586 | 586 | ||
587 | 587 | ||
588 | EXAMPLES OF MEMORY BARRIER SEQUENCES | 588 | EXAMPLES OF MEMORY BARRIER SEQUENCES |
589 | ------------------------------------ | 589 | ------------------------------------ |
590 | 590 | ||
591 | Firstly, write barriers act as a partial orderings on store operations. | 591 | Firstly, write barriers act as a partial orderings on store operations. |
592 | Consider the following sequence of events: | 592 | Consider the following sequence of events: |
593 | 593 | ||
594 | CPU 1 | 594 | CPU 1 |
595 | ======================= | 595 | ======================= |
596 | STORE A = 1 | 596 | STORE A = 1 |
597 | STORE B = 2 | 597 | STORE B = 2 |
598 | STORE C = 3 | 598 | STORE C = 3 |
599 | <write barrier> | 599 | <write barrier> |
600 | STORE D = 4 | 600 | STORE D = 4 |
601 | STORE E = 5 | 601 | STORE E = 5 |
602 | 602 | ||
603 | This sequence of events is committed to the memory coherence system in an order | 603 | This sequence of events is committed to the memory coherence system in an order |
604 | that the rest of the system might perceive as the unordered set of { STORE A, | 604 | that the rest of the system might perceive as the unordered set of { STORE A, |
605 | STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E | 605 | STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E |
606 | }: | 606 | }: |
607 | 607 | ||
608 | +-------+ : : | 608 | +-------+ : : |
609 | | | +------+ | 609 | | | +------+ |
610 | | |------>| C=3 | } /\ | 610 | | |------>| C=3 | } /\ |
611 | | | : +------+ }----- \ -----> Events perceptible | 611 | | | : +------+ }----- \ -----> Events perceptible |
612 | | | : | A=1 | } \/ to rest of system | 612 | | | : | A=1 | } \/ to rest of system |
613 | | | : +------+ } | 613 | | | : +------+ } |
614 | | CPU 1 | : | B=2 | } | 614 | | CPU 1 | : | B=2 | } |
615 | | | +------+ } | 615 | | | +------+ } |
616 | | | wwwwwwwwwwwwwwww } <--- At this point the write barrier | 616 | | | wwwwwwwwwwwwwwww } <--- At this point the write barrier |
617 | | | +------+ } requires all stores prior to the | 617 | | | +------+ } requires all stores prior to the |
618 | | | : | E=5 | } barrier to be committed before | 618 | | | : | E=5 | } barrier to be committed before |
619 | | | : +------+ } further stores may be take place. | 619 | | | : +------+ } further stores may be take place. |
620 | | |------>| D=4 | } | 620 | | |------>| D=4 | } |
621 | | | +------+ | 621 | | | +------+ |
622 | +-------+ : : | 622 | +-------+ : : |
623 | | | 623 | | |
624 | | Sequence in which stores are committed to the | 624 | | Sequence in which stores are committed to the |
625 | | memory system by CPU 1 | 625 | | memory system by CPU 1 |
626 | V | 626 | V |
627 | 627 | ||
628 | 628 | ||
629 | Secondly, data dependency barriers act as a partial orderings on data-dependent | 629 | Secondly, data dependency barriers act as a partial orderings on data-dependent |
630 | loads. Consider the following sequence of events: | 630 | loads. Consider the following sequence of events: |
631 | 631 | ||
632 | CPU 1 CPU 2 | 632 | CPU 1 CPU 2 |
633 | ======================= ======================= | 633 | ======================= ======================= |
634 | { B = 7; X = 9; Y = 8; C = &Y } | 634 | { B = 7; X = 9; Y = 8; C = &Y } |
635 | STORE A = 1 | 635 | STORE A = 1 |
636 | STORE B = 2 | 636 | STORE B = 2 |
637 | <write barrier> | 637 | <write barrier> |
638 | STORE C = &B LOAD X | 638 | STORE C = &B LOAD X |
639 | STORE D = 4 LOAD C (gets &B) | 639 | STORE D = 4 LOAD C (gets &B) |
640 | LOAD *C (reads B) | 640 | LOAD *C (reads B) |
641 | 641 | ||
642 | Without intervention, CPU 2 may perceive the events on CPU 1 in some | 642 | Without intervention, CPU 2 may perceive the events on CPU 1 in some |
643 | effectively random order, despite the write barrier issued by CPU 1: | 643 | effectively random order, despite the write barrier issued by CPU 1: |
644 | 644 | ||
645 | +-------+ : : : : | 645 | +-------+ : : : : |
646 | | | +------+ +-------+ | Sequence of update | 646 | | | +------+ +-------+ | Sequence of update |
647 | | |------>| B=2 |----- --->| Y->8 | | of perception on | 647 | | |------>| B=2 |----- --->| Y->8 | | of perception on |
648 | | | : +------+ \ +-------+ | CPU 2 | 648 | | | : +------+ \ +-------+ | CPU 2 |
649 | | CPU 1 | : | A=1 | \ --->| C->&Y | V | 649 | | CPU 1 | : | A=1 | \ --->| C->&Y | V |
650 | | | +------+ | +-------+ | 650 | | | +------+ | +-------+ |
651 | | | wwwwwwwwwwwwwwww | : : | 651 | | | wwwwwwwwwwwwwwww | : : |
652 | | | +------+ | : : | 652 | | | +------+ | : : |
653 | | | : | C=&B |--- | : : +-------+ | 653 | | | : | C=&B |--- | : : +-------+ |
654 | | | : +------+ \ | +-------+ | | | 654 | | | : +------+ \ | +-------+ | | |
655 | | |------>| D=4 | ----------->| C->&B |------>| | | 655 | | |------>| D=4 | ----------->| C->&B |------>| | |
656 | | | +------+ | +-------+ | | | 656 | | | +------+ | +-------+ | | |
657 | +-------+ : : | : : | | | 657 | +-------+ : : | : : | | |
658 | | : : | | | 658 | | : : | | |
659 | | : : | CPU 2 | | 659 | | : : | CPU 2 | |
660 | | +-------+ | | | 660 | | +-------+ | | |
661 | Apparently incorrect ---> | | B->7 |------>| | | 661 | Apparently incorrect ---> | | B->7 |------>| | |
662 | perception of B (!) | +-------+ | | | 662 | perception of B (!) | +-------+ | | |
663 | | : : | | | 663 | | : : | | |
664 | | +-------+ | | | 664 | | +-------+ | | |
665 | The load of X holds ---> \ | X->9 |------>| | | 665 | The load of X holds ---> \ | X->9 |------>| | |
666 | up the maintenance \ +-------+ | | | 666 | up the maintenance \ +-------+ | | |
667 | of coherence of B ----->| B->2 | +-------+ | 667 | of coherence of B ----->| B->2 | +-------+ |
668 | +-------+ | 668 | +-------+ |
669 | : : | 669 | : : |
670 | 670 | ||
671 | 671 | ||
672 | In the above example, CPU 2 perceives that B is 7, despite the load of *C | 672 | In the above example, CPU 2 perceives that B is 7, despite the load of *C |
673 | (which would be B) coming after the the LOAD of C. | 673 | (which would be B) coming after the LOAD of C. |
674 | 674 | ||
675 | If, however, a data dependency barrier were to be placed between the load of C | 675 | If, however, a data dependency barrier were to be placed between the load of C |
676 | and the load of *C (ie: B) on CPU 2: | 676 | and the load of *C (ie: B) on CPU 2: |
677 | 677 | ||
678 | CPU 1 CPU 2 | 678 | CPU 1 CPU 2 |
679 | ======================= ======================= | 679 | ======================= ======================= |
680 | { B = 7; X = 9; Y = 8; C = &Y } | 680 | { B = 7; X = 9; Y = 8; C = &Y } |
681 | STORE A = 1 | 681 | STORE A = 1 |
682 | STORE B = 2 | 682 | STORE B = 2 |
683 | <write barrier> | 683 | <write barrier> |
684 | STORE C = &B LOAD X | 684 | STORE C = &B LOAD X |
685 | STORE D = 4 LOAD C (gets &B) | 685 | STORE D = 4 LOAD C (gets &B) |
686 | <data dependency barrier> | 686 | <data dependency barrier> |
687 | LOAD *C (reads B) | 687 | LOAD *C (reads B) |
688 | 688 | ||
689 | then the following will occur: | 689 | then the following will occur: |
690 | 690 | ||
691 | +-------+ : : : : | 691 | +-------+ : : : : |
692 | | | +------+ +-------+ | 692 | | | +------+ +-------+ |
693 | | |------>| B=2 |----- --->| Y->8 | | 693 | | |------>| B=2 |----- --->| Y->8 | |
694 | | | : +------+ \ +-------+ | 694 | | | : +------+ \ +-------+ |
695 | | CPU 1 | : | A=1 | \ --->| C->&Y | | 695 | | CPU 1 | : | A=1 | \ --->| C->&Y | |
696 | | | +------+ | +-------+ | 696 | | | +------+ | +-------+ |
697 | | | wwwwwwwwwwwwwwww | : : | 697 | | | wwwwwwwwwwwwwwww | : : |
698 | | | +------+ | : : | 698 | | | +------+ | : : |
699 | | | : | C=&B |--- | : : +-------+ | 699 | | | : | C=&B |--- | : : +-------+ |
700 | | | : +------+ \ | +-------+ | | | 700 | | | : +------+ \ | +-------+ | | |
701 | | |------>| D=4 | ----------->| C->&B |------>| | | 701 | | |------>| D=4 | ----------->| C->&B |------>| | |
702 | | | +------+ | +-------+ | | | 702 | | | +------+ | +-------+ | | |
703 | +-------+ : : | : : | | | 703 | +-------+ : : | : : | | |
704 | | : : | | | 704 | | : : | | |
705 | | : : | CPU 2 | | 705 | | : : | CPU 2 | |
706 | | +-------+ | | | 706 | | +-------+ | | |
707 | | | X->9 |------>| | | 707 | | | X->9 |------>| | |
708 | | +-------+ | | | 708 | | +-------+ | | |
709 | Makes sure all effects ---> \ ddddddddddddddddd | | | 709 | Makes sure all effects ---> \ ddddddddddddddddd | | |
710 | prior to the store of C \ +-------+ | | | 710 | prior to the store of C \ +-------+ | | |
711 | are perceptible to ----->| B->2 |------>| | | 711 | are perceptible to ----->| B->2 |------>| | |
712 | subsequent loads +-------+ | | | 712 | subsequent loads +-------+ | | |
713 | : : +-------+ | 713 | : : +-------+ |
714 | 714 | ||
715 | 715 | ||
716 | And thirdly, a read barrier acts as a partial order on loads. Consider the | 716 | And thirdly, a read barrier acts as a partial order on loads. Consider the |
717 | following sequence of events: | 717 | following sequence of events: |
718 | 718 | ||
719 | CPU 1 CPU 2 | 719 | CPU 1 CPU 2 |
720 | ======================= ======================= | 720 | ======================= ======================= |
721 | { A = 0, B = 9 } | 721 | { A = 0, B = 9 } |
722 | STORE A=1 | 722 | STORE A=1 |
723 | <write barrier> | 723 | <write barrier> |
724 | STORE B=2 | 724 | STORE B=2 |
725 | LOAD B | 725 | LOAD B |
726 | LOAD A | 726 | LOAD A |
727 | 727 | ||
728 | Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in | 728 | Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in |
729 | some effectively random order, despite the write barrier issued by CPU 1: | 729 | some effectively random order, despite the write barrier issued by CPU 1: |
730 | 730 | ||
731 | +-------+ : : : : | 731 | +-------+ : : : : |
732 | | | +------+ +-------+ | 732 | | | +------+ +-------+ |
733 | | |------>| A=1 |------ --->| A->0 | | 733 | | |------>| A=1 |------ --->| A->0 | |
734 | | | +------+ \ +-------+ | 734 | | | +------+ \ +-------+ |
735 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | | 735 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | |
736 | | | +------+ | +-------+ | 736 | | | +------+ | +-------+ |
737 | | |------>| B=2 |--- | : : | 737 | | |------>| B=2 |--- | : : |
738 | | | +------+ \ | : : +-------+ | 738 | | | +------+ \ | : : +-------+ |
739 | +-------+ : : \ | +-------+ | | | 739 | +-------+ : : \ | +-------+ | | |
740 | ---------->| B->2 |------>| | | 740 | ---------->| B->2 |------>| | |
741 | | +-------+ | CPU 2 | | 741 | | +-------+ | CPU 2 | |
742 | | | A->0 |------>| | | 742 | | | A->0 |------>| | |
743 | | +-------+ | | | 743 | | +-------+ | | |
744 | | : : +-------+ | 744 | | : : +-------+ |
745 | \ : : | 745 | \ : : |
746 | \ +-------+ | 746 | \ +-------+ |
747 | ---->| A->1 | | 747 | ---->| A->1 | |
748 | +-------+ | 748 | +-------+ |
749 | : : | 749 | : : |
750 | 750 | ||
751 | 751 | ||
752 | If, however, a read barrier were to be placed between the load of B and the | 752 | If, however, a read barrier were to be placed between the load of B and the |
753 | load of A on CPU 2: | 753 | load of A on CPU 2: |
754 | 754 | ||
755 | CPU 1 CPU 2 | 755 | CPU 1 CPU 2 |
756 | ======================= ======================= | 756 | ======================= ======================= |
757 | { A = 0, B = 9 } | 757 | { A = 0, B = 9 } |
758 | STORE A=1 | 758 | STORE A=1 |
759 | <write barrier> | 759 | <write barrier> |
760 | STORE B=2 | 760 | STORE B=2 |
761 | LOAD B | 761 | LOAD B |
762 | <read barrier> | 762 | <read barrier> |
763 | LOAD A | 763 | LOAD A |
764 | 764 | ||
765 | then the partial ordering imposed by CPU 1 will be perceived correctly by CPU | 765 | then the partial ordering imposed by CPU 1 will be perceived correctly by CPU |
766 | 2: | 766 | 2: |
767 | 767 | ||
768 | +-------+ : : : : | 768 | +-------+ : : : : |
769 | | | +------+ +-------+ | 769 | | | +------+ +-------+ |
770 | | |------>| A=1 |------ --->| A->0 | | 770 | | |------>| A=1 |------ --->| A->0 | |
771 | | | +------+ \ +-------+ | 771 | | | +------+ \ +-------+ |
772 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | | 772 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | |
773 | | | +------+ | +-------+ | 773 | | | +------+ | +-------+ |
774 | | |------>| B=2 |--- | : : | 774 | | |------>| B=2 |--- | : : |
775 | | | +------+ \ | : : +-------+ | 775 | | | +------+ \ | : : +-------+ |
776 | +-------+ : : \ | +-------+ | | | 776 | +-------+ : : \ | +-------+ | | |
777 | ---------->| B->2 |------>| | | 777 | ---------->| B->2 |------>| | |
778 | | +-------+ | CPU 2 | | 778 | | +-------+ | CPU 2 | |
779 | | : : | | | 779 | | : : | | |
780 | | : : | | | 780 | | : : | | |
781 | At this point the read ----> \ rrrrrrrrrrrrrrrrr | | | 781 | At this point the read ----> \ rrrrrrrrrrrrrrrrr | | |
782 | barrier causes all effects \ +-------+ | | | 782 | barrier causes all effects \ +-------+ | | |
783 | prior to the storage of B ---->| A->1 |------>| | | 783 | prior to the storage of B ---->| A->1 |------>| | |
784 | to be perceptible to CPU 2 +-------+ | | | 784 | to be perceptible to CPU 2 +-------+ | | |
785 | : : +-------+ | 785 | : : +-------+ |
786 | 786 | ||
787 | 787 | ||
788 | To illustrate this more completely, consider what could happen if the code | 788 | To illustrate this more completely, consider what could happen if the code |
789 | contained a load of A either side of the read barrier: | 789 | contained a load of A either side of the read barrier: |
790 | 790 | ||
791 | CPU 1 CPU 2 | 791 | CPU 1 CPU 2 |
792 | ======================= ======================= | 792 | ======================= ======================= |
793 | { A = 0, B = 9 } | 793 | { A = 0, B = 9 } |
794 | STORE A=1 | 794 | STORE A=1 |
795 | <write barrier> | 795 | <write barrier> |
796 | STORE B=2 | 796 | STORE B=2 |
797 | LOAD B | 797 | LOAD B |
798 | LOAD A [first load of A] | 798 | LOAD A [first load of A] |
799 | <read barrier> | 799 | <read barrier> |
800 | LOAD A [second load of A] | 800 | LOAD A [second load of A] |
801 | 801 | ||
802 | Even though the two loads of A both occur after the load of B, they may both | 802 | Even though the two loads of A both occur after the load of B, they may both |
803 | come up with different values: | 803 | come up with different values: |
804 | 804 | ||
805 | +-------+ : : : : | 805 | +-------+ : : : : |
806 | | | +------+ +-------+ | 806 | | | +------+ +-------+ |
807 | | |------>| A=1 |------ --->| A->0 | | 807 | | |------>| A=1 |------ --->| A->0 | |
808 | | | +------+ \ +-------+ | 808 | | | +------+ \ +-------+ |
809 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | | 809 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | |
810 | | | +------+ | +-------+ | 810 | | | +------+ | +-------+ |
811 | | |------>| B=2 |--- | : : | 811 | | |------>| B=2 |--- | : : |
812 | | | +------+ \ | : : +-------+ | 812 | | | +------+ \ | : : +-------+ |
813 | +-------+ : : \ | +-------+ | | | 813 | +-------+ : : \ | +-------+ | | |
814 | ---------->| B->2 |------>| | | 814 | ---------->| B->2 |------>| | |
815 | | +-------+ | CPU 2 | | 815 | | +-------+ | CPU 2 | |
816 | | : : | | | 816 | | : : | | |
817 | | : : | | | 817 | | : : | | |
818 | | +-------+ | | | 818 | | +-------+ | | |
819 | | | A->0 |------>| 1st | | 819 | | | A->0 |------>| 1st | |
820 | | +-------+ | | | 820 | | +-------+ | | |
821 | At this point the read ----> \ rrrrrrrrrrrrrrrrr | | | 821 | At this point the read ----> \ rrrrrrrrrrrrrrrrr | | |
822 | barrier causes all effects \ +-------+ | | | 822 | barrier causes all effects \ +-------+ | | |
823 | prior to the storage of B ---->| A->1 |------>| 2nd | | 823 | prior to the storage of B ---->| A->1 |------>| 2nd | |
824 | to be perceptible to CPU 2 +-------+ | | | 824 | to be perceptible to CPU 2 +-------+ | | |
825 | : : +-------+ | 825 | : : +-------+ |
826 | 826 | ||
827 | 827 | ||
828 | But it may be that the update to A from CPU 1 becomes perceptible to CPU 2 | 828 | But it may be that the update to A from CPU 1 becomes perceptible to CPU 2 |
829 | before the read barrier completes anyway: | 829 | before the read barrier completes anyway: |
830 | 830 | ||
831 | +-------+ : : : : | 831 | +-------+ : : : : |
832 | | | +------+ +-------+ | 832 | | | +------+ +-------+ |
833 | | |------>| A=1 |------ --->| A->0 | | 833 | | |------>| A=1 |------ --->| A->0 | |
834 | | | +------+ \ +-------+ | 834 | | | +------+ \ +-------+ |
835 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | | 835 | | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | |
836 | | | +------+ | +-------+ | 836 | | | +------+ | +-------+ |
837 | | |------>| B=2 |--- | : : | 837 | | |------>| B=2 |--- | : : |
838 | | | +------+ \ | : : +-------+ | 838 | | | +------+ \ | : : +-------+ |
839 | +-------+ : : \ | +-------+ | | | 839 | +-------+ : : \ | +-------+ | | |
840 | ---------->| B->2 |------>| | | 840 | ---------->| B->2 |------>| | |
841 | | +-------+ | CPU 2 | | 841 | | +-------+ | CPU 2 | |
842 | | : : | | | 842 | | : : | | |
843 | \ : : | | | 843 | \ : : | | |
844 | \ +-------+ | | | 844 | \ +-------+ | | |
845 | ---->| A->1 |------>| 1st | | 845 | ---->| A->1 |------>| 1st | |
846 | +-------+ | | | 846 | +-------+ | | |
847 | rrrrrrrrrrrrrrrrr | | | 847 | rrrrrrrrrrrrrrrrr | | |
848 | +-------+ | | | 848 | +-------+ | | |
849 | | A->1 |------>| 2nd | | 849 | | A->1 |------>| 2nd | |
850 | +-------+ | | | 850 | +-------+ | | |
851 | : : +-------+ | 851 | : : +-------+ |
852 | 852 | ||
853 | 853 | ||
854 | The guarantee is that the second load will always come up with A == 1 if the | 854 | The guarantee is that the second load will always come up with A == 1 if the |
855 | load of B came up with B == 2. No such guarantee exists for the first load of | 855 | load of B came up with B == 2. No such guarantee exists for the first load of |
856 | A; that may come up with either A == 0 or A == 1. | 856 | A; that may come up with either A == 0 or A == 1. |
857 | 857 | ||
858 | 858 | ||
859 | READ MEMORY BARRIERS VS LOAD SPECULATION | 859 | READ MEMORY BARRIERS VS LOAD SPECULATION |
860 | ---------------------------------------- | 860 | ---------------------------------------- |
861 | 861 | ||
862 | Many CPUs speculate with loads: that is they see that they will need to load an | 862 | Many CPUs speculate with loads: that is they see that they will need to load an |
863 | item from memory, and they find a time where they're not using the bus for any | 863 | item from memory, and they find a time where they're not using the bus for any |
864 | other loads, and so do the load in advance - even though they haven't actually | 864 | other loads, and so do the load in advance - even though they haven't actually |
865 | got to that point in the instruction execution flow yet. This permits the | 865 | got to that point in the instruction execution flow yet. This permits the |
866 | actual load instruction to potentially complete immediately because the CPU | 866 | actual load instruction to potentially complete immediately because the CPU |
867 | already has the value to hand. | 867 | already has the value to hand. |
868 | 868 | ||
869 | It may turn out that the CPU didn't actually need the value - perhaps because a | 869 | It may turn out that the CPU didn't actually need the value - perhaps because a |
870 | branch circumvented the load - in which case it can discard the value or just | 870 | branch circumvented the load - in which case it can discard the value or just |
871 | cache it for later use. | 871 | cache it for later use. |
872 | 872 | ||
873 | Consider: | 873 | Consider: |
874 | 874 | ||
875 | CPU 1 CPU 2 | 875 | CPU 1 CPU 2 |
876 | ======================= ======================= | 876 | ======================= ======================= |
877 | LOAD B | 877 | LOAD B |
878 | DIVIDE } Divide instructions generally | 878 | DIVIDE } Divide instructions generally |
879 | DIVIDE } take a long time to perform | 879 | DIVIDE } take a long time to perform |
880 | LOAD A | 880 | LOAD A |
881 | 881 | ||
882 | Which might appear as this: | 882 | Which might appear as this: |
883 | 883 | ||
884 | : : +-------+ | 884 | : : +-------+ |
885 | +-------+ | | | 885 | +-------+ | | |
886 | --->| B->2 |------>| | | 886 | --->| B->2 |------>| | |
887 | +-------+ | CPU 2 | | 887 | +-------+ | CPU 2 | |
888 | : :DIVIDE | | | 888 | : :DIVIDE | | |
889 | +-------+ | | | 889 | +-------+ | | |
890 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | | 890 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | |
891 | division speculates on the +-------+ ~ | | | 891 | division speculates on the +-------+ ~ | | |
892 | LOAD of A : : ~ | | | 892 | LOAD of A : : ~ | | |
893 | : :DIVIDE | | | 893 | : :DIVIDE | | |
894 | : : ~ | | | 894 | : : ~ | | |
895 | Once the divisions are complete --> : : ~-->| | | 895 | Once the divisions are complete --> : : ~-->| | |
896 | the CPU can then perform the : : | | | 896 | the CPU can then perform the : : | | |
897 | LOAD with immediate effect : : +-------+ | 897 | LOAD with immediate effect : : +-------+ |
898 | 898 | ||
899 | 899 | ||
900 | Placing a read barrier or a data dependency barrier just before the second | 900 | Placing a read barrier or a data dependency barrier just before the second |
901 | load: | 901 | load: |
902 | 902 | ||
903 | CPU 1 CPU 2 | 903 | CPU 1 CPU 2 |
904 | ======================= ======================= | 904 | ======================= ======================= |
905 | LOAD B | 905 | LOAD B |
906 | DIVIDE | 906 | DIVIDE |
907 | DIVIDE | 907 | DIVIDE |
908 | <read barrier> | 908 | <read barrier> |
909 | LOAD A | 909 | LOAD A |
910 | 910 | ||
911 | will force any value speculatively obtained to be reconsidered to an extent | 911 | will force any value speculatively obtained to be reconsidered to an extent |
912 | dependent on the type of barrier used. If there was no change made to the | 912 | dependent on the type of barrier used. If there was no change made to the |
913 | speculated memory location, then the speculated value will just be used: | 913 | speculated memory location, then the speculated value will just be used: |
914 | 914 | ||
915 | : : +-------+ | 915 | : : +-------+ |
916 | +-------+ | | | 916 | +-------+ | | |
917 | --->| B->2 |------>| | | 917 | --->| B->2 |------>| | |
918 | +-------+ | CPU 2 | | 918 | +-------+ | CPU 2 | |
919 | : :DIVIDE | | | 919 | : :DIVIDE | | |
920 | +-------+ | | | 920 | +-------+ | | |
921 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | | 921 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | |
922 | division speculates on the +-------+ ~ | | | 922 | division speculates on the +-------+ ~ | | |
923 | LOAD of A : : ~ | | | 923 | LOAD of A : : ~ | | |
924 | : :DIVIDE | | | 924 | : :DIVIDE | | |
925 | : : ~ | | | 925 | : : ~ | | |
926 | : : ~ | | | 926 | : : ~ | | |
927 | rrrrrrrrrrrrrrrr~ | | | 927 | rrrrrrrrrrrrrrrr~ | | |
928 | : : ~ | | | 928 | : : ~ | | |
929 | : : ~-->| | | 929 | : : ~-->| | |
930 | : : | | | 930 | : : | | |
931 | : : +-------+ | 931 | : : +-------+ |
932 | 932 | ||
933 | 933 | ||
934 | but if there was an update or an invalidation from another CPU pending, then | 934 | but if there was an update or an invalidation from another CPU pending, then |
935 | the speculation will be cancelled and the value reloaded: | 935 | the speculation will be cancelled and the value reloaded: |
936 | 936 | ||
937 | : : +-------+ | 937 | : : +-------+ |
938 | +-------+ | | | 938 | +-------+ | | |
939 | --->| B->2 |------>| | | 939 | --->| B->2 |------>| | |
940 | +-------+ | CPU 2 | | 940 | +-------+ | CPU 2 | |
941 | : :DIVIDE | | | 941 | : :DIVIDE | | |
942 | +-------+ | | | 942 | +-------+ | | |
943 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | | 943 | The CPU being busy doing a ---> --->| A->0 |~~~~ | | |
944 | division speculates on the +-------+ ~ | | | 944 | division speculates on the +-------+ ~ | | |
945 | LOAD of A : : ~ | | | 945 | LOAD of A : : ~ | | |
946 | : :DIVIDE | | | 946 | : :DIVIDE | | |
947 | : : ~ | | | 947 | : : ~ | | |
948 | : : ~ | | | 948 | : : ~ | | |
949 | rrrrrrrrrrrrrrrrr | | | 949 | rrrrrrrrrrrrrrrrr | | |
950 | +-------+ | | | 950 | +-------+ | | |
951 | The speculation is discarded ---> --->| A->1 |------>| | | 951 | The speculation is discarded ---> --->| A->1 |------>| | |
952 | and an updated value is +-------+ | | | 952 | and an updated value is +-------+ | | |
953 | retrieved : : +-------+ | 953 | retrieved : : +-------+ |
954 | 954 | ||
955 | 955 | ||
956 | ======================== | 956 | ======================== |
957 | EXPLICIT KERNEL BARRIERS | 957 | EXPLICIT KERNEL BARRIERS |
958 | ======================== | 958 | ======================== |
959 | 959 | ||
960 | The Linux kernel has a variety of different barriers that act at different | 960 | The Linux kernel has a variety of different barriers that act at different |
961 | levels: | 961 | levels: |
962 | 962 | ||
963 | (*) Compiler barrier. | 963 | (*) Compiler barrier. |
964 | 964 | ||
965 | (*) CPU memory barriers. | 965 | (*) CPU memory barriers. |
966 | 966 | ||
967 | (*) MMIO write barrier. | 967 | (*) MMIO write barrier. |
968 | 968 | ||
969 | 969 | ||
970 | COMPILER BARRIER | 970 | COMPILER BARRIER |
971 | ---------------- | 971 | ---------------- |
972 | 972 | ||
973 | The Linux kernel has an explicit compiler barrier function that prevents the | 973 | The Linux kernel has an explicit compiler barrier function that prevents the |
974 | compiler from moving the memory accesses either side of it to the other side: | 974 | compiler from moving the memory accesses either side of it to the other side: |
975 | 975 | ||
976 | barrier(); | 976 | barrier(); |
977 | 977 | ||
978 | This a general barrier - lesser varieties of compiler barrier do not exist. | 978 | This a general barrier - lesser varieties of compiler barrier do not exist. |
979 | 979 | ||
980 | The compiler barrier has no direct effect on the CPU, which may then reorder | 980 | The compiler barrier has no direct effect on the CPU, which may then reorder |
981 | things however it wishes. | 981 | things however it wishes. |
982 | 982 | ||
983 | 983 | ||
984 | CPU MEMORY BARRIERS | 984 | CPU MEMORY BARRIERS |
985 | ------------------- | 985 | ------------------- |
986 | 986 | ||
987 | The Linux kernel has eight basic CPU memory barriers: | 987 | The Linux kernel has eight basic CPU memory barriers: |
988 | 988 | ||
989 | TYPE MANDATORY SMP CONDITIONAL | 989 | TYPE MANDATORY SMP CONDITIONAL |
990 | =============== ======================= =========================== | 990 | =============== ======================= =========================== |
991 | GENERAL mb() smp_mb() | 991 | GENERAL mb() smp_mb() |
992 | WRITE wmb() smp_wmb() | 992 | WRITE wmb() smp_wmb() |
993 | READ rmb() smp_rmb() | 993 | READ rmb() smp_rmb() |
994 | DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends() | 994 | DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends() |
995 | 995 | ||
996 | 996 | ||
997 | All CPU memory barriers unconditionally imply compiler barriers. | 997 | All CPU memory barriers unconditionally imply compiler barriers. |
998 | 998 | ||
999 | SMP memory barriers are reduced to compiler barriers on uniprocessor compiled | 999 | SMP memory barriers are reduced to compiler barriers on uniprocessor compiled |
1000 | systems because it is assumed that a CPU will be appear to be self-consistent, | 1000 | systems because it is assumed that a CPU will be appear to be self-consistent, |
1001 | and will order overlapping accesses correctly with respect to itself. | 1001 | and will order overlapping accesses correctly with respect to itself. |
1002 | 1002 | ||
1003 | [!] Note that SMP memory barriers _must_ be used to control the ordering of | 1003 | [!] Note that SMP memory barriers _must_ be used to control the ordering of |
1004 | references to shared memory on SMP systems, though the use of locking instead | 1004 | references to shared memory on SMP systems, though the use of locking instead |
1005 | is sufficient. | 1005 | is sufficient. |
1006 | 1006 | ||
1007 | Mandatory barriers should not be used to control SMP effects, since mandatory | 1007 | Mandatory barriers should not be used to control SMP effects, since mandatory |
1008 | barriers unnecessarily impose overhead on UP systems. They may, however, be | 1008 | barriers unnecessarily impose overhead on UP systems. They may, however, be |
1009 | used to control MMIO effects on accesses through relaxed memory I/O windows. | 1009 | used to control MMIO effects on accesses through relaxed memory I/O windows. |
1010 | These are required even on non-SMP systems as they affect the order in which | 1010 | These are required even on non-SMP systems as they affect the order in which |
1011 | memory operations appear to a device by prohibiting both the compiler and the | 1011 | memory operations appear to a device by prohibiting both the compiler and the |
1012 | CPU from reordering them. | 1012 | CPU from reordering them. |
1013 | 1013 | ||
1014 | 1014 | ||
1015 | There are some more advanced barrier functions: | 1015 | There are some more advanced barrier functions: |
1016 | 1016 | ||
1017 | (*) set_mb(var, value) | 1017 | (*) set_mb(var, value) |
1018 | 1018 | ||
1019 | This assigns the value to the variable and then inserts at least a write | 1019 | This assigns the value to the variable and then inserts at least a write |
1020 | barrier after it, depending on the function. It isn't guaranteed to | 1020 | barrier after it, depending on the function. It isn't guaranteed to |
1021 | insert anything more than a compiler barrier in a UP compilation. | 1021 | insert anything more than a compiler barrier in a UP compilation. |
1022 | 1022 | ||
1023 | 1023 | ||
1024 | (*) smp_mb__before_atomic_dec(); | 1024 | (*) smp_mb__before_atomic_dec(); |
1025 | (*) smp_mb__after_atomic_dec(); | 1025 | (*) smp_mb__after_atomic_dec(); |
1026 | (*) smp_mb__before_atomic_inc(); | 1026 | (*) smp_mb__before_atomic_inc(); |
1027 | (*) smp_mb__after_atomic_inc(); | 1027 | (*) smp_mb__after_atomic_inc(); |
1028 | 1028 | ||
1029 | These are for use with atomic add, subtract, increment and decrement | 1029 | These are for use with atomic add, subtract, increment and decrement |
1030 | functions that don't return a value, especially when used for reference | 1030 | functions that don't return a value, especially when used for reference |
1031 | counting. These functions do not imply memory barriers. | 1031 | counting. These functions do not imply memory barriers. |
1032 | 1032 | ||
1033 | As an example, consider a piece of code that marks an object as being dead | 1033 | As an example, consider a piece of code that marks an object as being dead |
1034 | and then decrements the object's reference count: | 1034 | and then decrements the object's reference count: |
1035 | 1035 | ||
1036 | obj->dead = 1; | 1036 | obj->dead = 1; |
1037 | smp_mb__before_atomic_dec(); | 1037 | smp_mb__before_atomic_dec(); |
1038 | atomic_dec(&obj->ref_count); | 1038 | atomic_dec(&obj->ref_count); |
1039 | 1039 | ||
1040 | This makes sure that the death mark on the object is perceived to be set | 1040 | This makes sure that the death mark on the object is perceived to be set |
1041 | *before* the reference counter is decremented. | 1041 | *before* the reference counter is decremented. |
1042 | 1042 | ||
1043 | See Documentation/atomic_ops.txt for more information. See the "Atomic | 1043 | See Documentation/atomic_ops.txt for more information. See the "Atomic |
1044 | operations" subsection for information on where to use these. | 1044 | operations" subsection for information on where to use these. |
1045 | 1045 | ||
1046 | 1046 | ||
1047 | (*) smp_mb__before_clear_bit(void); | 1047 | (*) smp_mb__before_clear_bit(void); |
1048 | (*) smp_mb__after_clear_bit(void); | 1048 | (*) smp_mb__after_clear_bit(void); |
1049 | 1049 | ||
1050 | These are for use similar to the atomic inc/dec barriers. These are | 1050 | These are for use similar to the atomic inc/dec barriers. These are |
1051 | typically used for bitwise unlocking operations, so care must be taken as | 1051 | typically used for bitwise unlocking operations, so care must be taken as |
1052 | there are no implicit memory barriers here either. | 1052 | there are no implicit memory barriers here either. |
1053 | 1053 | ||
1054 | Consider implementing an unlock operation of some nature by clearing a | 1054 | Consider implementing an unlock operation of some nature by clearing a |
1055 | locking bit. The clear_bit() would then need to be barriered like this: | 1055 | locking bit. The clear_bit() would then need to be barriered like this: |
1056 | 1056 | ||
1057 | smp_mb__before_clear_bit(); | 1057 | smp_mb__before_clear_bit(); |
1058 | clear_bit( ... ); | 1058 | clear_bit( ... ); |
1059 | 1059 | ||
1060 | This prevents memory operations before the clear leaking to after it. See | 1060 | This prevents memory operations before the clear leaking to after it. See |
1061 | the subsection on "Locking Functions" with reference to UNLOCK operation | 1061 | the subsection on "Locking Functions" with reference to UNLOCK operation |
1062 | implications. | 1062 | implications. |
1063 | 1063 | ||
1064 | See Documentation/atomic_ops.txt for more information. See the "Atomic | 1064 | See Documentation/atomic_ops.txt for more information. See the "Atomic |
1065 | operations" subsection for information on where to use these. | 1065 | operations" subsection for information on where to use these. |
1066 | 1066 | ||
1067 | 1067 | ||
1068 | MMIO WRITE BARRIER | 1068 | MMIO WRITE BARRIER |
1069 | ------------------ | 1069 | ------------------ |
1070 | 1070 | ||
1071 | The Linux kernel also has a special barrier for use with memory-mapped I/O | 1071 | The Linux kernel also has a special barrier for use with memory-mapped I/O |
1072 | writes: | 1072 | writes: |
1073 | 1073 | ||
1074 | mmiowb(); | 1074 | mmiowb(); |
1075 | 1075 | ||
1076 | This is a variation on the mandatory write barrier that causes writes to weakly | 1076 | This is a variation on the mandatory write barrier that causes writes to weakly |
1077 | ordered I/O regions to be partially ordered. Its effects may go beyond the | 1077 | ordered I/O regions to be partially ordered. Its effects may go beyond the |
1078 | CPU->Hardware interface and actually affect the hardware at some level. | 1078 | CPU->Hardware interface and actually affect the hardware at some level. |
1079 | 1079 | ||
1080 | See the subsection "Locks vs I/O accesses" for more information. | 1080 | See the subsection "Locks vs I/O accesses" for more information. |
1081 | 1081 | ||
1082 | 1082 | ||
1083 | =============================== | 1083 | =============================== |
1084 | IMPLICIT KERNEL MEMORY BARRIERS | 1084 | IMPLICIT KERNEL MEMORY BARRIERS |
1085 | =============================== | 1085 | =============================== |
1086 | 1086 | ||
1087 | Some of the other functions in the linux kernel imply memory barriers, amongst | 1087 | Some of the other functions in the linux kernel imply memory barriers, amongst |
1088 | which are locking and scheduling functions. | 1088 | which are locking and scheduling functions. |
1089 | 1089 | ||
1090 | This specification is a _minimum_ guarantee; any particular architecture may | 1090 | This specification is a _minimum_ guarantee; any particular architecture may |
1091 | provide more substantial guarantees, but these may not be relied upon outside | 1091 | provide more substantial guarantees, but these may not be relied upon outside |
1092 | of arch specific code. | 1092 | of arch specific code. |
1093 | 1093 | ||
1094 | 1094 | ||
1095 | LOCKING FUNCTIONS | 1095 | LOCKING FUNCTIONS |
1096 | ----------------- | 1096 | ----------------- |
1097 | 1097 | ||
1098 | The Linux kernel has a number of locking constructs: | 1098 | The Linux kernel has a number of locking constructs: |
1099 | 1099 | ||
1100 | (*) spin locks | 1100 | (*) spin locks |
1101 | (*) R/W spin locks | 1101 | (*) R/W spin locks |
1102 | (*) mutexes | 1102 | (*) mutexes |
1103 | (*) semaphores | 1103 | (*) semaphores |
1104 | (*) R/W semaphores | 1104 | (*) R/W semaphores |
1105 | (*) RCU | 1105 | (*) RCU |
1106 | 1106 | ||
1107 | In all cases there are variants on "LOCK" operations and "UNLOCK" operations | 1107 | In all cases there are variants on "LOCK" operations and "UNLOCK" operations |
1108 | for each construct. These operations all imply certain barriers: | 1108 | for each construct. These operations all imply certain barriers: |
1109 | 1109 | ||
1110 | (1) LOCK operation implication: | 1110 | (1) LOCK operation implication: |
1111 | 1111 | ||
1112 | Memory operations issued after the LOCK will be completed after the LOCK | 1112 | Memory operations issued after the LOCK will be completed after the LOCK |
1113 | operation has completed. | 1113 | operation has completed. |
1114 | 1114 | ||
1115 | Memory operations issued before the LOCK may be completed after the LOCK | 1115 | Memory operations issued before the LOCK may be completed after the LOCK |
1116 | operation has completed. | 1116 | operation has completed. |
1117 | 1117 | ||
1118 | (2) UNLOCK operation implication: | 1118 | (2) UNLOCK operation implication: |
1119 | 1119 | ||
1120 | Memory operations issued before the UNLOCK will be completed before the | 1120 | Memory operations issued before the UNLOCK will be completed before the |
1121 | UNLOCK operation has completed. | 1121 | UNLOCK operation has completed. |
1122 | 1122 | ||
1123 | Memory operations issued after the UNLOCK may be completed before the | 1123 | Memory operations issued after the UNLOCK may be completed before the |
1124 | UNLOCK operation has completed. | 1124 | UNLOCK operation has completed. |
1125 | 1125 | ||
1126 | (3) LOCK vs LOCK implication: | 1126 | (3) LOCK vs LOCK implication: |
1127 | 1127 | ||
1128 | All LOCK operations issued before another LOCK operation will be completed | 1128 | All LOCK operations issued before another LOCK operation will be completed |
1129 | before that LOCK operation. | 1129 | before that LOCK operation. |
1130 | 1130 | ||
1131 | (4) LOCK vs UNLOCK implication: | 1131 | (4) LOCK vs UNLOCK implication: |
1132 | 1132 | ||
1133 | All LOCK operations issued before an UNLOCK operation will be completed | 1133 | All LOCK operations issued before an UNLOCK operation will be completed |
1134 | before the UNLOCK operation. | 1134 | before the UNLOCK operation. |
1135 | 1135 | ||
1136 | All UNLOCK operations issued before a LOCK operation will be completed | 1136 | All UNLOCK operations issued before a LOCK operation will be completed |
1137 | before the LOCK operation. | 1137 | before the LOCK operation. |
1138 | 1138 | ||
1139 | (5) Failed conditional LOCK implication: | 1139 | (5) Failed conditional LOCK implication: |
1140 | 1140 | ||
1141 | Certain variants of the LOCK operation may fail, either due to being | 1141 | Certain variants of the LOCK operation may fail, either due to being |
1142 | unable to get the lock immediately, or due to receiving an unblocked | 1142 | unable to get the lock immediately, or due to receiving an unblocked |
1143 | signal whilst asleep waiting for the lock to become available. Failed | 1143 | signal whilst asleep waiting for the lock to become available. Failed |
1144 | locks do not imply any sort of barrier. | 1144 | locks do not imply any sort of barrier. |
1145 | 1145 | ||
1146 | Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is | 1146 | Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is |
1147 | equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. | 1147 | equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. |
1148 | 1148 | ||
1149 | [!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way | 1149 | [!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way |
1150 | barriers is that the effects instructions outside of a critical section may | 1150 | barriers is that the effects instructions outside of a critical section may |
1151 | seep into the inside of the critical section. | 1151 | seep into the inside of the critical section. |
1152 | 1152 | ||
1153 | A LOCK followed by an UNLOCK may not be assumed to be full memory barrier | 1153 | A LOCK followed by an UNLOCK may not be assumed to be full memory barrier |
1154 | because it is possible for an access preceding the LOCK to happen after the | 1154 | because it is possible for an access preceding the LOCK to happen after the |
1155 | LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the | 1155 | LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the |
1156 | two accesses can themselves then cross: | 1156 | two accesses can themselves then cross: |
1157 | 1157 | ||
1158 | *A = a; | 1158 | *A = a; |
1159 | LOCK | 1159 | LOCK |
1160 | UNLOCK | 1160 | UNLOCK |
1161 | *B = b; | 1161 | *B = b; |
1162 | 1162 | ||
1163 | may occur as: | 1163 | may occur as: |
1164 | 1164 | ||
1165 | LOCK, STORE *B, STORE *A, UNLOCK | 1165 | LOCK, STORE *B, STORE *A, UNLOCK |
1166 | 1166 | ||
1167 | Locks and semaphores may not provide any guarantee of ordering on UP compiled | 1167 | Locks and semaphores may not provide any guarantee of ordering on UP compiled |
1168 | systems, and so cannot be counted on in such a situation to actually achieve | 1168 | systems, and so cannot be counted on in such a situation to actually achieve |
1169 | anything at all - especially with respect to I/O accesses - unless combined | 1169 | anything at all - especially with respect to I/O accesses - unless combined |
1170 | with interrupt disabling operations. | 1170 | with interrupt disabling operations. |
1171 | 1171 | ||
1172 | See also the section on "Inter-CPU locking barrier effects". | 1172 | See also the section on "Inter-CPU locking barrier effects". |
1173 | 1173 | ||
1174 | 1174 | ||
1175 | As an example, consider the following: | 1175 | As an example, consider the following: |
1176 | 1176 | ||
1177 | *A = a; | 1177 | *A = a; |
1178 | *B = b; | 1178 | *B = b; |
1179 | LOCK | 1179 | LOCK |
1180 | *C = c; | 1180 | *C = c; |
1181 | *D = d; | 1181 | *D = d; |
1182 | UNLOCK | 1182 | UNLOCK |
1183 | *E = e; | 1183 | *E = e; |
1184 | *F = f; | 1184 | *F = f; |
1185 | 1185 | ||
1186 | The following sequence of events is acceptable: | 1186 | The following sequence of events is acceptable: |
1187 | 1187 | ||
1188 | LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK | 1188 | LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK |
1189 | 1189 | ||
1190 | [+] Note that {*F,*A} indicates a combined access. | 1190 | [+] Note that {*F,*A} indicates a combined access. |
1191 | 1191 | ||
1192 | But none of the following are: | 1192 | But none of the following are: |
1193 | 1193 | ||
1194 | {*F,*A}, *B, LOCK, *C, *D, UNLOCK, *E | 1194 | {*F,*A}, *B, LOCK, *C, *D, UNLOCK, *E |
1195 | *A, *B, *C, LOCK, *D, UNLOCK, *E, *F | 1195 | *A, *B, *C, LOCK, *D, UNLOCK, *E, *F |
1196 | *A, *B, LOCK, *C, UNLOCK, *D, *E, *F | 1196 | *A, *B, LOCK, *C, UNLOCK, *D, *E, *F |
1197 | *B, LOCK, *C, *D, UNLOCK, {*F,*A}, *E | 1197 | *B, LOCK, *C, *D, UNLOCK, {*F,*A}, *E |
1198 | 1198 | ||
1199 | 1199 | ||
1200 | 1200 | ||
1201 | INTERRUPT DISABLING FUNCTIONS | 1201 | INTERRUPT DISABLING FUNCTIONS |
1202 | ----------------------------- | 1202 | ----------------------------- |
1203 | 1203 | ||
1204 | Functions that disable interrupts (LOCK equivalent) and enable interrupts | 1204 | Functions that disable interrupts (LOCK equivalent) and enable interrupts |
1205 | (UNLOCK equivalent) will act as compiler barriers only. So if memory or I/O | 1205 | (UNLOCK equivalent) will act as compiler barriers only. So if memory or I/O |
1206 | barriers are required in such a situation, they must be provided from some | 1206 | barriers are required in such a situation, they must be provided from some |
1207 | other means. | 1207 | other means. |
1208 | 1208 | ||
1209 | 1209 | ||
1210 | MISCELLANEOUS FUNCTIONS | 1210 | MISCELLANEOUS FUNCTIONS |
1211 | ----------------------- | 1211 | ----------------------- |
1212 | 1212 | ||
1213 | Other functions that imply barriers: | 1213 | Other functions that imply barriers: |
1214 | 1214 | ||
1215 | (*) schedule() and similar imply full memory barriers. | 1215 | (*) schedule() and similar imply full memory barriers. |
1216 | 1216 | ||
1217 | 1217 | ||
1218 | ================================= | 1218 | ================================= |
1219 | INTER-CPU LOCKING BARRIER EFFECTS | 1219 | INTER-CPU LOCKING BARRIER EFFECTS |
1220 | ================================= | 1220 | ================================= |
1221 | 1221 | ||
1222 | On SMP systems locking primitives give a more substantial form of barrier: one | 1222 | On SMP systems locking primitives give a more substantial form of barrier: one |
1223 | that does affect memory access ordering on other CPUs, within the context of | 1223 | that does affect memory access ordering on other CPUs, within the context of |
1224 | conflict on any particular lock. | 1224 | conflict on any particular lock. |
1225 | 1225 | ||
1226 | 1226 | ||
1227 | LOCKS VS MEMORY ACCESSES | 1227 | LOCKS VS MEMORY ACCESSES |
1228 | ------------------------ | 1228 | ------------------------ |
1229 | 1229 | ||
1230 | Consider the following: the system has a pair of spinlocks (M) and (Q), and | 1230 | Consider the following: the system has a pair of spinlocks (M) and (Q), and |
1231 | three CPUs; then should the following sequence of events occur: | 1231 | three CPUs; then should the following sequence of events occur: |
1232 | 1232 | ||
1233 | CPU 1 CPU 2 | 1233 | CPU 1 CPU 2 |
1234 | =============================== =============================== | 1234 | =============================== =============================== |
1235 | *A = a; *E = e; | 1235 | *A = a; *E = e; |
1236 | LOCK M LOCK Q | 1236 | LOCK M LOCK Q |
1237 | *B = b; *F = f; | 1237 | *B = b; *F = f; |
1238 | *C = c; *G = g; | 1238 | *C = c; *G = g; |
1239 | UNLOCK M UNLOCK Q | 1239 | UNLOCK M UNLOCK Q |
1240 | *D = d; *H = h; | 1240 | *D = d; *H = h; |
1241 | 1241 | ||
1242 | Then there is no guarantee as to what order CPU #3 will see the accesses to *A | 1242 | Then there is no guarantee as to what order CPU #3 will see the accesses to *A |
1243 | through *H occur in, other than the constraints imposed by the separate locks | 1243 | through *H occur in, other than the constraints imposed by the separate locks |
1244 | on the separate CPUs. It might, for example, see: | 1244 | on the separate CPUs. It might, for example, see: |
1245 | 1245 | ||
1246 | *E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M | 1246 | *E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M |
1247 | 1247 | ||
1248 | But it won't see any of: | 1248 | But it won't see any of: |
1249 | 1249 | ||
1250 | *B, *C or *D preceding LOCK M | 1250 | *B, *C or *D preceding LOCK M |
1251 | *A, *B or *C following UNLOCK M | 1251 | *A, *B or *C following UNLOCK M |
1252 | *F, *G or *H preceding LOCK Q | 1252 | *F, *G or *H preceding LOCK Q |
1253 | *E, *F or *G following UNLOCK Q | 1253 | *E, *F or *G following UNLOCK Q |
1254 | 1254 | ||
1255 | 1255 | ||
1256 | However, if the following occurs: | 1256 | However, if the following occurs: |
1257 | 1257 | ||
1258 | CPU 1 CPU 2 | 1258 | CPU 1 CPU 2 |
1259 | =============================== =============================== | 1259 | =============================== =============================== |
1260 | *A = a; | 1260 | *A = a; |
1261 | LOCK M [1] | 1261 | LOCK M [1] |
1262 | *B = b; | 1262 | *B = b; |
1263 | *C = c; | 1263 | *C = c; |
1264 | UNLOCK M [1] | 1264 | UNLOCK M [1] |
1265 | *D = d; *E = e; | 1265 | *D = d; *E = e; |
1266 | LOCK M [2] | 1266 | LOCK M [2] |
1267 | *F = f; | 1267 | *F = f; |
1268 | *G = g; | 1268 | *G = g; |
1269 | UNLOCK M [2] | 1269 | UNLOCK M [2] |
1270 | *H = h; | 1270 | *H = h; |
1271 | 1271 | ||
1272 | CPU #3 might see: | 1272 | CPU #3 might see: |
1273 | 1273 | ||
1274 | *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], | 1274 | *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], |
1275 | LOCK M [2], *H, *F, *G, UNLOCK M [2], *D | 1275 | LOCK M [2], *H, *F, *G, UNLOCK M [2], *D |
1276 | 1276 | ||
1277 | But assuming CPU #1 gets the lock first, it won't see any of: | 1277 | But assuming CPU #1 gets the lock first, it won't see any of: |
1278 | 1278 | ||
1279 | *B, *C, *D, *F, *G or *H preceding LOCK M [1] | 1279 | *B, *C, *D, *F, *G or *H preceding LOCK M [1] |
1280 | *A, *B or *C following UNLOCK M [1] | 1280 | *A, *B or *C following UNLOCK M [1] |
1281 | *F, *G or *H preceding LOCK M [2] | 1281 | *F, *G or *H preceding LOCK M [2] |
1282 | *A, *B, *C, *E, *F or *G following UNLOCK M [2] | 1282 | *A, *B, *C, *E, *F or *G following UNLOCK M [2] |
1283 | 1283 | ||
1284 | 1284 | ||
1285 | LOCKS VS I/O ACCESSES | 1285 | LOCKS VS I/O ACCESSES |
1286 | --------------------- | 1286 | --------------------- |
1287 | 1287 | ||
1288 | Under certain circumstances (especially involving NUMA), I/O accesses within | 1288 | Under certain circumstances (especially involving NUMA), I/O accesses within |
1289 | two spinlocked sections on two different CPUs may be seen as interleaved by the | 1289 | two spinlocked sections on two different CPUs may be seen as interleaved by the |
1290 | PCI bridge, because the PCI bridge does not necessarily participate in the | 1290 | PCI bridge, because the PCI bridge does not necessarily participate in the |
1291 | cache-coherence protocol, and is therefore incapable of issuing the required | 1291 | cache-coherence protocol, and is therefore incapable of issuing the required |
1292 | read memory barriers. | 1292 | read memory barriers. |
1293 | 1293 | ||
1294 | For example: | 1294 | For example: |
1295 | 1295 | ||
1296 | CPU 1 CPU 2 | 1296 | CPU 1 CPU 2 |
1297 | =============================== =============================== | 1297 | =============================== =============================== |
1298 | spin_lock(Q) | 1298 | spin_lock(Q) |
1299 | writel(0, ADDR) | 1299 | writel(0, ADDR) |
1300 | writel(1, DATA); | 1300 | writel(1, DATA); |
1301 | spin_unlock(Q); | 1301 | spin_unlock(Q); |
1302 | spin_lock(Q); | 1302 | spin_lock(Q); |
1303 | writel(4, ADDR); | 1303 | writel(4, ADDR); |
1304 | writel(5, DATA); | 1304 | writel(5, DATA); |
1305 | spin_unlock(Q); | 1305 | spin_unlock(Q); |
1306 | 1306 | ||
1307 | may be seen by the PCI bridge as follows: | 1307 | may be seen by the PCI bridge as follows: |
1308 | 1308 | ||
1309 | STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5 | 1309 | STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5 |
1310 | 1310 | ||
1311 | which would probably cause the hardware to malfunction. | 1311 | which would probably cause the hardware to malfunction. |
1312 | 1312 | ||
1313 | 1313 | ||
1314 | What is necessary here is to intervene with an mmiowb() before dropping the | 1314 | What is necessary here is to intervene with an mmiowb() before dropping the |
1315 | spinlock, for example: | 1315 | spinlock, for example: |
1316 | 1316 | ||
1317 | CPU 1 CPU 2 | 1317 | CPU 1 CPU 2 |
1318 | =============================== =============================== | 1318 | =============================== =============================== |
1319 | spin_lock(Q) | 1319 | spin_lock(Q) |
1320 | writel(0, ADDR) | 1320 | writel(0, ADDR) |
1321 | writel(1, DATA); | 1321 | writel(1, DATA); |
1322 | mmiowb(); | 1322 | mmiowb(); |
1323 | spin_unlock(Q); | 1323 | spin_unlock(Q); |
1324 | spin_lock(Q); | 1324 | spin_lock(Q); |
1325 | writel(4, ADDR); | 1325 | writel(4, ADDR); |
1326 | writel(5, DATA); | 1326 | writel(5, DATA); |
1327 | mmiowb(); | 1327 | mmiowb(); |
1328 | spin_unlock(Q); | 1328 | spin_unlock(Q); |
1329 | 1329 | ||
1330 | this will ensure that the two stores issued on CPU #1 appear at the PCI bridge | 1330 | this will ensure that the two stores issued on CPU #1 appear at the PCI bridge |
1331 | before either of the stores issued on CPU #2. | 1331 | before either of the stores issued on CPU #2. |
1332 | 1332 | ||
1333 | 1333 | ||
1334 | Furthermore, following a store by a load to the same device obviates the need | 1334 | Furthermore, following a store by a load to the same device obviates the need |
1335 | for an mmiowb(), because the load forces the store to complete before the load | 1335 | for an mmiowb(), because the load forces the store to complete before the load |
1336 | is performed: | 1336 | is performed: |
1337 | 1337 | ||
1338 | CPU 1 CPU 2 | 1338 | CPU 1 CPU 2 |
1339 | =============================== =============================== | 1339 | =============================== =============================== |
1340 | spin_lock(Q) | 1340 | spin_lock(Q) |
1341 | writel(0, ADDR) | 1341 | writel(0, ADDR) |
1342 | a = readl(DATA); | 1342 | a = readl(DATA); |
1343 | spin_unlock(Q); | 1343 | spin_unlock(Q); |
1344 | spin_lock(Q); | 1344 | spin_lock(Q); |
1345 | writel(4, ADDR); | 1345 | writel(4, ADDR); |
1346 | b = readl(DATA); | 1346 | b = readl(DATA); |
1347 | spin_unlock(Q); | 1347 | spin_unlock(Q); |
1348 | 1348 | ||
1349 | 1349 | ||
1350 | See Documentation/DocBook/deviceiobook.tmpl for more information. | 1350 | See Documentation/DocBook/deviceiobook.tmpl for more information. |
1351 | 1351 | ||
1352 | 1352 | ||
1353 | ================================= | 1353 | ================================= |
1354 | WHERE ARE MEMORY BARRIERS NEEDED? | 1354 | WHERE ARE MEMORY BARRIERS NEEDED? |
1355 | ================================= | 1355 | ================================= |
1356 | 1356 | ||
1357 | Under normal operation, memory operation reordering is generally not going to | 1357 | Under normal operation, memory operation reordering is generally not going to |
1358 | be a problem as a single-threaded linear piece of code will still appear to | 1358 | be a problem as a single-threaded linear piece of code will still appear to |
1359 | work correctly, even if it's in an SMP kernel. There are, however, three | 1359 | work correctly, even if it's in an SMP kernel. There are, however, three |
1360 | circumstances in which reordering definitely _could_ be a problem: | 1360 | circumstances in which reordering definitely _could_ be a problem: |
1361 | 1361 | ||
1362 | (*) Interprocessor interaction. | 1362 | (*) Interprocessor interaction. |
1363 | 1363 | ||
1364 | (*) Atomic operations. | 1364 | (*) Atomic operations. |
1365 | 1365 | ||
1366 | (*) Accessing devices (I/O). | 1366 | (*) Accessing devices (I/O). |
1367 | 1367 | ||
1368 | (*) Interrupts. | 1368 | (*) Interrupts. |
1369 | 1369 | ||
1370 | 1370 | ||
1371 | INTERPROCESSOR INTERACTION | 1371 | INTERPROCESSOR INTERACTION |
1372 | -------------------------- | 1372 | -------------------------- |
1373 | 1373 | ||
1374 | When there's a system with more than one processor, more than one CPU in the | 1374 | When there's a system with more than one processor, more than one CPU in the |
1375 | system may be working on the same data set at the same time. This can cause | 1375 | system may be working on the same data set at the same time. This can cause |
1376 | synchronisation problems, and the usual way of dealing with them is to use | 1376 | synchronisation problems, and the usual way of dealing with them is to use |
1377 | locks. Locks, however, are quite expensive, and so it may be preferable to | 1377 | locks. Locks, however, are quite expensive, and so it may be preferable to |
1378 | operate without the use of a lock if at all possible. In such a case | 1378 | operate without the use of a lock if at all possible. In such a case |
1379 | operations that affect both CPUs may have to be carefully ordered to prevent | 1379 | operations that affect both CPUs may have to be carefully ordered to prevent |
1380 | a malfunction. | 1380 | a malfunction. |
1381 | 1381 | ||
1382 | Consider, for example, the R/W semaphore slow path. Here a waiting process is | 1382 | Consider, for example, the R/W semaphore slow path. Here a waiting process is |
1383 | queued on the semaphore, by virtue of it having a piece of its stack linked to | 1383 | queued on the semaphore, by virtue of it having a piece of its stack linked to |
1384 | the semaphore's list of waiting processes: | 1384 | the semaphore's list of waiting processes: |
1385 | 1385 | ||
1386 | struct rw_semaphore { | 1386 | struct rw_semaphore { |
1387 | ... | 1387 | ... |
1388 | spinlock_t lock; | 1388 | spinlock_t lock; |
1389 | struct list_head waiters; | 1389 | struct list_head waiters; |
1390 | }; | 1390 | }; |
1391 | 1391 | ||
1392 | struct rwsem_waiter { | 1392 | struct rwsem_waiter { |
1393 | struct list_head list; | 1393 | struct list_head list; |
1394 | struct task_struct *task; | 1394 | struct task_struct *task; |
1395 | }; | 1395 | }; |
1396 | 1396 | ||
1397 | To wake up a particular waiter, the up_read() or up_write() functions have to: | 1397 | To wake up a particular waiter, the up_read() or up_write() functions have to: |
1398 | 1398 | ||
1399 | (1) read the next pointer from this waiter's record to know as to where the | 1399 | (1) read the next pointer from this waiter's record to know as to where the |
1400 | next waiter record is; | 1400 | next waiter record is; |
1401 | 1401 | ||
1402 | (4) read the pointer to the waiter's task structure; | 1402 | (4) read the pointer to the waiter's task structure; |
1403 | 1403 | ||
1404 | (3) clear the task pointer to tell the waiter it has been given the semaphore; | 1404 | (3) clear the task pointer to tell the waiter it has been given the semaphore; |
1405 | 1405 | ||
1406 | (4) call wake_up_process() on the task; and | 1406 | (4) call wake_up_process() on the task; and |
1407 | 1407 | ||
1408 | (5) release the reference held on the waiter's task struct. | 1408 | (5) release the reference held on the waiter's task struct. |
1409 | 1409 | ||
1410 | In otherwords, it has to perform this sequence of events: | 1410 | In otherwords, it has to perform this sequence of events: |
1411 | 1411 | ||
1412 | LOAD waiter->list.next; | 1412 | LOAD waiter->list.next; |
1413 | LOAD waiter->task; | 1413 | LOAD waiter->task; |
1414 | STORE waiter->task; | 1414 | STORE waiter->task; |
1415 | CALL wakeup | 1415 | CALL wakeup |
1416 | RELEASE task | 1416 | RELEASE task |
1417 | 1417 | ||
1418 | and if any of these steps occur out of order, then the whole thing may | 1418 | and if any of these steps occur out of order, then the whole thing may |
1419 | malfunction. | 1419 | malfunction. |
1420 | 1420 | ||
1421 | Once it has queued itself and dropped the semaphore lock, the waiter does not | 1421 | Once it has queued itself and dropped the semaphore lock, the waiter does not |
1422 | get the lock again; it instead just waits for its task pointer to be cleared | 1422 | get the lock again; it instead just waits for its task pointer to be cleared |
1423 | before proceeding. Since the record is on the waiter's stack, this means that | 1423 | before proceeding. Since the record is on the waiter's stack, this means that |
1424 | if the task pointer is cleared _before_ the next pointer in the list is read, | 1424 | if the task pointer is cleared _before_ the next pointer in the list is read, |
1425 | another CPU might start processing the waiter and might clobber the waiter's | 1425 | another CPU might start processing the waiter and might clobber the waiter's |
1426 | stack before the up*() function has a chance to read the next pointer. | 1426 | stack before the up*() function has a chance to read the next pointer. |
1427 | 1427 | ||
1428 | Consider then what might happen to the above sequence of events: | 1428 | Consider then what might happen to the above sequence of events: |
1429 | 1429 | ||
1430 | CPU 1 CPU 2 | 1430 | CPU 1 CPU 2 |
1431 | =============================== =============================== | 1431 | =============================== =============================== |
1432 | down_xxx() | 1432 | down_xxx() |
1433 | Queue waiter | 1433 | Queue waiter |
1434 | Sleep | 1434 | Sleep |
1435 | up_yyy() | 1435 | up_yyy() |
1436 | LOAD waiter->task; | 1436 | LOAD waiter->task; |
1437 | STORE waiter->task; | 1437 | STORE waiter->task; |
1438 | Woken up by other event | 1438 | Woken up by other event |
1439 | <preempt> | 1439 | <preempt> |
1440 | Resume processing | 1440 | Resume processing |
1441 | down_xxx() returns | 1441 | down_xxx() returns |
1442 | call foo() | 1442 | call foo() |
1443 | foo() clobbers *waiter | 1443 | foo() clobbers *waiter |
1444 | </preempt> | 1444 | </preempt> |
1445 | LOAD waiter->list.next; | 1445 | LOAD waiter->list.next; |
1446 | --- OOPS --- | 1446 | --- OOPS --- |
1447 | 1447 | ||
1448 | This could be dealt with using the semaphore lock, but then the down_xxx() | 1448 | This could be dealt with using the semaphore lock, but then the down_xxx() |
1449 | function has to needlessly get the spinlock again after being woken up. | 1449 | function has to needlessly get the spinlock again after being woken up. |
1450 | 1450 | ||
1451 | The way to deal with this is to insert a general SMP memory barrier: | 1451 | The way to deal with this is to insert a general SMP memory barrier: |
1452 | 1452 | ||
1453 | LOAD waiter->list.next; | 1453 | LOAD waiter->list.next; |
1454 | LOAD waiter->task; | 1454 | LOAD waiter->task; |
1455 | smp_mb(); | 1455 | smp_mb(); |
1456 | STORE waiter->task; | 1456 | STORE waiter->task; |
1457 | CALL wakeup | 1457 | CALL wakeup |
1458 | RELEASE task | 1458 | RELEASE task |
1459 | 1459 | ||
1460 | In this case, the barrier makes a guarantee that all memory accesses before the | 1460 | In this case, the barrier makes a guarantee that all memory accesses before the |
1461 | barrier will appear to happen before all the memory accesses after the barrier | 1461 | barrier will appear to happen before all the memory accesses after the barrier |
1462 | with respect to the other CPUs on the system. It does _not_ guarantee that all | 1462 | with respect to the other CPUs on the system. It does _not_ guarantee that all |
1463 | the memory accesses before the barrier will be complete by the time the barrier | 1463 | the memory accesses before the barrier will be complete by the time the barrier |
1464 | instruction itself is complete. | 1464 | instruction itself is complete. |
1465 | 1465 | ||
1466 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a | 1466 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a |
1467 | compiler barrier, thus making sure the compiler emits the instructions in the | 1467 | compiler barrier, thus making sure the compiler emits the instructions in the |
1468 | right order without actually intervening in the CPU. Since there's only one | 1468 | right order without actually intervening in the CPU. Since there's only one |
1469 | CPU, that CPU's dependency ordering logic will take care of everything else. | 1469 | CPU, that CPU's dependency ordering logic will take care of everything else. |
1470 | 1470 | ||
1471 | 1471 | ||
1472 | ATOMIC OPERATIONS | 1472 | ATOMIC OPERATIONS |
1473 | ----------------- | 1473 | ----------------- |
1474 | 1474 | ||
1475 | Whilst they are technically interprocessor interaction considerations, atomic | 1475 | Whilst they are technically interprocessor interaction considerations, atomic |
1476 | operations are noted specially as some of them imply full memory barriers and | 1476 | operations are noted specially as some of them imply full memory barriers and |
1477 | some don't, but they're very heavily relied on as a group throughout the | 1477 | some don't, but they're very heavily relied on as a group throughout the |
1478 | kernel. | 1478 | kernel. |
1479 | 1479 | ||
1480 | Any atomic operation that modifies some state in memory and returns information | 1480 | Any atomic operation that modifies some state in memory and returns information |
1481 | about the state (old or new) implies an SMP-conditional general memory barrier | 1481 | about the state (old or new) implies an SMP-conditional general memory barrier |
1482 | (smp_mb()) on each side of the actual operation. These include: | 1482 | (smp_mb()) on each side of the actual operation. These include: |
1483 | 1483 | ||
1484 | xchg(); | 1484 | xchg(); |
1485 | cmpxchg(); | 1485 | cmpxchg(); |
1486 | atomic_cmpxchg(); | 1486 | atomic_cmpxchg(); |
1487 | atomic_inc_return(); | 1487 | atomic_inc_return(); |
1488 | atomic_dec_return(); | 1488 | atomic_dec_return(); |
1489 | atomic_add_return(); | 1489 | atomic_add_return(); |
1490 | atomic_sub_return(); | 1490 | atomic_sub_return(); |
1491 | atomic_inc_and_test(); | 1491 | atomic_inc_and_test(); |
1492 | atomic_dec_and_test(); | 1492 | atomic_dec_and_test(); |
1493 | atomic_sub_and_test(); | 1493 | atomic_sub_and_test(); |
1494 | atomic_add_negative(); | 1494 | atomic_add_negative(); |
1495 | atomic_add_unless(); | 1495 | atomic_add_unless(); |
1496 | test_and_set_bit(); | 1496 | test_and_set_bit(); |
1497 | test_and_clear_bit(); | 1497 | test_and_clear_bit(); |
1498 | test_and_change_bit(); | 1498 | test_and_change_bit(); |
1499 | 1499 | ||
1500 | These are used for such things as implementing LOCK-class and UNLOCK-class | 1500 | These are used for such things as implementing LOCK-class and UNLOCK-class |
1501 | operations and adjusting reference counters towards object destruction, and as | 1501 | operations and adjusting reference counters towards object destruction, and as |
1502 | such the implicit memory barrier effects are necessary. | 1502 | such the implicit memory barrier effects are necessary. |
1503 | 1503 | ||
1504 | 1504 | ||
1505 | The following operation are potential problems as they do _not_ imply memory | 1505 | The following operation are potential problems as they do _not_ imply memory |
1506 | barriers, but might be used for implementing such things as UNLOCK-class | 1506 | barriers, but might be used for implementing such things as UNLOCK-class |
1507 | operations: | 1507 | operations: |
1508 | 1508 | ||
1509 | atomic_set(); | 1509 | atomic_set(); |
1510 | set_bit(); | 1510 | set_bit(); |
1511 | clear_bit(); | 1511 | clear_bit(); |
1512 | change_bit(); | 1512 | change_bit(); |
1513 | 1513 | ||
1514 | With these the appropriate explicit memory barrier should be used if necessary | 1514 | With these the appropriate explicit memory barrier should be used if necessary |
1515 | (smp_mb__before_clear_bit() for instance). | 1515 | (smp_mb__before_clear_bit() for instance). |
1516 | 1516 | ||
1517 | 1517 | ||
1518 | The following also do _not_ imply memory barriers, and so may require explicit | 1518 | The following also do _not_ imply memory barriers, and so may require explicit |
1519 | memory barriers under some circumstances (smp_mb__before_atomic_dec() for | 1519 | memory barriers under some circumstances (smp_mb__before_atomic_dec() for |
1520 | instance)): | 1520 | instance)): |
1521 | 1521 | ||
1522 | atomic_add(); | 1522 | atomic_add(); |
1523 | atomic_sub(); | 1523 | atomic_sub(); |
1524 | atomic_inc(); | 1524 | atomic_inc(); |
1525 | atomic_dec(); | 1525 | atomic_dec(); |
1526 | 1526 | ||
1527 | If they're used for statistics generation, then they probably don't need memory | 1527 | If they're used for statistics generation, then they probably don't need memory |
1528 | barriers, unless there's a coupling between statistical data. | 1528 | barriers, unless there's a coupling between statistical data. |
1529 | 1529 | ||
1530 | If they're used for reference counting on an object to control its lifetime, | 1530 | If they're used for reference counting on an object to control its lifetime, |
1531 | they probably don't need memory barriers because either the reference count | 1531 | they probably don't need memory barriers because either the reference count |
1532 | will be adjusted inside a locked section, or the caller will already hold | 1532 | will be adjusted inside a locked section, or the caller will already hold |
1533 | sufficient references to make the lock, and thus a memory barrier unnecessary. | 1533 | sufficient references to make the lock, and thus a memory barrier unnecessary. |
1534 | 1534 | ||
1535 | If they're used for constructing a lock of some description, then they probably | 1535 | If they're used for constructing a lock of some description, then they probably |
1536 | do need memory barriers as a lock primitive generally has to do things in a | 1536 | do need memory barriers as a lock primitive generally has to do things in a |
1537 | specific order. | 1537 | specific order. |
1538 | 1538 | ||
1539 | 1539 | ||
1540 | Basically, each usage case has to be carefully considered as to whether memory | 1540 | Basically, each usage case has to be carefully considered as to whether memory |
1541 | barriers are needed or not. | 1541 | barriers are needed or not. |
1542 | 1542 | ||
1543 | [!] Note that special memory barrier primitives are available for these | 1543 | [!] Note that special memory barrier primitives are available for these |
1544 | situations because on some CPUs the atomic instructions used imply full memory | 1544 | situations because on some CPUs the atomic instructions used imply full memory |
1545 | barriers, and so barrier instructions are superfluous in conjunction with them, | 1545 | barriers, and so barrier instructions are superfluous in conjunction with them, |
1546 | and in such cases the special barrier primitives will be no-ops. | 1546 | and in such cases the special barrier primitives will be no-ops. |
1547 | 1547 | ||
1548 | See Documentation/atomic_ops.txt for more information. | 1548 | See Documentation/atomic_ops.txt for more information. |
1549 | 1549 | ||
1550 | 1550 | ||
1551 | ACCESSING DEVICES | 1551 | ACCESSING DEVICES |
1552 | ----------------- | 1552 | ----------------- |
1553 | 1553 | ||
1554 | Many devices can be memory mapped, and so appear to the CPU as if they're just | 1554 | Many devices can be memory mapped, and so appear to the CPU as if they're just |
1555 | a set of memory locations. To control such a device, the driver usually has to | 1555 | a set of memory locations. To control such a device, the driver usually has to |
1556 | make the right memory accesses in exactly the right order. | 1556 | make the right memory accesses in exactly the right order. |
1557 | 1557 | ||
1558 | However, having a clever CPU or a clever compiler creates a potential problem | 1558 | However, having a clever CPU or a clever compiler creates a potential problem |
1559 | in that the carefully sequenced accesses in the driver code won't reach the | 1559 | in that the carefully sequenced accesses in the driver code won't reach the |
1560 | device in the requisite order if the CPU or the compiler thinks it is more | 1560 | device in the requisite order if the CPU or the compiler thinks it is more |
1561 | efficient to reorder, combine or merge accesses - something that would cause | 1561 | efficient to reorder, combine or merge accesses - something that would cause |
1562 | the device to malfunction. | 1562 | the device to malfunction. |
1563 | 1563 | ||
1564 | Inside of the Linux kernel, I/O should be done through the appropriate accessor | 1564 | Inside of the Linux kernel, I/O should be done through the appropriate accessor |
1565 | routines - such as inb() or writel() - which know how to make such accesses | 1565 | routines - such as inb() or writel() - which know how to make such accesses |
1566 | appropriately sequential. Whilst this, for the most part, renders the explicit | 1566 | appropriately sequential. Whilst this, for the most part, renders the explicit |
1567 | use of memory barriers unnecessary, there are a couple of situations where they | 1567 | use of memory barriers unnecessary, there are a couple of situations where they |
1568 | might be needed: | 1568 | might be needed: |
1569 | 1569 | ||
1570 | (1) On some systems, I/O stores are not strongly ordered across all CPUs, and | 1570 | (1) On some systems, I/O stores are not strongly ordered across all CPUs, and |
1571 | so for _all_ general drivers locks should be used and mmiowb() must be | 1571 | so for _all_ general drivers locks should be used and mmiowb() must be |
1572 | issued prior to unlocking the critical section. | 1572 | issued prior to unlocking the critical section. |
1573 | 1573 | ||
1574 | (2) If the accessor functions are used to refer to an I/O memory window with | 1574 | (2) If the accessor functions are used to refer to an I/O memory window with |
1575 | relaxed memory access properties, then _mandatory_ memory barriers are | 1575 | relaxed memory access properties, then _mandatory_ memory barriers are |
1576 | required to enforce ordering. | 1576 | required to enforce ordering. |
1577 | 1577 | ||
1578 | See Documentation/DocBook/deviceiobook.tmpl for more information. | 1578 | See Documentation/DocBook/deviceiobook.tmpl for more information. |
1579 | 1579 | ||
1580 | 1580 | ||
1581 | INTERRUPTS | 1581 | INTERRUPTS |
1582 | ---------- | 1582 | ---------- |
1583 | 1583 | ||
1584 | A driver may be interrupted by its own interrupt service routine, and thus the | 1584 | A driver may be interrupted by its own interrupt service routine, and thus the |
1585 | two parts of the driver may interfere with each other's attempts to control or | 1585 | two parts of the driver may interfere with each other's attempts to control or |
1586 | access the device. | 1586 | access the device. |
1587 | 1587 | ||
1588 | This may be alleviated - at least in part - by disabling local interrupts (a | 1588 | This may be alleviated - at least in part - by disabling local interrupts (a |
1589 | form of locking), such that the critical operations are all contained within | 1589 | form of locking), such that the critical operations are all contained within |
1590 | the interrupt-disabled section in the driver. Whilst the driver's interrupt | 1590 | the interrupt-disabled section in the driver. Whilst the driver's interrupt |
1591 | routine is executing, the driver's core may not run on the same CPU, and its | 1591 | routine is executing, the driver's core may not run on the same CPU, and its |
1592 | interrupt is not permitted to happen again until the current interrupt has been | 1592 | interrupt is not permitted to happen again until the current interrupt has been |
1593 | handled, thus the interrupt handler does not need to lock against that. | 1593 | handled, thus the interrupt handler does not need to lock against that. |
1594 | 1594 | ||
1595 | However, consider a driver that was talking to an ethernet card that sports an | 1595 | However, consider a driver that was talking to an ethernet card that sports an |
1596 | address register and a data register. If that driver's core talks to the card | 1596 | address register and a data register. If that driver's core talks to the card |
1597 | under interrupt-disablement and then the driver's interrupt handler is invoked: | 1597 | under interrupt-disablement and then the driver's interrupt handler is invoked: |
1598 | 1598 | ||
1599 | LOCAL IRQ DISABLE | 1599 | LOCAL IRQ DISABLE |
1600 | writew(ADDR, 3); | 1600 | writew(ADDR, 3); |
1601 | writew(DATA, y); | 1601 | writew(DATA, y); |
1602 | LOCAL IRQ ENABLE | 1602 | LOCAL IRQ ENABLE |
1603 | <interrupt> | 1603 | <interrupt> |
1604 | writew(ADDR, 4); | 1604 | writew(ADDR, 4); |
1605 | q = readw(DATA); | 1605 | q = readw(DATA); |
1606 | </interrupt> | 1606 | </interrupt> |
1607 | 1607 | ||
1608 | The store to the data register might happen after the second store to the | 1608 | The store to the data register might happen after the second store to the |
1609 | address register if ordering rules are sufficiently relaxed: | 1609 | address register if ordering rules are sufficiently relaxed: |
1610 | 1610 | ||
1611 | STORE *ADDR = 3, STORE *ADDR = 4, STORE *DATA = y, q = LOAD *DATA | 1611 | STORE *ADDR = 3, STORE *ADDR = 4, STORE *DATA = y, q = LOAD *DATA |
1612 | 1612 | ||
1613 | 1613 | ||
1614 | If ordering rules are relaxed, it must be assumed that accesses done inside an | 1614 | If ordering rules are relaxed, it must be assumed that accesses done inside an |
1615 | interrupt disabled section may leak outside of it and may interleave with | 1615 | interrupt disabled section may leak outside of it and may interleave with |
1616 | accesses performed in an interrupt - and vice versa - unless implicit or | 1616 | accesses performed in an interrupt - and vice versa - unless implicit or |
1617 | explicit barriers are used. | 1617 | explicit barriers are used. |
1618 | 1618 | ||
1619 | Normally this won't be a problem because the I/O accesses done inside such | 1619 | Normally this won't be a problem because the I/O accesses done inside such |
1620 | sections will include synchronous load operations on strictly ordered I/O | 1620 | sections will include synchronous load operations on strictly ordered I/O |
1621 | registers that form implicit I/O barriers. If this isn't sufficient then an | 1621 | registers that form implicit I/O barriers. If this isn't sufficient then an |
1622 | mmiowb() may need to be used explicitly. | 1622 | mmiowb() may need to be used explicitly. |
1623 | 1623 | ||
1624 | 1624 | ||
1625 | A similar situation may occur between an interrupt routine and two routines | 1625 | A similar situation may occur between an interrupt routine and two routines |
1626 | running on separate CPUs that communicate with each other. If such a case is | 1626 | running on separate CPUs that communicate with each other. If such a case is |
1627 | likely, then interrupt-disabling locks should be used to guarantee ordering. | 1627 | likely, then interrupt-disabling locks should be used to guarantee ordering. |
1628 | 1628 | ||
1629 | 1629 | ||
1630 | ========================== | 1630 | ========================== |
1631 | KERNEL I/O BARRIER EFFECTS | 1631 | KERNEL I/O BARRIER EFFECTS |
1632 | ========================== | 1632 | ========================== |
1633 | 1633 | ||
1634 | When accessing I/O memory, drivers should use the appropriate accessor | 1634 | When accessing I/O memory, drivers should use the appropriate accessor |
1635 | functions: | 1635 | functions: |
1636 | 1636 | ||
1637 | (*) inX(), outX(): | 1637 | (*) inX(), outX(): |
1638 | 1638 | ||
1639 | These are intended to talk to I/O space rather than memory space, but | 1639 | These are intended to talk to I/O space rather than memory space, but |
1640 | that's primarily a CPU-specific concept. The i386 and x86_64 processors do | 1640 | that's primarily a CPU-specific concept. The i386 and x86_64 processors do |
1641 | indeed have special I/O space access cycles and instructions, but many | 1641 | indeed have special I/O space access cycles and instructions, but many |
1642 | CPUs don't have such a concept. | 1642 | CPUs don't have such a concept. |
1643 | 1643 | ||
1644 | The PCI bus, amongst others, defines an I/O space concept - which on such | 1644 | The PCI bus, amongst others, defines an I/O space concept - which on such |
1645 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O | 1645 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O |
1646 | space. However, it may also be mapped as a virtual I/O space in the CPU's | 1646 | space. However, it may also be mapped as a virtual I/O space in the CPU's |
1647 | memory map, particularly on those CPUs that don't support alternate I/O | 1647 | memory map, particularly on those CPUs that don't support alternate I/O |
1648 | spaces. | 1648 | spaces. |
1649 | 1649 | ||
1650 | Accesses to this space may be fully synchronous (as on i386), but | 1650 | Accesses to this space may be fully synchronous (as on i386), but |
1651 | intermediary bridges (such as the PCI host bridge) may not fully honour | 1651 | intermediary bridges (such as the PCI host bridge) may not fully honour |
1652 | that. | 1652 | that. |
1653 | 1653 | ||
1654 | They are guaranteed to be fully ordered with respect to each other. | 1654 | They are guaranteed to be fully ordered with respect to each other. |
1655 | 1655 | ||
1656 | They are not guaranteed to be fully ordered with respect to other types of | 1656 | They are not guaranteed to be fully ordered with respect to other types of |
1657 | memory and I/O operation. | 1657 | memory and I/O operation. |
1658 | 1658 | ||
1659 | (*) readX(), writeX(): | 1659 | (*) readX(), writeX(): |
1660 | 1660 | ||
1661 | Whether these are guaranteed to be fully ordered and uncombined with | 1661 | Whether these are guaranteed to be fully ordered and uncombined with |
1662 | respect to each other on the issuing CPU depends on the characteristics | 1662 | respect to each other on the issuing CPU depends on the characteristics |
1663 | defined for the memory window through which they're accessing. On later | 1663 | defined for the memory window through which they're accessing. On later |
1664 | i386 architecture machines, for example, this is controlled by way of the | 1664 | i386 architecture machines, for example, this is controlled by way of the |
1665 | MTRR registers. | 1665 | MTRR registers. |
1666 | 1666 | ||
1667 | Ordinarily, these will be guaranteed to be fully ordered and uncombined,, | 1667 | Ordinarily, these will be guaranteed to be fully ordered and uncombined,, |
1668 | provided they're not accessing a prefetchable device. | 1668 | provided they're not accessing a prefetchable device. |
1669 | 1669 | ||
1670 | However, intermediary hardware (such as a PCI bridge) may indulge in | 1670 | However, intermediary hardware (such as a PCI bridge) may indulge in |
1671 | deferral if it so wishes; to flush a store, a load from the same location | 1671 | deferral if it so wishes; to flush a store, a load from the same location |
1672 | is preferred[*], but a load from the same device or from configuration | 1672 | is preferred[*], but a load from the same device or from configuration |
1673 | space should suffice for PCI. | 1673 | space should suffice for PCI. |
1674 | 1674 | ||
1675 | [*] NOTE! attempting to load from the same location as was written to may | 1675 | [*] NOTE! attempting to load from the same location as was written to may |
1676 | cause a malfunction - consider the 16550 Rx/Tx serial registers for | 1676 | cause a malfunction - consider the 16550 Rx/Tx serial registers for |
1677 | example. | 1677 | example. |
1678 | 1678 | ||
1679 | Used with prefetchable I/O memory, an mmiowb() barrier may be required to | 1679 | Used with prefetchable I/O memory, an mmiowb() barrier may be required to |
1680 | force stores to be ordered. | 1680 | force stores to be ordered. |
1681 | 1681 | ||
1682 | Please refer to the PCI specification for more information on interactions | 1682 | Please refer to the PCI specification for more information on interactions |
1683 | between PCI transactions. | 1683 | between PCI transactions. |
1684 | 1684 | ||
1685 | (*) readX_relaxed() | 1685 | (*) readX_relaxed() |
1686 | 1686 | ||
1687 | These are similar to readX(), but are not guaranteed to be ordered in any | 1687 | These are similar to readX(), but are not guaranteed to be ordered in any |
1688 | way. Be aware that there is no I/O read barrier available. | 1688 | way. Be aware that there is no I/O read barrier available. |
1689 | 1689 | ||
1690 | (*) ioreadX(), iowriteX() | 1690 | (*) ioreadX(), iowriteX() |
1691 | 1691 | ||
1692 | These will perform as appropriate for the type of access they're actually | 1692 | These will perform as appropriate for the type of access they're actually |
1693 | doing, be it inX()/outX() or readX()/writeX(). | 1693 | doing, be it inX()/outX() or readX()/writeX(). |
1694 | 1694 | ||
1695 | 1695 | ||
1696 | ======================================== | 1696 | ======================================== |
1697 | ASSUMED MINIMUM EXECUTION ORDERING MODEL | 1697 | ASSUMED MINIMUM EXECUTION ORDERING MODEL |
1698 | ======================================== | 1698 | ======================================== |
1699 | 1699 | ||
1700 | It has to be assumed that the conceptual CPU is weakly-ordered but that it will | 1700 | It has to be assumed that the conceptual CPU is weakly-ordered but that it will |
1701 | maintain the appearance of program causality with respect to itself. Some CPUs | 1701 | maintain the appearance of program causality with respect to itself. Some CPUs |
1702 | (such as i386 or x86_64) are more constrained than others (such as powerpc or | 1702 | (such as i386 or x86_64) are more constrained than others (such as powerpc or |
1703 | frv), and so the most relaxed case (namely DEC Alpha) must be assumed outside | 1703 | frv), and so the most relaxed case (namely DEC Alpha) must be assumed outside |
1704 | of arch-specific code. | 1704 | of arch-specific code. |
1705 | 1705 | ||
1706 | This means that it must be considered that the CPU will execute its instruction | 1706 | This means that it must be considered that the CPU will execute its instruction |
1707 | stream in any order it feels like - or even in parallel - provided that if an | 1707 | stream in any order it feels like - or even in parallel - provided that if an |
1708 | instruction in the stream depends on the an earlier instruction, then that | 1708 | instruction in the stream depends on the an earlier instruction, then that |
1709 | earlier instruction must be sufficiently complete[*] before the later | 1709 | earlier instruction must be sufficiently complete[*] before the later |
1710 | instruction may proceed; in other words: provided that the appearance of | 1710 | instruction may proceed; in other words: provided that the appearance of |
1711 | causality is maintained. | 1711 | causality is maintained. |
1712 | 1712 | ||
1713 | [*] Some instructions have more than one effect - such as changing the | 1713 | [*] Some instructions have more than one effect - such as changing the |
1714 | condition codes, changing registers or changing memory - and different | 1714 | condition codes, changing registers or changing memory - and different |
1715 | instructions may depend on different effects. | 1715 | instructions may depend on different effects. |
1716 | 1716 | ||
1717 | A CPU may also discard any instruction sequence that winds up having no | 1717 | A CPU may also discard any instruction sequence that winds up having no |
1718 | ultimate effect. For example, if two adjacent instructions both load an | 1718 | ultimate effect. For example, if two adjacent instructions both load an |
1719 | immediate value into the same register, the first may be discarded. | 1719 | immediate value into the same register, the first may be discarded. |
1720 | 1720 | ||
1721 | 1721 | ||
1722 | Similarly, it has to be assumed that compiler might reorder the instruction | 1722 | Similarly, it has to be assumed that compiler might reorder the instruction |
1723 | stream in any way it sees fit, again provided the appearance of causality is | 1723 | stream in any way it sees fit, again provided the appearance of causality is |
1724 | maintained. | 1724 | maintained. |
1725 | 1725 | ||
1726 | 1726 | ||
1727 | ============================ | 1727 | ============================ |
1728 | THE EFFECTS OF THE CPU CACHE | 1728 | THE EFFECTS OF THE CPU CACHE |
1729 | ============================ | 1729 | ============================ |
1730 | 1730 | ||
1731 | The way cached memory operations are perceived across the system is affected to | 1731 | The way cached memory operations are perceived across the system is affected to |
1732 | a certain extent by the caches that lie between CPUs and memory, and by the | 1732 | a certain extent by the caches that lie between CPUs and memory, and by the |
1733 | memory coherence system that maintains the consistency of state in the system. | 1733 | memory coherence system that maintains the consistency of state in the system. |
1734 | 1734 | ||
1735 | As far as the way a CPU interacts with another part of the system through the | 1735 | As far as the way a CPU interacts with another part of the system through the |
1736 | caches goes, the memory system has to include the CPU's caches, and memory | 1736 | caches goes, the memory system has to include the CPU's caches, and memory |
1737 | barriers for the most part act at the interface between the CPU and its cache | 1737 | barriers for the most part act at the interface between the CPU and its cache |
1738 | (memory barriers logically act on the dotted line in the following diagram): | 1738 | (memory barriers logically act on the dotted line in the following diagram): |
1739 | 1739 | ||
1740 | <--- CPU ---> : <----------- Memory -----------> | 1740 | <--- CPU ---> : <----------- Memory -----------> |
1741 | : | 1741 | : |
1742 | +--------+ +--------+ : +--------+ +-----------+ | 1742 | +--------+ +--------+ : +--------+ +-----------+ |
1743 | | | | | : | | | | +--------+ | 1743 | | | | | : | | | | +--------+ |
1744 | | CPU | | Memory | : | CPU | | | | | | 1744 | | CPU | | Memory | : | CPU | | | | | |
1745 | | Core |--->| Access |----->| Cache |<-->| | | | | 1745 | | Core |--->| Access |----->| Cache |<-->| | | | |
1746 | | | | Queue | : | | | |--->| Memory | | 1746 | | | | Queue | : | | | |--->| Memory | |
1747 | | | | | : | | | | | | | 1747 | | | | | : | | | | | | |
1748 | +--------+ +--------+ : +--------+ | | | | | 1748 | +--------+ +--------+ : +--------+ | | | | |
1749 | : | Cache | +--------+ | 1749 | : | Cache | +--------+ |
1750 | : | Coherency | | 1750 | : | Coherency | |
1751 | : | Mechanism | +--------+ | 1751 | : | Mechanism | +--------+ |
1752 | +--------+ +--------+ : +--------+ | | | | | 1752 | +--------+ +--------+ : +--------+ | | | | |
1753 | | | | | : | | | | | | | 1753 | | | | | : | | | | | | |
1754 | | CPU | | Memory | : | CPU | | |--->| Device | | 1754 | | CPU | | Memory | : | CPU | | |--->| Device | |
1755 | | Core |--->| Access |----->| Cache |<-->| | | | | 1755 | | Core |--->| Access |----->| Cache |<-->| | | | |
1756 | | | | Queue | : | | | | | | | 1756 | | | | Queue | : | | | | | | |
1757 | | | | | : | | | | +--------+ | 1757 | | | | | : | | | | +--------+ |
1758 | +--------+ +--------+ : +--------+ +-----------+ | 1758 | +--------+ +--------+ : +--------+ +-----------+ |
1759 | : | 1759 | : |
1760 | : | 1760 | : |
1761 | 1761 | ||
1762 | Although any particular load or store may not actually appear outside of the | 1762 | Although any particular load or store may not actually appear outside of the |
1763 | CPU that issued it since it may have been satisfied within the CPU's own cache, | 1763 | CPU that issued it since it may have been satisfied within the CPU's own cache, |
1764 | it will still appear as if the full memory access had taken place as far as the | 1764 | it will still appear as if the full memory access had taken place as far as the |
1765 | other CPUs are concerned since the cache coherency mechanisms will migrate the | 1765 | other CPUs are concerned since the cache coherency mechanisms will migrate the |
1766 | cacheline over to the accessing CPU and propagate the effects upon conflict. | 1766 | cacheline over to the accessing CPU and propagate the effects upon conflict. |
1767 | 1767 | ||
1768 | The CPU core may execute instructions in any order it deems fit, provided the | 1768 | The CPU core may execute instructions in any order it deems fit, provided the |
1769 | expected program causality appears to be maintained. Some of the instructions | 1769 | expected program causality appears to be maintained. Some of the instructions |
1770 | generate load and store operations which then go into the queue of memory | 1770 | generate load and store operations which then go into the queue of memory |
1771 | accesses to be performed. The core may place these in the queue in any order | 1771 | accesses to be performed. The core may place these in the queue in any order |
1772 | it wishes, and continue execution until it is forced to wait for an instruction | 1772 | it wishes, and continue execution until it is forced to wait for an instruction |
1773 | to complete. | 1773 | to complete. |
1774 | 1774 | ||
1775 | What memory barriers are concerned with is controlling the order in which | 1775 | What memory barriers are concerned with is controlling the order in which |
1776 | accesses cross from the CPU side of things to the memory side of things, and | 1776 | accesses cross from the CPU side of things to the memory side of things, and |
1777 | the order in which the effects are perceived to happen by the other observers | 1777 | the order in which the effects are perceived to happen by the other observers |
1778 | in the system. | 1778 | in the system. |
1779 | 1779 | ||
1780 | [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see | 1780 | [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see |
1781 | their own loads and stores as if they had happened in program order. | 1781 | their own loads and stores as if they had happened in program order. |
1782 | 1782 | ||
1783 | [!] MMIO or other device accesses may bypass the cache system. This depends on | 1783 | [!] MMIO or other device accesses may bypass the cache system. This depends on |
1784 | the properties of the memory window through which devices are accessed and/or | 1784 | the properties of the memory window through which devices are accessed and/or |
1785 | the use of any special device communication instructions the CPU may have. | 1785 | the use of any special device communication instructions the CPU may have. |
1786 | 1786 | ||
1787 | 1787 | ||
1788 | CACHE COHERENCY | 1788 | CACHE COHERENCY |
1789 | --------------- | 1789 | --------------- |
1790 | 1790 | ||
1791 | Life isn't quite as simple as it may appear above, however: for while the | 1791 | Life isn't quite as simple as it may appear above, however: for while the |
1792 | caches are expected to be coherent, there's no guarantee that that coherency | 1792 | caches are expected to be coherent, there's no guarantee that that coherency |
1793 | will be ordered. This means that whilst changes made on one CPU will | 1793 | will be ordered. This means that whilst changes made on one CPU will |
1794 | eventually become visible on all CPUs, there's no guarantee that they will | 1794 | eventually become visible on all CPUs, there's no guarantee that they will |
1795 | become apparent in the same order on those other CPUs. | 1795 | become apparent in the same order on those other CPUs. |
1796 | 1796 | ||
1797 | 1797 | ||
1798 | Consider dealing with a system that has pair of CPUs (1 & 2), each of which has | 1798 | Consider dealing with a system that has pair of CPUs (1 & 2), each of which has |
1799 | a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): | 1799 | a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): |
1800 | 1800 | ||
1801 | : | 1801 | : |
1802 | : +--------+ | 1802 | : +--------+ |
1803 | : +---------+ | | | 1803 | : +---------+ | | |
1804 | +--------+ : +--->| Cache A |<------->| | | 1804 | +--------+ : +--->| Cache A |<------->| | |
1805 | | | : | +---------+ | | | 1805 | | | : | +---------+ | | |
1806 | | CPU 1 |<---+ | | | 1806 | | CPU 1 |<---+ | | |
1807 | | | : | +---------+ | | | 1807 | | | : | +---------+ | | |
1808 | +--------+ : +--->| Cache B |<------->| | | 1808 | +--------+ : +--->| Cache B |<------->| | |
1809 | : +---------+ | | | 1809 | : +---------+ | | |
1810 | : | Memory | | 1810 | : | Memory | |
1811 | : +---------+ | System | | 1811 | : +---------+ | System | |
1812 | +--------+ : +--->| Cache C |<------->| | | 1812 | +--------+ : +--->| Cache C |<------->| | |
1813 | | | : | +---------+ | | | 1813 | | | : | +---------+ | | |
1814 | | CPU 2 |<---+ | | | 1814 | | CPU 2 |<---+ | | |
1815 | | | : | +---------+ | | | 1815 | | | : | +---------+ | | |
1816 | +--------+ : +--->| Cache D |<------->| | | 1816 | +--------+ : +--->| Cache D |<------->| | |
1817 | : +---------+ | | | 1817 | : +---------+ | | |
1818 | : +--------+ | 1818 | : +--------+ |
1819 | : | 1819 | : |
1820 | 1820 | ||
1821 | Imagine the system has the following properties: | 1821 | Imagine the system has the following properties: |
1822 | 1822 | ||
1823 | (*) an odd-numbered cache line may be in cache A, cache C or it may still be | 1823 | (*) an odd-numbered cache line may be in cache A, cache C or it may still be |
1824 | resident in memory; | 1824 | resident in memory; |
1825 | 1825 | ||
1826 | (*) an even-numbered cache line may be in cache B, cache D or it may still be | 1826 | (*) an even-numbered cache line may be in cache B, cache D or it may still be |
1827 | resident in memory; | 1827 | resident in memory; |
1828 | 1828 | ||
1829 | (*) whilst the CPU core is interrogating one cache, the other cache may be | 1829 | (*) whilst the CPU core is interrogating one cache, the other cache may be |
1830 | making use of the bus to access the rest of the system - perhaps to | 1830 | making use of the bus to access the rest of the system - perhaps to |
1831 | displace a dirty cacheline or to do a speculative load; | 1831 | displace a dirty cacheline or to do a speculative load; |
1832 | 1832 | ||
1833 | (*) each cache has a queue of operations that need to be applied to that cache | 1833 | (*) each cache has a queue of operations that need to be applied to that cache |
1834 | to maintain coherency with the rest of the system; | 1834 | to maintain coherency with the rest of the system; |
1835 | 1835 | ||
1836 | (*) the coherency queue is not flushed by normal loads to lines already | 1836 | (*) the coherency queue is not flushed by normal loads to lines already |
1837 | present in the cache, even though the contents of the queue may | 1837 | present in the cache, even though the contents of the queue may |
1838 | potentially effect those loads. | 1838 | potentially effect those loads. |
1839 | 1839 | ||
1840 | Imagine, then, that two writes are made on the first CPU, with a write barrier | 1840 | Imagine, then, that two writes are made on the first CPU, with a write barrier |
1841 | between them to guarantee that they will appear to reach that CPU's caches in | 1841 | between them to guarantee that they will appear to reach that CPU's caches in |
1842 | the requisite order: | 1842 | the requisite order: |
1843 | 1843 | ||
1844 | CPU 1 CPU 2 COMMENT | 1844 | CPU 1 CPU 2 COMMENT |
1845 | =============== =============== ======================================= | 1845 | =============== =============== ======================================= |
1846 | u == 0, v == 1 and p == &u, q == &u | 1846 | u == 0, v == 1 and p == &u, q == &u |
1847 | v = 2; | 1847 | v = 2; |
1848 | smp_wmb(); Make sure change to v visible before | 1848 | smp_wmb(); Make sure change to v visible before |
1849 | change to p | 1849 | change to p |
1850 | <A:modify v=2> v is now in cache A exclusively | 1850 | <A:modify v=2> v is now in cache A exclusively |
1851 | p = &v; | 1851 | p = &v; |
1852 | <B:modify p=&v> p is now in cache B exclusively | 1852 | <B:modify p=&v> p is now in cache B exclusively |
1853 | 1853 | ||
1854 | The write memory barrier forces the other CPUs in the system to perceive that | 1854 | The write memory barrier forces the other CPUs in the system to perceive that |
1855 | the local CPU's caches have apparently been updated in the correct order. But | 1855 | the local CPU's caches have apparently been updated in the correct order. But |
1856 | now imagine that the second CPU that wants to read those values: | 1856 | now imagine that the second CPU that wants to read those values: |
1857 | 1857 | ||
1858 | CPU 1 CPU 2 COMMENT | 1858 | CPU 1 CPU 2 COMMENT |
1859 | =============== =============== ======================================= | 1859 | =============== =============== ======================================= |
1860 | ... | 1860 | ... |
1861 | q = p; | 1861 | q = p; |
1862 | x = *q; | 1862 | x = *q; |
1863 | 1863 | ||
1864 | The above pair of reads may then fail to happen in expected order, as the | 1864 | The above pair of reads may then fail to happen in expected order, as the |
1865 | cacheline holding p may get updated in one of the second CPU's caches whilst | 1865 | cacheline holding p may get updated in one of the second CPU's caches whilst |
1866 | the update to the cacheline holding v is delayed in the other of the second | 1866 | the update to the cacheline holding v is delayed in the other of the second |
1867 | CPU's caches by some other cache event: | 1867 | CPU's caches by some other cache event: |
1868 | 1868 | ||
1869 | CPU 1 CPU 2 COMMENT | 1869 | CPU 1 CPU 2 COMMENT |
1870 | =============== =============== ======================================= | 1870 | =============== =============== ======================================= |
1871 | u == 0, v == 1 and p == &u, q == &u | 1871 | u == 0, v == 1 and p == &u, q == &u |
1872 | v = 2; | 1872 | v = 2; |
1873 | smp_wmb(); | 1873 | smp_wmb(); |
1874 | <A:modify v=2> <C:busy> | 1874 | <A:modify v=2> <C:busy> |
1875 | <C:queue v=2> | 1875 | <C:queue v=2> |
1876 | p = &v; q = p; | 1876 | p = &v; q = p; |
1877 | <D:request p> | 1877 | <D:request p> |
1878 | <B:modify p=&v> <D:commit p=&v> | 1878 | <B:modify p=&v> <D:commit p=&v> |
1879 | <D:read p> | 1879 | <D:read p> |
1880 | x = *q; | 1880 | x = *q; |
1881 | <C:read *q> Reads from v before v updated in cache | 1881 | <C:read *q> Reads from v before v updated in cache |
1882 | <C:unbusy> | 1882 | <C:unbusy> |
1883 | <C:commit v=2> | 1883 | <C:commit v=2> |
1884 | 1884 | ||
1885 | Basically, whilst both cachelines will be updated on CPU 2 eventually, there's | 1885 | Basically, whilst both cachelines will be updated on CPU 2 eventually, there's |
1886 | no guarantee that, without intervention, the order of update will be the same | 1886 | no guarantee that, without intervention, the order of update will be the same |
1887 | as that committed on CPU 1. | 1887 | as that committed on CPU 1. |
1888 | 1888 | ||
1889 | 1889 | ||
1890 | To intervene, we need to interpolate a data dependency barrier or a read | 1890 | To intervene, we need to interpolate a data dependency barrier or a read |
1891 | barrier between the loads. This will force the cache to commit its coherency | 1891 | barrier between the loads. This will force the cache to commit its coherency |
1892 | queue before processing any further requests: | 1892 | queue before processing any further requests: |
1893 | 1893 | ||
1894 | CPU 1 CPU 2 COMMENT | 1894 | CPU 1 CPU 2 COMMENT |
1895 | =============== =============== ======================================= | 1895 | =============== =============== ======================================= |
1896 | u == 0, v == 1 and p == &u, q == &u | 1896 | u == 0, v == 1 and p == &u, q == &u |
1897 | v = 2; | 1897 | v = 2; |
1898 | smp_wmb(); | 1898 | smp_wmb(); |
1899 | <A:modify v=2> <C:busy> | 1899 | <A:modify v=2> <C:busy> |
1900 | <C:queue v=2> | 1900 | <C:queue v=2> |
1901 | p = &b; q = p; | 1901 | p = &b; q = p; |
1902 | <D:request p> | 1902 | <D:request p> |
1903 | <B:modify p=&v> <D:commit p=&v> | 1903 | <B:modify p=&v> <D:commit p=&v> |
1904 | <D:read p> | 1904 | <D:read p> |
1905 | smp_read_barrier_depends() | 1905 | smp_read_barrier_depends() |
1906 | <C:unbusy> | 1906 | <C:unbusy> |
1907 | <C:commit v=2> | 1907 | <C:commit v=2> |
1908 | x = *q; | 1908 | x = *q; |
1909 | <C:read *q> Reads from v after v updated in cache | 1909 | <C:read *q> Reads from v after v updated in cache |
1910 | 1910 | ||
1911 | 1911 | ||
1912 | This sort of problem can be encountered on DEC Alpha processors as they have a | 1912 | This sort of problem can be encountered on DEC Alpha processors as they have a |
1913 | split cache that improves performance by making better use of the data bus. | 1913 | split cache that improves performance by making better use of the data bus. |
1914 | Whilst most CPUs do imply a data dependency barrier on the read when a memory | 1914 | Whilst most CPUs do imply a data dependency barrier on the read when a memory |
1915 | access depends on a read, not all do, so it may not be relied on. | 1915 | access depends on a read, not all do, so it may not be relied on. |
1916 | 1916 | ||
1917 | Other CPUs may also have split caches, but must coordinate between the various | 1917 | Other CPUs may also have split caches, but must coordinate between the various |
1918 | cachelets for normal memory accesses. The semantics of the Alpha removes the | 1918 | cachelets for normal memory accesses. The semantics of the Alpha removes the |
1919 | need for coordination in absence of memory barriers. | 1919 | need for coordination in absence of memory barriers. |
1920 | 1920 | ||
1921 | 1921 | ||
1922 | CACHE COHERENCY VS DMA | 1922 | CACHE COHERENCY VS DMA |
1923 | ---------------------- | 1923 | ---------------------- |
1924 | 1924 | ||
1925 | Not all systems maintain cache coherency with respect to devices doing DMA. In | 1925 | Not all systems maintain cache coherency with respect to devices doing DMA. In |
1926 | such cases, a device attempting DMA may obtain stale data from RAM because | 1926 | such cases, a device attempting DMA may obtain stale data from RAM because |
1927 | dirty cache lines may be resident in the caches of various CPUs, and may not | 1927 | dirty cache lines may be resident in the caches of various CPUs, and may not |
1928 | have been written back to RAM yet. To deal with this, the appropriate part of | 1928 | have been written back to RAM yet. To deal with this, the appropriate part of |
1929 | the kernel must flush the overlapping bits of cache on each CPU (and maybe | 1929 | the kernel must flush the overlapping bits of cache on each CPU (and maybe |
1930 | invalidate them as well). | 1930 | invalidate them as well). |
1931 | 1931 | ||
1932 | In addition, the data DMA'd to RAM by a device may be overwritten by dirty | 1932 | In addition, the data DMA'd to RAM by a device may be overwritten by dirty |
1933 | cache lines being written back to RAM from a CPU's cache after the device has | 1933 | cache lines being written back to RAM from a CPU's cache after the device has |
1934 | installed its own data, or cache lines simply present in a CPUs cache may | 1934 | installed its own data, or cache lines simply present in a CPUs cache may |
1935 | simply obscure the fact that RAM has been updated, until at such time as the | 1935 | simply obscure the fact that RAM has been updated, until at such time as the |
1936 | cacheline is discarded from the CPU's cache and reloaded. To deal with this, | 1936 | cacheline is discarded from the CPU's cache and reloaded. To deal with this, |
1937 | the appropriate part of the kernel must invalidate the overlapping bits of the | 1937 | the appropriate part of the kernel must invalidate the overlapping bits of the |
1938 | cache on each CPU. | 1938 | cache on each CPU. |
1939 | 1939 | ||
1940 | See Documentation/cachetlb.txt for more information on cache management. | 1940 | See Documentation/cachetlb.txt for more information on cache management. |
1941 | 1941 | ||
1942 | 1942 | ||
1943 | CACHE COHERENCY VS MMIO | 1943 | CACHE COHERENCY VS MMIO |
1944 | ----------------------- | 1944 | ----------------------- |
1945 | 1945 | ||
1946 | Memory mapped I/O usually takes place through memory locations that are part of | 1946 | Memory mapped I/O usually takes place through memory locations that are part of |
1947 | a window in the CPU's memory space that have different properties assigned than | 1947 | a window in the CPU's memory space that have different properties assigned than |
1948 | the usual RAM directed window. | 1948 | the usual RAM directed window. |
1949 | 1949 | ||
1950 | Amongst these properties is usually the fact that such accesses bypass the | 1950 | Amongst these properties is usually the fact that such accesses bypass the |
1951 | caching entirely and go directly to the device buses. This means MMIO accesses | 1951 | caching entirely and go directly to the device buses. This means MMIO accesses |
1952 | may, in effect, overtake accesses to cached memory that were emitted earlier. | 1952 | may, in effect, overtake accesses to cached memory that were emitted earlier. |
1953 | A memory barrier isn't sufficient in such a case, but rather the cache must be | 1953 | A memory barrier isn't sufficient in such a case, but rather the cache must be |
1954 | flushed between the cached memory write and the MMIO access if the two are in | 1954 | flushed between the cached memory write and the MMIO access if the two are in |
1955 | any way dependent. | 1955 | any way dependent. |
1956 | 1956 | ||
1957 | 1957 | ||
1958 | ========================= | 1958 | ========================= |
1959 | THE THINGS CPUS GET UP TO | 1959 | THE THINGS CPUS GET UP TO |
1960 | ========================= | 1960 | ========================= |
1961 | 1961 | ||
1962 | A programmer might take it for granted that the CPU will perform memory | 1962 | A programmer might take it for granted that the CPU will perform memory |
1963 | operations in exactly the order specified, so that if a CPU is, for example, | 1963 | operations in exactly the order specified, so that if a CPU is, for example, |
1964 | given the following piece of code to execute: | 1964 | given the following piece of code to execute: |
1965 | 1965 | ||
1966 | a = *A; | 1966 | a = *A; |
1967 | *B = b; | 1967 | *B = b; |
1968 | c = *C; | 1968 | c = *C; |
1969 | d = *D; | 1969 | d = *D; |
1970 | *E = e; | 1970 | *E = e; |
1971 | 1971 | ||
1972 | They would then expect that the CPU will complete the memory operation for each | 1972 | They would then expect that the CPU will complete the memory operation for each |
1973 | instruction before moving on to the next one, leading to a definite sequence of | 1973 | instruction before moving on to the next one, leading to a definite sequence of |
1974 | operations as seen by external observers in the system: | 1974 | operations as seen by external observers in the system: |
1975 | 1975 | ||
1976 | LOAD *A, STORE *B, LOAD *C, LOAD *D, STORE *E. | 1976 | LOAD *A, STORE *B, LOAD *C, LOAD *D, STORE *E. |
1977 | 1977 | ||
1978 | 1978 | ||
1979 | Reality is, of course, much messier. With many CPUs and compilers, the above | 1979 | Reality is, of course, much messier. With many CPUs and compilers, the above |
1980 | assumption doesn't hold because: | 1980 | assumption doesn't hold because: |
1981 | 1981 | ||
1982 | (*) loads are more likely to need to be completed immediately to permit | 1982 | (*) loads are more likely to need to be completed immediately to permit |
1983 | execution progress, whereas stores can often be deferred without a | 1983 | execution progress, whereas stores can often be deferred without a |
1984 | problem; | 1984 | problem; |
1985 | 1985 | ||
1986 | (*) loads may be done speculatively, and the result discarded should it prove | 1986 | (*) loads may be done speculatively, and the result discarded should it prove |
1987 | to have been unnecessary; | 1987 | to have been unnecessary; |
1988 | 1988 | ||
1989 | (*) loads may be done speculatively, leading to the result having being | 1989 | (*) loads may be done speculatively, leading to the result having being |
1990 | fetched at the wrong time in the expected sequence of events; | 1990 | fetched at the wrong time in the expected sequence of events; |
1991 | 1991 | ||
1992 | (*) the order of the memory accesses may be rearranged to promote better use | 1992 | (*) the order of the memory accesses may be rearranged to promote better use |
1993 | of the CPU buses and caches; | 1993 | of the CPU buses and caches; |
1994 | 1994 | ||
1995 | (*) loads and stores may be combined to improve performance when talking to | 1995 | (*) loads and stores may be combined to improve performance when talking to |
1996 | memory or I/O hardware that can do batched accesses of adjacent locations, | 1996 | memory or I/O hardware that can do batched accesses of adjacent locations, |
1997 | thus cutting down on transaction setup costs (memory and PCI devices may | 1997 | thus cutting down on transaction setup costs (memory and PCI devices may |
1998 | both be able to do this); and | 1998 | both be able to do this); and |
1999 | 1999 | ||
2000 | (*) the CPU's data cache may affect the ordering, and whilst cache-coherency | 2000 | (*) the CPU's data cache may affect the ordering, and whilst cache-coherency |
2001 | mechanisms may alleviate this - once the store has actually hit the cache | 2001 | mechanisms may alleviate this - once the store has actually hit the cache |
2002 | - there's no guarantee that the coherency management will be propagated in | 2002 | - there's no guarantee that the coherency management will be propagated in |
2003 | order to other CPUs. | 2003 | order to other CPUs. |
2004 | 2004 | ||
2005 | So what another CPU, say, might actually observe from the above piece of code | 2005 | So what another CPU, say, might actually observe from the above piece of code |
2006 | is: | 2006 | is: |
2007 | 2007 | ||
2008 | LOAD *A, ..., LOAD {*C,*D}, STORE *E, STORE *B | 2008 | LOAD *A, ..., LOAD {*C,*D}, STORE *E, STORE *B |
2009 | 2009 | ||
2010 | (Where "LOAD {*C,*D}" is a combined load) | 2010 | (Where "LOAD {*C,*D}" is a combined load) |
2011 | 2011 | ||
2012 | 2012 | ||
2013 | However, it is guaranteed that a CPU will be self-consistent: it will see its | 2013 | However, it is guaranteed that a CPU will be self-consistent: it will see its |
2014 | _own_ accesses appear to be correctly ordered, without the need for a memory | 2014 | _own_ accesses appear to be correctly ordered, without the need for a memory |
2015 | barrier. For instance with the following code: | 2015 | barrier. For instance with the following code: |
2016 | 2016 | ||
2017 | U = *A; | 2017 | U = *A; |
2018 | *A = V; | 2018 | *A = V; |
2019 | *A = W; | 2019 | *A = W; |
2020 | X = *A; | 2020 | X = *A; |
2021 | *A = Y; | 2021 | *A = Y; |
2022 | Z = *A; | 2022 | Z = *A; |
2023 | 2023 | ||
2024 | and assuming no intervention by an external influence, it can be assumed that | 2024 | and assuming no intervention by an external influence, it can be assumed that |
2025 | the final result will appear to be: | 2025 | the final result will appear to be: |
2026 | 2026 | ||
2027 | U == the original value of *A | 2027 | U == the original value of *A |
2028 | X == W | 2028 | X == W |
2029 | Z == Y | 2029 | Z == Y |
2030 | *A == Y | 2030 | *A == Y |
2031 | 2031 | ||
2032 | The code above may cause the CPU to generate the full sequence of memory | 2032 | The code above may cause the CPU to generate the full sequence of memory |
2033 | accesses: | 2033 | accesses: |
2034 | 2034 | ||
2035 | U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A | 2035 | U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A |
2036 | 2036 | ||
2037 | in that order, but, without intervention, the sequence may have almost any | 2037 | in that order, but, without intervention, the sequence may have almost any |
2038 | combination of elements combined or discarded, provided the program's view of | 2038 | combination of elements combined or discarded, provided the program's view of |
2039 | the world remains consistent. | 2039 | the world remains consistent. |
2040 | 2040 | ||
2041 | The compiler may also combine, discard or defer elements of the sequence before | 2041 | The compiler may also combine, discard or defer elements of the sequence before |
2042 | the CPU even sees them. | 2042 | the CPU even sees them. |
2043 | 2043 | ||
2044 | For instance: | 2044 | For instance: |
2045 | 2045 | ||
2046 | *A = V; | 2046 | *A = V; |
2047 | *A = W; | 2047 | *A = W; |
2048 | 2048 | ||
2049 | may be reduced to: | 2049 | may be reduced to: |
2050 | 2050 | ||
2051 | *A = W; | 2051 | *A = W; |
2052 | 2052 | ||
2053 | since, without a write barrier, it can be assumed that the effect of the | 2053 | since, without a write barrier, it can be assumed that the effect of the |
2054 | storage of V to *A is lost. Similarly: | 2054 | storage of V to *A is lost. Similarly: |
2055 | 2055 | ||
2056 | *A = Y; | 2056 | *A = Y; |
2057 | Z = *A; | 2057 | Z = *A; |
2058 | 2058 | ||
2059 | may, without a memory barrier, be reduced to: | 2059 | may, without a memory barrier, be reduced to: |
2060 | 2060 | ||
2061 | *A = Y; | 2061 | *A = Y; |
2062 | Z = Y; | 2062 | Z = Y; |
2063 | 2063 | ||
2064 | and the LOAD operation never appear outside of the CPU. | 2064 | and the LOAD operation never appear outside of the CPU. |
2065 | 2065 | ||
2066 | 2066 | ||
2067 | AND THEN THERE'S THE ALPHA | 2067 | AND THEN THERE'S THE ALPHA |
2068 | -------------------------- | 2068 | -------------------------- |
2069 | 2069 | ||
2070 | The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, | 2070 | The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, |
2071 | some versions of the Alpha CPU have a split data cache, permitting them to have | 2071 | some versions of the Alpha CPU have a split data cache, permitting them to have |
2072 | two semantically related cache lines updating at separate times. This is where | 2072 | two semantically related cache lines updating at separate times. This is where |
2073 | the data dependency barrier really becomes necessary as this synchronises both | 2073 | the data dependency barrier really becomes necessary as this synchronises both |
2074 | caches with the memory coherence system, thus making it seem like pointer | 2074 | caches with the memory coherence system, thus making it seem like pointer |
2075 | changes vs new data occur in the right order. | 2075 | changes vs new data occur in the right order. |
2076 | 2076 | ||
2077 | The Alpha defines the Linux's kernel's memory barrier model. | 2077 | The Alpha defines the Linux's kernel's memory barrier model. |
2078 | 2078 | ||
2079 | See the subsection on "Cache Coherency" above. | 2079 | See the subsection on "Cache Coherency" above. |
2080 | 2080 | ||
2081 | 2081 | ||
2082 | ========== | 2082 | ========== |
2083 | REFERENCES | 2083 | REFERENCES |
2084 | ========== | 2084 | ========== |
2085 | 2085 | ||
2086 | Alpha AXP Architecture Reference Manual, Second Edition (Sites & Witek, | 2086 | Alpha AXP Architecture Reference Manual, Second Edition (Sites & Witek, |
2087 | Digital Press) | 2087 | Digital Press) |
2088 | Chapter 5.2: Physical Address Space Characteristics | 2088 | Chapter 5.2: Physical Address Space Characteristics |
2089 | Chapter 5.4: Caches and Write Buffers | 2089 | Chapter 5.4: Caches and Write Buffers |
2090 | Chapter 5.5: Data Sharing | 2090 | Chapter 5.5: Data Sharing |
2091 | Chapter 5.6: Read/Write Ordering | 2091 | Chapter 5.6: Read/Write Ordering |
2092 | 2092 | ||
2093 | AMD64 Architecture Programmer's Manual Volume 2: System Programming | 2093 | AMD64 Architecture Programmer's Manual Volume 2: System Programming |
2094 | Chapter 7.1: Memory-Access Ordering | 2094 | Chapter 7.1: Memory-Access Ordering |
2095 | Chapter 7.4: Buffering and Combining Memory Writes | 2095 | Chapter 7.4: Buffering and Combining Memory Writes |
2096 | 2096 | ||
2097 | IA-32 Intel Architecture Software Developer's Manual, Volume 3: | 2097 | IA-32 Intel Architecture Software Developer's Manual, Volume 3: |
2098 | System Programming Guide | 2098 | System Programming Guide |
2099 | Chapter 7.1: Locked Atomic Operations | 2099 | Chapter 7.1: Locked Atomic Operations |
2100 | Chapter 7.2: Memory Ordering | 2100 | Chapter 7.2: Memory Ordering |
2101 | Chapter 7.4: Serializing Instructions | 2101 | Chapter 7.4: Serializing Instructions |
2102 | 2102 | ||
2103 | The SPARC Architecture Manual, Version 9 | 2103 | The SPARC Architecture Manual, Version 9 |
2104 | Chapter 8: Memory Models | 2104 | Chapter 8: Memory Models |
2105 | Appendix D: Formal Specification of the Memory Models | 2105 | Appendix D: Formal Specification of the Memory Models |
2106 | Appendix J: Programming with the Memory Models | 2106 | Appendix J: Programming with the Memory Models |
2107 | 2107 | ||
2108 | UltraSPARC Programmer Reference Manual | 2108 | UltraSPARC Programmer Reference Manual |
2109 | Chapter 5: Memory Accesses and Cacheability | 2109 | Chapter 5: Memory Accesses and Cacheability |
2110 | Chapter 15: Sparc-V9 Memory Models | 2110 | Chapter 15: Sparc-V9 Memory Models |
2111 | 2111 | ||
2112 | UltraSPARC III Cu User's Manual | 2112 | UltraSPARC III Cu User's Manual |
2113 | Chapter 9: Memory Models | 2113 | Chapter 9: Memory Models |
2114 | 2114 | ||
2115 | UltraSPARC IIIi Processor User's Manual | 2115 | UltraSPARC IIIi Processor User's Manual |
2116 | Chapter 8: Memory Models | 2116 | Chapter 8: Memory Models |
2117 | 2117 | ||
2118 | UltraSPARC Architecture 2005 | 2118 | UltraSPARC Architecture 2005 |
2119 | Chapter 9: Memory | 2119 | Chapter 9: Memory |
2120 | Appendix D: Formal Specifications of the Memory Models | 2120 | Appendix D: Formal Specifications of the Memory Models |
2121 | 2121 | ||
2122 | UltraSPARC T1 Supplement to the UltraSPARC Architecture 2005 | 2122 | UltraSPARC T1 Supplement to the UltraSPARC Architecture 2005 |
2123 | Chapter 8: Memory Models | 2123 | Chapter 8: Memory Models |
2124 | Appendix F: Caches and Cache Coherency | 2124 | Appendix F: Caches and Cache Coherency |
2125 | 2125 | ||
2126 | Solaris Internals, Core Kernel Architecture, p63-68: | 2126 | Solaris Internals, Core Kernel Architecture, p63-68: |
2127 | Chapter 3.3: Hardware Considerations for Locks and | 2127 | Chapter 3.3: Hardware Considerations for Locks and |
2128 | Synchronization | 2128 | Synchronization |
2129 | 2129 | ||
2130 | Unix Systems for Modern Architectures, Symmetric Multiprocessing and Caching | 2130 | Unix Systems for Modern Architectures, Symmetric Multiprocessing and Caching |
2131 | for Kernel Programmers: | 2131 | for Kernel Programmers: |
2132 | Chapter 13: Other Memory Models | 2132 | Chapter 13: Other Memory Models |
2133 | 2133 | ||
2134 | Intel Itanium Architecture Software Developer's Manual: Volume 1: | 2134 | Intel Itanium Architecture Software Developer's Manual: Volume 1: |
2135 | Section 2.6: Speculation | 2135 | Section 2.6: Speculation |
2136 | Section 4.4: Memory Access | 2136 | Section 4.4: Memory Access |
2137 | 2137 |
Documentation/networking/bonding.txt
1 | 1 | ||
2 | Linux Ethernet Bonding Driver HOWTO | 2 | Linux Ethernet Bonding Driver HOWTO |
3 | 3 | ||
4 | Latest update: 24 April 2006 | 4 | Latest update: 24 April 2006 |
5 | 5 | ||
6 | Initial release : Thomas Davis <tadavis at lbl.gov> | 6 | Initial release : Thomas Davis <tadavis at lbl.gov> |
7 | Corrections, HA extensions : 2000/10/03-15 : | 7 | Corrections, HA extensions : 2000/10/03-15 : |
8 | - Willy Tarreau <willy at meta-x.org> | 8 | - Willy Tarreau <willy at meta-x.org> |
9 | - Constantine Gavrilov <const-g at xpert.com> | 9 | - Constantine Gavrilov <const-g at xpert.com> |
10 | - Chad N. Tindel <ctindel at ieee dot org> | 10 | - Chad N. Tindel <ctindel at ieee dot org> |
11 | - Janice Girouard <girouard at us dot ibm dot com> | 11 | - Janice Girouard <girouard at us dot ibm dot com> |
12 | - Jay Vosburgh <fubar at us dot ibm dot com> | 12 | - Jay Vosburgh <fubar at us dot ibm dot com> |
13 | 13 | ||
14 | Reorganized and updated Feb 2005 by Jay Vosburgh | 14 | Reorganized and updated Feb 2005 by Jay Vosburgh |
15 | Added Sysfs information: 2006/04/24 | 15 | Added Sysfs information: 2006/04/24 |
16 | - Mitch Williams <mitch.a.williams at intel.com> | 16 | - Mitch Williams <mitch.a.williams at intel.com> |
17 | 17 | ||
18 | Introduction | 18 | Introduction |
19 | ============ | 19 | ============ |
20 | 20 | ||
21 | The Linux bonding driver provides a method for aggregating | 21 | The Linux bonding driver provides a method for aggregating |
22 | multiple network interfaces into a single logical "bonded" interface. | 22 | multiple network interfaces into a single logical "bonded" interface. |
23 | The behavior of the bonded interfaces depends upon the mode; generally | 23 | The behavior of the bonded interfaces depends upon the mode; generally |
24 | speaking, modes provide either hot standby or load balancing services. | 24 | speaking, modes provide either hot standby or load balancing services. |
25 | Additionally, link integrity monitoring may be performed. | 25 | Additionally, link integrity monitoring may be performed. |
26 | 26 | ||
27 | The bonding driver originally came from Donald Becker's | 27 | The bonding driver originally came from Donald Becker's |
28 | beowulf patches for kernel 2.0. It has changed quite a bit since, and | 28 | beowulf patches for kernel 2.0. It has changed quite a bit since, and |
29 | the original tools from extreme-linux and beowulf sites will not work | 29 | the original tools from extreme-linux and beowulf sites will not work |
30 | with this version of the driver. | 30 | with this version of the driver. |
31 | 31 | ||
32 | For new versions of the driver, updated userspace tools, and | 32 | For new versions of the driver, updated userspace tools, and |
33 | who to ask for help, please follow the links at the end of this file. | 33 | who to ask for help, please follow the links at the end of this file. |
34 | 34 | ||
35 | Table of Contents | 35 | Table of Contents |
36 | ================= | 36 | ================= |
37 | 37 | ||
38 | 1. Bonding Driver Installation | 38 | 1. Bonding Driver Installation |
39 | 39 | ||
40 | 2. Bonding Driver Options | 40 | 2. Bonding Driver Options |
41 | 41 | ||
42 | 3. Configuring Bonding Devices | 42 | 3. Configuring Bonding Devices |
43 | 3.1 Configuration with Sysconfig Support | 43 | 3.1 Configuration with Sysconfig Support |
44 | 3.1.1 Using DHCP with Sysconfig | 44 | 3.1.1 Using DHCP with Sysconfig |
45 | 3.1.2 Configuring Multiple Bonds with Sysconfig | 45 | 3.1.2 Configuring Multiple Bonds with Sysconfig |
46 | 3.2 Configuration with Initscripts Support | 46 | 3.2 Configuration with Initscripts Support |
47 | 3.2.1 Using DHCP with Initscripts | 47 | 3.2.1 Using DHCP with Initscripts |
48 | 3.2.2 Configuring Multiple Bonds with Initscripts | 48 | 3.2.2 Configuring Multiple Bonds with Initscripts |
49 | 3.3 Configuring Bonding Manually with Ifenslave | 49 | 3.3 Configuring Bonding Manually with Ifenslave |
50 | 3.3.1 Configuring Multiple Bonds Manually | 50 | 3.3.1 Configuring Multiple Bonds Manually |
51 | 3.4 Configuring Bonding Manually via Sysfs | 51 | 3.4 Configuring Bonding Manually via Sysfs |
52 | 52 | ||
53 | 4. Querying Bonding Configuration | 53 | 4. Querying Bonding Configuration |
54 | 4.1 Bonding Configuration | 54 | 4.1 Bonding Configuration |
55 | 4.2 Network Configuration | 55 | 4.2 Network Configuration |
56 | 56 | ||
57 | 5. Switch Configuration | 57 | 5. Switch Configuration |
58 | 58 | ||
59 | 6. 802.1q VLAN Support | 59 | 6. 802.1q VLAN Support |
60 | 60 | ||
61 | 7. Link Monitoring | 61 | 7. Link Monitoring |
62 | 7.1 ARP Monitor Operation | 62 | 7.1 ARP Monitor Operation |
63 | 7.2 Configuring Multiple ARP Targets | 63 | 7.2 Configuring Multiple ARP Targets |
64 | 7.3 MII Monitor Operation | 64 | 7.3 MII Monitor Operation |
65 | 65 | ||
66 | 8. Potential Trouble Sources | 66 | 8. Potential Trouble Sources |
67 | 8.1 Adventures in Routing | 67 | 8.1 Adventures in Routing |
68 | 8.2 Ethernet Device Renaming | 68 | 8.2 Ethernet Device Renaming |
69 | 8.3 Painfully Slow Or No Failed Link Detection By Miimon | 69 | 8.3 Painfully Slow Or No Failed Link Detection By Miimon |
70 | 70 | ||
71 | 9. SNMP agents | 71 | 9. SNMP agents |
72 | 72 | ||
73 | 10. Promiscuous mode | 73 | 10. Promiscuous mode |
74 | 74 | ||
75 | 11. Configuring Bonding for High Availability | 75 | 11. Configuring Bonding for High Availability |
76 | 11.1 High Availability in a Single Switch Topology | 76 | 11.1 High Availability in a Single Switch Topology |
77 | 11.2 High Availability in a Multiple Switch Topology | 77 | 11.2 High Availability in a Multiple Switch Topology |
78 | 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology | 78 | 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology |
79 | 11.2.2 HA Link Monitoring for Multiple Switch Topology | 79 | 11.2.2 HA Link Monitoring for Multiple Switch Topology |
80 | 80 | ||
81 | 12. Configuring Bonding for Maximum Throughput | 81 | 12. Configuring Bonding for Maximum Throughput |
82 | 12.1 Maximum Throughput in a Single Switch Topology | 82 | 12.1 Maximum Throughput in a Single Switch Topology |
83 | 12.1.1 MT Bonding Mode Selection for Single Switch Topology | 83 | 12.1.1 MT Bonding Mode Selection for Single Switch Topology |
84 | 12.1.2 MT Link Monitoring for Single Switch Topology | 84 | 12.1.2 MT Link Monitoring for Single Switch Topology |
85 | 12.2 Maximum Throughput in a Multiple Switch Topology | 85 | 12.2 Maximum Throughput in a Multiple Switch Topology |
86 | 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology | 86 | 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology |
87 | 12.2.2 MT Link Monitoring for Multiple Switch Topology | 87 | 12.2.2 MT Link Monitoring for Multiple Switch Topology |
88 | 88 | ||
89 | 13. Switch Behavior Issues | 89 | 13. Switch Behavior Issues |
90 | 13.1 Link Establishment and Failover Delays | 90 | 13.1 Link Establishment and Failover Delays |
91 | 13.2 Duplicated Incoming Packets | 91 | 13.2 Duplicated Incoming Packets |
92 | 92 | ||
93 | 14. Hardware Specific Considerations | 93 | 14. Hardware Specific Considerations |
94 | 14.1 IBM BladeCenter | 94 | 14.1 IBM BladeCenter |
95 | 95 | ||
96 | 15. Frequently Asked Questions | 96 | 15. Frequently Asked Questions |
97 | 97 | ||
98 | 16. Resources and Links | 98 | 16. Resources and Links |
99 | 99 | ||
100 | 100 | ||
101 | 1. Bonding Driver Installation | 101 | 1. Bonding Driver Installation |
102 | ============================== | 102 | ============================== |
103 | 103 | ||
104 | Most popular distro kernels ship with the bonding driver | 104 | Most popular distro kernels ship with the bonding driver |
105 | already available as a module and the ifenslave user level control | 105 | already available as a module and the ifenslave user level control |
106 | program installed and ready for use. If your distro does not, or you | 106 | program installed and ready for use. If your distro does not, or you |
107 | have need to compile bonding from source (e.g., configuring and | 107 | have need to compile bonding from source (e.g., configuring and |
108 | installing a mainline kernel from kernel.org), you'll need to perform | 108 | installing a mainline kernel from kernel.org), you'll need to perform |
109 | the following steps: | 109 | the following steps: |
110 | 110 | ||
111 | 1.1 Configure and build the kernel with bonding | 111 | 1.1 Configure and build the kernel with bonding |
112 | ----------------------------------------------- | 112 | ----------------------------------------------- |
113 | 113 | ||
114 | The current version of the bonding driver is available in the | 114 | The current version of the bonding driver is available in the |
115 | drivers/net/bonding subdirectory of the most recent kernel source | 115 | drivers/net/bonding subdirectory of the most recent kernel source |
116 | (which is available on http://kernel.org). Most users "rolling their | 116 | (which is available on http://kernel.org). Most users "rolling their |
117 | own" will want to use the most recent kernel from kernel.org. | 117 | own" will want to use the most recent kernel from kernel.org. |
118 | 118 | ||
119 | Configure kernel with "make menuconfig" (or "make xconfig" or | 119 | Configure kernel with "make menuconfig" (or "make xconfig" or |
120 | "make config"), then select "Bonding driver support" in the "Network | 120 | "make config"), then select "Bonding driver support" in the "Network |
121 | device support" section. It is recommended that you configure the | 121 | device support" section. It is recommended that you configure the |
122 | driver as module since it is currently the only way to pass parameters | 122 | driver as module since it is currently the only way to pass parameters |
123 | to the driver or configure more than one bonding device. | 123 | to the driver or configure more than one bonding device. |
124 | 124 | ||
125 | Build and install the new kernel and modules, then continue | 125 | Build and install the new kernel and modules, then continue |
126 | below to install ifenslave. | 126 | below to install ifenslave. |
127 | 127 | ||
128 | 1.2 Install ifenslave Control Utility | 128 | 1.2 Install ifenslave Control Utility |
129 | ------------------------------------- | 129 | ------------------------------------- |
130 | 130 | ||
131 | The ifenslave user level control program is included in the | 131 | The ifenslave user level control program is included in the |
132 | kernel source tree, in the file Documentation/networking/ifenslave.c. | 132 | kernel source tree, in the file Documentation/networking/ifenslave.c. |
133 | It is generally recommended that you use the ifenslave that | 133 | It is generally recommended that you use the ifenslave that |
134 | corresponds to the kernel that you are using (either from the same | 134 | corresponds to the kernel that you are using (either from the same |
135 | source tree or supplied with the distro), however, ifenslave | 135 | source tree or supplied with the distro), however, ifenslave |
136 | executables from older kernels should function (but features newer | 136 | executables from older kernels should function (but features newer |
137 | than the ifenslave release are not supported). Running an ifenslave | 137 | than the ifenslave release are not supported). Running an ifenslave |
138 | that is newer than the kernel is not supported, and may or may not | 138 | that is newer than the kernel is not supported, and may or may not |
139 | work. | 139 | work. |
140 | 140 | ||
141 | To install ifenslave, do the following: | 141 | To install ifenslave, do the following: |
142 | 142 | ||
143 | # gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave | 143 | # gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave |
144 | # cp ifenslave /sbin/ifenslave | 144 | # cp ifenslave /sbin/ifenslave |
145 | 145 | ||
146 | If your kernel source is not in "/usr/src/linux," then replace | 146 | If your kernel source is not in "/usr/src/linux," then replace |
147 | "/usr/src/linux/include" in the above with the location of your kernel | 147 | "/usr/src/linux/include" in the above with the location of your kernel |
148 | source include directory. | 148 | source include directory. |
149 | 149 | ||
150 | You may wish to back up any existing /sbin/ifenslave, or, for | 150 | You may wish to back up any existing /sbin/ifenslave, or, for |
151 | testing or informal use, tag the ifenslave to the kernel version | 151 | testing or informal use, tag the ifenslave to the kernel version |
152 | (e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). | 152 | (e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). |
153 | 153 | ||
154 | IMPORTANT NOTE: | 154 | IMPORTANT NOTE: |
155 | 155 | ||
156 | If you omit the "-I" or specify an incorrect directory, you | 156 | If you omit the "-I" or specify an incorrect directory, you |
157 | may end up with an ifenslave that is incompatible with the kernel | 157 | may end up with an ifenslave that is incompatible with the kernel |
158 | you're trying to build it for. Some distros (e.g., Red Hat from 7.1 | 158 | you're trying to build it for. Some distros (e.g., Red Hat from 7.1 |
159 | onwards) do not have /usr/include/linux symbolically linked to the | 159 | onwards) do not have /usr/include/linux symbolically linked to the |
160 | default kernel source include directory. | 160 | default kernel source include directory. |
161 | 161 | ||
162 | SECOND IMPORTANT NOTE: | 162 | SECOND IMPORTANT NOTE: |
163 | If you plan to configure bonding using sysfs, you do not need | 163 | If you plan to configure bonding using sysfs, you do not need |
164 | to use ifenslave. | 164 | to use ifenslave. |
165 | 165 | ||
166 | 2. Bonding Driver Options | 166 | 2. Bonding Driver Options |
167 | ========================= | 167 | ========================= |
168 | 168 | ||
169 | Options for the bonding driver are supplied as parameters to | 169 | Options for the bonding driver are supplied as parameters to |
170 | the bonding module at load time. They may be given as command line | 170 | the bonding module at load time. They may be given as command line |
171 | arguments to the insmod or modprobe command, but are usually specified | 171 | arguments to the insmod or modprobe command, but are usually specified |
172 | in either the /etc/modules.conf or /etc/modprobe.conf configuration | 172 | in either the /etc/modules.conf or /etc/modprobe.conf configuration |
173 | file, or in a distro-specific configuration file (some of which are | 173 | file, or in a distro-specific configuration file (some of which are |
174 | detailed in the next section). | 174 | detailed in the next section). |
175 | 175 | ||
176 | The available bonding driver parameters are listed below. If a | 176 | The available bonding driver parameters are listed below. If a |
177 | parameter is not specified the default value is used. When initially | 177 | parameter is not specified the default value is used. When initially |
178 | configuring a bond, it is recommended "tail -f /var/log/messages" be | 178 | configuring a bond, it is recommended "tail -f /var/log/messages" be |
179 | run in a separate window to watch for bonding driver error messages. | 179 | run in a separate window to watch for bonding driver error messages. |
180 | 180 | ||
181 | It is critical that either the miimon or arp_interval and | 181 | It is critical that either the miimon or arp_interval and |
182 | arp_ip_target parameters be specified, otherwise serious network | 182 | arp_ip_target parameters be specified, otherwise serious network |
183 | degradation will occur during link failures. Very few devices do not | 183 | degradation will occur during link failures. Very few devices do not |
184 | support at least miimon, so there is really no reason not to use it. | 184 | support at least miimon, so there is really no reason not to use it. |
185 | 185 | ||
186 | Options with textual values will accept either the text name | 186 | Options with textual values will accept either the text name |
187 | or, for backwards compatibility, the option value. E.g., | 187 | or, for backwards compatibility, the option value. E.g., |
188 | "mode=802.3ad" and "mode=4" set the same mode. | 188 | "mode=802.3ad" and "mode=4" set the same mode. |
189 | 189 | ||
190 | The parameters are as follows: | 190 | The parameters are as follows: |
191 | 191 | ||
192 | arp_interval | 192 | arp_interval |
193 | 193 | ||
194 | Specifies the ARP link monitoring frequency in milliseconds. | 194 | Specifies the ARP link monitoring frequency in milliseconds. |
195 | 195 | ||
196 | The ARP monitor works by periodically checking the slave | 196 | The ARP monitor works by periodically checking the slave |
197 | devices to determine whether they have sent or received | 197 | devices to determine whether they have sent or received |
198 | traffic recently (the precise criteria depends upon the | 198 | traffic recently (the precise criteria depends upon the |
199 | bonding mode, and the state of the slave). Regular traffic is | 199 | bonding mode, and the state of the slave). Regular traffic is |
200 | generated via ARP probes issued for the addresses specified by | 200 | generated via ARP probes issued for the addresses specified by |
201 | the arp_ip_target option. | 201 | the arp_ip_target option. |
202 | 202 | ||
203 | This behavior can be modified by the arp_validate option, | 203 | This behavior can be modified by the arp_validate option, |
204 | below. | 204 | below. |
205 | 205 | ||
206 | If ARP monitoring is used in an etherchannel compatible mode | 206 | If ARP monitoring is used in an etherchannel compatible mode |
207 | (modes 0 and 2), the switch should be configured in a mode | 207 | (modes 0 and 2), the switch should be configured in a mode |
208 | that evenly distributes packets across all links. If the | 208 | that evenly distributes packets across all links. If the |
209 | switch is configured to distribute the packets in an XOR | 209 | switch is configured to distribute the packets in an XOR |
210 | fashion, all replies from the ARP targets will be received on | 210 | fashion, all replies from the ARP targets will be received on |
211 | the same link which could cause the other team members to | 211 | the same link which could cause the other team members to |
212 | fail. ARP monitoring should not be used in conjunction with | 212 | fail. ARP monitoring should not be used in conjunction with |
213 | miimon. A value of 0 disables ARP monitoring. The default | 213 | miimon. A value of 0 disables ARP monitoring. The default |
214 | value is 0. | 214 | value is 0. |
215 | 215 | ||
216 | arp_ip_target | 216 | arp_ip_target |
217 | 217 | ||
218 | Specifies the IP addresses to use as ARP monitoring peers when | 218 | Specifies the IP addresses to use as ARP monitoring peers when |
219 | arp_interval is > 0. These are the targets of the ARP request | 219 | arp_interval is > 0. These are the targets of the ARP request |
220 | sent to determine the health of the link to the targets. | 220 | sent to determine the health of the link to the targets. |
221 | Specify these values in ddd.ddd.ddd.ddd format. Multiple IP | 221 | Specify these values in ddd.ddd.ddd.ddd format. Multiple IP |
222 | addresses must be separated by a comma. At least one IP | 222 | addresses must be separated by a comma. At least one IP |
223 | address must be given for ARP monitoring to function. The | 223 | address must be given for ARP monitoring to function. The |
224 | maximum number of targets that can be specified is 16. The | 224 | maximum number of targets that can be specified is 16. The |
225 | default value is no IP addresses. | 225 | default value is no IP addresses. |
226 | 226 | ||
227 | arp_validate | 227 | arp_validate |
228 | 228 | ||
229 | Specifies whether or not ARP probes and replies should be | 229 | Specifies whether or not ARP probes and replies should be |
230 | validated in the active-backup mode. This causes the ARP | 230 | validated in the active-backup mode. This causes the ARP |
231 | monitor to examine the incoming ARP requests and replies, and | 231 | monitor to examine the incoming ARP requests and replies, and |
232 | only consider a slave to be up if it is receiving the | 232 | only consider a slave to be up if it is receiving the |
233 | appropriate ARP traffic. | 233 | appropriate ARP traffic. |
234 | 234 | ||
235 | Possible values are: | 235 | Possible values are: |
236 | 236 | ||
237 | none or 0 | 237 | none or 0 |
238 | 238 | ||
239 | No validation is performed. This is the default. | 239 | No validation is performed. This is the default. |
240 | 240 | ||
241 | active or 1 | 241 | active or 1 |
242 | 242 | ||
243 | Validation is performed only for the active slave. | 243 | Validation is performed only for the active slave. |
244 | 244 | ||
245 | backup or 2 | 245 | backup or 2 |
246 | 246 | ||
247 | Validation is performed only for backup slaves. | 247 | Validation is performed only for backup slaves. |
248 | 248 | ||
249 | all or 3 | 249 | all or 3 |
250 | 250 | ||
251 | Validation is performed for all slaves. | 251 | Validation is performed for all slaves. |
252 | 252 | ||
253 | For the active slave, the validation checks ARP replies to | 253 | For the active slave, the validation checks ARP replies to |
254 | confirm that they were generated by an arp_ip_target. Since | 254 | confirm that they were generated by an arp_ip_target. Since |
255 | backup slaves do not typically receive these replies, the | 255 | backup slaves do not typically receive these replies, the |
256 | validation performed for backup slaves is on the ARP request | 256 | validation performed for backup slaves is on the ARP request |
257 | sent out via the active slave. It is possible that some | 257 | sent out via the active slave. It is possible that some |
258 | switch or network configurations may result in situations | 258 | switch or network configurations may result in situations |
259 | wherein the backup slaves do not receive the ARP requests; in | 259 | wherein the backup slaves do not receive the ARP requests; in |
260 | such a situation, validation of backup slaves must be | 260 | such a situation, validation of backup slaves must be |
261 | disabled. | 261 | disabled. |
262 | 262 | ||
263 | This option is useful in network configurations in which | 263 | This option is useful in network configurations in which |
264 | multiple bonding hosts are concurrently issuing ARPs to one or | 264 | multiple bonding hosts are concurrently issuing ARPs to one or |
265 | more targets beyond a common switch. Should the link between | 265 | more targets beyond a common switch. Should the link between |
266 | the switch and target fail (but not the switch itself), the | 266 | the switch and target fail (but not the switch itself), the |
267 | probe traffic generated by the multiple bonding instances will | 267 | probe traffic generated by the multiple bonding instances will |
268 | fool the standard ARP monitor into considering the links as | 268 | fool the standard ARP monitor into considering the links as |
269 | still up. Use of the arp_validate option can resolve this, as | 269 | still up. Use of the arp_validate option can resolve this, as |
270 | the ARP monitor will only consider ARP requests and replies | 270 | the ARP monitor will only consider ARP requests and replies |
271 | associated with its own instance of bonding. | 271 | associated with its own instance of bonding. |
272 | 272 | ||
273 | This option was added in bonding version 3.1.0. | 273 | This option was added in bonding version 3.1.0. |
274 | 274 | ||
275 | downdelay | 275 | downdelay |
276 | 276 | ||
277 | Specifies the time, in milliseconds, to wait before disabling | 277 | Specifies the time, in milliseconds, to wait before disabling |
278 | a slave after a link failure has been detected. This option | 278 | a slave after a link failure has been detected. This option |
279 | is only valid for the miimon link monitor. The downdelay | 279 | is only valid for the miimon link monitor. The downdelay |
280 | value should be a multiple of the miimon value; if not, it | 280 | value should be a multiple of the miimon value; if not, it |
281 | will be rounded down to the nearest multiple. The default | 281 | will be rounded down to the nearest multiple. The default |
282 | value is 0. | 282 | value is 0. |
283 | 283 | ||
284 | lacp_rate | 284 | lacp_rate |
285 | 285 | ||
286 | Option specifying the rate in which we'll ask our link partner | 286 | Option specifying the rate in which we'll ask our link partner |
287 | to transmit LACPDU packets in 802.3ad mode. Possible values | 287 | to transmit LACPDU packets in 802.3ad mode. Possible values |
288 | are: | 288 | are: |
289 | 289 | ||
290 | slow or 0 | 290 | slow or 0 |
291 | Request partner to transmit LACPDUs every 30 seconds | 291 | Request partner to transmit LACPDUs every 30 seconds |
292 | 292 | ||
293 | fast or 1 | 293 | fast or 1 |
294 | Request partner to transmit LACPDUs every 1 second | 294 | Request partner to transmit LACPDUs every 1 second |
295 | 295 | ||
296 | The default is slow. | 296 | The default is slow. |
297 | 297 | ||
298 | max_bonds | 298 | max_bonds |
299 | 299 | ||
300 | Specifies the number of bonding devices to create for this | 300 | Specifies the number of bonding devices to create for this |
301 | instance of the bonding driver. E.g., if max_bonds is 3, and | 301 | instance of the bonding driver. E.g., if max_bonds is 3, and |
302 | the bonding driver is not already loaded, then bond0, bond1 | 302 | the bonding driver is not already loaded, then bond0, bond1 |
303 | and bond2 will be created. The default value is 1. | 303 | and bond2 will be created. The default value is 1. |
304 | 304 | ||
305 | miimon | 305 | miimon |
306 | 306 | ||
307 | Specifies the MII link monitoring frequency in milliseconds. | 307 | Specifies the MII link monitoring frequency in milliseconds. |
308 | This determines how often the link state of each slave is | 308 | This determines how often the link state of each slave is |
309 | inspected for link failures. A value of zero disables MII | 309 | inspected for link failures. A value of zero disables MII |
310 | link monitoring. A value of 100 is a good starting point. | 310 | link monitoring. A value of 100 is a good starting point. |
311 | The use_carrier option, below, affects how the link state is | 311 | The use_carrier option, below, affects how the link state is |
312 | determined. See the High Availability section for additional | 312 | determined. See the High Availability section for additional |
313 | information. The default value is 0. | 313 | information. The default value is 0. |
314 | 314 | ||
315 | mode | 315 | mode |
316 | 316 | ||
317 | Specifies one of the bonding policies. The default is | 317 | Specifies one of the bonding policies. The default is |
318 | balance-rr (round robin). Possible values are: | 318 | balance-rr (round robin). Possible values are: |
319 | 319 | ||
320 | balance-rr or 0 | 320 | balance-rr or 0 |
321 | 321 | ||
322 | Round-robin policy: Transmit packets in sequential | 322 | Round-robin policy: Transmit packets in sequential |
323 | order from the first available slave through the | 323 | order from the first available slave through the |
324 | last. This mode provides load balancing and fault | 324 | last. This mode provides load balancing and fault |
325 | tolerance. | 325 | tolerance. |
326 | 326 | ||
327 | active-backup or 1 | 327 | active-backup or 1 |
328 | 328 | ||
329 | Active-backup policy: Only one slave in the bond is | 329 | Active-backup policy: Only one slave in the bond is |
330 | active. A different slave becomes active if, and only | 330 | active. A different slave becomes active if, and only |
331 | if, the active slave fails. The bond's MAC address is | 331 | if, the active slave fails. The bond's MAC address is |
332 | externally visible on only one port (network adapter) | 332 | externally visible on only one port (network adapter) |
333 | to avoid confusing the switch. | 333 | to avoid confusing the switch. |
334 | 334 | ||
335 | In bonding version 2.6.2 or later, when a failover | 335 | In bonding version 2.6.2 or later, when a failover |
336 | occurs in active-backup mode, bonding will issue one | 336 | occurs in active-backup mode, bonding will issue one |
337 | or more gratuitous ARPs on the newly active slave. | 337 | or more gratuitous ARPs on the newly active slave. |
338 | One gratuitous ARP is issued for the bonding master | 338 | One gratuitous ARP is issued for the bonding master |
339 | interface and each VLAN interfaces configured above | 339 | interface and each VLAN interfaces configured above |
340 | it, provided that the interface has at least one IP | 340 | it, provided that the interface has at least one IP |
341 | address configured. Gratuitous ARPs issued for VLAN | 341 | address configured. Gratuitous ARPs issued for VLAN |
342 | interfaces are tagged with the appropriate VLAN id. | 342 | interfaces are tagged with the appropriate VLAN id. |
343 | 343 | ||
344 | This mode provides fault tolerance. The primary | 344 | This mode provides fault tolerance. The primary |
345 | option, documented below, affects the behavior of this | 345 | option, documented below, affects the behavior of this |
346 | mode. | 346 | mode. |
347 | 347 | ||
348 | balance-xor or 2 | 348 | balance-xor or 2 |
349 | 349 | ||
350 | XOR policy: Transmit based on the selected transmit | 350 | XOR policy: Transmit based on the selected transmit |
351 | hash policy. The default policy is a simple [(source | 351 | hash policy. The default policy is a simple [(source |
352 | MAC address XOR'd with destination MAC address) modulo | 352 | MAC address XOR'd with destination MAC address) modulo |
353 | slave count]. Alternate transmit policies may be | 353 | slave count]. Alternate transmit policies may be |
354 | selected via the xmit_hash_policy option, described | 354 | selected via the xmit_hash_policy option, described |
355 | below. | 355 | below. |
356 | 356 | ||
357 | This mode provides load balancing and fault tolerance. | 357 | This mode provides load balancing and fault tolerance. |
358 | 358 | ||
359 | broadcast or 3 | 359 | broadcast or 3 |
360 | 360 | ||
361 | Broadcast policy: transmits everything on all slave | 361 | Broadcast policy: transmits everything on all slave |
362 | interfaces. This mode provides fault tolerance. | 362 | interfaces. This mode provides fault tolerance. |
363 | 363 | ||
364 | 802.3ad or 4 | 364 | 802.3ad or 4 |
365 | 365 | ||
366 | IEEE 802.3ad Dynamic link aggregation. Creates | 366 | IEEE 802.3ad Dynamic link aggregation. Creates |
367 | aggregation groups that share the same speed and | 367 | aggregation groups that share the same speed and |
368 | duplex settings. Utilizes all slaves in the active | 368 | duplex settings. Utilizes all slaves in the active |
369 | aggregator according to the 802.3ad specification. | 369 | aggregator according to the 802.3ad specification. |
370 | 370 | ||
371 | Slave selection for outgoing traffic is done according | 371 | Slave selection for outgoing traffic is done according |
372 | to the transmit hash policy, which may be changed from | 372 | to the transmit hash policy, which may be changed from |
373 | the default simple XOR policy via the xmit_hash_policy | 373 | the default simple XOR policy via the xmit_hash_policy |
374 | option, documented below. Note that not all transmit | 374 | option, documented below. Note that not all transmit |
375 | policies may be 802.3ad compliant, particularly in | 375 | policies may be 802.3ad compliant, particularly in |
376 | regards to the packet mis-ordering requirements of | 376 | regards to the packet mis-ordering requirements of |
377 | section 43.2.4 of the 802.3ad standard. Differing | 377 | section 43.2.4 of the 802.3ad standard. Differing |
378 | peer implementations will have varying tolerances for | 378 | peer implementations will have varying tolerances for |
379 | noncompliance. | 379 | noncompliance. |
380 | 380 | ||
381 | Prerequisites: | 381 | Prerequisites: |
382 | 382 | ||
383 | 1. Ethtool support in the base drivers for retrieving | 383 | 1. Ethtool support in the base drivers for retrieving |
384 | the speed and duplex of each slave. | 384 | the speed and duplex of each slave. |
385 | 385 | ||
386 | 2. A switch that supports IEEE 802.3ad Dynamic link | 386 | 2. A switch that supports IEEE 802.3ad Dynamic link |
387 | aggregation. | 387 | aggregation. |
388 | 388 | ||
389 | Most switches will require some type of configuration | 389 | Most switches will require some type of configuration |
390 | to enable 802.3ad mode. | 390 | to enable 802.3ad mode. |
391 | 391 | ||
392 | balance-tlb or 5 | 392 | balance-tlb or 5 |
393 | 393 | ||
394 | Adaptive transmit load balancing: channel bonding that | 394 | Adaptive transmit load balancing: channel bonding that |
395 | does not require any special switch support. The | 395 | does not require any special switch support. The |
396 | outgoing traffic is distributed according to the | 396 | outgoing traffic is distributed according to the |
397 | current load (computed relative to the speed) on each | 397 | current load (computed relative to the speed) on each |
398 | slave. Incoming traffic is received by the current | 398 | slave. Incoming traffic is received by the current |
399 | slave. If the receiving slave fails, another slave | 399 | slave. If the receiving slave fails, another slave |
400 | takes over the MAC address of the failed receiving | 400 | takes over the MAC address of the failed receiving |
401 | slave. | 401 | slave. |
402 | 402 | ||
403 | Prerequisite: | 403 | Prerequisite: |
404 | 404 | ||
405 | Ethtool support in the base drivers for retrieving the | 405 | Ethtool support in the base drivers for retrieving the |
406 | speed of each slave. | 406 | speed of each slave. |
407 | 407 | ||
408 | balance-alb or 6 | 408 | balance-alb or 6 |
409 | 409 | ||
410 | Adaptive load balancing: includes balance-tlb plus | 410 | Adaptive load balancing: includes balance-tlb plus |
411 | receive load balancing (rlb) for IPV4 traffic, and | 411 | receive load balancing (rlb) for IPV4 traffic, and |
412 | does not require any special switch support. The | 412 | does not require any special switch support. The |
413 | receive load balancing is achieved by ARP negotiation. | 413 | receive load balancing is achieved by ARP negotiation. |
414 | The bonding driver intercepts the ARP Replies sent by | 414 | The bonding driver intercepts the ARP Replies sent by |
415 | the local system on their way out and overwrites the | 415 | the local system on their way out and overwrites the |
416 | source hardware address with the unique hardware | 416 | source hardware address with the unique hardware |
417 | address of one of the slaves in the bond such that | 417 | address of one of the slaves in the bond such that |
418 | different peers use different hardware addresses for | 418 | different peers use different hardware addresses for |
419 | the server. | 419 | the server. |
420 | 420 | ||
421 | Receive traffic from connections created by the server | 421 | Receive traffic from connections created by the server |
422 | is also balanced. When the local system sends an ARP | 422 | is also balanced. When the local system sends an ARP |
423 | Request the bonding driver copies and saves the peer's | 423 | Request the bonding driver copies and saves the peer's |
424 | IP information from the ARP packet. When the ARP | 424 | IP information from the ARP packet. When the ARP |
425 | Reply arrives from the peer, its hardware address is | 425 | Reply arrives from the peer, its hardware address is |
426 | retrieved and the bonding driver initiates an ARP | 426 | retrieved and the bonding driver initiates an ARP |
427 | reply to this peer assigning it to one of the slaves | 427 | reply to this peer assigning it to one of the slaves |
428 | in the bond. A problematic outcome of using ARP | 428 | in the bond. A problematic outcome of using ARP |
429 | negotiation for balancing is that each time that an | 429 | negotiation for balancing is that each time that an |
430 | ARP request is broadcast it uses the hardware address | 430 | ARP request is broadcast it uses the hardware address |
431 | of the bond. Hence, peers learn the hardware address | 431 | of the bond. Hence, peers learn the hardware address |
432 | of the bond and the balancing of receive traffic | 432 | of the bond and the balancing of receive traffic |
433 | collapses to the current slave. This is handled by | 433 | collapses to the current slave. This is handled by |
434 | sending updates (ARP Replies) to all the peers with | 434 | sending updates (ARP Replies) to all the peers with |
435 | their individually assigned hardware address such that | 435 | their individually assigned hardware address such that |
436 | the traffic is redistributed. Receive traffic is also | 436 | the traffic is redistributed. Receive traffic is also |
437 | redistributed when a new slave is added to the bond | 437 | redistributed when a new slave is added to the bond |
438 | and when an inactive slave is re-activated. The | 438 | and when an inactive slave is re-activated. The |
439 | receive load is distributed sequentially (round robin) | 439 | receive load is distributed sequentially (round robin) |
440 | among the group of highest speed slaves in the bond. | 440 | among the group of highest speed slaves in the bond. |
441 | 441 | ||
442 | When a link is reconnected or a new slave joins the | 442 | When a link is reconnected or a new slave joins the |
443 | bond the receive traffic is redistributed among all | 443 | bond the receive traffic is redistributed among all |
444 | active slaves in the bond by initiating ARP Replies | 444 | active slaves in the bond by initiating ARP Replies |
445 | with the selected MAC address to each of the | 445 | with the selected MAC address to each of the |
446 | clients. The updelay parameter (detailed below) must | 446 | clients. The updelay parameter (detailed below) must |
447 | be set to a value equal or greater than the switch's | 447 | be set to a value equal or greater than the switch's |
448 | forwarding delay so that the ARP Replies sent to the | 448 | forwarding delay so that the ARP Replies sent to the |
449 | peers will not be blocked by the switch. | 449 | peers will not be blocked by the switch. |
450 | 450 | ||
451 | Prerequisites: | 451 | Prerequisites: |
452 | 452 | ||
453 | 1. Ethtool support in the base drivers for retrieving | 453 | 1. Ethtool support in the base drivers for retrieving |
454 | the speed of each slave. | 454 | the speed of each slave. |
455 | 455 | ||
456 | 2. Base driver support for setting the hardware | 456 | 2. Base driver support for setting the hardware |
457 | address of a device while it is open. This is | 457 | address of a device while it is open. This is |
458 | required so that there will always be one slave in the | 458 | required so that there will always be one slave in the |
459 | team using the bond hardware address (the | 459 | team using the bond hardware address (the |
460 | curr_active_slave) while having a unique hardware | 460 | curr_active_slave) while having a unique hardware |
461 | address for each slave in the bond. If the | 461 | address for each slave in the bond. If the |
462 | curr_active_slave fails its hardware address is | 462 | curr_active_slave fails its hardware address is |
463 | swapped with the new curr_active_slave that was | 463 | swapped with the new curr_active_slave that was |
464 | chosen. | 464 | chosen. |
465 | 465 | ||
466 | primary | 466 | primary |
467 | 467 | ||
468 | A string (eth0, eth2, etc) specifying which slave is the | 468 | A string (eth0, eth2, etc) specifying which slave is the |
469 | primary device. The specified device will always be the | 469 | primary device. The specified device will always be the |
470 | active slave while it is available. Only when the primary is | 470 | active slave while it is available. Only when the primary is |
471 | off-line will alternate devices be used. This is useful when | 471 | off-line will alternate devices be used. This is useful when |
472 | one slave is preferred over another, e.g., when one slave has | 472 | one slave is preferred over another, e.g., when one slave has |
473 | higher throughput than another. | 473 | higher throughput than another. |
474 | 474 | ||
475 | The primary option is only valid for active-backup mode. | 475 | The primary option is only valid for active-backup mode. |
476 | 476 | ||
477 | updelay | 477 | updelay |
478 | 478 | ||
479 | Specifies the time, in milliseconds, to wait before enabling a | 479 | Specifies the time, in milliseconds, to wait before enabling a |
480 | slave after a link recovery has been detected. This option is | 480 | slave after a link recovery has been detected. This option is |
481 | only valid for the miimon link monitor. The updelay value | 481 | only valid for the miimon link monitor. The updelay value |
482 | should be a multiple of the miimon value; if not, it will be | 482 | should be a multiple of the miimon value; if not, it will be |
483 | rounded down to the nearest multiple. The default value is 0. | 483 | rounded down to the nearest multiple. The default value is 0. |
484 | 484 | ||
485 | use_carrier | 485 | use_carrier |
486 | 486 | ||
487 | Specifies whether or not miimon should use MII or ETHTOOL | 487 | Specifies whether or not miimon should use MII or ETHTOOL |
488 | ioctls vs. netif_carrier_ok() to determine the link | 488 | ioctls vs. netif_carrier_ok() to determine the link |
489 | status. The MII or ETHTOOL ioctls are less efficient and | 489 | status. The MII or ETHTOOL ioctls are less efficient and |
490 | utilize a deprecated calling sequence within the kernel. The | 490 | utilize a deprecated calling sequence within the kernel. The |
491 | netif_carrier_ok() relies on the device driver to maintain its | 491 | netif_carrier_ok() relies on the device driver to maintain its |
492 | state with netif_carrier_on/off; at this writing, most, but | 492 | state with netif_carrier_on/off; at this writing, most, but |
493 | not all, device drivers support this facility. | 493 | not all, device drivers support this facility. |
494 | 494 | ||
495 | If bonding insists that the link is up when it should not be, | 495 | If bonding insists that the link is up when it should not be, |
496 | it may be that your network device driver does not support | 496 | it may be that your network device driver does not support |
497 | netif_carrier_on/off. The default state for netif_carrier is | 497 | netif_carrier_on/off. The default state for netif_carrier is |
498 | "carrier on," so if a driver does not support netif_carrier, | 498 | "carrier on," so if a driver does not support netif_carrier, |
499 | it will appear as if the link is always up. In this case, | 499 | it will appear as if the link is always up. In this case, |
500 | setting use_carrier to 0 will cause bonding to revert to the | 500 | setting use_carrier to 0 will cause bonding to revert to the |
501 | MII / ETHTOOL ioctl method to determine the link state. | 501 | MII / ETHTOOL ioctl method to determine the link state. |
502 | 502 | ||
503 | A value of 1 enables the use of netif_carrier_ok(), a value of | 503 | A value of 1 enables the use of netif_carrier_ok(), a value of |
504 | 0 will use the deprecated MII / ETHTOOL ioctls. The default | 504 | 0 will use the deprecated MII / ETHTOOL ioctls. The default |
505 | value is 1. | 505 | value is 1. |
506 | 506 | ||
507 | xmit_hash_policy | 507 | xmit_hash_policy |
508 | 508 | ||
509 | Selects the transmit hash policy to use for slave selection in | 509 | Selects the transmit hash policy to use for slave selection in |
510 | balance-xor and 802.3ad modes. Possible values are: | 510 | balance-xor and 802.3ad modes. Possible values are: |
511 | 511 | ||
512 | layer2 | 512 | layer2 |
513 | 513 | ||
514 | Uses XOR of hardware MAC addresses to generate the | 514 | Uses XOR of hardware MAC addresses to generate the |
515 | hash. The formula is | 515 | hash. The formula is |
516 | 516 | ||
517 | (source MAC XOR destination MAC) modulo slave count | 517 | (source MAC XOR destination MAC) modulo slave count |
518 | 518 | ||
519 | This algorithm will place all traffic to a particular | 519 | This algorithm will place all traffic to a particular |
520 | network peer on the same slave. | 520 | network peer on the same slave. |
521 | 521 | ||
522 | This algorithm is 802.3ad compliant. | 522 | This algorithm is 802.3ad compliant. |
523 | 523 | ||
524 | layer3+4 | 524 | layer3+4 |
525 | 525 | ||
526 | This policy uses upper layer protocol information, | 526 | This policy uses upper layer protocol information, |
527 | when available, to generate the hash. This allows for | 527 | when available, to generate the hash. This allows for |
528 | traffic to a particular network peer to span multiple | 528 | traffic to a particular network peer to span multiple |
529 | slaves, although a single connection will not span | 529 | slaves, although a single connection will not span |
530 | multiple slaves. | 530 | multiple slaves. |
531 | 531 | ||
532 | The formula for unfragmented TCP and UDP packets is | 532 | The formula for unfragmented TCP and UDP packets is |
533 | 533 | ||
534 | ((source port XOR dest port) XOR | 534 | ((source port XOR dest port) XOR |
535 | ((source IP XOR dest IP) AND 0xffff) | 535 | ((source IP XOR dest IP) AND 0xffff) |
536 | modulo slave count | 536 | modulo slave count |
537 | 537 | ||
538 | For fragmented TCP or UDP packets and all other IP | 538 | For fragmented TCP or UDP packets and all other IP |
539 | protocol traffic, the source and destination port | 539 | protocol traffic, the source and destination port |
540 | information is omitted. For non-IP traffic, the | 540 | information is omitted. For non-IP traffic, the |
541 | formula is the same as for the layer2 transmit hash | 541 | formula is the same as for the layer2 transmit hash |
542 | policy. | 542 | policy. |
543 | 543 | ||
544 | This policy is intended to mimic the behavior of | 544 | This policy is intended to mimic the behavior of |
545 | certain switches, notably Cisco switches with PFC2 as | 545 | certain switches, notably Cisco switches with PFC2 as |
546 | well as some Foundry and IBM products. | 546 | well as some Foundry and IBM products. |
547 | 547 | ||
548 | This algorithm is not fully 802.3ad compliant. A | 548 | This algorithm is not fully 802.3ad compliant. A |
549 | single TCP or UDP conversation containing both | 549 | single TCP or UDP conversation containing both |
550 | fragmented and unfragmented packets will see packets | 550 | fragmented and unfragmented packets will see packets |
551 | striped across two interfaces. This may result in out | 551 | striped across two interfaces. This may result in out |
552 | of order delivery. Most traffic types will not meet | 552 | of order delivery. Most traffic types will not meet |
553 | this criteria, as TCP rarely fragments traffic, and | 553 | this criteria, as TCP rarely fragments traffic, and |
554 | most UDP traffic is not involved in extended | 554 | most UDP traffic is not involved in extended |
555 | conversations. Other implementations of 802.3ad may | 555 | conversations. Other implementations of 802.3ad may |
556 | or may not tolerate this noncompliance. | 556 | or may not tolerate this noncompliance. |
557 | 557 | ||
558 | The default value is layer2. This option was added in bonding | 558 | The default value is layer2. This option was added in bonding |
559 | version 2.6.3. In earlier versions of bonding, this parameter does | 559 | version 2.6.3. In earlier versions of bonding, this parameter does |
560 | not exist, and the layer2 policy is the only policy. | 560 | not exist, and the layer2 policy is the only policy. |
561 | 561 | ||
562 | 562 | ||
563 | 3. Configuring Bonding Devices | 563 | 3. Configuring Bonding Devices |
564 | ============================== | 564 | ============================== |
565 | 565 | ||
566 | You can configure bonding using either your distro's network | 566 | You can configure bonding using either your distro's network |
567 | initialization scripts, or manually using either ifenslave or the | 567 | initialization scripts, or manually using either ifenslave or the |
568 | sysfs interface. Distros generally use one of two packages for the | 568 | sysfs interface. Distros generally use one of two packages for the |
569 | network initialization scripts: initscripts or sysconfig. Recent | 569 | network initialization scripts: initscripts or sysconfig. Recent |
570 | versions of these packages have support for bonding, while older | 570 | versions of these packages have support for bonding, while older |
571 | versions do not. | 571 | versions do not. |
572 | 572 | ||
573 | We will first describe the options for configuring bonding for | 573 | We will first describe the options for configuring bonding for |
574 | distros using versions of initscripts and sysconfig with full or | 574 | distros using versions of initscripts and sysconfig with full or |
575 | partial support for bonding, then provide information on enabling | 575 | partial support for bonding, then provide information on enabling |
576 | bonding without support from the network initialization scripts (i.e., | 576 | bonding without support from the network initialization scripts (i.e., |
577 | older versions of initscripts or sysconfig). | 577 | older versions of initscripts or sysconfig). |
578 | 578 | ||
579 | If you're unsure whether your distro uses sysconfig or | 579 | If you're unsure whether your distro uses sysconfig or |
580 | initscripts, or don't know if it's new enough, have no fear. | 580 | initscripts, or don't know if it's new enough, have no fear. |
581 | Determining this is fairly straightforward. | 581 | Determining this is fairly straightforward. |
582 | 582 | ||
583 | First, issue the command: | 583 | First, issue the command: |
584 | 584 | ||
585 | $ rpm -qf /sbin/ifup | 585 | $ rpm -qf /sbin/ifup |
586 | 586 | ||
587 | It will respond with a line of text starting with either | 587 | It will respond with a line of text starting with either |
588 | "initscripts" or "sysconfig," followed by some numbers. This is the | 588 | "initscripts" or "sysconfig," followed by some numbers. This is the |
589 | package that provides your network initialization scripts. | 589 | package that provides your network initialization scripts. |
590 | 590 | ||
591 | Next, to determine if your installation supports bonding, | 591 | Next, to determine if your installation supports bonding, |
592 | issue the command: | 592 | issue the command: |
593 | 593 | ||
594 | $ grep ifenslave /sbin/ifup | 594 | $ grep ifenslave /sbin/ifup |
595 | 595 | ||
596 | If this returns any matches, then your initscripts or | 596 | If this returns any matches, then your initscripts or |
597 | sysconfig has support for bonding. | 597 | sysconfig has support for bonding. |
598 | 598 | ||
599 | 3.1 Configuration with Sysconfig Support | 599 | 3.1 Configuration with Sysconfig Support |
600 | ---------------------------------------- | 600 | ---------------------------------------- |
601 | 601 | ||
602 | This section applies to distros using a version of sysconfig | 602 | This section applies to distros using a version of sysconfig |
603 | with bonding support, for example, SuSE Linux Enterprise Server 9. | 603 | with bonding support, for example, SuSE Linux Enterprise Server 9. |
604 | 604 | ||
605 | SuSE SLES 9's networking configuration system does support | 605 | SuSE SLES 9's networking configuration system does support |
606 | bonding, however, at this writing, the YaST system configuration | 606 | bonding, however, at this writing, the YaST system configuration |
607 | front end does not provide any means to work with bonding devices. | 607 | front end does not provide any means to work with bonding devices. |
608 | Bonding devices can be managed by hand, however, as follows. | 608 | Bonding devices can be managed by hand, however, as follows. |
609 | 609 | ||
610 | First, if they have not already been configured, configure the | 610 | First, if they have not already been configured, configure the |
611 | slave devices. On SLES 9, this is most easily done by running the | 611 | slave devices. On SLES 9, this is most easily done by running the |
612 | yast2 sysconfig configuration utility. The goal is for to create an | 612 | yast2 sysconfig configuration utility. The goal is for to create an |
613 | ifcfg-id file for each slave device. The simplest way to accomplish | 613 | ifcfg-id file for each slave device. The simplest way to accomplish |
614 | this is to configure the devices for DHCP (this is only to get the | 614 | this is to configure the devices for DHCP (this is only to get the |
615 | file ifcfg-id file created; see below for some issues with DHCP). The | 615 | file ifcfg-id file created; see below for some issues with DHCP). The |
616 | name of the configuration file for each device will be of the form: | 616 | name of the configuration file for each device will be of the form: |
617 | 617 | ||
618 | ifcfg-id-xx:xx:xx:xx:xx:xx | 618 | ifcfg-id-xx:xx:xx:xx:xx:xx |
619 | 619 | ||
620 | Where the "xx" portion will be replaced with the digits from | 620 | Where the "xx" portion will be replaced with the digits from |
621 | the device's permanent MAC address. | 621 | the device's permanent MAC address. |
622 | 622 | ||
623 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been | 623 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been |
624 | created, it is necessary to edit the configuration files for the slave | 624 | created, it is necessary to edit the configuration files for the slave |
625 | devices (the MAC addresses correspond to those of the slave devices). | 625 | devices (the MAC addresses correspond to those of the slave devices). |
626 | Before editing, the file will contain multiple lines, and will look | 626 | Before editing, the file will contain multiple lines, and will look |
627 | something like this: | 627 | something like this: |
628 | 628 | ||
629 | BOOTPROTO='dhcp' | 629 | BOOTPROTO='dhcp' |
630 | STARTMODE='on' | 630 | STARTMODE='on' |
631 | USERCTL='no' | 631 | USERCTL='no' |
632 | UNIQUE='XNzu.WeZGOGF+4wE' | 632 | UNIQUE='XNzu.WeZGOGF+4wE' |
633 | _nm_name='bus-pci-0001:61:01.0' | 633 | _nm_name='bus-pci-0001:61:01.0' |
634 | 634 | ||
635 | Change the BOOTPROTO and STARTMODE lines to the following: | 635 | Change the BOOTPROTO and STARTMODE lines to the following: |
636 | 636 | ||
637 | BOOTPROTO='none' | 637 | BOOTPROTO='none' |
638 | STARTMODE='off' | 638 | STARTMODE='off' |
639 | 639 | ||
640 | Do not alter the UNIQUE or _nm_name lines. Remove any other | 640 | Do not alter the UNIQUE or _nm_name lines. Remove any other |
641 | lines (USERCTL, etc). | 641 | lines (USERCTL, etc). |
642 | 642 | ||
643 | Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified, | 643 | Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified, |
644 | it's time to create the configuration file for the bonding device | 644 | it's time to create the configuration file for the bonding device |
645 | itself. This file is named ifcfg-bondX, where X is the number of the | 645 | itself. This file is named ifcfg-bondX, where X is the number of the |
646 | bonding device to create, starting at 0. The first such file is | 646 | bonding device to create, starting at 0. The first such file is |
647 | ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig | 647 | ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig |
648 | network configuration system will correctly start multiple instances | 648 | network configuration system will correctly start multiple instances |
649 | of bonding. | 649 | of bonding. |
650 | 650 | ||
651 | The contents of the ifcfg-bondX file is as follows: | 651 | The contents of the ifcfg-bondX file is as follows: |
652 | 652 | ||
653 | BOOTPROTO="static" | 653 | BOOTPROTO="static" |
654 | BROADCAST="10.0.2.255" | 654 | BROADCAST="10.0.2.255" |
655 | IPADDR="10.0.2.10" | 655 | IPADDR="10.0.2.10" |
656 | NETMASK="255.255.0.0" | 656 | NETMASK="255.255.0.0" |
657 | NETWORK="10.0.2.0" | 657 | NETWORK="10.0.2.0" |
658 | REMOTE_IPADDR="" | 658 | REMOTE_IPADDR="" |
659 | STARTMODE="onboot" | 659 | STARTMODE="onboot" |
660 | BONDING_MASTER="yes" | 660 | BONDING_MASTER="yes" |
661 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" | 661 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" |
662 | BONDING_SLAVE0="eth0" | 662 | BONDING_SLAVE0="eth0" |
663 | BONDING_SLAVE1="bus-pci-0000:06:08.1" | 663 | BONDING_SLAVE1="bus-pci-0000:06:08.1" |
664 | 664 | ||
665 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK | 665 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK |
666 | values with the appropriate values for your network. | 666 | values with the appropriate values for your network. |
667 | 667 | ||
668 | The STARTMODE specifies when the device is brought online. | 668 | The STARTMODE specifies when the device is brought online. |
669 | The possible values are: | 669 | The possible values are: |
670 | 670 | ||
671 | onboot: The device is started at boot time. If you're not | 671 | onboot: The device is started at boot time. If you're not |
672 | sure, this is probably what you want. | 672 | sure, this is probably what you want. |
673 | 673 | ||
674 | manual: The device is started only when ifup is called | 674 | manual: The device is started only when ifup is called |
675 | manually. Bonding devices may be configured this | 675 | manually. Bonding devices may be configured this |
676 | way if you do not wish them to start automatically | 676 | way if you do not wish them to start automatically |
677 | at boot for some reason. | 677 | at boot for some reason. |
678 | 678 | ||
679 | hotplug: The device is started by a hotplug event. This is not | 679 | hotplug: The device is started by a hotplug event. This is not |
680 | a valid choice for a bonding device. | 680 | a valid choice for a bonding device. |
681 | 681 | ||
682 | off or ignore: The device configuration is ignored. | 682 | off or ignore: The device configuration is ignored. |
683 | 683 | ||
684 | The line BONDING_MASTER='yes' indicates that the device is a | 684 | The line BONDING_MASTER='yes' indicates that the device is a |
685 | bonding master device. The only useful value is "yes." | 685 | bonding master device. The only useful value is "yes." |
686 | 686 | ||
687 | The contents of BONDING_MODULE_OPTS are supplied to the | 687 | The contents of BONDING_MODULE_OPTS are supplied to the |
688 | instance of the bonding module for this device. Specify the options | 688 | instance of the bonding module for this device. Specify the options |
689 | for the bonding mode, link monitoring, and so on here. Do not include | 689 | for the bonding mode, link monitoring, and so on here. Do not include |
690 | the max_bonds bonding parameter; this will confuse the configuration | 690 | the max_bonds bonding parameter; this will confuse the configuration |
691 | system if you have multiple bonding devices. | 691 | system if you have multiple bonding devices. |
692 | 692 | ||
693 | Finally, supply one BONDING_SLAVEn="slave device" for each | 693 | Finally, supply one BONDING_SLAVEn="slave device" for each |
694 | slave. where "n" is an increasing value, one for each slave. The | 694 | slave. where "n" is an increasing value, one for each slave. The |
695 | "slave device" is either an interface name, e.g., "eth0", or a device | 695 | "slave device" is either an interface name, e.g., "eth0", or a device |
696 | specifier for the network device. The interface name is easier to | 696 | specifier for the network device. The interface name is easier to |
697 | find, but the ethN names are subject to change at boot time if, e.g., | 697 | find, but the ethN names are subject to change at boot time if, e.g., |
698 | a device early in the sequence has failed. The device specifiers | 698 | a device early in the sequence has failed. The device specifiers |
699 | (bus-pci-0000:06:08.1 in the example above) specify the physical | 699 | (bus-pci-0000:06:08.1 in the example above) specify the physical |
700 | network device, and will not change unless the device's bus location | 700 | network device, and will not change unless the device's bus location |
701 | changes (for example, it is moved from one PCI slot to another). The | 701 | changes (for example, it is moved from one PCI slot to another). The |
702 | example above uses one of each type for demonstration purposes; most | 702 | example above uses one of each type for demonstration purposes; most |
703 | configurations will choose one or the other for all slave devices. | 703 | configurations will choose one or the other for all slave devices. |
704 | 704 | ||
705 | When all configuration files have been modified or created, | 705 | When all configuration files have been modified or created, |
706 | networking must be restarted for the configuration changes to take | 706 | networking must be restarted for the configuration changes to take |
707 | effect. This can be accomplished via the following: | 707 | effect. This can be accomplished via the following: |
708 | 708 | ||
709 | # /etc/init.d/network restart | 709 | # /etc/init.d/network restart |
710 | 710 | ||
711 | Note that the network control script (/sbin/ifdown) will | 711 | Note that the network control script (/sbin/ifdown) will |
712 | remove the bonding module as part of the network shutdown processing, | 712 | remove the bonding module as part of the network shutdown processing, |
713 | so it is not necessary to remove the module by hand if, e.g., the | 713 | so it is not necessary to remove the module by hand if, e.g., the |
714 | module parameters have changed. | 714 | module parameters have changed. |
715 | 715 | ||
716 | Also, at this writing, YaST/YaST2 will not manage bonding | 716 | Also, at this writing, YaST/YaST2 will not manage bonding |
717 | devices (they do not show bonding interfaces on its list of network | 717 | devices (they do not show bonding interfaces on its list of network |
718 | devices). It is necessary to edit the configuration file by hand to | 718 | devices). It is necessary to edit the configuration file by hand to |
719 | change the bonding configuration. | 719 | change the bonding configuration. |
720 | 720 | ||
721 | Additional general options and details of the ifcfg file | 721 | Additional general options and details of the ifcfg file |
722 | format can be found in an example ifcfg template file: | 722 | format can be found in an example ifcfg template file: |
723 | 723 | ||
724 | /etc/sysconfig/network/ifcfg.template | 724 | /etc/sysconfig/network/ifcfg.template |
725 | 725 | ||
726 | Note that the template does not document the various BONDING_ | 726 | Note that the template does not document the various BONDING_ |
727 | settings described above, but does describe many of the other options. | 727 | settings described above, but does describe many of the other options. |
728 | 728 | ||
729 | 3.1.1 Using DHCP with Sysconfig | 729 | 3.1.1 Using DHCP with Sysconfig |
730 | ------------------------------- | 730 | ------------------------------- |
731 | 731 | ||
732 | Under sysconfig, configuring a device with BOOTPROTO='dhcp' | 732 | Under sysconfig, configuring a device with BOOTPROTO='dhcp' |
733 | will cause it to query DHCP for its IP address information. At this | 733 | will cause it to query DHCP for its IP address information. At this |
734 | writing, this does not function for bonding devices; the scripts | 734 | writing, this does not function for bonding devices; the scripts |
735 | attempt to obtain the device address from DHCP prior to adding any of | 735 | attempt to obtain the device address from DHCP prior to adding any of |
736 | the slave devices. Without active slaves, the DHCP requests are not | 736 | the slave devices. Without active slaves, the DHCP requests are not |
737 | sent to the network. | 737 | sent to the network. |
738 | 738 | ||
739 | 3.1.2 Configuring Multiple Bonds with Sysconfig | 739 | 3.1.2 Configuring Multiple Bonds with Sysconfig |
740 | ----------------------------------------------- | 740 | ----------------------------------------------- |
741 | 741 | ||
742 | The sysconfig network initialization system is capable of | 742 | The sysconfig network initialization system is capable of |
743 | handling multiple bonding devices. All that is necessary is for each | 743 | handling multiple bonding devices. All that is necessary is for each |
744 | bonding instance to have an appropriately configured ifcfg-bondX file | 744 | bonding instance to have an appropriately configured ifcfg-bondX file |
745 | (as described above). Do not specify the "max_bonds" parameter to any | 745 | (as described above). Do not specify the "max_bonds" parameter to any |
746 | instance of bonding, as this will confuse sysconfig. If you require | 746 | instance of bonding, as this will confuse sysconfig. If you require |
747 | multiple bonding devices with identical parameters, create multiple | 747 | multiple bonding devices with identical parameters, create multiple |
748 | ifcfg-bondX files. | 748 | ifcfg-bondX files. |
749 | 749 | ||
750 | Because the sysconfig scripts supply the bonding module | 750 | Because the sysconfig scripts supply the bonding module |
751 | options in the ifcfg-bondX file, it is not necessary to add them to | 751 | options in the ifcfg-bondX file, it is not necessary to add them to |
752 | the system /etc/modules.conf or /etc/modprobe.conf configuration file. | 752 | the system /etc/modules.conf or /etc/modprobe.conf configuration file. |
753 | 753 | ||
754 | 3.2 Configuration with Initscripts Support | 754 | 3.2 Configuration with Initscripts Support |
755 | ------------------------------------------ | 755 | ------------------------------------------ |
756 | 756 | ||
757 | This section applies to distros using a version of initscripts | 757 | This section applies to distros using a version of initscripts |
758 | with bonding support, for example, Red Hat Linux 9 or Red Hat | 758 | with bonding support, for example, Red Hat Linux 9 or Red Hat |
759 | Enterprise Linux version 3 or 4. On these systems, the network | 759 | Enterprise Linux version 3 or 4. On these systems, the network |
760 | initialization scripts have some knowledge of bonding, and can be | 760 | initialization scripts have some knowledge of bonding, and can be |
761 | configured to control bonding devices. | 761 | configured to control bonding devices. |
762 | 762 | ||
763 | These distros will not automatically load the network adapter | 763 | These distros will not automatically load the network adapter |
764 | driver unless the ethX device is configured with an IP address. | 764 | driver unless the ethX device is configured with an IP address. |
765 | Because of this constraint, users must manually configure a | 765 | Because of this constraint, users must manually configure a |
766 | network-script file for all physical adapters that will be members of | 766 | network-script file for all physical adapters that will be members of |
767 | a bondX link. Network script files are located in the directory: | 767 | a bondX link. Network script files are located in the directory: |
768 | 768 | ||
769 | /etc/sysconfig/network-scripts | 769 | /etc/sysconfig/network-scripts |
770 | 770 | ||
771 | The file name must be prefixed with "ifcfg-eth" and suffixed | 771 | The file name must be prefixed with "ifcfg-eth" and suffixed |
772 | with the adapter's physical adapter number. For example, the script | 772 | with the adapter's physical adapter number. For example, the script |
773 | for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0. | 773 | for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0. |
774 | Place the following text in the file: | 774 | Place the following text in the file: |
775 | 775 | ||
776 | DEVICE=eth0 | 776 | DEVICE=eth0 |
777 | USERCTL=no | 777 | USERCTL=no |
778 | ONBOOT=yes | 778 | ONBOOT=yes |
779 | MASTER=bond0 | 779 | MASTER=bond0 |
780 | SLAVE=yes | 780 | SLAVE=yes |
781 | BOOTPROTO=none | 781 | BOOTPROTO=none |
782 | 782 | ||
783 | The DEVICE= line will be different for every ethX device and | 783 | The DEVICE= line will be different for every ethX device and |
784 | must correspond with the name of the file, i.e., ifcfg-eth1 must have | 784 | must correspond with the name of the file, i.e., ifcfg-eth1 must have |
785 | a device line of DEVICE=eth1. The setting of the MASTER= line will | 785 | a device line of DEVICE=eth1. The setting of the MASTER= line will |
786 | also depend on the final bonding interface name chosen for your bond. | 786 | also depend on the final bonding interface name chosen for your bond. |
787 | As with other network devices, these typically start at 0, and go up | 787 | As with other network devices, these typically start at 0, and go up |
788 | one for each device, i.e., the first bonding instance is bond0, the | 788 | one for each device, i.e., the first bonding instance is bond0, the |
789 | second is bond1, and so on. | 789 | second is bond1, and so on. |
790 | 790 | ||
791 | Next, create a bond network script. The file name for this | 791 | Next, create a bond network script. The file name for this |
792 | script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is | 792 | script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is |
793 | the number of the bond. For bond0 the file is named "ifcfg-bond0", | 793 | the number of the bond. For bond0 the file is named "ifcfg-bond0", |
794 | for bond1 it is named "ifcfg-bond1", and so on. Within that file, | 794 | for bond1 it is named "ifcfg-bond1", and so on. Within that file, |
795 | place the following text: | 795 | place the following text: |
796 | 796 | ||
797 | DEVICE=bond0 | 797 | DEVICE=bond0 |
798 | IPADDR=192.168.1.1 | 798 | IPADDR=192.168.1.1 |
799 | NETMASK=255.255.255.0 | 799 | NETMASK=255.255.255.0 |
800 | NETWORK=192.168.1.0 | 800 | NETWORK=192.168.1.0 |
801 | BROADCAST=192.168.1.255 | 801 | BROADCAST=192.168.1.255 |
802 | ONBOOT=yes | 802 | ONBOOT=yes |
803 | BOOTPROTO=none | 803 | BOOTPROTO=none |
804 | USERCTL=no | 804 | USERCTL=no |
805 | 805 | ||
806 | Be sure to change the networking specific lines (IPADDR, | 806 | Be sure to change the networking specific lines (IPADDR, |
807 | NETMASK, NETWORK and BROADCAST) to match your network configuration. | 807 | NETMASK, NETWORK and BROADCAST) to match your network configuration. |
808 | 808 | ||
809 | Finally, it is necessary to edit /etc/modules.conf (or | 809 | Finally, it is necessary to edit /etc/modules.conf (or |
810 | /etc/modprobe.conf, depending upon your distro) to load the bonding | 810 | /etc/modprobe.conf, depending upon your distro) to load the bonding |
811 | module with your desired options when the bond0 interface is brought | 811 | module with your desired options when the bond0 interface is brought |
812 | up. The following lines in /etc/modules.conf (or modprobe.conf) will | 812 | up. The following lines in /etc/modules.conf (or modprobe.conf) will |
813 | load the bonding module, and select its options: | 813 | load the bonding module, and select its options: |
814 | 814 | ||
815 | alias bond0 bonding | 815 | alias bond0 bonding |
816 | options bond0 mode=balance-alb miimon=100 | 816 | options bond0 mode=balance-alb miimon=100 |
817 | 817 | ||
818 | Replace the sample parameters with the appropriate set of | 818 | Replace the sample parameters with the appropriate set of |
819 | options for your configuration. | 819 | options for your configuration. |
820 | 820 | ||
821 | Finally run "/etc/rc.d/init.d/network restart" as root. This | 821 | Finally run "/etc/rc.d/init.d/network restart" as root. This |
822 | will restart the networking subsystem and your bond link should be now | 822 | will restart the networking subsystem and your bond link should be now |
823 | up and running. | 823 | up and running. |
824 | 824 | ||
825 | 3.2.1 Using DHCP with Initscripts | 825 | 3.2.1 Using DHCP with Initscripts |
826 | --------------------------------- | 826 | --------------------------------- |
827 | 827 | ||
828 | Recent versions of initscripts (the version supplied with | 828 | Recent versions of initscripts (the version supplied with |
829 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do | 829 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do |
830 | have support for assigning IP information to bonding devices via DHCP. | 830 | have support for assigning IP information to bonding devices via DHCP. |
831 | 831 | ||
832 | To configure bonding for DHCP, configure it as described | 832 | To configure bonding for DHCP, configure it as described |
833 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" | 833 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" |
834 | and add a line consisting of "TYPE=Bonding". Note that the TYPE value | 834 | and add a line consisting of "TYPE=Bonding". Note that the TYPE value |
835 | is case sensitive. | 835 | is case sensitive. |
836 | 836 | ||
837 | 3.2.2 Configuring Multiple Bonds with Initscripts | 837 | 3.2.2 Configuring Multiple Bonds with Initscripts |
838 | ------------------------------------------------- | 838 | ------------------------------------------------- |
839 | 839 | ||
840 | At this writing, the initscripts package does not directly | 840 | At this writing, the initscripts package does not directly |
841 | support loading the bonding driver multiple times, so the process for | 841 | support loading the bonding driver multiple times, so the process for |
842 | doing so is the same as described in the "Configuring Multiple Bonds | 842 | doing so is the same as described in the "Configuring Multiple Bonds |
843 | Manually" section, below. | 843 | Manually" section, below. |
844 | 844 | ||
845 | NOTE: It has been observed that some Red Hat supplied kernels | 845 | NOTE: It has been observed that some Red Hat supplied kernels |
846 | are apparently unable to rename modules at load time (the "-o bond1" | 846 | are apparently unable to rename modules at load time (the "-o bond1" |
847 | part). Attempts to pass that option to modprobe will produce an | 847 | part). Attempts to pass that option to modprobe will produce an |
848 | "Operation not permitted" error. This has been reported on some | 848 | "Operation not permitted" error. This has been reported on some |
849 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels | 849 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels |
850 | exhibiting this problem, it will be impossible to configure multiple | 850 | exhibiting this problem, it will be impossible to configure multiple |
851 | bonds with differing parameters. | 851 | bonds with differing parameters. |
852 | 852 | ||
853 | 3.3 Configuring Bonding Manually with Ifenslave | 853 | 3.3 Configuring Bonding Manually with Ifenslave |
854 | ----------------------------------------------- | 854 | ----------------------------------------------- |
855 | 855 | ||
856 | This section applies to distros whose network initialization | 856 | This section applies to distros whose network initialization |
857 | scripts (the sysconfig or initscripts package) do not have specific | 857 | scripts (the sysconfig or initscripts package) do not have specific |
858 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server | 858 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server |
859 | version 8. | 859 | version 8. |
860 | 860 | ||
861 | The general method for these systems is to place the bonding | 861 | The general method for these systems is to place the bonding |
862 | module parameters into /etc/modules.conf or /etc/modprobe.conf (as | 862 | module parameters into /etc/modules.conf or /etc/modprobe.conf (as |
863 | appropriate for the installed distro), then add modprobe and/or | 863 | appropriate for the installed distro), then add modprobe and/or |
864 | ifenslave commands to the system's global init script. The name of | 864 | ifenslave commands to the system's global init script. The name of |
865 | the global init script differs; for sysconfig, it is | 865 | the global init script differs; for sysconfig, it is |
866 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. | 866 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. |
867 | 867 | ||
868 | For example, if you wanted to make a simple bond of two e100 | 868 | For example, if you wanted to make a simple bond of two e100 |
869 | devices (presumed to be eth0 and eth1), and have it persist across | 869 | devices (presumed to be eth0 and eth1), and have it persist across |
870 | reboots, edit the appropriate file (/etc/init.d/boot.local or | 870 | reboots, edit the appropriate file (/etc/init.d/boot.local or |
871 | /etc/rc.d/rc.local), and add the following: | 871 | /etc/rc.d/rc.local), and add the following: |
872 | 872 | ||
873 | modprobe bonding mode=balance-alb miimon=100 | 873 | modprobe bonding mode=balance-alb miimon=100 |
874 | modprobe e100 | 874 | modprobe e100 |
875 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up | 875 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up |
876 | ifenslave bond0 eth0 | 876 | ifenslave bond0 eth0 |
877 | ifenslave bond0 eth1 | 877 | ifenslave bond0 eth1 |
878 | 878 | ||
879 | Replace the example bonding module parameters and bond0 | 879 | Replace the example bonding module parameters and bond0 |
880 | network configuration (IP address, netmask, etc) with the appropriate | 880 | network configuration (IP address, netmask, etc) with the appropriate |
881 | values for your configuration. | 881 | values for your configuration. |
882 | 882 | ||
883 | Unfortunately, this method will not provide support for the | 883 | Unfortunately, this method will not provide support for the |
884 | ifup and ifdown scripts on the bond devices. To reload the bonding | 884 | ifup and ifdown scripts on the bond devices. To reload the bonding |
885 | configuration, it is necessary to run the initialization script, e.g., | 885 | configuration, it is necessary to run the initialization script, e.g., |
886 | 886 | ||
887 | # /etc/init.d/boot.local | 887 | # /etc/init.d/boot.local |
888 | 888 | ||
889 | or | 889 | or |
890 | 890 | ||
891 | # /etc/rc.d/rc.local | 891 | # /etc/rc.d/rc.local |
892 | 892 | ||
893 | It may be desirable in such a case to create a separate script | 893 | It may be desirable in such a case to create a separate script |
894 | which only initializes the bonding configuration, then call that | 894 | which only initializes the bonding configuration, then call that |
895 | separate script from within boot.local. This allows for bonding to be | 895 | separate script from within boot.local. This allows for bonding to be |
896 | enabled without re-running the entire global init script. | 896 | enabled without re-running the entire global init script. |
897 | 897 | ||
898 | To shut down the bonding devices, it is necessary to first | 898 | To shut down the bonding devices, it is necessary to first |
899 | mark the bonding device itself as being down, then remove the | 899 | mark the bonding device itself as being down, then remove the |
900 | appropriate device driver modules. For our example above, you can do | 900 | appropriate device driver modules. For our example above, you can do |
901 | the following: | 901 | the following: |
902 | 902 | ||
903 | # ifconfig bond0 down | 903 | # ifconfig bond0 down |
904 | # rmmod bonding | 904 | # rmmod bonding |
905 | # rmmod e100 | 905 | # rmmod e100 |
906 | 906 | ||
907 | Again, for convenience, it may be desirable to create a script | 907 | Again, for convenience, it may be desirable to create a script |
908 | with these commands. | 908 | with these commands. |
909 | 909 | ||
910 | 910 | ||
911 | 3.3.1 Configuring Multiple Bonds Manually | 911 | 3.3.1 Configuring Multiple Bonds Manually |
912 | ----------------------------------------- | 912 | ----------------------------------------- |
913 | 913 | ||
914 | This section contains information on configuring multiple | 914 | This section contains information on configuring multiple |
915 | bonding devices with differing options for those systems whose network | 915 | bonding devices with differing options for those systems whose network |
916 | initialization scripts lack support for configuring multiple bonds. | 916 | initialization scripts lack support for configuring multiple bonds. |
917 | 917 | ||
918 | If you require multiple bonding devices, but all with the same | 918 | If you require multiple bonding devices, but all with the same |
919 | options, you may wish to use the "max_bonds" module parameter, | 919 | options, you may wish to use the "max_bonds" module parameter, |
920 | documented above. | 920 | documented above. |
921 | 921 | ||
922 | To create multiple bonding devices with differing options, it | 922 | To create multiple bonding devices with differing options, it |
923 | is necessary to load the bonding driver multiple times. Note that | 923 | is necessary to load the bonding driver multiple times. Note that |
924 | current versions of the sysconfig network initialization scripts | 924 | current versions of the sysconfig network initialization scripts |
925 | handle this automatically; if your distro uses these scripts, no | 925 | handle this automatically; if your distro uses these scripts, no |
926 | special action is needed. See the section Configuring Bonding | 926 | special action is needed. See the section Configuring Bonding |
927 | Devices, above, if you're not sure about your network initialization | 927 | Devices, above, if you're not sure about your network initialization |
928 | scripts. | 928 | scripts. |
929 | 929 | ||
930 | To load multiple instances of the module, it is necessary to | 930 | To load multiple instances of the module, it is necessary to |
931 | specify a different name for each instance (the module loading system | 931 | specify a different name for each instance (the module loading system |
932 | requires that every loaded module, even multiple instances of the same | 932 | requires that every loaded module, even multiple instances of the same |
933 | module, have a unique name). This is accomplished by supplying | 933 | module, have a unique name). This is accomplished by supplying |
934 | multiple sets of bonding options in /etc/modprobe.conf, for example: | 934 | multiple sets of bonding options in /etc/modprobe.conf, for example: |
935 | 935 | ||
936 | alias bond0 bonding | 936 | alias bond0 bonding |
937 | options bond0 -o bond0 mode=balance-rr miimon=100 | 937 | options bond0 -o bond0 mode=balance-rr miimon=100 |
938 | 938 | ||
939 | alias bond1 bonding | 939 | alias bond1 bonding |
940 | options bond1 -o bond1 mode=balance-alb miimon=50 | 940 | options bond1 -o bond1 mode=balance-alb miimon=50 |
941 | 941 | ||
942 | will load the bonding module two times. The first instance is | 942 | will load the bonding module two times. The first instance is |
943 | named "bond0" and creates the bond0 device in balance-rr mode with an | 943 | named "bond0" and creates the bond0 device in balance-rr mode with an |
944 | miimon of 100. The second instance is named "bond1" and creates the | 944 | miimon of 100. The second instance is named "bond1" and creates the |
945 | bond1 device in balance-alb mode with an miimon of 50. | 945 | bond1 device in balance-alb mode with an miimon of 50. |
946 | 946 | ||
947 | In some circumstances (typically with older distributions), | 947 | In some circumstances (typically with older distributions), |
948 | the above does not work, and the second bonding instance never sees | 948 | the above does not work, and the second bonding instance never sees |
949 | its options. In that case, the second options line can be substituted | 949 | its options. In that case, the second options line can be substituted |
950 | as follows: | 950 | as follows: |
951 | 951 | ||
952 | install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ | 952 | install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ |
953 | mode=balance-alb miimon=50 | 953 | mode=balance-alb miimon=50 |
954 | 954 | ||
955 | This may be repeated any number of times, specifying a new and | 955 | This may be repeated any number of times, specifying a new and |
956 | unique name in place of bond1 for each subsequent instance. | 956 | unique name in place of bond1 for each subsequent instance. |
957 | 957 | ||
958 | 3.4 Configuring Bonding Manually via Sysfs | 958 | 3.4 Configuring Bonding Manually via Sysfs |
959 | ------------------------------------------ | 959 | ------------------------------------------ |
960 | 960 | ||
961 | Starting with version 3.0, Channel Bonding may be configured | 961 | Starting with version 3.0, Channel Bonding may be configured |
962 | via the sysfs interface. This interface allows dynamic configuration | 962 | via the sysfs interface. This interface allows dynamic configuration |
963 | of all bonds in the system without unloading the module. It also | 963 | of all bonds in the system without unloading the module. It also |
964 | allows for adding and removing bonds at runtime. Ifenslave is no | 964 | allows for adding and removing bonds at runtime. Ifenslave is no |
965 | longer required, though it is still supported. | 965 | longer required, though it is still supported. |
966 | 966 | ||
967 | Use of the sysfs interface allows you to use multiple bonds | 967 | Use of the sysfs interface allows you to use multiple bonds |
968 | with different configurations without having to reload the module. | 968 | with different configurations without having to reload the module. |
969 | It also allows you to use multiple, differently configured bonds when | 969 | It also allows you to use multiple, differently configured bonds when |
970 | bonding is compiled into the kernel. | 970 | bonding is compiled into the kernel. |
971 | 971 | ||
972 | You must have the sysfs filesystem mounted to configure | 972 | You must have the sysfs filesystem mounted to configure |
973 | bonding this way. The examples in this document assume that you | 973 | bonding this way. The examples in this document assume that you |
974 | are using the standard mount point for sysfs, e.g. /sys. If your | 974 | are using the standard mount point for sysfs, e.g. /sys. If your |
975 | sysfs filesystem is mounted elsewhere, you will need to adjust the | 975 | sysfs filesystem is mounted elsewhere, you will need to adjust the |
976 | example paths accordingly. | 976 | example paths accordingly. |
977 | 977 | ||
978 | Creating and Destroying Bonds | 978 | Creating and Destroying Bonds |
979 | ----------------------------- | 979 | ----------------------------- |
980 | To add a new bond foo: | 980 | To add a new bond foo: |
981 | # echo +foo > /sys/class/net/bonding_masters | 981 | # echo +foo > /sys/class/net/bonding_masters |
982 | 982 | ||
983 | To remove an existing bond bar: | 983 | To remove an existing bond bar: |
984 | # echo -bar > /sys/class/net/bonding_masters | 984 | # echo -bar > /sys/class/net/bonding_masters |
985 | 985 | ||
986 | To show all existing bonds: | 986 | To show all existing bonds: |
987 | # cat /sys/class/net/bonding_masters | 987 | # cat /sys/class/net/bonding_masters |
988 | 988 | ||
989 | NOTE: due to 4K size limitation of sysfs files, this list may be | 989 | NOTE: due to 4K size limitation of sysfs files, this list may be |
990 | truncated if you have more than a few hundred bonds. This is unlikely | 990 | truncated if you have more than a few hundred bonds. This is unlikely |
991 | to occur under normal operating conditions. | 991 | to occur under normal operating conditions. |
992 | 992 | ||
993 | Adding and Removing Slaves | 993 | Adding and Removing Slaves |
994 | -------------------------- | 994 | -------------------------- |
995 | Interfaces may be enslaved to a bond using the file | 995 | Interfaces may be enslaved to a bond using the file |
996 | /sys/class/net/<bond>/bonding/slaves. The semantics for this file | 996 | /sys/class/net/<bond>/bonding/slaves. The semantics for this file |
997 | are the same as for the bonding_masters file. | 997 | are the same as for the bonding_masters file. |
998 | 998 | ||
999 | To enslave interface eth0 to bond bond0: | 999 | To enslave interface eth0 to bond bond0: |
1000 | # ifconfig bond0 up | 1000 | # ifconfig bond0 up |
1001 | # echo +eth0 > /sys/class/net/bond0/bonding/slaves | 1001 | # echo +eth0 > /sys/class/net/bond0/bonding/slaves |
1002 | 1002 | ||
1003 | To free slave eth0 from bond bond0: | 1003 | To free slave eth0 from bond bond0: |
1004 | # echo -eth0 > /sys/class/net/bond0/bonding/slaves | 1004 | # echo -eth0 > /sys/class/net/bond0/bonding/slaves |
1005 | 1005 | ||
1006 | NOTE: The bond must be up before slaves can be added. All | 1006 | NOTE: The bond must be up before slaves can be added. All |
1007 | slaves are freed when the interface is brought down. | 1007 | slaves are freed when the interface is brought down. |
1008 | 1008 | ||
1009 | When an interface is enslaved to a bond, symlinks between the | 1009 | When an interface is enslaved to a bond, symlinks between the |
1010 | two are created in the sysfs filesystem. In this case, you would get | 1010 | two are created in the sysfs filesystem. In this case, you would get |
1011 | /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and | 1011 | /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and |
1012 | /sys/class/net/eth0/master pointing to /sys/class/net/bond0. | 1012 | /sys/class/net/eth0/master pointing to /sys/class/net/bond0. |
1013 | 1013 | ||
1014 | This means that you can tell quickly whether or not an | 1014 | This means that you can tell quickly whether or not an |
1015 | interface is enslaved by looking for the master symlink. Thus: | 1015 | interface is enslaved by looking for the master symlink. Thus: |
1016 | # echo -eth0 > /sys/class/net/eth0/master/bonding/slaves | 1016 | # echo -eth0 > /sys/class/net/eth0/master/bonding/slaves |
1017 | will free eth0 from whatever bond it is enslaved to, regardless of | 1017 | will free eth0 from whatever bond it is enslaved to, regardless of |
1018 | the name of the bond interface. | 1018 | the name of the bond interface. |
1019 | 1019 | ||
1020 | Changing a Bond's Configuration | 1020 | Changing a Bond's Configuration |
1021 | ------------------------------- | 1021 | ------------------------------- |
1022 | Each bond may be configured individually by manipulating the | 1022 | Each bond may be configured individually by manipulating the |
1023 | files located in /sys/class/net/<bond name>/bonding | 1023 | files located in /sys/class/net/<bond name>/bonding |
1024 | 1024 | ||
1025 | The names of these files correspond directly with the command- | 1025 | The names of these files correspond directly with the command- |
1026 | line parameters described elsewhere in in this file, and, with the | 1026 | line parameters described elsewhere in this file, and, with the |
1027 | exception of arp_ip_target, they accept the same values. To see the | 1027 | exception of arp_ip_target, they accept the same values. To see the |
1028 | current setting, simply cat the appropriate file. | 1028 | current setting, simply cat the appropriate file. |
1029 | 1029 | ||
1030 | A few examples will be given here; for specific usage | 1030 | A few examples will be given here; for specific usage |
1031 | guidelines for each parameter, see the appropriate section in this | 1031 | guidelines for each parameter, see the appropriate section in this |
1032 | document. | 1032 | document. |
1033 | 1033 | ||
1034 | To configure bond0 for balance-alb mode: | 1034 | To configure bond0 for balance-alb mode: |
1035 | # ifconfig bond0 down | 1035 | # ifconfig bond0 down |
1036 | # echo 6 > /sys/class/net/bond0/bonding/mode | 1036 | # echo 6 > /sys/class/net/bond0/bonding/mode |
1037 | - or - | 1037 | - or - |
1038 | # echo balance-alb > /sys/class/net/bond0/bonding/mode | 1038 | # echo balance-alb > /sys/class/net/bond0/bonding/mode |
1039 | NOTE: The bond interface must be down before the mode can be | 1039 | NOTE: The bond interface must be down before the mode can be |
1040 | changed. | 1040 | changed. |
1041 | 1041 | ||
1042 | To enable MII monitoring on bond0 with a 1 second interval: | 1042 | To enable MII monitoring on bond0 with a 1 second interval: |
1043 | # echo 1000 > /sys/class/net/bond0/bonding/miimon | 1043 | # echo 1000 > /sys/class/net/bond0/bonding/miimon |
1044 | NOTE: If ARP monitoring is enabled, it will disabled when MII | 1044 | NOTE: If ARP monitoring is enabled, it will disabled when MII |
1045 | monitoring is enabled, and vice-versa. | 1045 | monitoring is enabled, and vice-versa. |
1046 | 1046 | ||
1047 | To add ARP targets: | 1047 | To add ARP targets: |
1048 | # echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target | 1048 | # echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target |
1049 | # echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target | 1049 | # echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target |
1050 | NOTE: up to 10 target addresses may be specified. | 1050 | NOTE: up to 10 target addresses may be specified. |
1051 | 1051 | ||
1052 | To remove an ARP target: | 1052 | To remove an ARP target: |
1053 | # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target | 1053 | # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target |
1054 | 1054 | ||
1055 | Example Configuration | 1055 | Example Configuration |
1056 | --------------------- | 1056 | --------------------- |
1057 | We begin with the same example that is shown in section 3.3, | 1057 | We begin with the same example that is shown in section 3.3, |
1058 | executed with sysfs, and without using ifenslave. | 1058 | executed with sysfs, and without using ifenslave. |
1059 | 1059 | ||
1060 | To make a simple bond of two e100 devices (presumed to be eth0 | 1060 | To make a simple bond of two e100 devices (presumed to be eth0 |
1061 | and eth1), and have it persist across reboots, edit the appropriate | 1061 | and eth1), and have it persist across reboots, edit the appropriate |
1062 | file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the | 1062 | file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the |
1063 | following: | 1063 | following: |
1064 | 1064 | ||
1065 | modprobe bonding | 1065 | modprobe bonding |
1066 | modprobe e100 | 1066 | modprobe e100 |
1067 | echo balance-alb > /sys/class/net/bond0/bonding/mode | 1067 | echo balance-alb > /sys/class/net/bond0/bonding/mode |
1068 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up | 1068 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up |
1069 | echo 100 > /sys/class/net/bond0/bonding/miimon | 1069 | echo 100 > /sys/class/net/bond0/bonding/miimon |
1070 | echo +eth0 > /sys/class/net/bond0/bonding/slaves | 1070 | echo +eth0 > /sys/class/net/bond0/bonding/slaves |
1071 | echo +eth1 > /sys/class/net/bond0/bonding/slaves | 1071 | echo +eth1 > /sys/class/net/bond0/bonding/slaves |
1072 | 1072 | ||
1073 | To add a second bond, with two e1000 interfaces in | 1073 | To add a second bond, with two e1000 interfaces in |
1074 | active-backup mode, using ARP monitoring, add the following lines to | 1074 | active-backup mode, using ARP monitoring, add the following lines to |
1075 | your init script: | 1075 | your init script: |
1076 | 1076 | ||
1077 | modprobe e1000 | 1077 | modprobe e1000 |
1078 | echo +bond1 > /sys/class/net/bonding_masters | 1078 | echo +bond1 > /sys/class/net/bonding_masters |
1079 | echo active-backup > /sys/class/net/bond1/bonding/mode | 1079 | echo active-backup > /sys/class/net/bond1/bonding/mode |
1080 | ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up | 1080 | ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up |
1081 | echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target | 1081 | echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target |
1082 | echo 2000 > /sys/class/net/bond1/bonding/arp_interval | 1082 | echo 2000 > /sys/class/net/bond1/bonding/arp_interval |
1083 | echo +eth2 > /sys/class/net/bond1/bonding/slaves | 1083 | echo +eth2 > /sys/class/net/bond1/bonding/slaves |
1084 | echo +eth3 > /sys/class/net/bond1/bonding/slaves | 1084 | echo +eth3 > /sys/class/net/bond1/bonding/slaves |
1085 | 1085 | ||
1086 | 1086 | ||
1087 | 4. Querying Bonding Configuration | 1087 | 4. Querying Bonding Configuration |
1088 | ================================= | 1088 | ================================= |
1089 | 1089 | ||
1090 | 4.1 Bonding Configuration | 1090 | 4.1 Bonding Configuration |
1091 | ------------------------- | 1091 | ------------------------- |
1092 | 1092 | ||
1093 | Each bonding device has a read-only file residing in the | 1093 | Each bonding device has a read-only file residing in the |
1094 | /proc/net/bonding directory. The file contents include information | 1094 | /proc/net/bonding directory. The file contents include information |
1095 | about the bonding configuration, options and state of each slave. | 1095 | about the bonding configuration, options and state of each slave. |
1096 | 1096 | ||
1097 | For example, the contents of /proc/net/bonding/bond0 after the | 1097 | For example, the contents of /proc/net/bonding/bond0 after the |
1098 | driver is loaded with parameters of mode=0 and miimon=1000 is | 1098 | driver is loaded with parameters of mode=0 and miimon=1000 is |
1099 | generally as follows: | 1099 | generally as follows: |
1100 | 1100 | ||
1101 | Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004) | 1101 | Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004) |
1102 | Bonding Mode: load balancing (round-robin) | 1102 | Bonding Mode: load balancing (round-robin) |
1103 | Currently Active Slave: eth0 | 1103 | Currently Active Slave: eth0 |
1104 | MII Status: up | 1104 | MII Status: up |
1105 | MII Polling Interval (ms): 1000 | 1105 | MII Polling Interval (ms): 1000 |
1106 | Up Delay (ms): 0 | 1106 | Up Delay (ms): 0 |
1107 | Down Delay (ms): 0 | 1107 | Down Delay (ms): 0 |
1108 | 1108 | ||
1109 | Slave Interface: eth1 | 1109 | Slave Interface: eth1 |
1110 | MII Status: up | 1110 | MII Status: up |
1111 | Link Failure Count: 1 | 1111 | Link Failure Count: 1 |
1112 | 1112 | ||
1113 | Slave Interface: eth0 | 1113 | Slave Interface: eth0 |
1114 | MII Status: up | 1114 | MII Status: up |
1115 | Link Failure Count: 1 | 1115 | Link Failure Count: 1 |
1116 | 1116 | ||
1117 | The precise format and contents will change depending upon the | 1117 | The precise format and contents will change depending upon the |
1118 | bonding configuration, state, and version of the bonding driver. | 1118 | bonding configuration, state, and version of the bonding driver. |
1119 | 1119 | ||
1120 | 4.2 Network configuration | 1120 | 4.2 Network configuration |
1121 | ------------------------- | 1121 | ------------------------- |
1122 | 1122 | ||
1123 | The network configuration can be inspected using the ifconfig | 1123 | The network configuration can be inspected using the ifconfig |
1124 | command. Bonding devices will have the MASTER flag set; Bonding slave | 1124 | command. Bonding devices will have the MASTER flag set; Bonding slave |
1125 | devices will have the SLAVE flag set. The ifconfig output does not | 1125 | devices will have the SLAVE flag set. The ifconfig output does not |
1126 | contain information on which slaves are associated with which masters. | 1126 | contain information on which slaves are associated with which masters. |
1127 | 1127 | ||
1128 | In the example below, the bond0 interface is the master | 1128 | In the example below, the bond0 interface is the master |
1129 | (MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of | 1129 | (MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of |
1130 | bond0 have the same MAC address (HWaddr) as bond0 for all modes except | 1130 | bond0 have the same MAC address (HWaddr) as bond0 for all modes except |
1131 | TLB and ALB that require a unique MAC address for each slave. | 1131 | TLB and ALB that require a unique MAC address for each slave. |
1132 | 1132 | ||
1133 | # /sbin/ifconfig | 1133 | # /sbin/ifconfig |
1134 | bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 1134 | bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
1135 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 | 1135 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 |
1136 | UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 | 1136 | UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 |
1137 | RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 | 1137 | RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 |
1138 | TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 | 1138 | TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 |
1139 | collisions:0 txqueuelen:0 | 1139 | collisions:0 txqueuelen:0 |
1140 | 1140 | ||
1141 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 1141 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
1142 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 1142 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
1143 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 | 1143 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 |
1144 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 | 1144 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 |
1145 | collisions:0 txqueuelen:100 | 1145 | collisions:0 txqueuelen:100 |
1146 | Interrupt:10 Base address:0x1080 | 1146 | Interrupt:10 Base address:0x1080 |
1147 | 1147 | ||
1148 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 1148 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
1149 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 1149 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
1150 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 | 1150 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 |
1151 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 | 1151 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 |
1152 | collisions:0 txqueuelen:100 | 1152 | collisions:0 txqueuelen:100 |
1153 | Interrupt:9 Base address:0x1400 | 1153 | Interrupt:9 Base address:0x1400 |
1154 | 1154 | ||
1155 | 5. Switch Configuration | 1155 | 5. Switch Configuration |
1156 | ======================= | 1156 | ======================= |
1157 | 1157 | ||
1158 | For this section, "switch" refers to whatever system the | 1158 | For this section, "switch" refers to whatever system the |
1159 | bonded devices are directly connected to (i.e., where the other end of | 1159 | bonded devices are directly connected to (i.e., where the other end of |
1160 | the cable plugs into). This may be an actual dedicated switch device, | 1160 | the cable plugs into). This may be an actual dedicated switch device, |
1161 | or it may be another regular system (e.g., another computer running | 1161 | or it may be another regular system (e.g., another computer running |
1162 | Linux), | 1162 | Linux), |
1163 | 1163 | ||
1164 | The active-backup, balance-tlb and balance-alb modes do not | 1164 | The active-backup, balance-tlb and balance-alb modes do not |
1165 | require any specific configuration of the switch. | 1165 | require any specific configuration of the switch. |
1166 | 1166 | ||
1167 | The 802.3ad mode requires that the switch have the appropriate | 1167 | The 802.3ad mode requires that the switch have the appropriate |
1168 | ports configured as an 802.3ad aggregation. The precise method used | 1168 | ports configured as an 802.3ad aggregation. The precise method used |
1169 | to configure this varies from switch to switch, but, for example, a | 1169 | to configure this varies from switch to switch, but, for example, a |
1170 | Cisco 3550 series switch requires that the appropriate ports first be | 1170 | Cisco 3550 series switch requires that the appropriate ports first be |
1171 | grouped together in a single etherchannel instance, then that | 1171 | grouped together in a single etherchannel instance, then that |
1172 | etherchannel is set to mode "lacp" to enable 802.3ad (instead of | 1172 | etherchannel is set to mode "lacp" to enable 802.3ad (instead of |
1173 | standard EtherChannel). | 1173 | standard EtherChannel). |
1174 | 1174 | ||
1175 | The balance-rr, balance-xor and broadcast modes generally | 1175 | The balance-rr, balance-xor and broadcast modes generally |
1176 | require that the switch have the appropriate ports grouped together. | 1176 | require that the switch have the appropriate ports grouped together. |
1177 | The nomenclature for such a group differs between switches, it may be | 1177 | The nomenclature for such a group differs between switches, it may be |
1178 | called an "etherchannel" (as in the Cisco example, above), a "trunk | 1178 | called an "etherchannel" (as in the Cisco example, above), a "trunk |
1179 | group" or some other similar variation. For these modes, each switch | 1179 | group" or some other similar variation. For these modes, each switch |
1180 | will also have its own configuration options for the switch's transmit | 1180 | will also have its own configuration options for the switch's transmit |
1181 | policy to the bond. Typical choices include XOR of either the MAC or | 1181 | policy to the bond. Typical choices include XOR of either the MAC or |
1182 | IP addresses. The transmit policy of the two peers does not need to | 1182 | IP addresses. The transmit policy of the two peers does not need to |
1183 | match. For these three modes, the bonding mode really selects a | 1183 | match. For these three modes, the bonding mode really selects a |
1184 | transmit policy for an EtherChannel group; all three will interoperate | 1184 | transmit policy for an EtherChannel group; all three will interoperate |
1185 | with another EtherChannel group. | 1185 | with another EtherChannel group. |
1186 | 1186 | ||
1187 | 1187 | ||
1188 | 6. 802.1q VLAN Support | 1188 | 6. 802.1q VLAN Support |
1189 | ====================== | 1189 | ====================== |
1190 | 1190 | ||
1191 | It is possible to configure VLAN devices over a bond interface | 1191 | It is possible to configure VLAN devices over a bond interface |
1192 | using the 8021q driver. However, only packets coming from the 8021q | 1192 | using the 8021q driver. However, only packets coming from the 8021q |
1193 | driver and passing through bonding will be tagged by default. Self | 1193 | driver and passing through bonding will be tagged by default. Self |
1194 | generated packets, for example, bonding's learning packets or ARP | 1194 | generated packets, for example, bonding's learning packets or ARP |
1195 | packets generated by either ALB mode or the ARP monitor mechanism, are | 1195 | packets generated by either ALB mode or the ARP monitor mechanism, are |
1196 | tagged internally by bonding itself. As a result, bonding must | 1196 | tagged internally by bonding itself. As a result, bonding must |
1197 | "learn" the VLAN IDs configured above it, and use those IDs to tag | 1197 | "learn" the VLAN IDs configured above it, and use those IDs to tag |
1198 | self generated packets. | 1198 | self generated packets. |
1199 | 1199 | ||
1200 | For reasons of simplicity, and to support the use of adapters | 1200 | For reasons of simplicity, and to support the use of adapters |
1201 | that can do VLAN hardware acceleration offloading, the bonding | 1201 | that can do VLAN hardware acceleration offloading, the bonding |
1202 | interface declares itself as fully hardware offloading capable, it gets | 1202 | interface declares itself as fully hardware offloading capable, it gets |
1203 | the add_vid/kill_vid notifications to gather the necessary | 1203 | the add_vid/kill_vid notifications to gather the necessary |
1204 | information, and it propagates those actions to the slaves. In case | 1204 | information, and it propagates those actions to the slaves. In case |
1205 | of mixed adapter types, hardware accelerated tagged packets that | 1205 | of mixed adapter types, hardware accelerated tagged packets that |
1206 | should go through an adapter that is not offloading capable are | 1206 | should go through an adapter that is not offloading capable are |
1207 | "un-accelerated" by the bonding driver so the VLAN tag sits in the | 1207 | "un-accelerated" by the bonding driver so the VLAN tag sits in the |
1208 | regular location. | 1208 | regular location. |
1209 | 1209 | ||
1210 | VLAN interfaces *must* be added on top of a bonding interface | 1210 | VLAN interfaces *must* be added on top of a bonding interface |
1211 | only after enslaving at least one slave. The bonding interface has a | 1211 | only after enslaving at least one slave. The bonding interface has a |
1212 | hardware address of 00:00:00:00:00:00 until the first slave is added. | 1212 | hardware address of 00:00:00:00:00:00 until the first slave is added. |
1213 | If the VLAN interface is created prior to the first enslavement, it | 1213 | If the VLAN interface is created prior to the first enslavement, it |
1214 | would pick up the all-zeroes hardware address. Once the first slave | 1214 | would pick up the all-zeroes hardware address. Once the first slave |
1215 | is attached to the bond, the bond device itself will pick up the | 1215 | is attached to the bond, the bond device itself will pick up the |
1216 | slave's hardware address, which is then available for the VLAN device. | 1216 | slave's hardware address, which is then available for the VLAN device. |
1217 | 1217 | ||
1218 | Also, be aware that a similar problem can occur if all slaves | 1218 | Also, be aware that a similar problem can occur if all slaves |
1219 | are released from a bond that still has one or more VLAN interfaces on | 1219 | are released from a bond that still has one or more VLAN interfaces on |
1220 | top of it. When a new slave is added, the bonding interface will | 1220 | top of it. When a new slave is added, the bonding interface will |
1221 | obtain its hardware address from the first slave, which might not | 1221 | obtain its hardware address from the first slave, which might not |
1222 | match the hardware address of the VLAN interfaces (which was | 1222 | match the hardware address of the VLAN interfaces (which was |
1223 | ultimately copied from an earlier slave). | 1223 | ultimately copied from an earlier slave). |
1224 | 1224 | ||
1225 | There are two methods to insure that the VLAN device operates | 1225 | There are two methods to insure that the VLAN device operates |
1226 | with the correct hardware address if all slaves are removed from a | 1226 | with the correct hardware address if all slaves are removed from a |
1227 | bond interface: | 1227 | bond interface: |
1228 | 1228 | ||
1229 | 1. Remove all VLAN interfaces then recreate them | 1229 | 1. Remove all VLAN interfaces then recreate them |
1230 | 1230 | ||
1231 | 2. Set the bonding interface's hardware address so that it | 1231 | 2. Set the bonding interface's hardware address so that it |
1232 | matches the hardware address of the VLAN interfaces. | 1232 | matches the hardware address of the VLAN interfaces. |
1233 | 1233 | ||
1234 | Note that changing a VLAN interface's HW address would set the | 1234 | Note that changing a VLAN interface's HW address would set the |
1235 | underlying device -- i.e. the bonding interface -- to promiscuous | 1235 | underlying device -- i.e. the bonding interface -- to promiscuous |
1236 | mode, which might not be what you want. | 1236 | mode, which might not be what you want. |
1237 | 1237 | ||
1238 | 1238 | ||
1239 | 7. Link Monitoring | 1239 | 7. Link Monitoring |
1240 | ================== | 1240 | ================== |
1241 | 1241 | ||
1242 | The bonding driver at present supports two schemes for | 1242 | The bonding driver at present supports two schemes for |
1243 | monitoring a slave device's link state: the ARP monitor and the MII | 1243 | monitoring a slave device's link state: the ARP monitor and the MII |
1244 | monitor. | 1244 | monitor. |
1245 | 1245 | ||
1246 | At the present time, due to implementation restrictions in the | 1246 | At the present time, due to implementation restrictions in the |
1247 | bonding driver itself, it is not possible to enable both ARP and MII | 1247 | bonding driver itself, it is not possible to enable both ARP and MII |
1248 | monitoring simultaneously. | 1248 | monitoring simultaneously. |
1249 | 1249 | ||
1250 | 7.1 ARP Monitor Operation | 1250 | 7.1 ARP Monitor Operation |
1251 | ------------------------- | 1251 | ------------------------- |
1252 | 1252 | ||
1253 | The ARP monitor operates as its name suggests: it sends ARP | 1253 | The ARP monitor operates as its name suggests: it sends ARP |
1254 | queries to one or more designated peer systems on the network, and | 1254 | queries to one or more designated peer systems on the network, and |
1255 | uses the response as an indication that the link is operating. This | 1255 | uses the response as an indication that the link is operating. This |
1256 | gives some assurance that traffic is actually flowing to and from one | 1256 | gives some assurance that traffic is actually flowing to and from one |
1257 | or more peers on the local network. | 1257 | or more peers on the local network. |
1258 | 1258 | ||
1259 | The ARP monitor relies on the device driver itself to verify | 1259 | The ARP monitor relies on the device driver itself to verify |
1260 | that traffic is flowing. In particular, the driver must keep up to | 1260 | that traffic is flowing. In particular, the driver must keep up to |
1261 | date the last receive time, dev->last_rx, and transmit start time, | 1261 | date the last receive time, dev->last_rx, and transmit start time, |
1262 | dev->trans_start. If these are not updated by the driver, then the | 1262 | dev->trans_start. If these are not updated by the driver, then the |
1263 | ARP monitor will immediately fail any slaves using that driver, and | 1263 | ARP monitor will immediately fail any slaves using that driver, and |
1264 | those slaves will stay down. If networking monitoring (tcpdump, etc) | 1264 | those slaves will stay down. If networking monitoring (tcpdump, etc) |
1265 | shows the ARP requests and replies on the network, then it may be that | 1265 | shows the ARP requests and replies on the network, then it may be that |
1266 | your device driver is not updating last_rx and trans_start. | 1266 | your device driver is not updating last_rx and trans_start. |
1267 | 1267 | ||
1268 | 7.2 Configuring Multiple ARP Targets | 1268 | 7.2 Configuring Multiple ARP Targets |
1269 | ------------------------------------ | 1269 | ------------------------------------ |
1270 | 1270 | ||
1271 | While ARP monitoring can be done with just one target, it can | 1271 | While ARP monitoring can be done with just one target, it can |
1272 | be useful in a High Availability setup to have several targets to | 1272 | be useful in a High Availability setup to have several targets to |
1273 | monitor. In the case of just one target, the target itself may go | 1273 | monitor. In the case of just one target, the target itself may go |
1274 | down or have a problem making it unresponsive to ARP requests. Having | 1274 | down or have a problem making it unresponsive to ARP requests. Having |
1275 | an additional target (or several) increases the reliability of the ARP | 1275 | an additional target (or several) increases the reliability of the ARP |
1276 | monitoring. | 1276 | monitoring. |
1277 | 1277 | ||
1278 | Multiple ARP targets must be separated by commas as follows: | 1278 | Multiple ARP targets must be separated by commas as follows: |
1279 | 1279 | ||
1280 | # example options for ARP monitoring with three targets | 1280 | # example options for ARP monitoring with three targets |
1281 | alias bond0 bonding | 1281 | alias bond0 bonding |
1282 | options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 | 1282 | options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 |
1283 | 1283 | ||
1284 | For just a single target the options would resemble: | 1284 | For just a single target the options would resemble: |
1285 | 1285 | ||
1286 | # example options for ARP monitoring with one target | 1286 | # example options for ARP monitoring with one target |
1287 | alias bond0 bonding | 1287 | alias bond0 bonding |
1288 | options bond0 arp_interval=60 arp_ip_target=192.168.0.100 | 1288 | options bond0 arp_interval=60 arp_ip_target=192.168.0.100 |
1289 | 1289 | ||
1290 | 1290 | ||
1291 | 7.3 MII Monitor Operation | 1291 | 7.3 MII Monitor Operation |
1292 | ------------------------- | 1292 | ------------------------- |
1293 | 1293 | ||
1294 | The MII monitor monitors only the carrier state of the local | 1294 | The MII monitor monitors only the carrier state of the local |
1295 | network interface. It accomplishes this in one of three ways: by | 1295 | network interface. It accomplishes this in one of three ways: by |
1296 | depending upon the device driver to maintain its carrier state, by | 1296 | depending upon the device driver to maintain its carrier state, by |
1297 | querying the device's MII registers, or by making an ethtool query to | 1297 | querying the device's MII registers, or by making an ethtool query to |
1298 | the device. | 1298 | the device. |
1299 | 1299 | ||
1300 | If the use_carrier module parameter is 1 (the default value), | 1300 | If the use_carrier module parameter is 1 (the default value), |
1301 | then the MII monitor will rely on the driver for carrier state | 1301 | then the MII monitor will rely on the driver for carrier state |
1302 | information (via the netif_carrier subsystem). As explained in the | 1302 | information (via the netif_carrier subsystem). As explained in the |
1303 | use_carrier parameter information, above, if the MII monitor fails to | 1303 | use_carrier parameter information, above, if the MII monitor fails to |
1304 | detect carrier loss on the device (e.g., when the cable is physically | 1304 | detect carrier loss on the device (e.g., when the cable is physically |
1305 | disconnected), it may be that the driver does not support | 1305 | disconnected), it may be that the driver does not support |
1306 | netif_carrier. | 1306 | netif_carrier. |
1307 | 1307 | ||
1308 | If use_carrier is 0, then the MII monitor will first query the | 1308 | If use_carrier is 0, then the MII monitor will first query the |
1309 | device's (via ioctl) MII registers and check the link state. If that | 1309 | device's (via ioctl) MII registers and check the link state. If that |
1310 | request fails (not just that it returns carrier down), then the MII | 1310 | request fails (not just that it returns carrier down), then the MII |
1311 | monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain | 1311 | monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain |
1312 | the same information. If both methods fail (i.e., the driver either | 1312 | the same information. If both methods fail (i.e., the driver either |
1313 | does not support or had some error in processing both the MII register | 1313 | does not support or had some error in processing both the MII register |
1314 | and ethtool requests), then the MII monitor will assume the link is | 1314 | and ethtool requests), then the MII monitor will assume the link is |
1315 | up. | 1315 | up. |
1316 | 1316 | ||
1317 | 8. Potential Sources of Trouble | 1317 | 8. Potential Sources of Trouble |
1318 | =============================== | 1318 | =============================== |
1319 | 1319 | ||
1320 | 8.1 Adventures in Routing | 1320 | 8.1 Adventures in Routing |
1321 | ------------------------- | 1321 | ------------------------- |
1322 | 1322 | ||
1323 | When bonding is configured, it is important that the slave | 1323 | When bonding is configured, it is important that the slave |
1324 | devices not have routes that supersede routes of the master (or, | 1324 | devices not have routes that supersede routes of the master (or, |
1325 | generally, not have routes at all). For example, suppose the bonding | 1325 | generally, not have routes at all). For example, suppose the bonding |
1326 | device bond0 has two slaves, eth0 and eth1, and the routing table is | 1326 | device bond0 has two slaves, eth0 and eth1, and the routing table is |
1327 | as follows: | 1327 | as follows: |
1328 | 1328 | ||
1329 | Kernel IP routing table | 1329 | Kernel IP routing table |
1330 | Destination Gateway Genmask Flags MSS Window irtt Iface | 1330 | Destination Gateway Genmask Flags MSS Window irtt Iface |
1331 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 | 1331 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 |
1332 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1 | 1332 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1 |
1333 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0 | 1333 | 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0 |
1334 | 127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo | 1334 | 127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo |
1335 | 1335 | ||
1336 | This routing configuration will likely still update the | 1336 | This routing configuration will likely still update the |
1337 | receive/transmit times in the driver (needed by the ARP monitor), but | 1337 | receive/transmit times in the driver (needed by the ARP monitor), but |
1338 | may bypass the bonding driver (because outgoing traffic to, in this | 1338 | may bypass the bonding driver (because outgoing traffic to, in this |
1339 | case, another host on network 10 would use eth0 or eth1 before bond0). | 1339 | case, another host on network 10 would use eth0 or eth1 before bond0). |
1340 | 1340 | ||
1341 | The ARP monitor (and ARP itself) may become confused by this | 1341 | The ARP monitor (and ARP itself) may become confused by this |
1342 | configuration, because ARP requests (generated by the ARP monitor) | 1342 | configuration, because ARP requests (generated by the ARP monitor) |
1343 | will be sent on one interface (bond0), but the corresponding reply | 1343 | will be sent on one interface (bond0), but the corresponding reply |
1344 | will arrive on a different interface (eth0). This reply looks to ARP | 1344 | will arrive on a different interface (eth0). This reply looks to ARP |
1345 | as an unsolicited ARP reply (because ARP matches replies on an | 1345 | as an unsolicited ARP reply (because ARP matches replies on an |
1346 | interface basis), and is discarded. The MII monitor is not affected | 1346 | interface basis), and is discarded. The MII monitor is not affected |
1347 | by the state of the routing table. | 1347 | by the state of the routing table. |
1348 | 1348 | ||
1349 | The solution here is simply to insure that slaves do not have | 1349 | The solution here is simply to insure that slaves do not have |
1350 | routes of their own, and if for some reason they must, those routes do | 1350 | routes of their own, and if for some reason they must, those routes do |
1351 | not supersede routes of their master. This should generally be the | 1351 | not supersede routes of their master. This should generally be the |
1352 | case, but unusual configurations or errant manual or automatic static | 1352 | case, but unusual configurations or errant manual or automatic static |
1353 | route additions may cause trouble. | 1353 | route additions may cause trouble. |
1354 | 1354 | ||
1355 | 8.2 Ethernet Device Renaming | 1355 | 8.2 Ethernet Device Renaming |
1356 | ---------------------------- | 1356 | ---------------------------- |
1357 | 1357 | ||
1358 | On systems with network configuration scripts that do not | 1358 | On systems with network configuration scripts that do not |
1359 | associate physical devices directly with network interface names (so | 1359 | associate physical devices directly with network interface names (so |
1360 | that the same physical device always has the same "ethX" name), it may | 1360 | that the same physical device always has the same "ethX" name), it may |
1361 | be necessary to add some special logic to either /etc/modules.conf or | 1361 | be necessary to add some special logic to either /etc/modules.conf or |
1362 | /etc/modprobe.conf (depending upon which is installed on the system). | 1362 | /etc/modprobe.conf (depending upon which is installed on the system). |
1363 | 1363 | ||
1364 | For example, given a modules.conf containing the following: | 1364 | For example, given a modules.conf containing the following: |
1365 | 1365 | ||
1366 | alias bond0 bonding | 1366 | alias bond0 bonding |
1367 | options bond0 mode=some-mode miimon=50 | 1367 | options bond0 mode=some-mode miimon=50 |
1368 | alias eth0 tg3 | 1368 | alias eth0 tg3 |
1369 | alias eth1 tg3 | 1369 | alias eth1 tg3 |
1370 | alias eth2 e1000 | 1370 | alias eth2 e1000 |
1371 | alias eth3 e1000 | 1371 | alias eth3 e1000 |
1372 | 1372 | ||
1373 | If neither eth0 and eth1 are slaves to bond0, then when the | 1373 | If neither eth0 and eth1 are slaves to bond0, then when the |
1374 | bond0 interface comes up, the devices may end up reordered. This | 1374 | bond0 interface comes up, the devices may end up reordered. This |
1375 | happens because bonding is loaded first, then its slave device's | 1375 | happens because bonding is loaded first, then its slave device's |
1376 | drivers are loaded next. Since no other drivers have been loaded, | 1376 | drivers are loaded next. Since no other drivers have been loaded, |
1377 | when the e1000 driver loads, it will receive eth0 and eth1 for its | 1377 | when the e1000 driver loads, it will receive eth0 and eth1 for its |
1378 | devices, but the bonding configuration tries to enslave eth2 and eth3 | 1378 | devices, but the bonding configuration tries to enslave eth2 and eth3 |
1379 | (which may later be assigned to the tg3 devices). | 1379 | (which may later be assigned to the tg3 devices). |
1380 | 1380 | ||
1381 | Adding the following: | 1381 | Adding the following: |
1382 | 1382 | ||
1383 | add above bonding e1000 tg3 | 1383 | add above bonding e1000 tg3 |
1384 | 1384 | ||
1385 | causes modprobe to load e1000 then tg3, in that order, when | 1385 | causes modprobe to load e1000 then tg3, in that order, when |
1386 | bonding is loaded. This command is fully documented in the | 1386 | bonding is loaded. This command is fully documented in the |
1387 | modules.conf manual page. | 1387 | modules.conf manual page. |
1388 | 1388 | ||
1389 | On systems utilizing modprobe.conf (or modprobe.conf.local), | 1389 | On systems utilizing modprobe.conf (or modprobe.conf.local), |
1390 | an equivalent problem can occur. In this case, the following can be | 1390 | an equivalent problem can occur. In this case, the following can be |
1391 | added to modprobe.conf (or modprobe.conf.local, as appropriate), as | 1391 | added to modprobe.conf (or modprobe.conf.local, as appropriate), as |
1392 | follows (all on one line; it has been split here for clarity): | 1392 | follows (all on one line; it has been split here for clarity): |
1393 | 1393 | ||
1394 | install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; | 1394 | install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; |
1395 | /sbin/modprobe --ignore-install bonding | 1395 | /sbin/modprobe --ignore-install bonding |
1396 | 1396 | ||
1397 | This will, when loading the bonding module, rather than | 1397 | This will, when loading the bonding module, rather than |
1398 | performing the normal action, instead execute the provided command. | 1398 | performing the normal action, instead execute the provided command. |
1399 | This command loads the device drivers in the order needed, then calls | 1399 | This command loads the device drivers in the order needed, then calls |
1400 | modprobe with --ignore-install to cause the normal action to then take | 1400 | modprobe with --ignore-install to cause the normal action to then take |
1401 | place. Full documentation on this can be found in the modprobe.conf | 1401 | place. Full documentation on this can be found in the modprobe.conf |
1402 | and modprobe manual pages. | 1402 | and modprobe manual pages. |
1403 | 1403 | ||
1404 | 8.3. Painfully Slow Or No Failed Link Detection By Miimon | 1404 | 8.3. Painfully Slow Or No Failed Link Detection By Miimon |
1405 | --------------------------------------------------------- | 1405 | --------------------------------------------------------- |
1406 | 1406 | ||
1407 | By default, bonding enables the use_carrier option, which | 1407 | By default, bonding enables the use_carrier option, which |
1408 | instructs bonding to trust the driver to maintain carrier state. | 1408 | instructs bonding to trust the driver to maintain carrier state. |
1409 | 1409 | ||
1410 | As discussed in the options section, above, some drivers do | 1410 | As discussed in the options section, above, some drivers do |
1411 | not support the netif_carrier_on/_off link state tracking system. | 1411 | not support the netif_carrier_on/_off link state tracking system. |
1412 | With use_carrier enabled, bonding will always see these links as up, | 1412 | With use_carrier enabled, bonding will always see these links as up, |
1413 | regardless of their actual state. | 1413 | regardless of their actual state. |
1414 | 1414 | ||
1415 | Additionally, other drivers do support netif_carrier, but do | 1415 | Additionally, other drivers do support netif_carrier, but do |
1416 | not maintain it in real time, e.g., only polling the link state at | 1416 | not maintain it in real time, e.g., only polling the link state at |
1417 | some fixed interval. In this case, miimon will detect failures, but | 1417 | some fixed interval. In this case, miimon will detect failures, but |
1418 | only after some long period of time has expired. If it appears that | 1418 | only after some long period of time has expired. If it appears that |
1419 | miimon is very slow in detecting link failures, try specifying | 1419 | miimon is very slow in detecting link failures, try specifying |
1420 | use_carrier=0 to see if that improves the failure detection time. If | 1420 | use_carrier=0 to see if that improves the failure detection time. If |
1421 | it does, then it may be that the driver checks the carrier state at a | 1421 | it does, then it may be that the driver checks the carrier state at a |
1422 | fixed interval, but does not cache the MII register values (so the | 1422 | fixed interval, but does not cache the MII register values (so the |
1423 | use_carrier=0 method of querying the registers directly works). If | 1423 | use_carrier=0 method of querying the registers directly works). If |
1424 | use_carrier=0 does not improve the failover, then the driver may cache | 1424 | use_carrier=0 does not improve the failover, then the driver may cache |
1425 | the registers, or the problem may be elsewhere. | 1425 | the registers, or the problem may be elsewhere. |
1426 | 1426 | ||
1427 | Also, remember that miimon only checks for the device's | 1427 | Also, remember that miimon only checks for the device's |
1428 | carrier state. It has no way to determine the state of devices on or | 1428 | carrier state. It has no way to determine the state of devices on or |
1429 | beyond other ports of a switch, or if a switch is refusing to pass | 1429 | beyond other ports of a switch, or if a switch is refusing to pass |
1430 | traffic while still maintaining carrier on. | 1430 | traffic while still maintaining carrier on. |
1431 | 1431 | ||
1432 | 9. SNMP agents | 1432 | 9. SNMP agents |
1433 | =============== | 1433 | =============== |
1434 | 1434 | ||
1435 | If running SNMP agents, the bonding driver should be loaded | 1435 | If running SNMP agents, the bonding driver should be loaded |
1436 | before any network drivers participating in a bond. This requirement | 1436 | before any network drivers participating in a bond. This requirement |
1437 | is due to the interface index (ipAdEntIfIndex) being associated to | 1437 | is due to the interface index (ipAdEntIfIndex) being associated to |
1438 | the first interface found with a given IP address. That is, there is | 1438 | the first interface found with a given IP address. That is, there is |
1439 | only one ipAdEntIfIndex for each IP address. For example, if eth0 and | 1439 | only one ipAdEntIfIndex for each IP address. For example, if eth0 and |
1440 | eth1 are slaves of bond0 and the driver for eth0 is loaded before the | 1440 | eth1 are slaves of bond0 and the driver for eth0 is loaded before the |
1441 | bonding driver, the interface for the IP address will be associated | 1441 | bonding driver, the interface for the IP address will be associated |
1442 | with the eth0 interface. This configuration is shown below, the IP | 1442 | with the eth0 interface. This configuration is shown below, the IP |
1443 | address 192.168.1.1 has an interface index of 2 which indexes to eth0 | 1443 | address 192.168.1.1 has an interface index of 2 which indexes to eth0 |
1444 | in the ifDescr table (ifDescr.2). | 1444 | in the ifDescr table (ifDescr.2). |
1445 | 1445 | ||
1446 | interfaces.ifTable.ifEntry.ifDescr.1 = lo | 1446 | interfaces.ifTable.ifEntry.ifDescr.1 = lo |
1447 | interfaces.ifTable.ifEntry.ifDescr.2 = eth0 | 1447 | interfaces.ifTable.ifEntry.ifDescr.2 = eth0 |
1448 | interfaces.ifTable.ifEntry.ifDescr.3 = eth1 | 1448 | interfaces.ifTable.ifEntry.ifDescr.3 = eth1 |
1449 | interfaces.ifTable.ifEntry.ifDescr.4 = eth2 | 1449 | interfaces.ifTable.ifEntry.ifDescr.4 = eth2 |
1450 | interfaces.ifTable.ifEntry.ifDescr.5 = eth3 | 1450 | interfaces.ifTable.ifEntry.ifDescr.5 = eth3 |
1451 | interfaces.ifTable.ifEntry.ifDescr.6 = bond0 | 1451 | interfaces.ifTable.ifEntry.ifDescr.6 = bond0 |
1452 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5 | 1452 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5 |
1453 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 | 1453 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 |
1454 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4 | 1454 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4 |
1455 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 | 1455 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 |
1456 | 1456 | ||
1457 | This problem is avoided by loading the bonding driver before | 1457 | This problem is avoided by loading the bonding driver before |
1458 | any network drivers participating in a bond. Below is an example of | 1458 | any network drivers participating in a bond. Below is an example of |
1459 | loading the bonding driver first, the IP address 192.168.1.1 is | 1459 | loading the bonding driver first, the IP address 192.168.1.1 is |
1460 | correctly associated with ifDescr.2. | 1460 | correctly associated with ifDescr.2. |
1461 | 1461 | ||
1462 | interfaces.ifTable.ifEntry.ifDescr.1 = lo | 1462 | interfaces.ifTable.ifEntry.ifDescr.1 = lo |
1463 | interfaces.ifTable.ifEntry.ifDescr.2 = bond0 | 1463 | interfaces.ifTable.ifEntry.ifDescr.2 = bond0 |
1464 | interfaces.ifTable.ifEntry.ifDescr.3 = eth0 | 1464 | interfaces.ifTable.ifEntry.ifDescr.3 = eth0 |
1465 | interfaces.ifTable.ifEntry.ifDescr.4 = eth1 | 1465 | interfaces.ifTable.ifEntry.ifDescr.4 = eth1 |
1466 | interfaces.ifTable.ifEntry.ifDescr.5 = eth2 | 1466 | interfaces.ifTable.ifEntry.ifDescr.5 = eth2 |
1467 | interfaces.ifTable.ifEntry.ifDescr.6 = eth3 | 1467 | interfaces.ifTable.ifEntry.ifDescr.6 = eth3 |
1468 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6 | 1468 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6 |
1469 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 | 1469 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 |
1470 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5 | 1470 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5 |
1471 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 | 1471 | ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 |
1472 | 1472 | ||
1473 | While some distributions may not report the interface name in | 1473 | While some distributions may not report the interface name in |
1474 | ifDescr, the association between the IP address and IfIndex remains | 1474 | ifDescr, the association between the IP address and IfIndex remains |
1475 | and SNMP functions such as Interface_Scan_Next will report that | 1475 | and SNMP functions such as Interface_Scan_Next will report that |
1476 | association. | 1476 | association. |
1477 | 1477 | ||
1478 | 10. Promiscuous mode | 1478 | 10. Promiscuous mode |
1479 | ==================== | 1479 | ==================== |
1480 | 1480 | ||
1481 | When running network monitoring tools, e.g., tcpdump, it is | 1481 | When running network monitoring tools, e.g., tcpdump, it is |
1482 | common to enable promiscuous mode on the device, so that all traffic | 1482 | common to enable promiscuous mode on the device, so that all traffic |
1483 | is seen (instead of seeing only traffic destined for the local host). | 1483 | is seen (instead of seeing only traffic destined for the local host). |
1484 | The bonding driver handles promiscuous mode changes to the bonding | 1484 | The bonding driver handles promiscuous mode changes to the bonding |
1485 | master device (e.g., bond0), and propagates the setting to the slave | 1485 | master device (e.g., bond0), and propagates the setting to the slave |
1486 | devices. | 1486 | devices. |
1487 | 1487 | ||
1488 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, | 1488 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, |
1489 | the promiscuous mode setting is propagated to all slaves. | 1489 | the promiscuous mode setting is propagated to all slaves. |
1490 | 1490 | ||
1491 | For the active-backup, balance-tlb and balance-alb modes, the | 1491 | For the active-backup, balance-tlb and balance-alb modes, the |
1492 | promiscuous mode setting is propagated only to the active slave. | 1492 | promiscuous mode setting is propagated only to the active slave. |
1493 | 1493 | ||
1494 | For balance-tlb mode, the active slave is the slave currently | 1494 | For balance-tlb mode, the active slave is the slave currently |
1495 | receiving inbound traffic. | 1495 | receiving inbound traffic. |
1496 | 1496 | ||
1497 | For balance-alb mode, the active slave is the slave used as a | 1497 | For balance-alb mode, the active slave is the slave used as a |
1498 | "primary." This slave is used for mode-specific control traffic, for | 1498 | "primary." This slave is used for mode-specific control traffic, for |
1499 | sending to peers that are unassigned or if the load is unbalanced. | 1499 | sending to peers that are unassigned or if the load is unbalanced. |
1500 | 1500 | ||
1501 | For the active-backup, balance-tlb and balance-alb modes, when | 1501 | For the active-backup, balance-tlb and balance-alb modes, when |
1502 | the active slave changes (e.g., due to a link failure), the | 1502 | the active slave changes (e.g., due to a link failure), the |
1503 | promiscuous setting will be propagated to the new active slave. | 1503 | promiscuous setting will be propagated to the new active slave. |
1504 | 1504 | ||
1505 | 11. Configuring Bonding for High Availability | 1505 | 11. Configuring Bonding for High Availability |
1506 | ============================================= | 1506 | ============================================= |
1507 | 1507 | ||
1508 | High Availability refers to configurations that provide | 1508 | High Availability refers to configurations that provide |
1509 | maximum network availability by having redundant or backup devices, | 1509 | maximum network availability by having redundant or backup devices, |
1510 | links or switches between the host and the rest of the world. The | 1510 | links or switches between the host and the rest of the world. The |
1511 | goal is to provide the maximum availability of network connectivity | 1511 | goal is to provide the maximum availability of network connectivity |
1512 | (i.e., the network always works), even though other configurations | 1512 | (i.e., the network always works), even though other configurations |
1513 | could provide higher throughput. | 1513 | could provide higher throughput. |
1514 | 1514 | ||
1515 | 11.1 High Availability in a Single Switch Topology | 1515 | 11.1 High Availability in a Single Switch Topology |
1516 | -------------------------------------------------- | 1516 | -------------------------------------------------- |
1517 | 1517 | ||
1518 | If two hosts (or a host and a single switch) are directly | 1518 | If two hosts (or a host and a single switch) are directly |
1519 | connected via multiple physical links, then there is no availability | 1519 | connected via multiple physical links, then there is no availability |
1520 | penalty to optimizing for maximum bandwidth. In this case, there is | 1520 | penalty to optimizing for maximum bandwidth. In this case, there is |
1521 | only one switch (or peer), so if it fails, there is no alternative | 1521 | only one switch (or peer), so if it fails, there is no alternative |
1522 | access to fail over to. Additionally, the bonding load balance modes | 1522 | access to fail over to. Additionally, the bonding load balance modes |
1523 | support link monitoring of their members, so if individual links fail, | 1523 | support link monitoring of their members, so if individual links fail, |
1524 | the load will be rebalanced across the remaining devices. | 1524 | the load will be rebalanced across the remaining devices. |
1525 | 1525 | ||
1526 | See Section 13, "Configuring Bonding for Maximum Throughput" | 1526 | See Section 13, "Configuring Bonding for Maximum Throughput" |
1527 | for information on configuring bonding with one peer device. | 1527 | for information on configuring bonding with one peer device. |
1528 | 1528 | ||
1529 | 11.2 High Availability in a Multiple Switch Topology | 1529 | 11.2 High Availability in a Multiple Switch Topology |
1530 | ---------------------------------------------------- | 1530 | ---------------------------------------------------- |
1531 | 1531 | ||
1532 | With multiple switches, the configuration of bonding and the | 1532 | With multiple switches, the configuration of bonding and the |
1533 | network changes dramatically. In multiple switch topologies, there is | 1533 | network changes dramatically. In multiple switch topologies, there is |
1534 | a trade off between network availability and usable bandwidth. | 1534 | a trade off between network availability and usable bandwidth. |
1535 | 1535 | ||
1536 | Below is a sample network, configured to maximize the | 1536 | Below is a sample network, configured to maximize the |
1537 | availability of the network: | 1537 | availability of the network: |
1538 | 1538 | ||
1539 | | | | 1539 | | | |
1540 | |port3 port3| | 1540 | |port3 port3| |
1541 | +-----+----+ +-----+----+ | 1541 | +-----+----+ +-----+----+ |
1542 | | |port2 ISL port2| | | 1542 | | |port2 ISL port2| | |
1543 | | switch A +--------------------------+ switch B | | 1543 | | switch A +--------------------------+ switch B | |
1544 | | | | | | 1544 | | | | | |
1545 | +-----+----+ +-----++---+ | 1545 | +-----+----+ +-----++---+ |
1546 | |port1 port1| | 1546 | |port1 port1| |
1547 | | +-------+ | | 1547 | | +-------+ | |
1548 | +-------------+ host1 +---------------+ | 1548 | +-------------+ host1 +---------------+ |
1549 | eth0 +-------+ eth1 | 1549 | eth0 +-------+ eth1 |
1550 | 1550 | ||
1551 | In this configuration, there is a link between the two | 1551 | In this configuration, there is a link between the two |
1552 | switches (ISL, or inter switch link), and multiple ports connecting to | 1552 | switches (ISL, or inter switch link), and multiple ports connecting to |
1553 | the outside world ("port3" on each switch). There is no technical | 1553 | the outside world ("port3" on each switch). There is no technical |
1554 | reason that this could not be extended to a third switch. | 1554 | reason that this could not be extended to a third switch. |
1555 | 1555 | ||
1556 | 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology | 1556 | 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology |
1557 | ------------------------------------------------------------- | 1557 | ------------------------------------------------------------- |
1558 | 1558 | ||
1559 | In a topology such as the example above, the active-backup and | 1559 | In a topology such as the example above, the active-backup and |
1560 | broadcast modes are the only useful bonding modes when optimizing for | 1560 | broadcast modes are the only useful bonding modes when optimizing for |
1561 | availability; the other modes require all links to terminate on the | 1561 | availability; the other modes require all links to terminate on the |
1562 | same peer for them to behave rationally. | 1562 | same peer for them to behave rationally. |
1563 | 1563 | ||
1564 | active-backup: This is generally the preferred mode, particularly if | 1564 | active-backup: This is generally the preferred mode, particularly if |
1565 | the switches have an ISL and play together well. If the | 1565 | the switches have an ISL and play together well. If the |
1566 | network configuration is such that one switch is specifically | 1566 | network configuration is such that one switch is specifically |
1567 | a backup switch (e.g., has lower capacity, higher cost, etc), | 1567 | a backup switch (e.g., has lower capacity, higher cost, etc), |
1568 | then the primary option can be used to insure that the | 1568 | then the primary option can be used to insure that the |
1569 | preferred link is always used when it is available. | 1569 | preferred link is always used when it is available. |
1570 | 1570 | ||
1571 | broadcast: This mode is really a special purpose mode, and is suitable | 1571 | broadcast: This mode is really a special purpose mode, and is suitable |
1572 | only for very specific needs. For example, if the two | 1572 | only for very specific needs. For example, if the two |
1573 | switches are not connected (no ISL), and the networks beyond | 1573 | switches are not connected (no ISL), and the networks beyond |
1574 | them are totally independent. In this case, if it is | 1574 | them are totally independent. In this case, if it is |
1575 | necessary for some specific one-way traffic to reach both | 1575 | necessary for some specific one-way traffic to reach both |
1576 | independent networks, then the broadcast mode may be suitable. | 1576 | independent networks, then the broadcast mode may be suitable. |
1577 | 1577 | ||
1578 | 11.2.2 HA Link Monitoring Selection for Multiple Switch Topology | 1578 | 11.2.2 HA Link Monitoring Selection for Multiple Switch Topology |
1579 | ---------------------------------------------------------------- | 1579 | ---------------------------------------------------------------- |
1580 | 1580 | ||
1581 | The choice of link monitoring ultimately depends upon your | 1581 | The choice of link monitoring ultimately depends upon your |
1582 | switch. If the switch can reliably fail ports in response to other | 1582 | switch. If the switch can reliably fail ports in response to other |
1583 | failures, then either the MII or ARP monitors should work. For | 1583 | failures, then either the MII or ARP monitors should work. For |
1584 | example, in the above example, if the "port3" link fails at the remote | 1584 | example, in the above example, if the "port3" link fails at the remote |
1585 | end, the MII monitor has no direct means to detect this. The ARP | 1585 | end, the MII monitor has no direct means to detect this. The ARP |
1586 | monitor could be configured with a target at the remote end of port3, | 1586 | monitor could be configured with a target at the remote end of port3, |
1587 | thus detecting that failure without switch support. | 1587 | thus detecting that failure without switch support. |
1588 | 1588 | ||
1589 | In general, however, in a multiple switch topology, the ARP | 1589 | In general, however, in a multiple switch topology, the ARP |
1590 | monitor can provide a higher level of reliability in detecting end to | 1590 | monitor can provide a higher level of reliability in detecting end to |
1591 | end connectivity failures (which may be caused by the failure of any | 1591 | end connectivity failures (which may be caused by the failure of any |
1592 | individual component to pass traffic for any reason). Additionally, | 1592 | individual component to pass traffic for any reason). Additionally, |
1593 | the ARP monitor should be configured with multiple targets (at least | 1593 | the ARP monitor should be configured with multiple targets (at least |
1594 | one for each switch in the network). This will insure that, | 1594 | one for each switch in the network). This will insure that, |
1595 | regardless of which switch is active, the ARP monitor has a suitable | 1595 | regardless of which switch is active, the ARP monitor has a suitable |
1596 | target to query. | 1596 | target to query. |
1597 | 1597 | ||
1598 | 1598 | ||
1599 | 12. Configuring Bonding for Maximum Throughput | 1599 | 12. Configuring Bonding for Maximum Throughput |
1600 | ============================================== | 1600 | ============================================== |
1601 | 1601 | ||
1602 | 12.1 Maximizing Throughput in a Single Switch Topology | 1602 | 12.1 Maximizing Throughput in a Single Switch Topology |
1603 | ------------------------------------------------------ | 1603 | ------------------------------------------------------ |
1604 | 1604 | ||
1605 | In a single switch configuration, the best method to maximize | 1605 | In a single switch configuration, the best method to maximize |
1606 | throughput depends upon the application and network environment. The | 1606 | throughput depends upon the application and network environment. The |
1607 | various load balancing modes each have strengths and weaknesses in | 1607 | various load balancing modes each have strengths and weaknesses in |
1608 | different environments, as detailed below. | 1608 | different environments, as detailed below. |
1609 | 1609 | ||
1610 | For this discussion, we will break down the topologies into | 1610 | For this discussion, we will break down the topologies into |
1611 | two categories. Depending upon the destination of most traffic, we | 1611 | two categories. Depending upon the destination of most traffic, we |
1612 | categorize them into either "gatewayed" or "local" configurations. | 1612 | categorize them into either "gatewayed" or "local" configurations. |
1613 | 1613 | ||
1614 | In a gatewayed configuration, the "switch" is acting primarily | 1614 | In a gatewayed configuration, the "switch" is acting primarily |
1615 | as a router, and the majority of traffic passes through this router to | 1615 | as a router, and the majority of traffic passes through this router to |
1616 | other networks. An example would be the following: | 1616 | other networks. An example would be the following: |
1617 | 1617 | ||
1618 | 1618 | ||
1619 | +----------+ +----------+ | 1619 | +----------+ +----------+ |
1620 | | |eth0 port1| | to other networks | 1620 | | |eth0 port1| | to other networks |
1621 | | Host A +---------------------+ router +-------------------> | 1621 | | Host A +---------------------+ router +-------------------> |
1622 | | +---------------------+ | Hosts B and C are out | 1622 | | +---------------------+ | Hosts B and C are out |
1623 | | |eth1 port2| | here somewhere | 1623 | | |eth1 port2| | here somewhere |
1624 | +----------+ +----------+ | 1624 | +----------+ +----------+ |
1625 | 1625 | ||
1626 | The router may be a dedicated router device, or another host | 1626 | The router may be a dedicated router device, or another host |
1627 | acting as a gateway. For our discussion, the important point is that | 1627 | acting as a gateway. For our discussion, the important point is that |
1628 | the majority of traffic from Host A will pass through the router to | 1628 | the majority of traffic from Host A will pass through the router to |
1629 | some other network before reaching its final destination. | 1629 | some other network before reaching its final destination. |
1630 | 1630 | ||
1631 | In a gatewayed network configuration, although Host A may | 1631 | In a gatewayed network configuration, although Host A may |
1632 | communicate with many other systems, all of its traffic will be sent | 1632 | communicate with many other systems, all of its traffic will be sent |
1633 | and received via one other peer on the local network, the router. | 1633 | and received via one other peer on the local network, the router. |
1634 | 1634 | ||
1635 | Note that the case of two systems connected directly via | 1635 | Note that the case of two systems connected directly via |
1636 | multiple physical links is, for purposes of configuring bonding, the | 1636 | multiple physical links is, for purposes of configuring bonding, the |
1637 | same as a gatewayed configuration. In that case, it happens that all | 1637 | same as a gatewayed configuration. In that case, it happens that all |
1638 | traffic is destined for the "gateway" itself, not some other network | 1638 | traffic is destined for the "gateway" itself, not some other network |
1639 | beyond the gateway. | 1639 | beyond the gateway. |
1640 | 1640 | ||
1641 | In a local configuration, the "switch" is acting primarily as | 1641 | In a local configuration, the "switch" is acting primarily as |
1642 | a switch, and the majority of traffic passes through this switch to | 1642 | a switch, and the majority of traffic passes through this switch to |
1643 | reach other stations on the same network. An example would be the | 1643 | reach other stations on the same network. An example would be the |
1644 | following: | 1644 | following: |
1645 | 1645 | ||
1646 | +----------+ +----------+ +--------+ | 1646 | +----------+ +----------+ +--------+ |
1647 | | |eth0 port1| +-------+ Host B | | 1647 | | |eth0 port1| +-------+ Host B | |
1648 | | Host A +------------+ switch |port3 +--------+ | 1648 | | Host A +------------+ switch |port3 +--------+ |
1649 | | +------------+ | +--------+ | 1649 | | +------------+ | +--------+ |
1650 | | |eth1 port2| +------------------+ Host C | | 1650 | | |eth1 port2| +------------------+ Host C | |
1651 | +----------+ +----------+port4 +--------+ | 1651 | +----------+ +----------+port4 +--------+ |
1652 | 1652 | ||
1653 | 1653 | ||
1654 | Again, the switch may be a dedicated switch device, or another | 1654 | Again, the switch may be a dedicated switch device, or another |
1655 | host acting as a gateway. For our discussion, the important point is | 1655 | host acting as a gateway. For our discussion, the important point is |
1656 | that the majority of traffic from Host A is destined for other hosts | 1656 | that the majority of traffic from Host A is destined for other hosts |
1657 | on the same local network (Hosts B and C in the above example). | 1657 | on the same local network (Hosts B and C in the above example). |
1658 | 1658 | ||
1659 | In summary, in a gatewayed configuration, traffic to and from | 1659 | In summary, in a gatewayed configuration, traffic to and from |
1660 | the bonded device will be to the same MAC level peer on the network | 1660 | the bonded device will be to the same MAC level peer on the network |
1661 | (the gateway itself, i.e., the router), regardless of its final | 1661 | (the gateway itself, i.e., the router), regardless of its final |
1662 | destination. In a local configuration, traffic flows directly to and | 1662 | destination. In a local configuration, traffic flows directly to and |
1663 | from the final destinations, thus, each destination (Host B, Host C) | 1663 | from the final destinations, thus, each destination (Host B, Host C) |
1664 | will be addressed directly by their individual MAC addresses. | 1664 | will be addressed directly by their individual MAC addresses. |
1665 | 1665 | ||
1666 | This distinction between a gatewayed and a local network | 1666 | This distinction between a gatewayed and a local network |
1667 | configuration is important because many of the load balancing modes | 1667 | configuration is important because many of the load balancing modes |
1668 | available use the MAC addresses of the local network source and | 1668 | available use the MAC addresses of the local network source and |
1669 | destination to make load balancing decisions. The behavior of each | 1669 | destination to make load balancing decisions. The behavior of each |
1670 | mode is described below. | 1670 | mode is described below. |
1671 | 1671 | ||
1672 | 1672 | ||
1673 | 12.1.1 MT Bonding Mode Selection for Single Switch Topology | 1673 | 12.1.1 MT Bonding Mode Selection for Single Switch Topology |
1674 | ----------------------------------------------------------- | 1674 | ----------------------------------------------------------- |
1675 | 1675 | ||
1676 | This configuration is the easiest to set up and to understand, | 1676 | This configuration is the easiest to set up and to understand, |
1677 | although you will have to decide which bonding mode best suits your | 1677 | although you will have to decide which bonding mode best suits your |
1678 | needs. The trade offs for each mode are detailed below: | 1678 | needs. The trade offs for each mode are detailed below: |
1679 | 1679 | ||
1680 | balance-rr: This mode is the only mode that will permit a single | 1680 | balance-rr: This mode is the only mode that will permit a single |
1681 | TCP/IP connection to stripe traffic across multiple | 1681 | TCP/IP connection to stripe traffic across multiple |
1682 | interfaces. It is therefore the only mode that will allow a | 1682 | interfaces. It is therefore the only mode that will allow a |
1683 | single TCP/IP stream to utilize more than one interface's | 1683 | single TCP/IP stream to utilize more than one interface's |
1684 | worth of throughput. This comes at a cost, however: the | 1684 | worth of throughput. This comes at a cost, however: the |
1685 | striping often results in peer systems receiving packets out | 1685 | striping often results in peer systems receiving packets out |
1686 | of order, causing TCP/IP's congestion control system to kick | 1686 | of order, causing TCP/IP's congestion control system to kick |
1687 | in, often by retransmitting segments. | 1687 | in, often by retransmitting segments. |
1688 | 1688 | ||
1689 | It is possible to adjust TCP/IP's congestion limits by | 1689 | It is possible to adjust TCP/IP's congestion limits by |
1690 | altering the net.ipv4.tcp_reordering sysctl parameter. The | 1690 | altering the net.ipv4.tcp_reordering sysctl parameter. The |
1691 | usual default value is 3, and the maximum useful value is 127. | 1691 | usual default value is 3, and the maximum useful value is 127. |
1692 | For a four interface balance-rr bond, expect that a single | 1692 | For a four interface balance-rr bond, expect that a single |
1693 | TCP/IP stream will utilize no more than approximately 2.3 | 1693 | TCP/IP stream will utilize no more than approximately 2.3 |
1694 | interface's worth of throughput, even after adjusting | 1694 | interface's worth of throughput, even after adjusting |
1695 | tcp_reordering. | 1695 | tcp_reordering. |
1696 | 1696 | ||
1697 | Note that this out of order delivery occurs when both the | 1697 | Note that this out of order delivery occurs when both the |
1698 | sending and receiving systems are utilizing a multiple | 1698 | sending and receiving systems are utilizing a multiple |
1699 | interface bond. Consider a configuration in which a | 1699 | interface bond. Consider a configuration in which a |
1700 | balance-rr bond feeds into a single higher capacity network | 1700 | balance-rr bond feeds into a single higher capacity network |
1701 | channel (e.g., multiple 100Mb/sec ethernets feeding a single | 1701 | channel (e.g., multiple 100Mb/sec ethernets feeding a single |
1702 | gigabit ethernet via an etherchannel capable switch). In this | 1702 | gigabit ethernet via an etherchannel capable switch). In this |
1703 | configuration, traffic sent from the multiple 100Mb devices to | 1703 | configuration, traffic sent from the multiple 100Mb devices to |
1704 | a destination connected to the gigabit device will not see | 1704 | a destination connected to the gigabit device will not see |
1705 | packets out of order. However, traffic sent from the gigabit | 1705 | packets out of order. However, traffic sent from the gigabit |
1706 | device to the multiple 100Mb devices may or may not see | 1706 | device to the multiple 100Mb devices may or may not see |
1707 | traffic out of order, depending upon the balance policy of the | 1707 | traffic out of order, depending upon the balance policy of the |
1708 | switch. Many switches do not support any modes that stripe | 1708 | switch. Many switches do not support any modes that stripe |
1709 | traffic (instead choosing a port based upon IP or MAC level | 1709 | traffic (instead choosing a port based upon IP or MAC level |
1710 | addresses); for those devices, traffic flowing from the | 1710 | addresses); for those devices, traffic flowing from the |
1711 | gigabit device to the many 100Mb devices will only utilize one | 1711 | gigabit device to the many 100Mb devices will only utilize one |
1712 | interface. | 1712 | interface. |
1713 | 1713 | ||
1714 | If you are utilizing protocols other than TCP/IP, UDP for | 1714 | If you are utilizing protocols other than TCP/IP, UDP for |
1715 | example, and your application can tolerate out of order | 1715 | example, and your application can tolerate out of order |
1716 | delivery, then this mode can allow for single stream datagram | 1716 | delivery, then this mode can allow for single stream datagram |
1717 | performance that scales near linearly as interfaces are added | 1717 | performance that scales near linearly as interfaces are added |
1718 | to the bond. | 1718 | to the bond. |
1719 | 1719 | ||
1720 | This mode requires the switch to have the appropriate ports | 1720 | This mode requires the switch to have the appropriate ports |
1721 | configured for "etherchannel" or "trunking." | 1721 | configured for "etherchannel" or "trunking." |
1722 | 1722 | ||
1723 | active-backup: There is not much advantage in this network topology to | 1723 | active-backup: There is not much advantage in this network topology to |
1724 | the active-backup mode, as the inactive backup devices are all | 1724 | the active-backup mode, as the inactive backup devices are all |
1725 | connected to the same peer as the primary. In this case, a | 1725 | connected to the same peer as the primary. In this case, a |
1726 | load balancing mode (with link monitoring) will provide the | 1726 | load balancing mode (with link monitoring) will provide the |
1727 | same level of network availability, but with increased | 1727 | same level of network availability, but with increased |
1728 | available bandwidth. On the plus side, active-backup mode | 1728 | available bandwidth. On the plus side, active-backup mode |
1729 | does not require any configuration of the switch, so it may | 1729 | does not require any configuration of the switch, so it may |
1730 | have value if the hardware available does not support any of | 1730 | have value if the hardware available does not support any of |
1731 | the load balance modes. | 1731 | the load balance modes. |
1732 | 1732 | ||
1733 | balance-xor: This mode will limit traffic such that packets destined | 1733 | balance-xor: This mode will limit traffic such that packets destined |
1734 | for specific peers will always be sent over the same | 1734 | for specific peers will always be sent over the same |
1735 | interface. Since the destination is determined by the MAC | 1735 | interface. Since the destination is determined by the MAC |
1736 | addresses involved, this mode works best in a "local" network | 1736 | addresses involved, this mode works best in a "local" network |
1737 | configuration (as described above), with destinations all on | 1737 | configuration (as described above), with destinations all on |
1738 | the same local network. This mode is likely to be suboptimal | 1738 | the same local network. This mode is likely to be suboptimal |
1739 | if all your traffic is passed through a single router (i.e., a | 1739 | if all your traffic is passed through a single router (i.e., a |
1740 | "gatewayed" network configuration, as described above). | 1740 | "gatewayed" network configuration, as described above). |
1741 | 1741 | ||
1742 | As with balance-rr, the switch ports need to be configured for | 1742 | As with balance-rr, the switch ports need to be configured for |
1743 | "etherchannel" or "trunking." | 1743 | "etherchannel" or "trunking." |
1744 | 1744 | ||
1745 | broadcast: Like active-backup, there is not much advantage to this | 1745 | broadcast: Like active-backup, there is not much advantage to this |
1746 | mode in this type of network topology. | 1746 | mode in this type of network topology. |
1747 | 1747 | ||
1748 | 802.3ad: This mode can be a good choice for this type of network | 1748 | 802.3ad: This mode can be a good choice for this type of network |
1749 | topology. The 802.3ad mode is an IEEE standard, so all peers | 1749 | topology. The 802.3ad mode is an IEEE standard, so all peers |
1750 | that implement 802.3ad should interoperate well. The 802.3ad | 1750 | that implement 802.3ad should interoperate well. The 802.3ad |
1751 | protocol includes automatic configuration of the aggregates, | 1751 | protocol includes automatic configuration of the aggregates, |
1752 | so minimal manual configuration of the switch is needed | 1752 | so minimal manual configuration of the switch is needed |
1753 | (typically only to designate that some set of devices is | 1753 | (typically only to designate that some set of devices is |
1754 | available for 802.3ad). The 802.3ad standard also mandates | 1754 | available for 802.3ad). The 802.3ad standard also mandates |
1755 | that frames be delivered in order (within certain limits), so | 1755 | that frames be delivered in order (within certain limits), so |
1756 | in general single connections will not see misordering of | 1756 | in general single connections will not see misordering of |
1757 | packets. The 802.3ad mode does have some drawbacks: the | 1757 | packets. The 802.3ad mode does have some drawbacks: the |
1758 | standard mandates that all devices in the aggregate operate at | 1758 | standard mandates that all devices in the aggregate operate at |
1759 | the same speed and duplex. Also, as with all bonding load | 1759 | the same speed and duplex. Also, as with all bonding load |
1760 | balance modes other than balance-rr, no single connection will | 1760 | balance modes other than balance-rr, no single connection will |
1761 | be able to utilize more than a single interface's worth of | 1761 | be able to utilize more than a single interface's worth of |
1762 | bandwidth. | 1762 | bandwidth. |
1763 | 1763 | ||
1764 | Additionally, the linux bonding 802.3ad implementation | 1764 | Additionally, the linux bonding 802.3ad implementation |
1765 | distributes traffic by peer (using an XOR of MAC addresses), | 1765 | distributes traffic by peer (using an XOR of MAC addresses), |
1766 | so in a "gatewayed" configuration, all outgoing traffic will | 1766 | so in a "gatewayed" configuration, all outgoing traffic will |
1767 | generally use the same device. Incoming traffic may also end | 1767 | generally use the same device. Incoming traffic may also end |
1768 | up on a single device, but that is dependent upon the | 1768 | up on a single device, but that is dependent upon the |
1769 | balancing policy of the peer's 8023.ad implementation. In a | 1769 | balancing policy of the peer's 8023.ad implementation. In a |
1770 | "local" configuration, traffic will be distributed across the | 1770 | "local" configuration, traffic will be distributed across the |
1771 | devices in the bond. | 1771 | devices in the bond. |
1772 | 1772 | ||
1773 | Finally, the 802.3ad mode mandates the use of the MII monitor, | 1773 | Finally, the 802.3ad mode mandates the use of the MII monitor, |
1774 | therefore, the ARP monitor is not available in this mode. | 1774 | therefore, the ARP monitor is not available in this mode. |
1775 | 1775 | ||
1776 | balance-tlb: The balance-tlb mode balances outgoing traffic by peer. | 1776 | balance-tlb: The balance-tlb mode balances outgoing traffic by peer. |
1777 | Since the balancing is done according to MAC address, in a | 1777 | Since the balancing is done according to MAC address, in a |
1778 | "gatewayed" configuration (as described above), this mode will | 1778 | "gatewayed" configuration (as described above), this mode will |
1779 | send all traffic across a single device. However, in a | 1779 | send all traffic across a single device. However, in a |
1780 | "local" network configuration, this mode balances multiple | 1780 | "local" network configuration, this mode balances multiple |
1781 | local network peers across devices in a vaguely intelligent | 1781 | local network peers across devices in a vaguely intelligent |
1782 | manner (not a simple XOR as in balance-xor or 802.3ad mode), | 1782 | manner (not a simple XOR as in balance-xor or 802.3ad mode), |
1783 | so that mathematically unlucky MAC addresses (i.e., ones that | 1783 | so that mathematically unlucky MAC addresses (i.e., ones that |
1784 | XOR to the same value) will not all "bunch up" on a single | 1784 | XOR to the same value) will not all "bunch up" on a single |
1785 | interface. | 1785 | interface. |
1786 | 1786 | ||
1787 | Unlike 802.3ad, interfaces may be of differing speeds, and no | 1787 | Unlike 802.3ad, interfaces may be of differing speeds, and no |
1788 | special switch configuration is required. On the down side, | 1788 | special switch configuration is required. On the down side, |
1789 | in this mode all incoming traffic arrives over a single | 1789 | in this mode all incoming traffic arrives over a single |
1790 | interface, this mode requires certain ethtool support in the | 1790 | interface, this mode requires certain ethtool support in the |
1791 | network device driver of the slave interfaces, and the ARP | 1791 | network device driver of the slave interfaces, and the ARP |
1792 | monitor is not available. | 1792 | monitor is not available. |
1793 | 1793 | ||
1794 | balance-alb: This mode is everything that balance-tlb is, and more. | 1794 | balance-alb: This mode is everything that balance-tlb is, and more. |
1795 | It has all of the features (and restrictions) of balance-tlb, | 1795 | It has all of the features (and restrictions) of balance-tlb, |
1796 | and will also balance incoming traffic from local network | 1796 | and will also balance incoming traffic from local network |
1797 | peers (as described in the Bonding Module Options section, | 1797 | peers (as described in the Bonding Module Options section, |
1798 | above). | 1798 | above). |
1799 | 1799 | ||
1800 | The only additional down side to this mode is that the network | 1800 | The only additional down side to this mode is that the network |
1801 | device driver must support changing the hardware address while | 1801 | device driver must support changing the hardware address while |
1802 | the device is open. | 1802 | the device is open. |
1803 | 1803 | ||
1804 | 12.1.2 MT Link Monitoring for Single Switch Topology | 1804 | 12.1.2 MT Link Monitoring for Single Switch Topology |
1805 | ---------------------------------------------------- | 1805 | ---------------------------------------------------- |
1806 | 1806 | ||
1807 | The choice of link monitoring may largely depend upon which | 1807 | The choice of link monitoring may largely depend upon which |
1808 | mode you choose to use. The more advanced load balancing modes do not | 1808 | mode you choose to use. The more advanced load balancing modes do not |
1809 | support the use of the ARP monitor, and are thus restricted to using | 1809 | support the use of the ARP monitor, and are thus restricted to using |
1810 | the MII monitor (which does not provide as high a level of end to end | 1810 | the MII monitor (which does not provide as high a level of end to end |
1811 | assurance as the ARP monitor). | 1811 | assurance as the ARP monitor). |
1812 | 1812 | ||
1813 | 12.2 Maximum Throughput in a Multiple Switch Topology | 1813 | 12.2 Maximum Throughput in a Multiple Switch Topology |
1814 | ----------------------------------------------------- | 1814 | ----------------------------------------------------- |
1815 | 1815 | ||
1816 | Multiple switches may be utilized to optimize for throughput | 1816 | Multiple switches may be utilized to optimize for throughput |
1817 | when they are configured in parallel as part of an isolated network | 1817 | when they are configured in parallel as part of an isolated network |
1818 | between two or more systems, for example: | 1818 | between two or more systems, for example: |
1819 | 1819 | ||
1820 | +-----------+ | 1820 | +-----------+ |
1821 | | Host A | | 1821 | | Host A | |
1822 | +-+---+---+-+ | 1822 | +-+---+---+-+ |
1823 | | | | | 1823 | | | | |
1824 | +--------+ | +---------+ | 1824 | +--------+ | +---------+ |
1825 | | | | | 1825 | | | | |
1826 | +------+---+ +-----+----+ +-----+----+ | 1826 | +------+---+ +-----+----+ +-----+----+ |
1827 | | Switch A | | Switch B | | Switch C | | 1827 | | Switch A | | Switch B | | Switch C | |
1828 | +------+---+ +-----+----+ +-----+----+ | 1828 | +------+---+ +-----+----+ +-----+----+ |
1829 | | | | | 1829 | | | | |
1830 | +--------+ | +---------+ | 1830 | +--------+ | +---------+ |
1831 | | | | | 1831 | | | | |
1832 | +-+---+---+-+ | 1832 | +-+---+---+-+ |
1833 | | Host B | | 1833 | | Host B | |
1834 | +-----------+ | 1834 | +-----------+ |
1835 | 1835 | ||
1836 | In this configuration, the switches are isolated from one | 1836 | In this configuration, the switches are isolated from one |
1837 | another. One reason to employ a topology such as this is for an | 1837 | another. One reason to employ a topology such as this is for an |
1838 | isolated network with many hosts (a cluster configured for high | 1838 | isolated network with many hosts (a cluster configured for high |
1839 | performance, for example), using multiple smaller switches can be more | 1839 | performance, for example), using multiple smaller switches can be more |
1840 | cost effective than a single larger switch, e.g., on a network with 24 | 1840 | cost effective than a single larger switch, e.g., on a network with 24 |
1841 | hosts, three 24 port switches can be significantly less expensive than | 1841 | hosts, three 24 port switches can be significantly less expensive than |
1842 | a single 72 port switch. | 1842 | a single 72 port switch. |
1843 | 1843 | ||
1844 | If access beyond the network is required, an individual host | 1844 | If access beyond the network is required, an individual host |
1845 | can be equipped with an additional network device connected to an | 1845 | can be equipped with an additional network device connected to an |
1846 | external network; this host then additionally acts as a gateway. | 1846 | external network; this host then additionally acts as a gateway. |
1847 | 1847 | ||
1848 | 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology | 1848 | 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology |
1849 | ------------------------------------------------------------- | 1849 | ------------------------------------------------------------- |
1850 | 1850 | ||
1851 | In actual practice, the bonding mode typically employed in | 1851 | In actual practice, the bonding mode typically employed in |
1852 | configurations of this type is balance-rr. Historically, in this | 1852 | configurations of this type is balance-rr. Historically, in this |
1853 | network configuration, the usual caveats about out of order packet | 1853 | network configuration, the usual caveats about out of order packet |
1854 | delivery are mitigated by the use of network adapters that do not do | 1854 | delivery are mitigated by the use of network adapters that do not do |
1855 | any kind of packet coalescing (via the use of NAPI, or because the | 1855 | any kind of packet coalescing (via the use of NAPI, or because the |
1856 | device itself does not generate interrupts until some number of | 1856 | device itself does not generate interrupts until some number of |
1857 | packets has arrived). When employed in this fashion, the balance-rr | 1857 | packets has arrived). When employed in this fashion, the balance-rr |
1858 | mode allows individual connections between two hosts to effectively | 1858 | mode allows individual connections between two hosts to effectively |
1859 | utilize greater than one interface's bandwidth. | 1859 | utilize greater than one interface's bandwidth. |
1860 | 1860 | ||
1861 | 12.2.2 MT Link Monitoring for Multiple Switch Topology | 1861 | 12.2.2 MT Link Monitoring for Multiple Switch Topology |
1862 | ------------------------------------------------------ | 1862 | ------------------------------------------------------ |
1863 | 1863 | ||
1864 | Again, in actual practice, the MII monitor is most often used | 1864 | Again, in actual practice, the MII monitor is most often used |
1865 | in this configuration, as performance is given preference over | 1865 | in this configuration, as performance is given preference over |
1866 | availability. The ARP monitor will function in this topology, but its | 1866 | availability. The ARP monitor will function in this topology, but its |
1867 | advantages over the MII monitor are mitigated by the volume of probes | 1867 | advantages over the MII monitor are mitigated by the volume of probes |
1868 | needed as the number of systems involved grows (remember that each | 1868 | needed as the number of systems involved grows (remember that each |
1869 | host in the network is configured with bonding). | 1869 | host in the network is configured with bonding). |
1870 | 1870 | ||
1871 | 13. Switch Behavior Issues | 1871 | 13. Switch Behavior Issues |
1872 | ========================== | 1872 | ========================== |
1873 | 1873 | ||
1874 | 13.1 Link Establishment and Failover Delays | 1874 | 13.1 Link Establishment and Failover Delays |
1875 | ------------------------------------------- | 1875 | ------------------------------------------- |
1876 | 1876 | ||
1877 | Some switches exhibit undesirable behavior with regard to the | 1877 | Some switches exhibit undesirable behavior with regard to the |
1878 | timing of link up and down reporting by the switch. | 1878 | timing of link up and down reporting by the switch. |
1879 | 1879 | ||
1880 | First, when a link comes up, some switches may indicate that | 1880 | First, when a link comes up, some switches may indicate that |
1881 | the link is up (carrier available), but not pass traffic over the | 1881 | the link is up (carrier available), but not pass traffic over the |
1882 | interface for some period of time. This delay is typically due to | 1882 | interface for some period of time. This delay is typically due to |
1883 | some type of autonegotiation or routing protocol, but may also occur | 1883 | some type of autonegotiation or routing protocol, but may also occur |
1884 | during switch initialization (e.g., during recovery after a switch | 1884 | during switch initialization (e.g., during recovery after a switch |
1885 | failure). If you find this to be a problem, specify an appropriate | 1885 | failure). If you find this to be a problem, specify an appropriate |
1886 | value to the updelay bonding module option to delay the use of the | 1886 | value to the updelay bonding module option to delay the use of the |
1887 | relevant interface(s). | 1887 | relevant interface(s). |
1888 | 1888 | ||
1889 | Second, some switches may "bounce" the link state one or more | 1889 | Second, some switches may "bounce" the link state one or more |
1890 | times while a link is changing state. This occurs most commonly while | 1890 | times while a link is changing state. This occurs most commonly while |
1891 | the switch is initializing. Again, an appropriate updelay value may | 1891 | the switch is initializing. Again, an appropriate updelay value may |
1892 | help. | 1892 | help. |
1893 | 1893 | ||
1894 | Note that when a bonding interface has no active links, the | 1894 | Note that when a bonding interface has no active links, the |
1895 | driver will immediately reuse the first link that goes up, even if the | 1895 | driver will immediately reuse the first link that goes up, even if the |
1896 | updelay parameter has been specified (the updelay is ignored in this | 1896 | updelay parameter has been specified (the updelay is ignored in this |
1897 | case). If there are slave interfaces waiting for the updelay timeout | 1897 | case). If there are slave interfaces waiting for the updelay timeout |
1898 | to expire, the interface that first went into that state will be | 1898 | to expire, the interface that first went into that state will be |
1899 | immediately reused. This reduces down time of the network if the | 1899 | immediately reused. This reduces down time of the network if the |
1900 | value of updelay has been overestimated, and since this occurs only in | 1900 | value of updelay has been overestimated, and since this occurs only in |
1901 | cases with no connectivity, there is no additional penalty for | 1901 | cases with no connectivity, there is no additional penalty for |
1902 | ignoring the updelay. | 1902 | ignoring the updelay. |
1903 | 1903 | ||
1904 | In addition to the concerns about switch timings, if your | 1904 | In addition to the concerns about switch timings, if your |
1905 | switches take a long time to go into backup mode, it may be desirable | 1905 | switches take a long time to go into backup mode, it may be desirable |
1906 | to not activate a backup interface immediately after a link goes down. | 1906 | to not activate a backup interface immediately after a link goes down. |
1907 | Failover may be delayed via the downdelay bonding module option. | 1907 | Failover may be delayed via the downdelay bonding module option. |
1908 | 1908 | ||
1909 | 13.2 Duplicated Incoming Packets | 1909 | 13.2 Duplicated Incoming Packets |
1910 | -------------------------------- | 1910 | -------------------------------- |
1911 | 1911 | ||
1912 | It is not uncommon to observe a short burst of duplicated | 1912 | It is not uncommon to observe a short burst of duplicated |
1913 | traffic when the bonding device is first used, or after it has been | 1913 | traffic when the bonding device is first used, or after it has been |
1914 | idle for some period of time. This is most easily observed by issuing | 1914 | idle for some period of time. This is most easily observed by issuing |
1915 | a "ping" to some other host on the network, and noticing that the | 1915 | a "ping" to some other host on the network, and noticing that the |
1916 | output from ping flags duplicates (typically one per slave). | 1916 | output from ping flags duplicates (typically one per slave). |
1917 | 1917 | ||
1918 | For example, on a bond in active-backup mode with five slaves | 1918 | For example, on a bond in active-backup mode with five slaves |
1919 | all connected to one switch, the output may appear as follows: | 1919 | all connected to one switch, the output may appear as follows: |
1920 | 1920 | ||
1921 | # ping -n 10.0.4.2 | 1921 | # ping -n 10.0.4.2 |
1922 | PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. | 1922 | PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. |
1923 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms | 1923 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms |
1924 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | 1924 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) |
1925 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | 1925 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) |
1926 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | 1926 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) |
1927 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | 1927 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) |
1928 | 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms | 1928 | 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms |
1929 | 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms | 1929 | 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms |
1930 | 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms | 1930 | 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms |
1931 | 1931 | ||
1932 | This is not due to an error in the bonding driver, rather, it | 1932 | This is not due to an error in the bonding driver, rather, it |
1933 | is a side effect of how many switches update their MAC forwarding | 1933 | is a side effect of how many switches update their MAC forwarding |
1934 | tables. Initially, the switch does not associate the MAC address in | 1934 | tables. Initially, the switch does not associate the MAC address in |
1935 | the packet with a particular switch port, and so it may send the | 1935 | the packet with a particular switch port, and so it may send the |
1936 | traffic to all ports until its MAC forwarding table is updated. Since | 1936 | traffic to all ports until its MAC forwarding table is updated. Since |
1937 | the interfaces attached to the bond may occupy multiple ports on a | 1937 | the interfaces attached to the bond may occupy multiple ports on a |
1938 | single switch, when the switch (temporarily) floods the traffic to all | 1938 | single switch, when the switch (temporarily) floods the traffic to all |
1939 | ports, the bond device receives multiple copies of the same packet | 1939 | ports, the bond device receives multiple copies of the same packet |
1940 | (one per slave device). | 1940 | (one per slave device). |
1941 | 1941 | ||
1942 | The duplicated packet behavior is switch dependent, some | 1942 | The duplicated packet behavior is switch dependent, some |
1943 | switches exhibit this, and some do not. On switches that display this | 1943 | switches exhibit this, and some do not. On switches that display this |
1944 | behavior, it can be induced by clearing the MAC forwarding table (on | 1944 | behavior, it can be induced by clearing the MAC forwarding table (on |
1945 | most Cisco switches, the privileged command "clear mac address-table | 1945 | most Cisco switches, the privileged command "clear mac address-table |
1946 | dynamic" will accomplish this). | 1946 | dynamic" will accomplish this). |
1947 | 1947 | ||
1948 | 14. Hardware Specific Considerations | 1948 | 14. Hardware Specific Considerations |
1949 | ==================================== | 1949 | ==================================== |
1950 | 1950 | ||
1951 | This section contains additional information for configuring | 1951 | This section contains additional information for configuring |
1952 | bonding on specific hardware platforms, or for interfacing bonding | 1952 | bonding on specific hardware platforms, or for interfacing bonding |
1953 | with particular switches or other devices. | 1953 | with particular switches or other devices. |
1954 | 1954 | ||
1955 | 14.1 IBM BladeCenter | 1955 | 14.1 IBM BladeCenter |
1956 | -------------------- | 1956 | -------------------- |
1957 | 1957 | ||
1958 | This applies to the JS20 and similar systems. | 1958 | This applies to the JS20 and similar systems. |
1959 | 1959 | ||
1960 | On the JS20 blades, the bonding driver supports only | 1960 | On the JS20 blades, the bonding driver supports only |
1961 | balance-rr, active-backup, balance-tlb and balance-alb modes. This is | 1961 | balance-rr, active-backup, balance-tlb and balance-alb modes. This is |
1962 | largely due to the network topology inside the BladeCenter, detailed | 1962 | largely due to the network topology inside the BladeCenter, detailed |
1963 | below. | 1963 | below. |
1964 | 1964 | ||
1965 | JS20 network adapter information | 1965 | JS20 network adapter information |
1966 | -------------------------------- | 1966 | -------------------------------- |
1967 | 1967 | ||
1968 | All JS20s come with two Broadcom Gigabit Ethernet ports | 1968 | All JS20s come with two Broadcom Gigabit Ethernet ports |
1969 | integrated on the planar (that's "motherboard" in IBM-speak). In the | 1969 | integrated on the planar (that's "motherboard" in IBM-speak). In the |
1970 | BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to | 1970 | BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to |
1971 | I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. | 1971 | I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. |
1972 | An add-on Broadcom daughter card can be installed on a JS20 to provide | 1972 | An add-on Broadcom daughter card can be installed on a JS20 to provide |
1973 | two more Gigabit Ethernet ports. These ports, eth2 and eth3, are | 1973 | two more Gigabit Ethernet ports. These ports, eth2 and eth3, are |
1974 | wired to I/O Modules 3 and 4, respectively. | 1974 | wired to I/O Modules 3 and 4, respectively. |
1975 | 1975 | ||
1976 | Each I/O Module may contain either a switch or a passthrough | 1976 | Each I/O Module may contain either a switch or a passthrough |
1977 | module (which allows ports to be directly connected to an external | 1977 | module (which allows ports to be directly connected to an external |
1978 | switch). Some bonding modes require a specific BladeCenter internal | 1978 | switch). Some bonding modes require a specific BladeCenter internal |
1979 | network topology in order to function; these are detailed below. | 1979 | network topology in order to function; these are detailed below. |
1980 | 1980 | ||
1981 | Additional BladeCenter-specific networking information can be | 1981 | Additional BladeCenter-specific networking information can be |
1982 | found in two IBM Redbooks (www.ibm.com/redbooks): | 1982 | found in two IBM Redbooks (www.ibm.com/redbooks): |
1983 | 1983 | ||
1984 | "IBM eServer BladeCenter Networking Options" | 1984 | "IBM eServer BladeCenter Networking Options" |
1985 | "IBM eServer BladeCenter Layer 2-7 Network Switching" | 1985 | "IBM eServer BladeCenter Layer 2-7 Network Switching" |
1986 | 1986 | ||
1987 | BladeCenter networking configuration | 1987 | BladeCenter networking configuration |
1988 | ------------------------------------ | 1988 | ------------------------------------ |
1989 | 1989 | ||
1990 | Because a BladeCenter can be configured in a very large number | 1990 | Because a BladeCenter can be configured in a very large number |
1991 | of ways, this discussion will be confined to describing basic | 1991 | of ways, this discussion will be confined to describing basic |
1992 | configurations. | 1992 | configurations. |
1993 | 1993 | ||
1994 | Normally, Ethernet Switch Modules (ESMs) are used in I/O | 1994 | Normally, Ethernet Switch Modules (ESMs) are used in I/O |
1995 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a | 1995 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a |
1996 | JS20 will be connected to different internal switches (in the | 1996 | JS20 will be connected to different internal switches (in the |
1997 | respective I/O modules). | 1997 | respective I/O modules). |
1998 | 1998 | ||
1999 | A passthrough module (OPM or CPM, optical or copper, | 1999 | A passthrough module (OPM or CPM, optical or copper, |
2000 | passthrough module) connects the I/O module directly to an external | 2000 | passthrough module) connects the I/O module directly to an external |
2001 | switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 | 2001 | switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 |
2002 | interfaces of a JS20 can be redirected to the outside world and | 2002 | interfaces of a JS20 can be redirected to the outside world and |
2003 | connected to a common external switch. | 2003 | connected to a common external switch. |
2004 | 2004 | ||
2005 | Depending upon the mix of ESMs and PMs, the network will | 2005 | Depending upon the mix of ESMs and PMs, the network will |
2006 | appear to bonding as either a single switch topology (all PMs) or as a | 2006 | appear to bonding as either a single switch topology (all PMs) or as a |
2007 | multiple switch topology (one or more ESMs, zero or more PMs). It is | 2007 | multiple switch topology (one or more ESMs, zero or more PMs). It is |
2008 | also possible to connect ESMs together, resulting in a configuration | 2008 | also possible to connect ESMs together, resulting in a configuration |
2009 | much like the example in "High Availability in a Multiple Switch | 2009 | much like the example in "High Availability in a Multiple Switch |
2010 | Topology," above. | 2010 | Topology," above. |
2011 | 2011 | ||
2012 | Requirements for specific modes | 2012 | Requirements for specific modes |
2013 | ------------------------------- | 2013 | ------------------------------- |
2014 | 2014 | ||
2015 | The balance-rr mode requires the use of passthrough modules | 2015 | The balance-rr mode requires the use of passthrough modules |
2016 | for devices in the bond, all connected to an common external switch. | 2016 | for devices in the bond, all connected to an common external switch. |
2017 | That switch must be configured for "etherchannel" or "trunking" on the | 2017 | That switch must be configured for "etherchannel" or "trunking" on the |
2018 | appropriate ports, as is usual for balance-rr. | 2018 | appropriate ports, as is usual for balance-rr. |
2019 | 2019 | ||
2020 | The balance-alb and balance-tlb modes will function with | 2020 | The balance-alb and balance-tlb modes will function with |
2021 | either switch modules or passthrough modules (or a mix). The only | 2021 | either switch modules or passthrough modules (or a mix). The only |
2022 | specific requirement for these modes is that all network interfaces | 2022 | specific requirement for these modes is that all network interfaces |
2023 | must be able to reach all destinations for traffic sent over the | 2023 | must be able to reach all destinations for traffic sent over the |
2024 | bonding device (i.e., the network must converge at some point outside | 2024 | bonding device (i.e., the network must converge at some point outside |
2025 | the BladeCenter). | 2025 | the BladeCenter). |
2026 | 2026 | ||
2027 | The active-backup mode has no additional requirements. | 2027 | The active-backup mode has no additional requirements. |
2028 | 2028 | ||
2029 | Link monitoring issues | 2029 | Link monitoring issues |
2030 | ---------------------- | 2030 | ---------------------- |
2031 | 2031 | ||
2032 | When an Ethernet Switch Module is in place, only the ARP | 2032 | When an Ethernet Switch Module is in place, only the ARP |
2033 | monitor will reliably detect link loss to an external switch. This is | 2033 | monitor will reliably detect link loss to an external switch. This is |
2034 | nothing unusual, but examination of the BladeCenter cabinet would | 2034 | nothing unusual, but examination of the BladeCenter cabinet would |
2035 | suggest that the "external" network ports are the ethernet ports for | 2035 | suggest that the "external" network ports are the ethernet ports for |
2036 | the system, when it fact there is a switch between these "external" | 2036 | the system, when it fact there is a switch between these "external" |
2037 | ports and the devices on the JS20 system itself. The MII monitor is | 2037 | ports and the devices on the JS20 system itself. The MII monitor is |
2038 | only able to detect link failures between the ESM and the JS20 system. | 2038 | only able to detect link failures between the ESM and the JS20 system. |
2039 | 2039 | ||
2040 | When a passthrough module is in place, the MII monitor does | 2040 | When a passthrough module is in place, the MII monitor does |
2041 | detect failures to the "external" port, which is then directly | 2041 | detect failures to the "external" port, which is then directly |
2042 | connected to the JS20 system. | 2042 | connected to the JS20 system. |
2043 | 2043 | ||
2044 | Other concerns | 2044 | Other concerns |
2045 | -------------- | 2045 | -------------- |
2046 | 2046 | ||
2047 | The Serial Over LAN (SoL) link is established over the primary | 2047 | The Serial Over LAN (SoL) link is established over the primary |
2048 | ethernet (eth0) only, therefore, any loss of link to eth0 will result | 2048 | ethernet (eth0) only, therefore, any loss of link to eth0 will result |
2049 | in losing your SoL connection. It will not fail over with other | 2049 | in losing your SoL connection. It will not fail over with other |
2050 | network traffic, as the SoL system is beyond the control of the | 2050 | network traffic, as the SoL system is beyond the control of the |
2051 | bonding driver. | 2051 | bonding driver. |
2052 | 2052 | ||
2053 | It may be desirable to disable spanning tree on the switch | 2053 | It may be desirable to disable spanning tree on the switch |
2054 | (either the internal Ethernet Switch Module, or an external switch) to | 2054 | (either the internal Ethernet Switch Module, or an external switch) to |
2055 | avoid fail-over delay issues when using bonding. | 2055 | avoid fail-over delay issues when using bonding. |
2056 | 2056 | ||
2057 | 2057 | ||
2058 | 15. Frequently Asked Questions | 2058 | 15. Frequently Asked Questions |
2059 | ============================== | 2059 | ============================== |
2060 | 2060 | ||
2061 | 1. Is it SMP safe? | 2061 | 1. Is it SMP safe? |
2062 | 2062 | ||
2063 | Yes. The old 2.0.xx channel bonding patch was not SMP safe. | 2063 | Yes. The old 2.0.xx channel bonding patch was not SMP safe. |
2064 | The new driver was designed to be SMP safe from the start. | 2064 | The new driver was designed to be SMP safe from the start. |
2065 | 2065 | ||
2066 | 2. What type of cards will work with it? | 2066 | 2. What type of cards will work with it? |
2067 | 2067 | ||
2068 | Any Ethernet type cards (you can even mix cards - a Intel | 2068 | Any Ethernet type cards (you can even mix cards - a Intel |
2069 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, | 2069 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, |
2070 | devices need not be of the same speed. | 2070 | devices need not be of the same speed. |
2071 | 2071 | ||
2072 | 3. How many bonding devices can I have? | 2072 | 3. How many bonding devices can I have? |
2073 | 2073 | ||
2074 | There is no limit. | 2074 | There is no limit. |
2075 | 2075 | ||
2076 | 4. How many slaves can a bonding device have? | 2076 | 4. How many slaves can a bonding device have? |
2077 | 2077 | ||
2078 | This is limited only by the number of network interfaces Linux | 2078 | This is limited only by the number of network interfaces Linux |
2079 | supports and/or the number of network cards you can place in your | 2079 | supports and/or the number of network cards you can place in your |
2080 | system. | 2080 | system. |
2081 | 2081 | ||
2082 | 5. What happens when a slave link dies? | 2082 | 5. What happens when a slave link dies? |
2083 | 2083 | ||
2084 | If link monitoring is enabled, then the failing device will be | 2084 | If link monitoring is enabled, then the failing device will be |
2085 | disabled. The active-backup mode will fail over to a backup link, and | 2085 | disabled. The active-backup mode will fail over to a backup link, and |
2086 | other modes will ignore the failed link. The link will continue to be | 2086 | other modes will ignore the failed link. The link will continue to be |
2087 | monitored, and should it recover, it will rejoin the bond (in whatever | 2087 | monitored, and should it recover, it will rejoin the bond (in whatever |
2088 | manner is appropriate for the mode). See the sections on High | 2088 | manner is appropriate for the mode). See the sections on High |
2089 | Availability and the documentation for each mode for additional | 2089 | Availability and the documentation for each mode for additional |
2090 | information. | 2090 | information. |
2091 | 2091 | ||
2092 | Link monitoring can be enabled via either the miimon or | 2092 | Link monitoring can be enabled via either the miimon or |
2093 | arp_interval parameters (described in the module parameters section, | 2093 | arp_interval parameters (described in the module parameters section, |
2094 | above). In general, miimon monitors the carrier state as sensed by | 2094 | above). In general, miimon monitors the carrier state as sensed by |
2095 | the underlying network device, and the arp monitor (arp_interval) | 2095 | the underlying network device, and the arp monitor (arp_interval) |
2096 | monitors connectivity to another host on the local network. | 2096 | monitors connectivity to another host on the local network. |
2097 | 2097 | ||
2098 | If no link monitoring is configured, the bonding driver will | 2098 | If no link monitoring is configured, the bonding driver will |
2099 | be unable to detect link failures, and will assume that all links are | 2099 | be unable to detect link failures, and will assume that all links are |
2100 | always available. This will likely result in lost packets, and a | 2100 | always available. This will likely result in lost packets, and a |
2101 | resulting degradation of performance. The precise performance loss | 2101 | resulting degradation of performance. The precise performance loss |
2102 | depends upon the bonding mode and network configuration. | 2102 | depends upon the bonding mode and network configuration. |
2103 | 2103 | ||
2104 | 6. Can bonding be used for High Availability? | 2104 | 6. Can bonding be used for High Availability? |
2105 | 2105 | ||
2106 | Yes. See the section on High Availability for details. | 2106 | Yes. See the section on High Availability for details. |
2107 | 2107 | ||
2108 | 7. Which switches/systems does it work with? | 2108 | 7. Which switches/systems does it work with? |
2109 | 2109 | ||
2110 | The full answer to this depends upon the desired mode. | 2110 | The full answer to this depends upon the desired mode. |
2111 | 2111 | ||
2112 | In the basic balance modes (balance-rr and balance-xor), it | 2112 | In the basic balance modes (balance-rr and balance-xor), it |
2113 | works with any system that supports etherchannel (also called | 2113 | works with any system that supports etherchannel (also called |
2114 | trunking). Most managed switches currently available have such | 2114 | trunking). Most managed switches currently available have such |
2115 | support, and many unmanaged switches as well. | 2115 | support, and many unmanaged switches as well. |
2116 | 2116 | ||
2117 | The advanced balance modes (balance-tlb and balance-alb) do | 2117 | The advanced balance modes (balance-tlb and balance-alb) do |
2118 | not have special switch requirements, but do need device drivers that | 2118 | not have special switch requirements, but do need device drivers that |
2119 | support specific features (described in the appropriate section under | 2119 | support specific features (described in the appropriate section under |
2120 | module parameters, above). | 2120 | module parameters, above). |
2121 | 2121 | ||
2122 | In 802.3ad mode, it works with systems that support IEEE | 2122 | In 802.3ad mode, it works with systems that support IEEE |
2123 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged | 2123 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged |
2124 | switches currently available support 802.3ad. | 2124 | switches currently available support 802.3ad. |
2125 | 2125 | ||
2126 | The active-backup mode should work with any Layer-II switch. | 2126 | The active-backup mode should work with any Layer-II switch. |
2127 | 2127 | ||
2128 | 8. Where does a bonding device get its MAC address from? | 2128 | 8. Where does a bonding device get its MAC address from? |
2129 | 2129 | ||
2130 | If not explicitly configured (with ifconfig or ip link), the | 2130 | If not explicitly configured (with ifconfig or ip link), the |
2131 | MAC address of the bonding device is taken from its first slave | 2131 | MAC address of the bonding device is taken from its first slave |
2132 | device. This MAC address is then passed to all following slaves and | 2132 | device. This MAC address is then passed to all following slaves and |
2133 | remains persistent (even if the first slave is removed) until the | 2133 | remains persistent (even if the first slave is removed) until the |
2134 | bonding device is brought down or reconfigured. | 2134 | bonding device is brought down or reconfigured. |
2135 | 2135 | ||
2136 | If you wish to change the MAC address, you can set it with | 2136 | If you wish to change the MAC address, you can set it with |
2137 | ifconfig or ip link: | 2137 | ifconfig or ip link: |
2138 | 2138 | ||
2139 | # ifconfig bond0 hw ether 00:11:22:33:44:55 | 2139 | # ifconfig bond0 hw ether 00:11:22:33:44:55 |
2140 | 2140 | ||
2141 | # ip link set bond0 address 66:77:88:99:aa:bb | 2141 | # ip link set bond0 address 66:77:88:99:aa:bb |
2142 | 2142 | ||
2143 | The MAC address can be also changed by bringing down/up the | 2143 | The MAC address can be also changed by bringing down/up the |
2144 | device and then changing its slaves (or their order): | 2144 | device and then changing its slaves (or their order): |
2145 | 2145 | ||
2146 | # ifconfig bond0 down ; modprobe -r bonding | 2146 | # ifconfig bond0 down ; modprobe -r bonding |
2147 | # ifconfig bond0 .... up | 2147 | # ifconfig bond0 .... up |
2148 | # ifenslave bond0 eth... | 2148 | # ifenslave bond0 eth... |
2149 | 2149 | ||
2150 | This method will automatically take the address from the next | 2150 | This method will automatically take the address from the next |
2151 | slave that is added. | 2151 | slave that is added. |
2152 | 2152 | ||
2153 | To restore your slaves' MAC addresses, you need to detach them | 2153 | To restore your slaves' MAC addresses, you need to detach them |
2154 | from the bond (`ifenslave -d bond0 eth0'). The bonding driver will | 2154 | from the bond (`ifenslave -d bond0 eth0'). The bonding driver will |
2155 | then restore the MAC addresses that the slaves had before they were | 2155 | then restore the MAC addresses that the slaves had before they were |
2156 | enslaved. | 2156 | enslaved. |
2157 | 2157 | ||
2158 | 16. Resources and Links | 2158 | 16. Resources and Links |
2159 | ======================= | 2159 | ======================= |
2160 | 2160 | ||
2161 | The latest version of the bonding driver can be found in the latest | 2161 | The latest version of the bonding driver can be found in the latest |
2162 | version of the linux kernel, found on http://kernel.org | 2162 | version of the linux kernel, found on http://kernel.org |
2163 | 2163 | ||
2164 | The latest version of this document can be found in either the latest | 2164 | The latest version of this document can be found in either the latest |
2165 | kernel source (named Documentation/networking/bonding.txt), or on the | 2165 | kernel source (named Documentation/networking/bonding.txt), or on the |
2166 | bonding sourceforge site: | 2166 | bonding sourceforge site: |
2167 | 2167 | ||
2168 | http://www.sourceforge.net/projects/bonding | 2168 | http://www.sourceforge.net/projects/bonding |
2169 | 2169 | ||
2170 | Discussions regarding the bonding driver take place primarily on the | 2170 | Discussions regarding the bonding driver take place primarily on the |
2171 | bonding-devel mailing list, hosted at sourceforge.net. If you have | 2171 | bonding-devel mailing list, hosted at sourceforge.net. If you have |
2172 | questions or problems, post them to the list. The list address is: | 2172 | questions or problems, post them to the list. The list address is: |
2173 | 2173 | ||
2174 | bonding-devel@lists.sourceforge.net | 2174 | bonding-devel@lists.sourceforge.net |
2175 | 2175 | ||
2176 | The administrative interface (to subscribe or unsubscribe) can | 2176 | The administrative interface (to subscribe or unsubscribe) can |
2177 | be found at: | 2177 | be found at: |
2178 | 2178 | ||
2179 | https://lists.sourceforge.net/lists/listinfo/bonding-devel | 2179 | https://lists.sourceforge.net/lists/listinfo/bonding-devel |
2180 | 2180 | ||
2181 | Donald Becker's Ethernet Drivers and diag programs may be found at : | 2181 | Donald Becker's Ethernet Drivers and diag programs may be found at : |
2182 | - http://www.scyld.com/network/ | 2182 | - http://www.scyld.com/network/ |
2183 | 2183 | ||
2184 | You will also find a lot of information regarding Ethernet, NWay, MII, | 2184 | You will also find a lot of information regarding Ethernet, NWay, MII, |
2185 | etc. at www.scyld.com. | 2185 | etc. at www.scyld.com. |
2186 | 2186 | ||
2187 | -- END -- | 2187 | -- END -- |
2188 | 2188 |
Documentation/networking/cs89x0.txt
1 | 1 | ||
2 | NOTE | 2 | NOTE |
3 | ---- | 3 | ---- |
4 | 4 | ||
5 | This document was contributed by Cirrus Logic for kernel 2.2.5. This version | 5 | This document was contributed by Cirrus Logic for kernel 2.2.5. This version |
6 | has been updated for 2.3.48 by Andrew Morton <andrewm@uow.edu.au> | 6 | has been updated for 2.3.48 by Andrew Morton <andrewm@uow.edu.au> |
7 | 7 | ||
8 | Cirrus make a copy of this driver available at their website, as | 8 | Cirrus make a copy of this driver available at their website, as |
9 | described below. In general, you should use the driver version which | 9 | described below. In general, you should use the driver version which |
10 | comes with your Linux distribution. | 10 | comes with your Linux distribution. |
11 | 11 | ||
12 | 12 | ||
13 | 13 | ||
14 | CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS | 14 | CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS |
15 | Linux Network Interface Driver ver. 2.00 <kernel 2.3.48> | 15 | Linux Network Interface Driver ver. 2.00 <kernel 2.3.48> |
16 | =============================================================================== | 16 | =============================================================================== |
17 | 17 | ||
18 | 18 | ||
19 | TABLE OF CONTENTS | 19 | TABLE OF CONTENTS |
20 | 20 | ||
21 | 1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS | 21 | 1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS |
22 | 1.1 Product Overview | 22 | 1.1 Product Overview |
23 | 1.2 Driver Description | 23 | 1.2 Driver Description |
24 | 1.2.1 Driver Name | 24 | 1.2.1 Driver Name |
25 | 1.2.2 File in the Driver Package | 25 | 1.2.2 File in the Driver Package |
26 | 1.3 System Requirements | 26 | 1.3 System Requirements |
27 | 1.4 Licensing Information | 27 | 1.4 Licensing Information |
28 | 28 | ||
29 | 2.0 ADAPTER INSTALLATION and CONFIGURATION | 29 | 2.0 ADAPTER INSTALLATION and CONFIGURATION |
30 | 2.1 CS8900-based Adapter Configuration | 30 | 2.1 CS8900-based Adapter Configuration |
31 | 2.2 CS8920-based Adapter Configuration | 31 | 2.2 CS8920-based Adapter Configuration |
32 | 32 | ||
33 | 3.0 LOADING THE DRIVER AS A MODULE | 33 | 3.0 LOADING THE DRIVER AS A MODULE |
34 | 34 | ||
35 | 4.0 COMPILING THE DRIVER | 35 | 4.0 COMPILING THE DRIVER |
36 | 4.1 Compiling the Driver as a Loadable Module | 36 | 4.1 Compiling the Driver as a Loadable Module |
37 | 4.2 Compiling the driver to support memory mode | 37 | 4.2 Compiling the driver to support memory mode |
38 | 4.3 Compiling the driver to support Rx DMA | 38 | 4.3 Compiling the driver to support Rx DMA |
39 | 4.4 Compiling the Driver into the Kernel | 39 | 4.4 Compiling the Driver into the Kernel |
40 | 40 | ||
41 | 5.0 TESTING AND TROUBLESHOOTING | 41 | 5.0 TESTING AND TROUBLESHOOTING |
42 | 5.1 Known Defects and Limitations | 42 | 5.1 Known Defects and Limitations |
43 | 5.2 Testing the Adapter | 43 | 5.2 Testing the Adapter |
44 | 5.2.1 Diagnostic Self-Test | 44 | 5.2.1 Diagnostic Self-Test |
45 | 5.2.2 Diagnostic Network Test | 45 | 5.2.2 Diagnostic Network Test |
46 | 5.3 Using the Adapter's LEDs | 46 | 5.3 Using the Adapter's LEDs |
47 | 5.4 Resolving I/O Conflicts | 47 | 5.4 Resolving I/O Conflicts |
48 | 48 | ||
49 | 6.0 TECHNICAL SUPPORT | 49 | 6.0 TECHNICAL SUPPORT |
50 | 6.1 Contacting Cirrus Logic's Technical Support | 50 | 6.1 Contacting Cirrus Logic's Technical Support |
51 | 6.2 Information Required Before Contacting Technical Support | 51 | 6.2 Information Required Before Contacting Technical Support |
52 | 6.3 Obtaining the Latest Driver Version | 52 | 6.3 Obtaining the Latest Driver Version |
53 | 6.4 Current maintainer | 53 | 6.4 Current maintainer |
54 | 6.5 Kernel boot parameters | 54 | 6.5 Kernel boot parameters |
55 | 55 | ||
56 | 56 | ||
57 | 1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS | 57 | 1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS |
58 | =============================================================================== | 58 | =============================================================================== |
59 | 59 | ||
60 | 60 | ||
61 | 1.1 PRODUCT OVERVIEW | 61 | 1.1 PRODUCT OVERVIEW |
62 | 62 | ||
63 | The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow | 63 | The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow |
64 | IEEE 802.3 standards and support half or full-duplex operation in ISA bus | 64 | IEEE 802.3 standards and support half or full-duplex operation in ISA bus |
65 | computers on 10 Mbps Ethernet networks. The adapters are designed for operation | 65 | computers on 10 Mbps Ethernet networks. The adapters are designed for operation |
66 | in 16-bit ISA or EISA bus expansion slots and are available in | 66 | in 16-bit ISA or EISA bus expansion slots and are available in |
67 | 10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5 | 67 | 10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5 |
68 | or fiber networks). | 68 | or fiber networks). |
69 | 69 | ||
70 | CS8920-based adapters are similar to the CS8900-based adapter with additional | 70 | CS8920-based adapters are similar to the CS8900-based adapter with additional |
71 | features for Plug and Play (PnP) support and Wakeup Frame recognition. As | 71 | features for Plug and Play (PnP) support and Wakeup Frame recognition. As |
72 | such, the configuration procedures differ somewhat between the two types of | 72 | such, the configuration procedures differ somewhat between the two types of |
73 | adapters. Refer to the "Adapter Configuration" section for details on | 73 | adapters. Refer to the "Adapter Configuration" section for details on |
74 | configuring both types of adapters. | 74 | configuring both types of adapters. |
75 | 75 | ||
76 | 76 | ||
77 | 1.2 DRIVER DESCRIPTION | 77 | 1.2 DRIVER DESCRIPTION |
78 | 78 | ||
79 | The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux | 79 | The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux |
80 | v2.3.48 or greater kernel. It can be compiled directly into the kernel | 80 | v2.3.48 or greater kernel. It can be compiled directly into the kernel |
81 | or loaded at run-time as a device driver module. | 81 | or loaded at run-time as a device driver module. |
82 | 82 | ||
83 | 1.2.1 Driver Name: cs89x0 | 83 | 1.2.1 Driver Name: cs89x0 |
84 | 84 | ||
85 | 1.2.2 Files in the Driver Archive: | 85 | 1.2.2 Files in the Driver Archive: |
86 | 86 | ||
87 | The files in the driver at Cirrus' website include: | 87 | The files in the driver at Cirrus' website include: |
88 | 88 | ||
89 | readme.txt - this file | 89 | readme.txt - this file |
90 | build - batch file to compile cs89x0.c. | 90 | build - batch file to compile cs89x0.c. |
91 | cs89x0.c - driver C code | 91 | cs89x0.c - driver C code |
92 | cs89x0.h - driver header file | 92 | cs89x0.h - driver header file |
93 | cs89x0.o - pre-compiled module (for v2.2.5 kernel) | 93 | cs89x0.o - pre-compiled module (for v2.2.5 kernel) |
94 | config/Config.in - sample file to include cs89x0 driver in the kernel. | 94 | config/Config.in - sample file to include cs89x0 driver in the kernel. |
95 | config/Makefile - sample file to include cs89x0 driver in the kernel. | 95 | config/Makefile - sample file to include cs89x0 driver in the kernel. |
96 | config/Space.c - sample file to include cs89x0 driver in the kernel. | 96 | config/Space.c - sample file to include cs89x0 driver in the kernel. |
97 | 97 | ||
98 | 98 | ||
99 | 99 | ||
100 | 1.3 SYSTEM REQUIREMENTS | 100 | 1.3 SYSTEM REQUIREMENTS |
101 | 101 | ||
102 | The following hardware is required: | 102 | The following hardware is required: |
103 | 103 | ||
104 | * Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter | 104 | * Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter |
105 | 105 | ||
106 | * IBM or IBM-compatible PC with: | 106 | * IBM or IBM-compatible PC with: |
107 | * An 80386 or higher processor | 107 | * An 80386 or higher processor |
108 | * 16 bytes of contiguous IO space available between 210h - 370h | 108 | * 16 bytes of contiguous IO space available between 210h - 370h |
109 | * One available IRQ (5,10,11,or 12 for the CS8900, 3-7,9-15 for CS8920). | 109 | * One available IRQ (5,10,11,or 12 for the CS8900, 3-7,9-15 for CS8920). |
110 | 110 | ||
111 | * Appropriate cable (and connector for AUI, 10BASE-2) for your network | 111 | * Appropriate cable (and connector for AUI, 10BASE-2) for your network |
112 | topology. | 112 | topology. |
113 | 113 | ||
114 | The following software is required: | 114 | The following software is required: |
115 | 115 | ||
116 | * LINUX kernel version 2.3.48 or higher | 116 | * LINUX kernel version 2.3.48 or higher |
117 | 117 | ||
118 | * CS8900/20 Setup Utility (DOS-based) | 118 | * CS8900/20 Setup Utility (DOS-based) |
119 | 119 | ||
120 | * LINUX kernel sources for your kernel (if compiling into kernel) | 120 | * LINUX kernel sources for your kernel (if compiling into kernel) |
121 | 121 | ||
122 | * GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel | 122 | * GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel |
123 | or a module) | 123 | or a module) |
124 | 124 | ||
125 | 125 | ||
126 | 126 | ||
127 | 1.4 LICENSING INFORMATION | 127 | 1.4 LICENSING INFORMATION |
128 | 128 | ||
129 | This program is free software; you can redistribute it and/or modify it under | 129 | This program is free software; you can redistribute it and/or modify it under |
130 | the terms of the GNU General Public License as published by the Free Software | 130 | the terms of the GNU General Public License as published by the Free Software |
131 | Foundation, version 1. | 131 | Foundation, version 1. |
132 | 132 | ||
133 | This program is distributed in the hope that it will be useful, but WITHOUT | 133 | This program is distributed in the hope that it will be useful, but WITHOUT |
134 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | 134 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
135 | FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | 135 | FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for |
136 | more details. | 136 | more details. |
137 | 137 | ||
138 | For a full copy of the GNU General Public License, write to the Free Software | 138 | For a full copy of the GNU General Public License, write to the Free Software |
139 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 139 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
140 | 140 | ||
141 | 141 | ||
142 | 142 | ||
143 | 2.0 ADAPTER INSTALLATION and CONFIGURATION | 143 | 2.0 ADAPTER INSTALLATION and CONFIGURATION |
144 | =============================================================================== | 144 | =============================================================================== |
145 | 145 | ||
146 | Both the CS8900 and CS8920-based adapters can be configured using parameters | 146 | Both the CS8900 and CS8920-based adapters can be configured using parameters |
147 | stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup | 147 | stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup |
148 | Utility if you want to change the adapter's configuration in EEPROM. | 148 | Utility if you want to change the adapter's configuration in EEPROM. |
149 | 149 | ||
150 | When loading the driver as a module, you can specify many of the adapter's | 150 | When loading the driver as a module, you can specify many of the adapter's |
151 | configuration parameters on the command-line to override the EEPROM's settings | 151 | configuration parameters on the command-line to override the EEPROM's settings |
152 | or for interface configuration when an EEPROM is not used. (CS8920-based | 152 | or for interface configuration when an EEPROM is not used. (CS8920-based |
153 | adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE. | 153 | adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE. |
154 | 154 | ||
155 | Since the CS8900/20 Setup Utility is a DOS-based application, you must install | 155 | Since the CS8900/20 Setup Utility is a DOS-based application, you must install |
156 | and configure the adapter in a DOS-based system using the CS8900/20 Setup | 156 | and configure the adapter in a DOS-based system using the CS8900/20 Setup |
157 | Utility before installation in the target LINUX system. (Not required if | 157 | Utility before installation in the target LINUX system. (Not required if |
158 | installing a CS8900-based adapter and the default configuration is acceptable.) | 158 | installing a CS8900-based adapter and the default configuration is acceptable.) |
159 | 159 | ||
160 | 160 | ||
161 | 2.1 CS8900-BASED ADAPTER CONFIGURATION | 161 | 2.1 CS8900-BASED ADAPTER CONFIGURATION |
162 | 162 | ||
163 | CS8900-based adapters shipped from Cirrus Logic have been configured | 163 | CS8900-based adapters shipped from Cirrus Logic have been configured |
164 | with the following "default" settings: | 164 | with the following "default" settings: |
165 | 165 | ||
166 | Operation Mode: Memory Mode | 166 | Operation Mode: Memory Mode |
167 | IRQ: 10 | 167 | IRQ: 10 |
168 | Base I/O Address: 300 | 168 | Base I/O Address: 300 |
169 | Memory Base Address: D0000 | 169 | Memory Base Address: D0000 |
170 | Optimization: DOS Client | 170 | Optimization: DOS Client |
171 | Transmission Mode: Half-duplex | 171 | Transmission Mode: Half-duplex |
172 | BootProm: None | 172 | BootProm: None |
173 | Media Type: Autodetect (3-media cards) or | 173 | Media Type: Autodetect (3-media cards) or |
174 | 10BASE-T (10BASE-T only adapter) | 174 | 10BASE-T (10BASE-T only adapter) |
175 | 175 | ||
176 | You should only change the default configuration settings if conflicts with | 176 | You should only change the default configuration settings if conflicts with |
177 | another adapter exists. To change the adapter's configuration, run the | 177 | another adapter exists. To change the adapter's configuration, run the |
178 | CS8900/20 Setup Utility. | 178 | CS8900/20 Setup Utility. |
179 | 179 | ||
180 | 180 | ||
181 | 2.2 CS8920-BASED ADAPTER CONFIGURATION | 181 | 2.2 CS8920-BASED ADAPTER CONFIGURATION |
182 | 182 | ||
183 | CS8920-based adapters are shipped from Cirrus Logic configured as Plug | 183 | CS8920-based adapters are shipped from Cirrus Logic configured as Plug |
184 | and Play (PnP) enabled. However, since the cs89x0 driver does NOT | 184 | and Play (PnP) enabled. However, since the cs89x0 driver does NOT |
185 | support PnP, you must install the CS8920 adapter in a DOS-based PC and | 185 | support PnP, you must install the CS8920 adapter in a DOS-based PC and |
186 | run the CS8900/20 Setup Utility to disable PnP and configure the | 186 | run the CS8900/20 Setup Utility to disable PnP and configure the |
187 | adapter before installation in the target Linux system. Failure to do | 187 | adapter before installation in the target Linux system. Failure to do |
188 | this will leave the adapter inactive and the driver will be unable to | 188 | this will leave the adapter inactive and the driver will be unable to |
189 | communicate with the adapter. | 189 | communicate with the adapter. |
190 | 190 | ||
191 | 191 | ||
192 | **************************************************************** | 192 | **************************************************************** |
193 | * CS8920-BASED ADAPTERS: * | 193 | * CS8920-BASED ADAPTERS: * |
194 | * * | 194 | * * |
195 | * CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. * | 195 | * CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. * |
196 | * THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST * | 196 | * THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST * |
197 | * RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND * | 197 | * RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND * |
198 | * TO ACTIVATE THE ADAPTER. * | 198 | * TO ACTIVATE THE ADAPTER. * |
199 | **************************************************************** | 199 | **************************************************************** |
200 | 200 | ||
201 | 201 | ||
202 | 202 | ||
203 | 203 | ||
204 | 3.0 LOADING THE DRIVER AS A MODULE | 204 | 3.0 LOADING THE DRIVER AS A MODULE |
205 | =============================================================================== | 205 | =============================================================================== |
206 | 206 | ||
207 | If the driver is compiled as a loadable module, you can load the driver module | 207 | If the driver is compiled as a loadable module, you can load the driver module |
208 | with the 'modprobe' command. Many of the adapter's configuration parameters can | 208 | with the 'modprobe' command. Many of the adapter's configuration parameters can |
209 | be specified as command-line arguments to the load command. This facility | 209 | be specified as command-line arguments to the load command. This facility |
210 | provides a means to override the EEPROM's settings or for interface | 210 | provides a means to override the EEPROM's settings or for interface |
211 | configuration when an EEPROM is not used. | 211 | configuration when an EEPROM is not used. |
212 | 212 | ||
213 | Example: | 213 | Example: |
214 | 214 | ||
215 | insmod cs89x0.o io=0x200 irq=0xA media=aui | 215 | insmod cs89x0.o io=0x200 irq=0xA media=aui |
216 | 216 | ||
217 | This example loads the module and configures the adapter to use an IO port base | 217 | This example loads the module and configures the adapter to use an IO port base |
218 | address of 200h, interrupt 10, and use the AUI media connection. The following | 218 | address of 200h, interrupt 10, and use the AUI media connection. The following |
219 | configuration options are available on the command line: | 219 | configuration options are available on the command line: |
220 | 220 | ||
221 | * io=### - specify IO address (200h-360h) | 221 | * io=### - specify IO address (200h-360h) |
222 | * irq=## - specify interrupt level | 222 | * irq=## - specify interrupt level |
223 | * use_dma=1 - Enable DMA | 223 | * use_dma=1 - Enable DMA |
224 | * dma=# - specify dma channel (Driver is compiled to support | 224 | * dma=# - specify dma channel (Driver is compiled to support |
225 | Rx DMA only) | 225 | Rx DMA only) |
226 | * dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16. | 226 | * dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16. |
227 | * media=rj45 - specify media type | 227 | * media=rj45 - specify media type |
228 | or media=bnc | 228 | or media=bnc |
229 | or media=aui | 229 | or media=aui |
230 | or media=auto | 230 | or media=auto |
231 | * duplex=full - specify forced half/full/autonegotiate duplex | 231 | * duplex=full - specify forced half/full/autonegotiate duplex |
232 | or duplex=half | 232 | or duplex=half |
233 | or duplex=auto | 233 | or duplex=auto |
234 | * debug=# - debug level (only available if the driver was compiled | 234 | * debug=# - debug level (only available if the driver was compiled |
235 | for debugging) | 235 | for debugging) |
236 | 236 | ||
237 | NOTES: | 237 | NOTES: |
238 | 238 | ||
239 | a) If an EEPROM is present, any specified command-line parameter | 239 | a) If an EEPROM is present, any specified command-line parameter |
240 | will override the corresponding configuration value stored in | 240 | will override the corresponding configuration value stored in |
241 | EEPROM. | 241 | EEPROM. |
242 | 242 | ||
243 | b) The "io" parameter must be specified on the command-line. | 243 | b) The "io" parameter must be specified on the command-line. |
244 | 244 | ||
245 | c) The driver's hardware probe routine is designed to avoid | 245 | c) The driver's hardware probe routine is designed to avoid |
246 | writing to I/O space until it knows that there is a cs89x0 | 246 | writing to I/O space until it knows that there is a cs89x0 |
247 | card at the written addresses. This could cause problems | 247 | card at the written addresses. This could cause problems |
248 | with device probing. To avoid this behaviour, add one | 248 | with device probing. To avoid this behaviour, add one |
249 | to the `io=' module parameter. This doesn't actually change | 249 | to the `io=' module parameter. This doesn't actually change |
250 | the I/O address, but it is a flag to tell the driver | 250 | the I/O address, but it is a flag to tell the driver |
251 | topartially initialise the hardware before trying to | 251 | topartially initialise the hardware before trying to |
252 | identify the card. This could be dangerous if you are | 252 | identify the card. This could be dangerous if you are |
253 | not sure that there is a cs89x0 card at the provided address. | 253 | not sure that there is a cs89x0 card at the provided address. |
254 | 254 | ||
255 | For example, to scan for an adapter located at IO base 0x300, | 255 | For example, to scan for an adapter located at IO base 0x300, |
256 | specify an IO address of 0x301. | 256 | specify an IO address of 0x301. |
257 | 257 | ||
258 | d) The "duplex=auto" parameter is only supported for the CS8920. | 258 | d) The "duplex=auto" parameter is only supported for the CS8920. |
259 | 259 | ||
260 | e) The minimum command-line configuration required if an EEPROM is | 260 | e) The minimum command-line configuration required if an EEPROM is |
261 | not present is: | 261 | not present is: |
262 | 262 | ||
263 | io | 263 | io |
264 | irq | 264 | irq |
265 | media type (no autodetect) | 265 | media type (no autodetect) |
266 | 266 | ||
267 | f) The following additional parameters are CS89XX defaults (values | 267 | f) The following additional parameters are CS89XX defaults (values |
268 | used with no EEPROM or command-line argument). | 268 | used with no EEPROM or command-line argument). |
269 | 269 | ||
270 | * DMA Burst = enabled | 270 | * DMA Burst = enabled |
271 | * IOCHRDY Enabled = enabled | 271 | * IOCHRDY Enabled = enabled |
272 | * UseSA = enabled | 272 | * UseSA = enabled |
273 | * CS8900 defaults to half-duplex if not specified on command-line | 273 | * CS8900 defaults to half-duplex if not specified on command-line |
274 | * CS8920 defaults to autoneg if not specified on command-line | 274 | * CS8920 defaults to autoneg if not specified on command-line |
275 | * Use reset defaults for other config parameters | 275 | * Use reset defaults for other config parameters |
276 | * dma_mode = 0 | 276 | * dma_mode = 0 |
277 | 277 | ||
278 | g) You can use ifconfig to set the adapter's Ethernet address. | 278 | g) You can use ifconfig to set the adapter's Ethernet address. |
279 | 279 | ||
280 | h) Many Linux distributions use the 'modprobe' command to load | 280 | h) Many Linux distributions use the 'modprobe' command to load |
281 | modules. This program uses the '/etc/conf.modules' file to | 281 | modules. This program uses the '/etc/conf.modules' file to |
282 | determine configuration information which is passed to a driver | 282 | determine configuration information which is passed to a driver |
283 | module when it is loaded. All the configuration options which are | 283 | module when it is loaded. All the configuration options which are |
284 | described above may be placed within /etc/conf.modules. | 284 | described above may be placed within /etc/conf.modules. |
285 | 285 | ||
286 | For example: | 286 | For example: |
287 | 287 | ||
288 | > cat /etc/conf.modules | 288 | > cat /etc/conf.modules |
289 | ... | 289 | ... |
290 | alias eth0 cs89x0 | 290 | alias eth0 cs89x0 |
291 | options cs89x0 io=0x0200 dma=5 use_dma=1 | 291 | options cs89x0 io=0x0200 dma=5 use_dma=1 |
292 | ... | 292 | ... |
293 | 293 | ||
294 | In this example we are telling the module system that the | 294 | In this example we are telling the module system that the |
295 | ethernet driver for this machine should use the cs89x0 driver. We | 295 | ethernet driver for this machine should use the cs89x0 driver. We |
296 | are asking 'modprobe' to pass the 'io', 'dma' and 'use_dma' | 296 | are asking 'modprobe' to pass the 'io', 'dma' and 'use_dma' |
297 | arguments to the driver when it is loaded. | 297 | arguments to the driver when it is loaded. |
298 | 298 | ||
299 | i) Cirrus recommend that the cs89x0 use the ISA DMA channels 5, 6 or | 299 | i) Cirrus recommend that the cs89x0 use the ISA DMA channels 5, 6 or |
300 | 7. You will probably find that other DMA channels will not work. | 300 | 7. You will probably find that other DMA channels will not work. |
301 | 301 | ||
302 | j) The cs89x0 supports DMA for receiving only. DMA mode is | 302 | j) The cs89x0 supports DMA for receiving only. DMA mode is |
303 | significantly more efficient. Flooding a 400 MHz Celeron machine | 303 | significantly more efficient. Flooding a 400 MHz Celeron machine |
304 | with large ping packets consumes 82% of its CPU capacity in non-DMA | 304 | with large ping packets consumes 82% of its CPU capacity in non-DMA |
305 | mode. With DMA this is reduced to 45%. | 305 | mode. With DMA this is reduced to 45%. |
306 | 306 | ||
307 | k) If your Linux kernel was compiled with inbuilt plug-and-play | 307 | k) If your Linux kernel was compiled with inbuilt plug-and-play |
308 | support you will be able to find information about the cs89x0 card | 308 | support you will be able to find information about the cs89x0 card |
309 | with the command | 309 | with the command |
310 | 310 | ||
311 | cat /proc/isapnp | 311 | cat /proc/isapnp |
312 | 312 | ||
313 | l) If during DMA operation you find erratic behavior or network data | 313 | l) If during DMA operation you find erratic behavior or network data |
314 | corruption you should use your PC's BIOS to slow the EISA bus clock. | 314 | corruption you should use your PC's BIOS to slow the EISA bus clock. |
315 | 315 | ||
316 | m) If the cs89x0 driver is compiled directly into the kernel | 316 | m) If the cs89x0 driver is compiled directly into the kernel |
317 | (non-modular) then its I/O address is automatically determined by | 317 | (non-modular) then its I/O address is automatically determined by |
318 | ISA bus probing. The IRQ number, media options, etc are determined | 318 | ISA bus probing. The IRQ number, media options, etc are determined |
319 | from the card's EEPROM. | 319 | from the card's EEPROM. |
320 | 320 | ||
321 | n) If the cs89x0 driver is compiled directly into the kernel, DMA | 321 | n) If the cs89x0 driver is compiled directly into the kernel, DMA |
322 | mode may be selected by providing the kernel with a boot option | 322 | mode may be selected by providing the kernel with a boot option |
323 | 'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7). | 323 | 'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7). |
324 | 324 | ||
325 | Kernel boot options may be provided on the LILO command line: | 325 | Kernel boot options may be provided on the LILO command line: |
326 | 326 | ||
327 | LILO boot: linux cs89x0_dma=5 | 327 | LILO boot: linux cs89x0_dma=5 |
328 | 328 | ||
329 | or they may be placed in /etc/lilo.conf: | 329 | or they may be placed in /etc/lilo.conf: |
330 | 330 | ||
331 | image=/boot/bzImage-2.3.48 | 331 | image=/boot/bzImage-2.3.48 |
332 | append="cs89x0_dma=5" | 332 | append="cs89x0_dma=5" |
333 | label=linux | 333 | label=linux |
334 | root=/dev/hda5 | 334 | root=/dev/hda5 |
335 | read-only | 335 | read-only |
336 | 336 | ||
337 | The DMA Rx buffer size is hardwired to 16 kbytes in this mode. | 337 | The DMA Rx buffer size is hardwired to 16 kbytes in this mode. |
338 | (64k mode is not available). | 338 | (64k mode is not available). |
339 | 339 | ||
340 | 340 | ||
341 | 4.0 COMPILING THE DRIVER | 341 | 4.0 COMPILING THE DRIVER |
342 | =============================================================================== | 342 | =============================================================================== |
343 | 343 | ||
344 | The cs89x0 driver can be compiled directly into the kernel or compiled into | 344 | The cs89x0 driver can be compiled directly into the kernel or compiled into |
345 | a loadable device driver module. | 345 | a loadable device driver module. |
346 | 346 | ||
347 | 347 | ||
348 | 4.1 COMPILING THE DRIVER AS A LOADABLE MODULE | 348 | 4.1 COMPILING THE DRIVER AS A LOADABLE MODULE |
349 | 349 | ||
350 | To compile the driver into a loadable module, use the following command | 350 | To compile the driver into a loadable module, use the following command |
351 | (single command line, without quotes): | 351 | (single command line, without quotes): |
352 | 352 | ||
353 | "gcc -D__KERNEL__ -I/usr/src/linux/include -I/usr/src/linux/net/inet -Wall | 353 | "gcc -D__KERNEL__ -I/usr/src/linux/include -I/usr/src/linux/net/inet -Wall |
354 | -Wstrict-prototypes -O2 -fomit-frame-pointer -DMODULE -DCONFIG_MODVERSIONS | 354 | -Wstrict-prototypes -O2 -fomit-frame-pointer -DMODULE -DCONFIG_MODVERSIONS |
355 | -c cs89x0.c" | 355 | -c cs89x0.c" |
356 | 356 | ||
357 | 4.2 COMPILING THE DRIVER TO SUPPORT MEMORY MODE | 357 | 4.2 COMPILING THE DRIVER TO SUPPORT MEMORY MODE |
358 | 358 | ||
359 | Support for memory mode was not carried over into the 2.3 series kernels. | 359 | Support for memory mode was not carried over into the 2.3 series kernels. |
360 | 360 | ||
361 | 4.3 COMPILING THE DRIVER TO SUPPORT Rx DMA | 361 | 4.3 COMPILING THE DRIVER TO SUPPORT Rx DMA |
362 | 362 | ||
363 | The compile-time optionality for DMA was removed in the 2.3 kernel | 363 | The compile-time optionality for DMA was removed in the 2.3 kernel |
364 | series. DMA support is now unconditionally part of the driver. It is | 364 | series. DMA support is now unconditionally part of the driver. It is |
365 | enabled by the 'use_dma=1' module option. | 365 | enabled by the 'use_dma=1' module option. |
366 | 366 | ||
367 | 4.4 COMPILING THE DRIVER INTO THE KERNEL | 367 | 4.4 COMPILING THE DRIVER INTO THE KERNEL |
368 | 368 | ||
369 | If your Linux distribution already has support for the cs89x0 driver | 369 | If your Linux distribution already has support for the cs89x0 driver |
370 | then simply copy the source file to the /usr/src/linux/drivers/net | 370 | then simply copy the source file to the /usr/src/linux/drivers/net |
371 | directory to replace the original ones and run the make utility to | 371 | directory to replace the original ones and run the make utility to |
372 | rebuild the kernel. See Step 3 for rebuilding the kernel. | 372 | rebuild the kernel. See Step 3 for rebuilding the kernel. |
373 | 373 | ||
374 | If your Linux does not include the cs89x0 driver, you need to edit three | 374 | If your Linux does not include the cs89x0 driver, you need to edit three |
375 | configuration files, copy the source file to the /usr/src/linux/drivers/net | 375 | configuration files, copy the source file to the /usr/src/linux/drivers/net |
376 | directory, and then run the make utility to rebuild the kernel. | 376 | directory, and then run the make utility to rebuild the kernel. |
377 | 377 | ||
378 | 1. Edit the following configuration files by adding the statements as | 378 | 1. Edit the following configuration files by adding the statements as |
379 | indicated. (When possible, try to locate the added text to the section of the | 379 | indicated. (When possible, try to locate the added text to the section of the |
380 | file containing similar statements). | 380 | file containing similar statements). |
381 | 381 | ||
382 | 382 | ||
383 | a.) In /usr/src/linux/drivers/net/Config.in, add: | 383 | a.) In /usr/src/linux/drivers/net/Config.in, add: |
384 | 384 | ||
385 | tristate 'CS89x0 support' CONFIG_CS89x0 | 385 | tristate 'CS89x0 support' CONFIG_CS89x0 |
386 | 386 | ||
387 | Example: | 387 | Example: |
388 | 388 | ||
389 | if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then | 389 | if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then |
390 | tristate 'ICL EtherTeam 16i/32 support' CONFIG_ETH16I | 390 | tristate 'ICL EtherTeam 16i/32 support' CONFIG_ETH16I |
391 | fi | 391 | fi |
392 | 392 | ||
393 | tristate 'CS89x0 support' CONFIG_CS89x0 | 393 | tristate 'CS89x0 support' CONFIG_CS89x0 |
394 | 394 | ||
395 | tristate 'NE2000/NE1000 support' CONFIG_NE2000 | 395 | tristate 'NE2000/NE1000 support' CONFIG_NE2000 |
396 | if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then | 396 | if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then |
397 | tristate 'NI5210 support' CONFIG_NI52 | 397 | tristate 'NI5210 support' CONFIG_NI52 |
398 | 398 | ||
399 | 399 | ||
400 | b.) In /usr/src/linux/drivers/net/Makefile, add the following lines: | 400 | b.) In /usr/src/linux/drivers/net/Makefile, add the following lines: |
401 | 401 | ||
402 | ifeq ($(CONFIG_CS89x0),y) | 402 | ifeq ($(CONFIG_CS89x0),y) |
403 | L_OBJS += cs89x0.o | 403 | L_OBJS += cs89x0.o |
404 | else | 404 | else |
405 | ifeq ($(CONFIG_CS89x0),m) | 405 | ifeq ($(CONFIG_CS89x0),m) |
406 | M_OBJS += cs89x0.o | 406 | M_OBJS += cs89x0.o |
407 | endif | 407 | endif |
408 | endif | 408 | endif |
409 | 409 | ||
410 | 410 | ||
411 | c.) In /linux/drivers/net/Space.c file, add the line: | 411 | c.) In /linux/drivers/net/Space.c file, add the line: |
412 | 412 | ||
413 | extern int cs89x0_probe(struct device *dev); | 413 | extern int cs89x0_probe(struct device *dev); |
414 | 414 | ||
415 | 415 | ||
416 | Example: | 416 | Example: |
417 | 417 | ||
418 | extern int ultra_probe(struct device *dev); | 418 | extern int ultra_probe(struct device *dev); |
419 | extern int wd_probe(struct device *dev); | 419 | extern int wd_probe(struct device *dev); |
420 | extern int el2_probe(struct device *dev); | 420 | extern int el2_probe(struct device *dev); |
421 | 421 | ||
422 | extern int cs89x0_probe(struct device *dev); | 422 | extern int cs89x0_probe(struct device *dev); |
423 | 423 | ||
424 | extern int ne_probe(struct device *dev); | 424 | extern int ne_probe(struct device *dev); |
425 | extern int hp_probe(struct device *dev); | 425 | extern int hp_probe(struct device *dev); |
426 | extern int hp_plus_probe(struct device *dev); | 426 | extern int hp_plus_probe(struct device *dev); |
427 | 427 | ||
428 | 428 | ||
429 | Also add: | 429 | Also add: |
430 | 430 | ||
431 | #ifdef CONFIG_CS89x0 | 431 | #ifdef CONFIG_CS89x0 |
432 | { cs89x0_probe,0 }, | 432 | { cs89x0_probe,0 }, |
433 | #endif | 433 | #endif |
434 | 434 | ||
435 | 435 | ||
436 | 2.) Copy the driver source files (cs89x0.c and cs89x0.h) | 436 | 2.) Copy the driver source files (cs89x0.c and cs89x0.h) |
437 | into the /usr/src/linux/drivers/net directory. | 437 | into the /usr/src/linux/drivers/net directory. |
438 | 438 | ||
439 | 439 | ||
440 | 3.) Go to /usr/src/linux directory and run 'make config' followed by 'make' | 440 | 3.) Go to /usr/src/linux directory and run 'make config' followed by 'make' |
441 | (or make bzImage) to rebuild the kernel. | 441 | (or make bzImage) to rebuild the kernel. |
442 | 442 | ||
443 | 4.) Use the DOS 'setup' utility to disable plug and play on the NIC. | 443 | 4.) Use the DOS 'setup' utility to disable plug and play on the NIC. |
444 | 444 | ||
445 | 445 | ||
446 | 5.0 TESTING AND TROUBLESHOOTING | 446 | 5.0 TESTING AND TROUBLESHOOTING |
447 | =============================================================================== | 447 | =============================================================================== |
448 | 448 | ||
449 | 5.1 KNOWN DEFECTS and LIMITATIONS | 449 | 5.1 KNOWN DEFECTS and LIMITATIONS |
450 | 450 | ||
451 | Refer to the RELEASE.TXT file distributed as part of this archive for a list of | 451 | Refer to the RELEASE.TXT file distributed as part of this archive for a list of |
452 | known defects, driver limitations, and work arounds. | 452 | known defects, driver limitations, and work arounds. |
453 | 453 | ||
454 | 454 | ||
455 | 5.2 TESTING THE ADAPTER | 455 | 5.2 TESTING THE ADAPTER |
456 | 456 | ||
457 | Once the adapter has been installed and configured, the diagnostic option of | 457 | Once the adapter has been installed and configured, the diagnostic option of |
458 | the CS8900/20 Setup Utility can be used to test the functionality of the | 458 | the CS8900/20 Setup Utility can be used to test the functionality of the |
459 | adapter and its network connection. Use the diagnostics 'Self Test' option to | 459 | adapter and its network connection. Use the diagnostics 'Self Test' option to |
460 | test the functionality of the adapter with the hardware configuration you have | 460 | test the functionality of the adapter with the hardware configuration you have |
461 | assigned. You can use the diagnostics 'Network Test' to test the ability of the | 461 | assigned. You can use the diagnostics 'Network Test' to test the ability of the |
462 | adapter to communicate across the Ethernet with another PC equipped with a | 462 | adapter to communicate across the Ethernet with another PC equipped with a |
463 | CS8900/20-based adapter card (it must also be running the CS8900/20 Setup | 463 | CS8900/20-based adapter card (it must also be running the CS8900/20 Setup |
464 | Utility). | 464 | Utility). |
465 | 465 | ||
466 | NOTE: The Setup Utility's diagnostics are designed to run in a | 466 | NOTE: The Setup Utility's diagnostics are designed to run in a |
467 | DOS-only operating system environment. DO NOT run the diagnostics | 467 | DOS-only operating system environment. DO NOT run the diagnostics |
468 | from a DOS or command prompt session under Windows 95, Windows NT, | 468 | from a DOS or command prompt session under Windows 95, Windows NT, |
469 | OS/2, or other operating system. | 469 | OS/2, or other operating system. |
470 | 470 | ||
471 | To run the diagnostics tests on the CS8900/20 adapter: | 471 | To run the diagnostics tests on the CS8900/20 adapter: |
472 | 472 | ||
473 | 1.) Boot DOS on the PC and start the CS8900/20 Setup Utility. | 473 | 1.) Boot DOS on the PC and start the CS8900/20 Setup Utility. |
474 | 474 | ||
475 | 2.) The adapter's current configuration is displayed. Hit the ENTER key to | 475 | 2.) The adapter's current configuration is displayed. Hit the ENTER key to |
476 | get to the main menu. | 476 | get to the main menu. |
477 | 477 | ||
478 | 4.) Select 'Diagnostics' (ALT-G) from the main menu. | 478 | 4.) Select 'Diagnostics' (ALT-G) from the main menu. |
479 | * Select 'Self-Test' to test the adapter's basic functionality. | 479 | * Select 'Self-Test' to test the adapter's basic functionality. |
480 | * Select 'Network Test' to test the network connection and cabling. | 480 | * Select 'Network Test' to test the network connection and cabling. |
481 | 481 | ||
482 | 482 | ||
483 | 5.2.1 DIAGNOSTIC SELF-TEST | 483 | 5.2.1 DIAGNOSTIC SELF-TEST |
484 | 484 | ||
485 | The diagnostic self-test checks the adapter's basic functionality as well as | 485 | The diagnostic self-test checks the adapter's basic functionality as well as |
486 | its ability to communicate across the ISA bus based on the system resources | 486 | its ability to communicate across the ISA bus based on the system resources |
487 | assigned during hardware configuration. The following tests are performed: | 487 | assigned during hardware configuration. The following tests are performed: |
488 | 488 | ||
489 | * IO Register Read/Write Test | 489 | * IO Register Read/Write Test |
490 | The IO Register Read/Write test insures that the CS8900/20 can be | 490 | The IO Register Read/Write test insures that the CS8900/20 can be |
491 | accessed in IO mode, and that the IO base address is correct. | 491 | accessed in IO mode, and that the IO base address is correct. |
492 | 492 | ||
493 | * Shared Memory Test | 493 | * Shared Memory Test |
494 | The Shared Memory test insures the CS8900/20 can be accessed in memory | 494 | The Shared Memory test insures the CS8900/20 can be accessed in memory |
495 | mode and that the range of memory addresses assigned does not conflict | 495 | mode and that the range of memory addresses assigned does not conflict |
496 | with other devices in the system. | 496 | with other devices in the system. |
497 | 497 | ||
498 | * Interrupt Test | 498 | * Interrupt Test |
499 | The Interrupt test insures there are no conflicts with the assigned IRQ | 499 | The Interrupt test insures there are no conflicts with the assigned IRQ |
500 | signal. | 500 | signal. |
501 | 501 | ||
502 | * EEPROM Test | 502 | * EEPROM Test |
503 | The EEPROM test insures the EEPROM can be read. | 503 | The EEPROM test insures the EEPROM can be read. |
504 | 504 | ||
505 | * Chip RAM Test | 505 | * Chip RAM Test |
506 | The Chip RAM test insures the 4K of memory internal to the CS8900/20 is | 506 | The Chip RAM test insures the 4K of memory internal to the CS8900/20 is |
507 | working properly. | 507 | working properly. |
508 | 508 | ||
509 | * Internal Loop-back Test | 509 | * Internal Loop-back Test |
510 | The Internal Loop Back test insures the adapter's transmitter and | 510 | The Internal Loop Back test insures the adapter's transmitter and |
511 | receiver are operating properly. If this test fails, make sure the | 511 | receiver are operating properly. If this test fails, make sure the |
512 | adapter's cable is connected to the network (check for LED activity for | 512 | adapter's cable is connected to the network (check for LED activity for |
513 | example). | 513 | example). |
514 | 514 | ||
515 | * Boot PROM Test | 515 | * Boot PROM Test |
516 | The Boot PROM test insures the Boot PROM is present, and can be read. | 516 | The Boot PROM test insures the Boot PROM is present, and can be read. |
517 | Failure indicates the Boot PROM was not successfully read due to a | 517 | Failure indicates the Boot PROM was not successfully read due to a |
518 | hardware problem or due to a conflicts on the Boot PROM address | 518 | hardware problem or due to a conflicts on the Boot PROM address |
519 | assignment. (Test only applies if the adapter is configured to use the | 519 | assignment. (Test only applies if the adapter is configured to use the |
520 | Boot PROM option.) | 520 | Boot PROM option.) |
521 | 521 | ||
522 | Failure of a test item indicates a possible system resource conflict with | 522 | Failure of a test item indicates a possible system resource conflict with |
523 | another device on the ISA bus. In this case, you should use the Manual Setup | 523 | another device on the ISA bus. In this case, you should use the Manual Setup |
524 | option to reconfigure the adapter by selecting a different value for the system | 524 | option to reconfigure the adapter by selecting a different value for the system |
525 | resource that failed. | 525 | resource that failed. |
526 | 526 | ||
527 | 527 | ||
528 | 5.2.2 DIAGNOSTIC NETWORK TEST | 528 | 5.2.2 DIAGNOSTIC NETWORK TEST |
529 | 529 | ||
530 | The Diagnostic Network Test verifies a working network connection by | 530 | The Diagnostic Network Test verifies a working network connection by |
531 | transferring data between two CS8900/20 adapters installed in different PCs | 531 | transferring data between two CS8900/20 adapters installed in different PCs |
532 | on the same network. (Note: the diagnostic network test should not be run | 532 | on the same network. (Note: the diagnostic network test should not be run |
533 | between two nodes across a router.) | 533 | between two nodes across a router.) |
534 | 534 | ||
535 | This test requires that each of the two PCs have a CS8900/20-based adapter | 535 | This test requires that each of the two PCs have a CS8900/20-based adapter |
536 | installed and have the CS8900/20 Setup Utility running. The first PC is | 536 | installed and have the CS8900/20 Setup Utility running. The first PC is |
537 | configured as a Responder and the other PC is configured as an Initiator. | 537 | configured as a Responder and the other PC is configured as an Initiator. |
538 | Once the Initiator is started, it sends data frames to the Responder which | 538 | Once the Initiator is started, it sends data frames to the Responder which |
539 | returns the frames to the Initiator. | 539 | returns the frames to the Initiator. |
540 | 540 | ||
541 | The total number of frames received and transmitted are displayed on the | 541 | The total number of frames received and transmitted are displayed on the |
542 | Initiator's display, along with a count of the number of frames received and | 542 | Initiator's display, along with a count of the number of frames received and |
543 | transmitted OK or in error. The test can be terminated anytime by the user at | 543 | transmitted OK or in error. The test can be terminated anytime by the user at |
544 | either PC. | 544 | either PC. |
545 | 545 | ||
546 | To setup the Diagnostic Network Test: | 546 | To setup the Diagnostic Network Test: |
547 | 547 | ||
548 | 1.) Select a PC with a CS8900/20-based adapter and a known working network | 548 | 1.) Select a PC with a CS8900/20-based adapter and a known working network |
549 | connection to act as the Responder. Run the CS8900/20 Setup Utility | 549 | connection to act as the Responder. Run the CS8900/20 Setup Utility |
550 | and select 'Diagnostics -> Network Test -> Responder' from the main | 550 | and select 'Diagnostics -> Network Test -> Responder' from the main |
551 | menu. Hit ENTER to start the Responder. | 551 | menu. Hit ENTER to start the Responder. |
552 | 552 | ||
553 | 2.) Return to the PC with the CS8900/20-based adapter you want to test and | 553 | 2.) Return to the PC with the CS8900/20-based adapter you want to test and |
554 | start the CS8900/20 Setup Utility. | 554 | start the CS8900/20 Setup Utility. |
555 | 555 | ||
556 | 3.) From the main menu, Select 'Diagnostic -> Network Test -> Initiator'. | 556 | 3.) From the main menu, Select 'Diagnostic -> Network Test -> Initiator'. |
557 | Hit ENTER to start the test. | 557 | Hit ENTER to start the test. |
558 | 558 | ||
559 | You may stop the test on the Initiator at any time while allowing the Responder | 559 | You may stop the test on the Initiator at any time while allowing the Responder |
560 | to continue running. In this manner, you can move to additional PCs and test | 560 | to continue running. In this manner, you can move to additional PCs and test |
561 | them by starting the Initiator on another PC without having to stop/start the | 561 | them by starting the Initiator on another PC without having to stop/start the |
562 | Responder. | 562 | Responder. |
563 | 563 | ||
564 | 564 | ||
565 | 565 | ||
566 | 5.3 USING THE ADAPTER'S LEDs | 566 | 5.3 USING THE ADAPTER'S LEDs |
567 | 567 | ||
568 | The 2 and 3-media adapters have two LEDs visible on the back end of the board | 568 | The 2 and 3-media adapters have two LEDs visible on the back end of the board |
569 | located near the 10Base-T connector. | 569 | located near the 10Base-T connector. |
570 | 570 | ||
571 | Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T | 571 | Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T |
572 | connection. (Only applies to 10Base-T. The green LED has no significance for | 572 | connection. (Only applies to 10Base-T. The green LED has no significance for |
573 | a 10Base-2 or AUI connection.) | 573 | a 10Base-2 or AUI connection.) |
574 | 574 | ||
575 | TX/RX LED: The yellow LED lights briefly each time the adapter transmits or | 575 | TX/RX LED: The yellow LED lights briefly each time the adapter transmits or |
576 | receives data. (The yellow LED will appear to "flicker" on a typical network.) | 576 | receives data. (The yellow LED will appear to "flicker" on a typical network.) |
577 | 577 | ||
578 | 578 | ||
579 | 5.4 RESOLVING I/O CONFLICTS | 579 | 5.4 RESOLVING I/O CONFLICTS |
580 | 580 | ||
581 | An IO conflict occurs when two or more adapter use the same ISA resource (IO | 581 | An IO conflict occurs when two or more adapter use the same ISA resource (IO |
582 | address, memory address or IRQ). You can usually detect an IO conflict in one | 582 | address, memory address or IRQ). You can usually detect an IO conflict in one |
583 | of four ways after installing and or configuring the CS8900/20-based adapter: | 583 | of four ways after installing and or configuring the CS8900/20-based adapter: |
584 | 584 | ||
585 | 1.) The system does not boot properly (or at all). | 585 | 1.) The system does not boot properly (or at all). |
586 | 586 | ||
587 | 2.) The driver cannot communicate with the adapter, reporting an "Adapter | 587 | 2.) The driver cannot communicate with the adapter, reporting an "Adapter |
588 | not found" error message. | 588 | not found" error message. |
589 | 589 | ||
590 | 3.) You cannot connect to the network or the driver will not load. | 590 | 3.) You cannot connect to the network or the driver will not load. |
591 | 591 | ||
592 | 4.) If you have configured the adapter to run in memory mode but the driver | 592 | 4.) If you have configured the adapter to run in memory mode but the driver |
593 | reports it is using IO mode when loading, this is an indication of a | 593 | reports it is using IO mode when loading, this is an indication of a |
594 | memory address conflict. | 594 | memory address conflict. |
595 | 595 | ||
596 | If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a | 596 | If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a |
597 | diagnostic self-test. Normally, the ISA resource in conflict will fail the | 597 | diagnostic self-test. Normally, the ISA resource in conflict will fail the |
598 | self-test. If so, reconfigure the adapter selecting another choice for the | 598 | self-test. If so, reconfigure the adapter selecting another choice for the |
599 | resource in conflict. Run the diagnostics again to check for further IO | 599 | resource in conflict. Run the diagnostics again to check for further IO |
600 | conflicts. | 600 | conflicts. |
601 | 601 | ||
602 | In some cases, such as when the PC will not boot, it may be necessary to remove | 602 | In some cases, such as when the PC will not boot, it may be necessary to remove |
603 | the adapter and reconfigure it by installing it in another PC to run the | 603 | the adapter and reconfigure it by installing it in another PC to run the |
604 | CS8900/20 Setup Utility. Once reinstalled in the target system, run the | 604 | CS8900/20 Setup Utility. Once reinstalled in the target system, run the |
605 | diagnostics self-test to ensure the new configuration is free of conflicts | 605 | diagnostics self-test to ensure the new configuration is free of conflicts |
606 | before loading the driver again. | 606 | before loading the driver again. |
607 | 607 | ||
608 | When manually configuring the adapter, keep in mind the typical ISA system | 608 | When manually configuring the adapter, keep in mind the typical ISA system |
609 | resource usage as indicated in the tables below. | 609 | resource usage as indicated in the tables below. |
610 | 610 | ||
611 | I/O Address Device IRQ Device | 611 | I/O Address Device IRQ Device |
612 | ----------- -------- --- -------- | 612 | ----------- -------- --- -------- |
613 | 200-20F Game I/O adapter 3 COM2, Bus Mouse | 613 | 200-20F Game I/O adapter 3 COM2, Bus Mouse |
614 | 230-23F Bus Mouse 4 COM1 | 614 | 230-23F Bus Mouse 4 COM1 |
615 | 270-27F LPT3: third parallel port 5 LPT2 | 615 | 270-27F LPT3: third parallel port 5 LPT2 |
616 | 2F0-2FF COM2: second serial port 6 Floppy Disk controller | 616 | 2F0-2FF COM2: second serial port 6 Floppy Disk controller |
617 | 320-32F Fixed disk controller 7 LPT1 | 617 | 320-32F Fixed disk controller 7 LPT1 |
618 | 8 Real-time Clock | 618 | 8 Real-time Clock |
619 | 9 EGA/VGA display adapter | 619 | 9 EGA/VGA display adapter |
620 | 12 Mouse (PS/2) | 620 | 12 Mouse (PS/2) |
621 | Memory Address Device 13 Math Coprocessor | 621 | Memory Address Device 13 Math Coprocessor |
622 | -------------- --------------------- 14 Hard Disk controller | 622 | -------------- --------------------- 14 Hard Disk controller |
623 | A000-BFFF EGA Graphics Adpater | 623 | A000-BFFF EGA Graphics Adpater |
624 | A000-C7FF VGA Graphics Adpater | 624 | A000-C7FF VGA Graphics Adpater |
625 | B000-BFFF Mono Graphics Adapter | 625 | B000-BFFF Mono Graphics Adapter |
626 | B800-BFFF Color Graphics Adapter | 626 | B800-BFFF Color Graphics Adapter |
627 | E000-FFFF AT BIOS | 627 | E000-FFFF AT BIOS |
628 | 628 | ||
629 | 629 | ||
630 | 630 | ||
631 | 631 | ||
632 | 6.0 TECHNICAL SUPPORT | 632 | 6.0 TECHNICAL SUPPORT |
633 | =============================================================================== | 633 | =============================================================================== |
634 | 634 | ||
635 | 6.1 CONTACTING CIRRUS LOGIC'S TECHNICAL SUPPORT | 635 | 6.1 CONTACTING CIRRUS LOGIC'S TECHNICAL SUPPORT |
636 | 636 | ||
637 | Cirrus Logic's CS89XX Technical Application Support can be reached at: | 637 | Cirrus Logic's CS89XX Technical Application Support can be reached at: |
638 | 638 | ||
639 | Telephone :(800) 888-5016 (from inside U.S. and Canada) | 639 | Telephone :(800) 888-5016 (from inside U.S. and Canada) |
640 | :(512) 442-7555 (from outside the U.S. and Canada) | 640 | :(512) 442-7555 (from outside the U.S. and Canada) |
641 | Fax :(512) 912-3871 | 641 | Fax :(512) 912-3871 |
642 | Email :ethernet@crystal.cirrus.com | 642 | Email :ethernet@crystal.cirrus.com |
643 | WWW :http://www.cirrus.com | 643 | WWW :http://www.cirrus.com |
644 | 644 | ||
645 | 645 | ||
646 | 6.2 INFORMATION REQUIRED BEFORE CONTACTING TECHNICAL SUPPORT | 646 | 6.2 INFORMATION REQUIRED BEFORE CONTACTING TECHNICAL SUPPORT |
647 | 647 | ||
648 | Before contacting Cirrus Logic for technical support, be prepared to provide as | 648 | Before contacting Cirrus Logic for technical support, be prepared to provide as |
649 | Much of the following information as possible. | 649 | Much of the following information as possible. |
650 | 650 | ||
651 | 1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.) | 651 | 1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.) |
652 | 652 | ||
653 | 2.) Adapter configuration | 653 | 2.) Adapter configuration |
654 | 654 | ||
655 | * IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel | 655 | * IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel |
656 | * Plug and Play enabled/disabled (CS8920-based adapters only) | 656 | * Plug and Play enabled/disabled (CS8920-based adapters only) |
657 | * Configured for media auto-detect or specific media type (which type). | 657 | * Configured for media auto-detect or specific media type (which type). |
658 | 658 | ||
659 | 3.) PC System's Configuration | 659 | 3.) PC System's Configuration |
660 | 660 | ||
661 | * Plug and Play system (yes/no) | 661 | * Plug and Play system (yes/no) |
662 | * BIOS (make and version) | 662 | * BIOS (make and version) |
663 | * System make and model | 663 | * System make and model |
664 | * CPU (type and speed) | 664 | * CPU (type and speed) |
665 | * System RAM | 665 | * System RAM |
666 | * SCSI Adapter | 666 | * SCSI Adapter |
667 | 667 | ||
668 | 4.) Software | 668 | 4.) Software |
669 | 669 | ||
670 | * CS89XX driver and version | 670 | * CS89XX driver and version |
671 | * Your network operating system and version | 671 | * Your network operating system and version |
672 | * Your system's OS version | 672 | * Your system's OS version |
673 | * Version of all protocol support files | 673 | * Version of all protocol support files |
674 | 674 | ||
675 | 5.) Any Error Message displayed. | 675 | 5.) Any Error Message displayed. |
676 | 676 | ||
677 | 677 | ||
678 | 678 | ||
679 | 6.3 OBTAINING THE LATEST DRIVER VERSION | 679 | 6.3 OBTAINING THE LATEST DRIVER VERSION |
680 | 680 | ||
681 | You can obtain the latest CS89XX drivers and support software from Cirrus Logic's | 681 | You can obtain the latest CS89XX drivers and support software from Cirrus Logic's |
682 | Web site. You can also contact Cirrus Logic's Technical Support (email: | 682 | Web site. You can also contact Cirrus Logic's Technical Support (email: |
683 | ethernet@crystal.cirrus.com) and request that you be registered for automatic | 683 | ethernet@crystal.cirrus.com) and request that you be registered for automatic |
684 | software-update notification. | 684 | software-update notification. |
685 | 685 | ||
686 | Cirrus Logic maintains a web page at http://www.cirrus.com with the | 686 | Cirrus Logic maintains a web page at http://www.cirrus.com with the |
687 | the latest drivers and technical publications. | 687 | latest drivers and technical publications. |
688 | 688 | ||
689 | 689 | ||
690 | 6.4 Current maintainer | 690 | 6.4 Current maintainer |
691 | 691 | ||
692 | In February 2000 the maintenance of this driver was assumed by Andrew | 692 | In February 2000 the maintenance of this driver was assumed by Andrew |
693 | Morton <akpm@zip.com.au> | 693 | Morton <akpm@zip.com.au> |
694 | 694 | ||
695 | 6.5 Kernel module parameters | 695 | 6.5 Kernel module parameters |
696 | 696 | ||
697 | For use in embedded environments with no cs89x0 EEPROM, the kernel boot | 697 | For use in embedded environments with no cs89x0 EEPROM, the kernel boot |
698 | parameter `cs89x0_media=' has been implemented. Usage is: | 698 | parameter `cs89x0_media=' has been implemented. Usage is: |
699 | 699 | ||
700 | cs89x0_media=rj45 or | 700 | cs89x0_media=rj45 or |
701 | cs89x0_media=aui or | 701 | cs89x0_media=aui or |
702 | cs89x0_media=bnc | 702 | cs89x0_media=bnc |
703 | 703 | ||
704 | 704 |
Documentation/networking/decnet.txt
1 | Linux DECnet Networking Layer Information | 1 | Linux DECnet Networking Layer Information |
2 | =========================================== | 2 | =========================================== |
3 | 3 | ||
4 | 1) Other documentation.... | 4 | 1) Other documentation.... |
5 | 5 | ||
6 | o Project Home Pages | 6 | o Project Home Pages |
7 | http://www.chygwyn.com/DECnet/ - Kernel info | 7 | http://www.chygwyn.com/DECnet/ - Kernel info |
8 | http://linux-decnet.sourceforge.net/ - Userland tools | 8 | http://linux-decnet.sourceforge.net/ - Userland tools |
9 | http://www.sourceforge.net/projects/linux-decnet/ - Status page | 9 | http://www.sourceforge.net/projects/linux-decnet/ - Status page |
10 | 10 | ||
11 | 2) Configuring the kernel | 11 | 2) Configuring the kernel |
12 | 12 | ||
13 | Be sure to turn on the following options: | 13 | Be sure to turn on the following options: |
14 | 14 | ||
15 | CONFIG_DECNET (obviously) | 15 | CONFIG_DECNET (obviously) |
16 | CONFIG_PROC_FS (to see what's going on) | 16 | CONFIG_PROC_FS (to see what's going on) |
17 | CONFIG_SYSCTL (for easy configuration) | 17 | CONFIG_SYSCTL (for easy configuration) |
18 | 18 | ||
19 | if you want to try out router support (not properly debugged yet) | 19 | if you want to try out router support (not properly debugged yet) |
20 | you'll need the following options as well... | 20 | you'll need the following options as well... |
21 | 21 | ||
22 | CONFIG_DECNET_ROUTER (to be able to add/delete routes) | 22 | CONFIG_DECNET_ROUTER (to be able to add/delete routes) |
23 | CONFIG_NETFILTER (will be required for the DECnet routing daemon) | 23 | CONFIG_NETFILTER (will be required for the DECnet routing daemon) |
24 | 24 | ||
25 | CONFIG_DECNET_ROUTE_FWMARK is optional | 25 | CONFIG_DECNET_ROUTE_FWMARK is optional |
26 | 26 | ||
27 | Don't turn on SIOCGIFCONF support for DECnet unless you are really sure | 27 | Don't turn on SIOCGIFCONF support for DECnet unless you are really sure |
28 | that you need it, in general you won't and it can cause ifconfig to | 28 | that you need it, in general you won't and it can cause ifconfig to |
29 | malfunction. | 29 | malfunction. |
30 | 30 | ||
31 | Run time configuration has changed slightly from the 2.4 system. If you | 31 | Run time configuration has changed slightly from the 2.4 system. If you |
32 | want to configure an endnode, then the simplified procedure is as follows: | 32 | want to configure an endnode, then the simplified procedure is as follows: |
33 | 33 | ||
34 | o Set the MAC address on your ethernet card before starting _any_ other | 34 | o Set the MAC address on your ethernet card before starting _any_ other |
35 | network protocols. | 35 | network protocols. |
36 | 36 | ||
37 | As soon as your network card is brought into the UP state, DECnet should | 37 | As soon as your network card is brought into the UP state, DECnet should |
38 | start working. If you need something more complicated or are unsure how | 38 | start working. If you need something more complicated or are unsure how |
39 | to set the MAC address, see the next section. Also all configurations which | 39 | to set the MAC address, see the next section. Also all configurations which |
40 | worked with 2.4 will work under 2.5 with no change. | 40 | worked with 2.4 will work under 2.5 with no change. |
41 | 41 | ||
42 | 3) Command line options | 42 | 3) Command line options |
43 | 43 | ||
44 | You can set a DECnet address on the kernel command line for compatibility | 44 | You can set a DECnet address on the kernel command line for compatibility |
45 | with the 2.4 configuration procedure, but in general it's not needed any more. | 45 | with the 2.4 configuration procedure, but in general it's not needed any more. |
46 | If you do st a DECnet address on the command line, it has only one purpose | 46 | If you do st a DECnet address on the command line, it has only one purpose |
47 | which is that its added to the addresses on the loopback device. | 47 | which is that its added to the addresses on the loopback device. |
48 | 48 | ||
49 | With 2.4 kernels, DECnet would only recognise addresses as local if they | 49 | With 2.4 kernels, DECnet would only recognise addresses as local if they |
50 | were added to the loopback device. In 2.5, any local interface address | 50 | were added to the loopback device. In 2.5, any local interface address |
51 | can be used to loop back to the local machine. Of course this does not | 51 | can be used to loop back to the local machine. Of course this does not |
52 | prevent you adding further addresses to the loopback device if you | 52 | prevent you adding further addresses to the loopback device if you |
53 | want to. | 53 | want to. |
54 | 54 | ||
55 | N.B. Since the address list of an interface determines the addresses for | 55 | N.B. Since the address list of an interface determines the addresses for |
56 | which "hello" messages are sent, if you don't set an address on the loopback | 56 | which "hello" messages are sent, if you don't set an address on the loopback |
57 | interface then you won't see any entries in /proc/net/neigh for the local | 57 | interface then you won't see any entries in /proc/net/neigh for the local |
58 | host until such time as you start a connection. This doesn't affect the | 58 | host until such time as you start a connection. This doesn't affect the |
59 | operation of the local communications in any other way though. | 59 | operation of the local communications in any other way though. |
60 | 60 | ||
61 | The kernel command line takes options looking like the following: | 61 | The kernel command line takes options looking like the following: |
62 | 62 | ||
63 | decnet=1,2 | 63 | decnet=1,2 |
64 | 64 | ||
65 | the two numbers are the node address 1,2 = 1.2 For 2.2.xx kernels | 65 | the two numbers are the node address 1,2 = 1.2 For 2.2.xx kernels |
66 | and early 2.3.xx kernels, you must use a comma when specifying the | 66 | and early 2.3.xx kernels, you must use a comma when specifying the |
67 | DECnet address like this. For more recent 2.3.xx kernels, you may | 67 | DECnet address like this. For more recent 2.3.xx kernels, you may |
68 | use almost any character except space, although a `.` would be the most | 68 | use almost any character except space, although a `.` would be the most |
69 | obvious choice :-) | 69 | obvious choice :-) |
70 | 70 | ||
71 | There used to be a third number specifying the node type. This option | 71 | There used to be a third number specifying the node type. This option |
72 | has gone away in favour of a per interface node type. This is now set | 72 | has gone away in favour of a per interface node type. This is now set |
73 | using /proc/sys/net/decnet/conf/<dev>/forwarding. This file can be | 73 | using /proc/sys/net/decnet/conf/<dev>/forwarding. This file can be |
74 | set with a single digit, 0=EndNode, 1=L1 Router and 2=L2 Router. | 74 | set with a single digit, 0=EndNode, 1=L1 Router and 2=L2 Router. |
75 | 75 | ||
76 | There are also equivalent options for modules. The node address can | 76 | There are also equivalent options for modules. The node address can |
77 | also be set through the /proc/sys/net/decnet/ files, as can other system | 77 | also be set through the /proc/sys/net/decnet/ files, as can other system |
78 | parameters. | 78 | parameters. |
79 | 79 | ||
80 | Currently the only supported devices are ethernet and ip_gre. The | 80 | Currently the only supported devices are ethernet and ip_gre. The |
81 | ethernet address of your ethernet card has to be set according to the DECnet | 81 | ethernet address of your ethernet card has to be set according to the DECnet |
82 | address of the node in order for it to be autoconfigured (and then appear in | 82 | address of the node in order for it to be autoconfigured (and then appear in |
83 | /proc/net/decnet_dev). There is a utility available at the above | 83 | /proc/net/decnet_dev). There is a utility available at the above |
84 | FTP sites called dn2ethaddr which can compute the correct ethernet | 84 | FTP sites called dn2ethaddr which can compute the correct ethernet |
85 | address to use. The address can be set by ifconfig either before at | 85 | address to use. The address can be set by ifconfig either before or |
86 | at the time the device is brought up. If you are using RedHat you can | 86 | at the time the device is brought up. If you are using RedHat you can |
87 | add the line: | 87 | add the line: |
88 | 88 | ||
89 | MACADDR=AA:00:04:00:03:04 | 89 | MACADDR=AA:00:04:00:03:04 |
90 | 90 | ||
91 | or something similar, to /etc/sysconfig/network-scripts/ifcfg-eth0 or | 91 | or something similar, to /etc/sysconfig/network-scripts/ifcfg-eth0 or |
92 | wherever your network card's configuration lives. Setting the MAC address | 92 | wherever your network card's configuration lives. Setting the MAC address |
93 | of your ethernet card to an address starting with "hi-ord" will cause a | 93 | of your ethernet card to an address starting with "hi-ord" will cause a |
94 | DECnet address which matches to be added to the interface (which you can | 94 | DECnet address which matches to be added to the interface (which you can |
95 | verify with iproute2). | 95 | verify with iproute2). |
96 | 96 | ||
97 | The default device for routing can be set through the /proc filesystem | 97 | The default device for routing can be set through the /proc filesystem |
98 | by setting /proc/sys/net/decnet/default_device to the | 98 | by setting /proc/sys/net/decnet/default_device to the |
99 | device you want DECnet to route packets out of when no specific route | 99 | device you want DECnet to route packets out of when no specific route |
100 | is available. Usually this will be eth0, for example: | 100 | is available. Usually this will be eth0, for example: |
101 | 101 | ||
102 | echo -n "eth0" >/proc/sys/net/decnet/default_device | 102 | echo -n "eth0" >/proc/sys/net/decnet/default_device |
103 | 103 | ||
104 | If you don't set the default device, then it will default to the first | 104 | If you don't set the default device, then it will default to the first |
105 | ethernet card which has been autoconfigured as described above. You can | 105 | ethernet card which has been autoconfigured as described above. You can |
106 | confirm that by looking in the default_device file of course. | 106 | confirm that by looking in the default_device file of course. |
107 | 107 | ||
108 | There is a list of what the other files under /proc/sys/net/decnet/ do | 108 | There is a list of what the other files under /proc/sys/net/decnet/ do |
109 | on the kernel patch web site (shown above). | 109 | on the kernel patch web site (shown above). |
110 | 110 | ||
111 | 4) Run time kernel configuration | 111 | 4) Run time kernel configuration |
112 | 112 | ||
113 | This is either done through the sysctl/proc interface (see the kernel web | 113 | This is either done through the sysctl/proc interface (see the kernel web |
114 | pages for details on what the various options do) or through the iproute2 | 114 | pages for details on what the various options do) or through the iproute2 |
115 | package in the same way as IPv4/6 configuration is performed. | 115 | package in the same way as IPv4/6 configuration is performed. |
116 | 116 | ||
117 | Documentation for iproute2 is included with the package, although there is | 117 | Documentation for iproute2 is included with the package, although there is |
118 | as yet no specific section on DECnet, most of the features apply to both | 118 | as yet no specific section on DECnet, most of the features apply to both |
119 | IP and DECnet, albeit with DECnet addresses instead of IP addresses and | 119 | IP and DECnet, albeit with DECnet addresses instead of IP addresses and |
120 | a reduced functionality. | 120 | a reduced functionality. |
121 | 121 | ||
122 | If you want to configure a DECnet router you'll need the iproute2 package | 122 | If you want to configure a DECnet router you'll need the iproute2 package |
123 | since its the _only_ way to add and delete routes currently. Eventually | 123 | since its the _only_ way to add and delete routes currently. Eventually |
124 | there will be a routing daemon to send and receive routing messages for | 124 | there will be a routing daemon to send and receive routing messages for |
125 | each interface and update the kernel routing tables accordingly. The | 125 | each interface and update the kernel routing tables accordingly. The |
126 | routing daemon will use netfilter to listen to routing packets, and | 126 | routing daemon will use netfilter to listen to routing packets, and |
127 | rtnetlink to update the kernels routing tables. | 127 | rtnetlink to update the kernels routing tables. |
128 | 128 | ||
129 | The DECnet raw socket layer has been removed since it was there purely | 129 | The DECnet raw socket layer has been removed since it was there purely |
130 | for use by the routing daemon which will now use netfilter (a much cleaner | 130 | for use by the routing daemon which will now use netfilter (a much cleaner |
131 | and more generic solution) instead. | 131 | and more generic solution) instead. |
132 | 132 | ||
133 | 5) How can I tell if its working ? | 133 | 5) How can I tell if its working ? |
134 | 134 | ||
135 | Here is a quick guide of what to look for in order to know if your DECnet | 135 | Here is a quick guide of what to look for in order to know if your DECnet |
136 | kernel subsystem is working. | 136 | kernel subsystem is working. |
137 | 137 | ||
138 | - Is the node address set (see /proc/sys/net/decnet/node_address) | 138 | - Is the node address set (see /proc/sys/net/decnet/node_address) |
139 | - Is the node of the correct type | 139 | - Is the node of the correct type |
140 | (see /proc/sys/net/decnet/conf/<dev>/forwarding) | 140 | (see /proc/sys/net/decnet/conf/<dev>/forwarding) |
141 | - Is the Ethernet MAC address of each Ethernet card set to match | 141 | - Is the Ethernet MAC address of each Ethernet card set to match |
142 | the DECnet address. If in doubt use the dn2ethaddr utility available | 142 | the DECnet address. If in doubt use the dn2ethaddr utility available |
143 | at the ftp archive. | 143 | at the ftp archive. |
144 | - If the previous two steps are satisfied, and the Ethernet card is up, | 144 | - If the previous two steps are satisfied, and the Ethernet card is up, |
145 | you should find that it is listed in /proc/net/decnet_dev and also | 145 | you should find that it is listed in /proc/net/decnet_dev and also |
146 | that it appears as a directory in /proc/sys/net/decnet/conf/. The | 146 | that it appears as a directory in /proc/sys/net/decnet/conf/. The |
147 | loopback device (lo) should also appear and is required to communicate | 147 | loopback device (lo) should also appear and is required to communicate |
148 | within a node. | 148 | within a node. |
149 | - If you have any DECnet routers on your network, they should appear | 149 | - If you have any DECnet routers on your network, they should appear |
150 | in /proc/net/decnet_neigh, otherwise this file will only contain the | 150 | in /proc/net/decnet_neigh, otherwise this file will only contain the |
151 | entry for the node itself (if it doesn't check to see if lo is up). | 151 | entry for the node itself (if it doesn't check to see if lo is up). |
152 | - If you want to send to any node which is not listed in the | 152 | - If you want to send to any node which is not listed in the |
153 | /proc/net/decnet_neigh file, you'll need to set the default device | 153 | /proc/net/decnet_neigh file, you'll need to set the default device |
154 | to point to an Ethernet card with connection to a router. This is | 154 | to point to an Ethernet card with connection to a router. This is |
155 | again done with the /proc/sys/net/decnet/default_device file. | 155 | again done with the /proc/sys/net/decnet/default_device file. |
156 | - Try starting a simple server and client, like the dnping/dnmirror | 156 | - Try starting a simple server and client, like the dnping/dnmirror |
157 | over the loopback interface. With luck they should communicate. | 157 | over the loopback interface. With luck they should communicate. |
158 | For this step and those after, you'll need the DECnet library | 158 | For this step and those after, you'll need the DECnet library |
159 | which can be obtained from the above ftp sites as well as the | 159 | which can be obtained from the above ftp sites as well as the |
160 | actual utilities themselves. | 160 | actual utilities themselves. |
161 | - If this seems to work, then try talking to a node on your local | 161 | - If this seems to work, then try talking to a node on your local |
162 | network, and see if you can obtain the same results. | 162 | network, and see if you can obtain the same results. |
163 | - At this point you are on your own... :-) | 163 | - At this point you are on your own... :-) |
164 | 164 | ||
165 | 6) How to send a bug report | 165 | 6) How to send a bug report |
166 | 166 | ||
167 | If you've found a bug and want to report it, then there are several things | 167 | If you've found a bug and want to report it, then there are several things |
168 | you can do to help me work out exactly what it is that is wrong. Useful | 168 | you can do to help me work out exactly what it is that is wrong. Useful |
169 | information (_most_ of which _is_ _essential_) includes: | 169 | information (_most_ of which _is_ _essential_) includes: |
170 | 170 | ||
171 | - What kernel version are you running ? | 171 | - What kernel version are you running ? |
172 | - What version of the patch are you running ? | 172 | - What version of the patch are you running ? |
173 | - How far though the above set of tests can you get ? | 173 | - How far though the above set of tests can you get ? |
174 | - What is in the /proc/decnet* files and /proc/sys/net/decnet/* files ? | 174 | - What is in the /proc/decnet* files and /proc/sys/net/decnet/* files ? |
175 | - Which services are you running ? | 175 | - Which services are you running ? |
176 | - Which client caused the problem ? | 176 | - Which client caused the problem ? |
177 | - How much data was being transferred ? | 177 | - How much data was being transferred ? |
178 | - Was the network congested ? | 178 | - Was the network congested ? |
179 | - How can the problem be reproduced ? | 179 | - How can the problem be reproduced ? |
180 | - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of | 180 | - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of |
181 | tcpdump don't understand how to dump DECnet properly, so including | 181 | tcpdump don't understand how to dump DECnet properly, so including |
182 | the hex listing of the packet contents is _essential_, usually the -x flag. | 182 | the hex listing of the packet contents is _essential_, usually the -x flag. |
183 | You may also need to increase the length grabbed with the -s flag. The | 183 | You may also need to increase the length grabbed with the -s flag. The |
184 | -e flag also provides very useful information (ethernet MAC addresses)) | 184 | -e flag also provides very useful information (ethernet MAC addresses)) |
185 | 185 | ||
186 | 7) MAC FAQ | 186 | 7) MAC FAQ |
187 | 187 | ||
188 | A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet | 188 | A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet |
189 | interact and how to get the best performance from your hardware. | 189 | interact and how to get the best performance from your hardware. |
190 | 190 | ||
191 | Ethernet cards are designed to normally only pass received network frames | 191 | Ethernet cards are designed to normally only pass received network frames |
192 | to a host computer when they are addressed to it, or to the broadcast address. | 192 | to a host computer when they are addressed to it, or to the broadcast address. |
193 | 193 | ||
194 | Linux has an interface which allows the setting of extra addresses for | 194 | Linux has an interface which allows the setting of extra addresses for |
195 | an ethernet card to listen to. If the ethernet card supports it, the | 195 | an ethernet card to listen to. If the ethernet card supports it, the |
196 | filtering operation will be done in hardware, if not the extra unwanted packets | 196 | filtering operation will be done in hardware, if not the extra unwanted packets |
197 | received will be discarded by the host computer. In the latter case, | 197 | received will be discarded by the host computer. In the latter case, |
198 | significant processor time and bus bandwidth can be used up on a busy | 198 | significant processor time and bus bandwidth can be used up on a busy |
199 | network (see the NAPI documentation for a longer explanation of these | 199 | network (see the NAPI documentation for a longer explanation of these |
200 | effects). | 200 | effects). |
201 | 201 | ||
202 | DECnet makes use of this interface to allow running DECnet on an ethernet | 202 | DECnet makes use of this interface to allow running DECnet on an ethernet |
203 | card which has already been configured using TCP/IP (presumably using the | 203 | card which has already been configured using TCP/IP (presumably using the |
204 | built in MAC address of the card, as usual) and/or to allow multiple DECnet | 204 | built in MAC address of the card, as usual) and/or to allow multiple DECnet |
205 | addresses on each physical interface. If you do this, be aware that if your | 205 | addresses on each physical interface. If you do this, be aware that if your |
206 | ethernet card doesn't support perfect hashing in its MAC address filter | 206 | ethernet card doesn't support perfect hashing in its MAC address filter |
207 | then your computer will be doing more work than required. Some cards | 207 | then your computer will be doing more work than required. Some cards |
208 | will simply set themselves into promiscuous mode in order to receive | 208 | will simply set themselves into promiscuous mode in order to receive |
209 | packets from the DECnet specified addresses. So if you have one of these | 209 | packets from the DECnet specified addresses. So if you have one of these |
210 | cards its better to set the MAC address of the card as described above | 210 | cards its better to set the MAC address of the card as described above |
211 | to gain the best efficiency. Better still is to use a card which supports | 211 | to gain the best efficiency. Better still is to use a card which supports |
212 | NAPI as well. | 212 | NAPI as well. |
213 | 213 | ||
214 | 214 | ||
215 | 8) Mailing list | 215 | 8) Mailing list |
216 | 216 | ||
217 | If you are keen to get involved in development, or want to ask questions | 217 | If you are keen to get involved in development, or want to ask questions |
218 | about configuration, or even just report bugs, then there is a mailing | 218 | about configuration, or even just report bugs, then there is a mailing |
219 | list that you can join, details are at: | 219 | list that you can join, details are at: |
220 | 220 | ||
221 | http://sourceforge.net/mail/?group_id=4993 | 221 | http://sourceforge.net/mail/?group_id=4993 |
222 | 222 | ||
223 | 9) Legal Info | 223 | 9) Legal Info |
224 | 224 | ||
225 | The Linux DECnet project team have placed their code under the GPL. The | 225 | The Linux DECnet project team have placed their code under the GPL. The |
226 | software is provided "as is" and without warranty express or implied. | 226 | software is provided "as is" and without warranty express or implied. |
227 | DECnet is a trademark of Compaq. This software is not a product of | 227 | DECnet is a trademark of Compaq. This software is not a product of |
228 | Compaq. We acknowledge the help of people at Compaq in providing extra | 228 | Compaq. We acknowledge the help of people at Compaq in providing extra |
229 | documentation above and beyond what was previously publicly available. | 229 | documentation above and beyond what was previously publicly available. |
230 | 230 | ||
231 | Steve Whitehouse <SteveW@ACM.org> | 231 | Steve Whitehouse <SteveW@ACM.org> |
232 | 232 | ||
233 | 233 |
Documentation/networking/e1000.txt
1 | Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters | 1 | Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters |
2 | =============================================================== | 2 | =============================================================== |
3 | 3 | ||
4 | November 15, 2005 | 4 | November 15, 2005 |
5 | 5 | ||
6 | 6 | ||
7 | Contents | 7 | Contents |
8 | ======== | 8 | ======== |
9 | 9 | ||
10 | - In This Release | 10 | - In This Release |
11 | - Identifying Your Adapter | 11 | - Identifying Your Adapter |
12 | - Command Line Parameters | 12 | - Command Line Parameters |
13 | - Speed and Duplex Configuration | 13 | - Speed and Duplex Configuration |
14 | - Additional Configurations | 14 | - Additional Configurations |
15 | - Known Issues | 15 | - Known Issues |
16 | - Support | 16 | - Support |
17 | 17 | ||
18 | 18 | ||
19 | In This Release | 19 | In This Release |
20 | =============== | 20 | =============== |
21 | 21 | ||
22 | This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family | 22 | This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family |
23 | of Adapters. This driver includes support for Itanium(R)2-based systems. | 23 | of Adapters. This driver includes support for Itanium(R)2-based systems. |
24 | 24 | ||
25 | For questions related to hardware requirements, refer to the documentation | 25 | For questions related to hardware requirements, refer to the documentation |
26 | supplied with your Intel PRO/1000 adapter. All hardware requirements listed | 26 | supplied with your Intel PRO/1000 adapter. All hardware requirements listed |
27 | apply to use with Linux. | 27 | apply to use with Linux. |
28 | 28 | ||
29 | The following features are now available in supported kernels: | 29 | The following features are now available in supported kernels: |
30 | - Native VLANs | 30 | - Native VLANs |
31 | - Channel Bonding (teaming) | 31 | - Channel Bonding (teaming) |
32 | - SNMP | 32 | - SNMP |
33 | 33 | ||
34 | Channel Bonding documentation can be found in the Linux kernel source: | 34 | Channel Bonding documentation can be found in the Linux kernel source: |
35 | /Documentation/networking/bonding.txt | 35 | /Documentation/networking/bonding.txt |
36 | 36 | ||
37 | The driver information previously displayed in the /proc filesystem is not | 37 | The driver information previously displayed in the /proc filesystem is not |
38 | supported in this release. Alternatively, you can use ethtool (version 1.6 | 38 | supported in this release. Alternatively, you can use ethtool (version 1.6 |
39 | or later), lspci, and ifconfig to obtain the same information. | 39 | or later), lspci, and ifconfig to obtain the same information. |
40 | 40 | ||
41 | Instructions on updating ethtool can be found in the section "Additional | 41 | Instructions on updating ethtool can be found in the section "Additional |
42 | Configurations" later in this document. | 42 | Configurations" later in this document. |
43 | 43 | ||
44 | 44 | ||
45 | Identifying Your Adapter | 45 | Identifying Your Adapter |
46 | ======================== | 46 | ======================== |
47 | 47 | ||
48 | For more information on how to identify your adapter, go to the Adapter & | 48 | For more information on how to identify your adapter, go to the Adapter & |
49 | Driver ID Guide at: | 49 | Driver ID Guide at: |
50 | 50 | ||
51 | http://support.intel.com/support/network/adapter/pro100/21397.htm | 51 | http://support.intel.com/support/network/adapter/pro100/21397.htm |
52 | 52 | ||
53 | For the latest Intel network drivers for Linux, refer to the following | 53 | For the latest Intel network drivers for Linux, refer to the following |
54 | website. In the search field, enter your adapter name or type, or use the | 54 | website. In the search field, enter your adapter name or type, or use the |
55 | networking link on the left to search for your adapter: | 55 | networking link on the left to search for your adapter: |
56 | 56 | ||
57 | http://downloadfinder.intel.com/scripts-df/support_intel.asp | 57 | http://downloadfinder.intel.com/scripts-df/support_intel.asp |
58 | 58 | ||
59 | 59 | ||
60 | Command Line Parameters ======================= | 60 | Command Line Parameters ======================= |
61 | 61 | ||
62 | If the driver is built as a module, the following optional parameters | 62 | If the driver is built as a module, the following optional parameters |
63 | are used by entering them on the command line with the modprobe or insmod | 63 | are used by entering them on the command line with the modprobe or insmod |
64 | command using this syntax: | 64 | command using this syntax: |
65 | 65 | ||
66 | modprobe e1000 [<option>=<VAL1>,<VAL2>,...] | 66 | modprobe e1000 [<option>=<VAL1>,<VAL2>,...] |
67 | 67 | ||
68 | insmod e1000 [<option>=<VAL1>,<VAL2>,...] | 68 | insmod e1000 [<option>=<VAL1>,<VAL2>,...] |
69 | 69 | ||
70 | For example, with two PRO/1000 PCI adapters, entering: | 70 | For example, with two PRO/1000 PCI adapters, entering: |
71 | 71 | ||
72 | insmod e1000 TxDescriptors=80,128 | 72 | insmod e1000 TxDescriptors=80,128 |
73 | 73 | ||
74 | loads the e1000 driver with 80 TX descriptors for the first adapter and 128 | 74 | loads the e1000 driver with 80 TX descriptors for the first adapter and 128 |
75 | TX descriptors for the second adapter. | 75 | TX descriptors for the second adapter. |
76 | 76 | ||
77 | The default value for each parameter is generally the recommended setting, | 77 | The default value for each parameter is generally the recommended setting, |
78 | unless otherwise noted. | 78 | unless otherwise noted. |
79 | 79 | ||
80 | NOTES: For more information about the AutoNeg, Duplex, and Speed | 80 | NOTES: For more information about the AutoNeg, Duplex, and Speed |
81 | parameters, see the "Speed and Duplex Configuration" section in | 81 | parameters, see the "Speed and Duplex Configuration" section in |
82 | this document. | 82 | this document. |
83 | 83 | ||
84 | For more information about the InterruptThrottleRate, | 84 | For more information about the InterruptThrottleRate, |
85 | RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay | 85 | RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay |
86 | parameters, see the application note at: | 86 | parameters, see the application note at: |
87 | http://www.intel.com/design/network/applnots/ap450.htm | 87 | http://www.intel.com/design/network/applnots/ap450.htm |
88 | 88 | ||
89 | A descriptor describes a data buffer and attributes related to | 89 | A descriptor describes a data buffer and attributes related to |
90 | the data buffer. This information is accessed by the hardware. | 90 | the data buffer. This information is accessed by the hardware. |
91 | 91 | ||
92 | 92 | ||
93 | AutoNeg | 93 | AutoNeg |
94 | ------- | 94 | ------- |
95 | (Supported only on adapters with copper connections) | 95 | (Supported only on adapters with copper connections) |
96 | Valid Range: 0x01-0x0F, 0x20-0x2F | 96 | Valid Range: 0x01-0x0F, 0x20-0x2F |
97 | Default Value: 0x2F | 97 | Default Value: 0x2F |
98 | 98 | ||
99 | This parameter is a bit mask that specifies which speed and duplex | 99 | This parameter is a bit mask that specifies which speed and duplex |
100 | settings the board advertises. When this parameter is used, the Speed | 100 | settings the board advertises. When this parameter is used, the Speed |
101 | and Duplex parameters must not be specified. | 101 | and Duplex parameters must not be specified. |
102 | 102 | ||
103 | NOTE: Refer to the Speed and Duplex section of this readme for more | 103 | NOTE: Refer to the Speed and Duplex section of this readme for more |
104 | information on the AutoNeg parameter. | 104 | information on the AutoNeg parameter. |
105 | 105 | ||
106 | 106 | ||
107 | Duplex | 107 | Duplex |
108 | ------ | 108 | ------ |
109 | (Supported only on adapters with copper connections) | 109 | (Supported only on adapters with copper connections) |
110 | Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full) | 110 | Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full) |
111 | Default Value: 0 | 111 | Default Value: 0 |
112 | 112 | ||
113 | Defines the direction in which data is allowed to flow. Can be either | 113 | Defines the direction in which data is allowed to flow. Can be either |
114 | one or two-directional. If both Duplex and the link partner are set to | 114 | one or two-directional. If both Duplex and the link partner are set to |
115 | auto-negotiate, the board auto-detects the correct duplex. If the link | 115 | auto-negotiate, the board auto-detects the correct duplex. If the link |
116 | partner is forced (either full or half), Duplex defaults to half-duplex. | 116 | partner is forced (either full or half), Duplex defaults to half-duplex. |
117 | 117 | ||
118 | 118 | ||
119 | FlowControl | 119 | FlowControl |
120 | ---------- | 120 | ---------- |
121 | Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) | 121 | Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) |
122 | Default Value: Reads flow control settings from the EEPROM | 122 | Default Value: Reads flow control settings from the EEPROM |
123 | 123 | ||
124 | This parameter controls the automatic generation(Tx) and response(Rx) | 124 | This parameter controls the automatic generation(Tx) and response(Rx) |
125 | to Ethernet PAUSE frames. | 125 | to Ethernet PAUSE frames. |
126 | 126 | ||
127 | 127 | ||
128 | InterruptThrottleRate | 128 | InterruptThrottleRate |
129 | --------------------- | 129 | --------------------- |
130 | (not supported on Intel 82542, 82543 or 82544-based adapters) | 130 | (not supported on Intel 82542, 82543 or 82544-based adapters) |
131 | Valid Range: 100-100000 (0=off, 1=dynamic) | 131 | Valid Range: 100-100000 (0=off, 1=dynamic) |
132 | Default Value: 8000 | 132 | Default Value: 8000 |
133 | 133 | ||
134 | This value represents the maximum number of interrupts per second the | 134 | This value represents the maximum number of interrupts per second the |
135 | controller generates. InterruptThrottleRate is another setting used in | 135 | controller generates. InterruptThrottleRate is another setting used in |
136 | interrupt moderation. Dynamic mode uses a heuristic algorithm to adjust | 136 | interrupt moderation. Dynamic mode uses a heuristic algorithm to adjust |
137 | InterruptThrottleRate based on the current traffic load. | 137 | InterruptThrottleRate based on the current traffic load. |
138 | 138 | ||
139 | NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and | 139 | NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and |
140 | RxAbsIntDelay parameters. In other words, minimizing the receive | 140 | RxAbsIntDelay parameters. In other words, minimizing the receive |
141 | and/or transmit absolute delays does not force the controller to | 141 | and/or transmit absolute delays does not force the controller to |
142 | generate more interrupts than what the Interrupt Throttle Rate | 142 | generate more interrupts than what the Interrupt Throttle Rate |
143 | allows. | 143 | allows. |
144 | 144 | ||
145 | CAUTION: If you are using the Intel PRO/1000 CT Network Connection | 145 | CAUTION: If you are using the Intel PRO/1000 CT Network Connection |
146 | (controller 82547), setting InterruptThrottleRate to a value | 146 | (controller 82547), setting InterruptThrottleRate to a value |
147 | greater than 75,000, may hang (stop transmitting) adapters | 147 | greater than 75,000, may hang (stop transmitting) adapters |
148 | under certain network conditions. If this occurs a NETDEV | 148 | under certain network conditions. If this occurs a NETDEV |
149 | WATCHDOG message is logged in the system event log. In | 149 | WATCHDOG message is logged in the system event log. In |
150 | addition, the controller is automatically reset, restoring | 150 | addition, the controller is automatically reset, restoring |
151 | the network connection. To eliminate the potential for the | 151 | the network connection. To eliminate the potential for the |
152 | hang, ensure that InterruptThrottleRate is set no greater | 152 | hang, ensure that InterruptThrottleRate is set no greater |
153 | than 75,000 and is not set to 0. | 153 | than 75,000 and is not set to 0. |
154 | 154 | ||
155 | NOTE: When e1000 is loaded with default settings and multiple adapters | 155 | NOTE: When e1000 is loaded with default settings and multiple adapters |
156 | are in use simultaneously, the CPU utilization may increase non- | 156 | are in use simultaneously, the CPU utilization may increase non- |
157 | linearly. In order to limit the CPU utilization without impacting | 157 | linearly. In order to limit the CPU utilization without impacting |
158 | the overall throughput, we recommend that you load the driver as | 158 | the overall throughput, we recommend that you load the driver as |
159 | follows: | 159 | follows: |
160 | 160 | ||
161 | insmod e1000.o InterruptThrottleRate=3000,3000,3000 | 161 | insmod e1000.o InterruptThrottleRate=3000,3000,3000 |
162 | 162 | ||
163 | This sets the InterruptThrottleRate to 3000 interrupts/sec for | 163 | This sets the InterruptThrottleRate to 3000 interrupts/sec for |
164 | the first, second, and third instances of the driver. The range | 164 | the first, second, and third instances of the driver. The range |
165 | of 2000 to 3000 interrupts per second works on a majority of | 165 | of 2000 to 3000 interrupts per second works on a majority of |
166 | systems and is a good starting point, but the optimal value will | 166 | systems and is a good starting point, but the optimal value will |
167 | be platform-specific. If CPU utilization is not a concern, use | 167 | be platform-specific. If CPU utilization is not a concern, use |
168 | RX_POLLING (NAPI) and default driver settings. | 168 | RX_POLLING (NAPI) and default driver settings. |
169 | 169 | ||
170 | 170 | ||
171 | RxDescriptors | 171 | RxDescriptors |
172 | ------------- | 172 | ------------- |
173 | Valid Range: 80-256 for 82542 and 82543-based adapters | 173 | Valid Range: 80-256 for 82542 and 82543-based adapters |
174 | 80-4096 for all other supported adapters | 174 | 80-4096 for all other supported adapters |
175 | Default Value: 256 | 175 | Default Value: 256 |
176 | 176 | ||
177 | This value specifies the number of receive descriptors allocated by the | 177 | This value specifies the number of receive descriptors allocated by the |
178 | driver. Increasing this value allows the driver to buffer more incoming | 178 | driver. Increasing this value allows the driver to buffer more incoming |
179 | packets. Each descriptor is 16 bytes. A receive buffer is also | 179 | packets. Each descriptor is 16 bytes. A receive buffer is also |
180 | allocated for each descriptor and is 2048. | 180 | allocated for each descriptor and is 2048. |
181 | 181 | ||
182 | 182 | ||
183 | RxIntDelay | 183 | RxIntDelay |
184 | ---------- | 184 | ---------- |
185 | Valid Range: 0-65535 (0=off) | 185 | Valid Range: 0-65535 (0=off) |
186 | Default Value: 0 | 186 | Default Value: 0 |
187 | 187 | ||
188 | This value delays the generation of receive interrupts in units of 1.024 | 188 | This value delays the generation of receive interrupts in units of 1.024 |
189 | microseconds. Receive interrupt reduction can improve CPU efficiency if | 189 | microseconds. Receive interrupt reduction can improve CPU efficiency if |
190 | properly tuned for specific network traffic. Increasing this value adds | 190 | properly tuned for specific network traffic. Increasing this value adds |
191 | extra latency to frame reception and can end up decreasing the throughput | 191 | extra latency to frame reception and can end up decreasing the throughput |
192 | of TCP traffic. If the system is reporting dropped receives, this value | 192 | of TCP traffic. If the system is reporting dropped receives, this value |
193 | may be set too high, causing the driver to run out of available receive | 193 | may be set too high, causing the driver to run out of available receive |
194 | descriptors. | 194 | descriptors. |
195 | 195 | ||
196 | CAUTION: When setting RxIntDelay to a value other than 0, adapters may | 196 | CAUTION: When setting RxIntDelay to a value other than 0, adapters may |
197 | hang (stop transmitting) under certain network conditions. If | 197 | hang (stop transmitting) under certain network conditions. If |
198 | this occurs a NETDEV WATCHDOG message is logged in the system | 198 | this occurs a NETDEV WATCHDOG message is logged in the system |
199 | event log. In addition, the controller is automatically reset, | 199 | event log. In addition, the controller is automatically reset, |
200 | restoring the network connection. To eliminate the potential | 200 | restoring the network connection. To eliminate the potential |
201 | for the hang ensure that RxIntDelay is set to 0. | 201 | for the hang ensure that RxIntDelay is set to 0. |
202 | 202 | ||
203 | 203 | ||
204 | RxAbsIntDelay | 204 | RxAbsIntDelay |
205 | ------------- | 205 | ------------- |
206 | (This parameter is supported only on 82540, 82545 and later adapters.) | 206 | (This parameter is supported only on 82540, 82545 and later adapters.) |
207 | Valid Range: 0-65535 (0=off) | 207 | Valid Range: 0-65535 (0=off) |
208 | Default Value: 128 | 208 | Default Value: 128 |
209 | 209 | ||
210 | This value, in units of 1.024 microseconds, limits the delay in which a | 210 | This value, in units of 1.024 microseconds, limits the delay in which a |
211 | receive interrupt is generated. Useful only if RxIntDelay is non-zero, | 211 | receive interrupt is generated. Useful only if RxIntDelay is non-zero, |
212 | this value ensures that an interrupt is generated after the initial | 212 | this value ensures that an interrupt is generated after the initial |
213 | packet is received within the set amount of time. Proper tuning, | 213 | packet is received within the set amount of time. Proper tuning, |
214 | along with RxIntDelay, may improve traffic throughput in specific network | 214 | along with RxIntDelay, may improve traffic throughput in specific network |
215 | conditions. | 215 | conditions. |
216 | 216 | ||
217 | 217 | ||
218 | Speed | 218 | Speed |
219 | ----- | 219 | ----- |
220 | (This parameter is supported only on adapters with copper connections.) | 220 | (This parameter is supported only on adapters with copper connections.) |
221 | Valid Settings: 0, 10, 100, 1000 | 221 | Valid Settings: 0, 10, 100, 1000 |
222 | Default Value: 0 (auto-negotiate at all supported speeds) | 222 | Default Value: 0 (auto-negotiate at all supported speeds) |
223 | 223 | ||
224 | Speed forces the line speed to the specified value in megabits per second | 224 | Speed forces the line speed to the specified value in megabits per second |
225 | (Mbps). If this parameter is not specified or is set to 0 and the link | 225 | (Mbps). If this parameter is not specified or is set to 0 and the link |
226 | partner is set to auto-negotiate, the board will auto-detect the correct | 226 | partner is set to auto-negotiate, the board will auto-detect the correct |
227 | speed. Duplex should also be set when Speed is set to either 10 or 100. | 227 | speed. Duplex should also be set when Speed is set to either 10 or 100. |
228 | 228 | ||
229 | 229 | ||
230 | TxDescriptors | 230 | TxDescriptors |
231 | ------------- | 231 | ------------- |
232 | Valid Range: 80-256 for 82542 and 82543-based adapters | 232 | Valid Range: 80-256 for 82542 and 82543-based adapters |
233 | 80-4096 for all other supported adapters | 233 | 80-4096 for all other supported adapters |
234 | Default Value: 256 | 234 | Default Value: 256 |
235 | 235 | ||
236 | This value is the number of transmit descriptors allocated by the driver. | 236 | This value is the number of transmit descriptors allocated by the driver. |
237 | Increasing this value allows the driver to queue more transmits. Each | 237 | Increasing this value allows the driver to queue more transmits. Each |
238 | descriptor is 16 bytes. | 238 | descriptor is 16 bytes. |
239 | 239 | ||
240 | NOTE: Depending on the available system resources, the request for a | 240 | NOTE: Depending on the available system resources, the request for a |
241 | higher number of transmit descriptors may be denied. In this case, | 241 | higher number of transmit descriptors may be denied. In this case, |
242 | use a lower number. | 242 | use a lower number. |
243 | 243 | ||
244 | 244 | ||
245 | TxIntDelay | 245 | TxIntDelay |
246 | ---------- | 246 | ---------- |
247 | Valid Range: 0-65535 (0=off) | 247 | Valid Range: 0-65535 (0=off) |
248 | Default Value: 64 | 248 | Default Value: 64 |
249 | 249 | ||
250 | This value delays the generation of transmit interrupts in units of | 250 | This value delays the generation of transmit interrupts in units of |
251 | 1.024 microseconds. Transmit interrupt reduction can improve CPU | 251 | 1.024 microseconds. Transmit interrupt reduction can improve CPU |
252 | efficiency if properly tuned for specific network traffic. If the | 252 | efficiency if properly tuned for specific network traffic. If the |
253 | system is reporting dropped transmits, this value may be set too high | 253 | system is reporting dropped transmits, this value may be set too high |
254 | causing the driver to run out of available transmit descriptors. | 254 | causing the driver to run out of available transmit descriptors. |
255 | 255 | ||
256 | 256 | ||
257 | TxAbsIntDelay | 257 | TxAbsIntDelay |
258 | ------------- | 258 | ------------- |
259 | (This parameter is supported only on 82540, 82545 and later adapters.) | 259 | (This parameter is supported only on 82540, 82545 and later adapters.) |
260 | Valid Range: 0-65535 (0=off) | 260 | Valid Range: 0-65535 (0=off) |
261 | Default Value: 64 | 261 | Default Value: 64 |
262 | 262 | ||
263 | This value, in units of 1.024 microseconds, limits the delay in which a | 263 | This value, in units of 1.024 microseconds, limits the delay in which a |
264 | transmit interrupt is generated. Useful only if TxIntDelay is non-zero, | 264 | transmit interrupt is generated. Useful only if TxIntDelay is non-zero, |
265 | this value ensures that an interrupt is generated after the initial | 265 | this value ensures that an interrupt is generated after the initial |
266 | packet is sent on the wire within the set amount of time. Proper tuning, | 266 | packet is sent on the wire within the set amount of time. Proper tuning, |
267 | along with TxIntDelay, may improve traffic throughput in specific | 267 | along with TxIntDelay, may improve traffic throughput in specific |
268 | network conditions. | 268 | network conditions. |
269 | 269 | ||
270 | XsumRX | 270 | XsumRX |
271 | ------ | 271 | ------ |
272 | (This parameter is NOT supported on the 82542-based adapter.) | 272 | (This parameter is NOT supported on the 82542-based adapter.) |
273 | Valid Range: 0-1 | 273 | Valid Range: 0-1 |
274 | Default Value: 1 | 274 | Default Value: 1 |
275 | 275 | ||
276 | A value of '1' indicates that the driver should enable IP checksum | 276 | A value of '1' indicates that the driver should enable IP checksum |
277 | offload for received packets (both UDP and TCP) to the adapter hardware. | 277 | offload for received packets (both UDP and TCP) to the adapter hardware. |
278 | 278 | ||
279 | 279 | ||
280 | Speed and Duplex Configuration | 280 | Speed and Duplex Configuration |
281 | ============================== | 281 | ============================== |
282 | 282 | ||
283 | Three keywords are used to control the speed and duplex configuration. | 283 | Three keywords are used to control the speed and duplex configuration. |
284 | These keywords are Speed, Duplex, and AutoNeg. | 284 | These keywords are Speed, Duplex, and AutoNeg. |
285 | 285 | ||
286 | If the board uses a fiber interface, these keywords are ignored, and the | 286 | If the board uses a fiber interface, these keywords are ignored, and the |
287 | fiber interface board only links at 1000 Mbps full-duplex. | 287 | fiber interface board only links at 1000 Mbps full-duplex. |
288 | 288 | ||
289 | For copper-based boards, the keywords interact as follows: | 289 | For copper-based boards, the keywords interact as follows: |
290 | 290 | ||
291 | The default operation is auto-negotiate. The board advertises all | 291 | The default operation is auto-negotiate. The board advertises all |
292 | supported speed and duplex combinations, and it links at the highest | 292 | supported speed and duplex combinations, and it links at the highest |
293 | common speed and duplex mode IF the link partner is set to auto-negotiate. | 293 | common speed and duplex mode IF the link partner is set to auto-negotiate. |
294 | 294 | ||
295 | If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps | 295 | If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps |
296 | is advertised (The 1000BaseT spec requires auto-negotiation.) | 296 | is advertised (The 1000BaseT spec requires auto-negotiation.) |
297 | 297 | ||
298 | If Speed = 10 or 100, then both Speed and Duplex should be set. Auto- | 298 | If Speed = 10 or 100, then both Speed and Duplex should be set. Auto- |
299 | negotiation is disabled, and the AutoNeg parameter is ignored. Partner | 299 | negotiation is disabled, and the AutoNeg parameter is ignored. Partner |
300 | SHOULD also be forced. | 300 | SHOULD also be forced. |
301 | 301 | ||
302 | The AutoNeg parameter is used when more control is required over the | 302 | The AutoNeg parameter is used when more control is required over the |
303 | auto-negotiation process. It should be used when you wish to control which | 303 | auto-negotiation process. It should be used when you wish to control which |
304 | speed and duplex combinations are advertised during the auto-negotiation | 304 | speed and duplex combinations are advertised during the auto-negotiation |
305 | process. | 305 | process. |
306 | 306 | ||
307 | The parameter may be specified as either a decimal or hexidecimal value as | 307 | The parameter may be specified as either a decimal or hexidecimal value as |
308 | determined by the bitmap below. | 308 | determined by the bitmap below. |
309 | 309 | ||
310 | Bit position 7 6 5 4 3 2 1 0 | 310 | Bit position 7 6 5 4 3 2 1 0 |
311 | Decimal Value 128 64 32 16 8 4 2 1 | 311 | Decimal Value 128 64 32 16 8 4 2 1 |
312 | Hex value 80 40 20 10 8 4 2 1 | 312 | Hex value 80 40 20 10 8 4 2 1 |
313 | Speed (Mbps) N/A N/A 1000 N/A 100 100 10 10 | 313 | Speed (Mbps) N/A N/A 1000 N/A 100 100 10 10 |
314 | Duplex Full Full Half Full Half | 314 | Duplex Full Full Half Full Half |
315 | 315 | ||
316 | Some examples of using AutoNeg: | 316 | Some examples of using AutoNeg: |
317 | 317 | ||
318 | modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half) | 318 | modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half) |
319 | modprobe e1000 AutoNeg=1 (Same as above) | 319 | modprobe e1000 AutoNeg=1 (Same as above) |
320 | modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full) | 320 | modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full) |
321 | modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full) | 321 | modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full) |
322 | modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half) | 322 | modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half) |
323 | modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100 | 323 | modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100 |
324 | Half) | 324 | Half) |
325 | modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full) | 325 | modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full) |
326 | modprobe e1000 AutoNeg=32 (Same as above) | 326 | modprobe e1000 AutoNeg=32 (Same as above) |
327 | 327 | ||
328 | Note that when this parameter is used, Speed and Duplex must not be specified. | 328 | Note that when this parameter is used, Speed and Duplex must not be specified. |
329 | 329 | ||
330 | If the link partner is forced to a specific speed and duplex, then this | 330 | If the link partner is forced to a specific speed and duplex, then this |
331 | parameter should not be used. Instead, use the Speed and Duplex parameters | 331 | parameter should not be used. Instead, use the Speed and Duplex parameters |
332 | previously mentioned to force the adapter to the same speed and duplex. | 332 | previously mentioned to force the adapter to the same speed and duplex. |
333 | 333 | ||
334 | 334 | ||
335 | Additional Configurations | 335 | Additional Configurations |
336 | ========================= | 336 | ========================= |
337 | 337 | ||
338 | Configuring the Driver on Different Distributions | 338 | Configuring the Driver on Different Distributions |
339 | ------------------------------------------------- | 339 | ------------------------------------------------- |
340 | 340 | ||
341 | Configuring a network driver to load properly when the system is started | 341 | Configuring a network driver to load properly when the system is started |
342 | is distribution dependent. Typically, the configuration process involves | 342 | is distribution dependent. Typically, the configuration process involves |
343 | adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well | 343 | adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well |
344 | as editing other system startup scripts and/or configuration files. Many | 344 | as editing other system startup scripts and/or configuration files. Many |
345 | popular Linux distributions ship with tools to make these changes for you. | 345 | popular Linux distributions ship with tools to make these changes for you. |
346 | To learn the proper way to configure a network device for your system, | 346 | To learn the proper way to configure a network device for your system, |
347 | refer to your distribution documentation. If during this process you are | 347 | refer to your distribution documentation. If during this process you are |
348 | asked for the driver or module name, the name for the Linux Base Driver | 348 | asked for the driver or module name, the name for the Linux Base Driver |
349 | for the Intel PRO/1000 Family of Adapters is e1000. | 349 | for the Intel PRO/1000 Family of Adapters is e1000. |
350 | 350 | ||
351 | As an example, if you install the e1000 driver for two PRO/1000 adapters | 351 | As an example, if you install the e1000 driver for two PRO/1000 adapters |
352 | (eth0 and eth1) and set the speed and duplex to 10full and 100half, add | 352 | (eth0 and eth1) and set the speed and duplex to 10full and 100half, add |
353 | the following to modules.conf or or modprobe.conf: | 353 | the following to modules.conf or modprobe.conf: |
354 | 354 | ||
355 | alias eth0 e1000 | 355 | alias eth0 e1000 |
356 | alias eth1 e1000 | 356 | alias eth1 e1000 |
357 | options e1000 Speed=10,100 Duplex=2,1 | 357 | options e1000 Speed=10,100 Duplex=2,1 |
358 | 358 | ||
359 | Viewing Link Messages | 359 | Viewing Link Messages |
360 | --------------------- | 360 | --------------------- |
361 | 361 | ||
362 | Link messages will not be displayed to the console if the distribution is | 362 | Link messages will not be displayed to the console if the distribution is |
363 | restricting system messages. In order to see network driver link messages | 363 | restricting system messages. In order to see network driver link messages |
364 | on your console, set dmesg to eight by entering the following: | 364 | on your console, set dmesg to eight by entering the following: |
365 | 365 | ||
366 | dmesg -n 8 | 366 | dmesg -n 8 |
367 | 367 | ||
368 | NOTE: This setting is not saved across reboots. | 368 | NOTE: This setting is not saved across reboots. |
369 | 369 | ||
370 | Jumbo Frames | 370 | Jumbo Frames |
371 | ------------ | 371 | ------------ |
372 | 372 | ||
373 | The driver supports Jumbo Frames for all adapters except 82542 and | 373 | The driver supports Jumbo Frames for all adapters except 82542 and |
374 | 82573-based adapters. Jumbo Frames support is enabled by changing the | 374 | 82573-based adapters. Jumbo Frames support is enabled by changing the |
375 | MTU to a value larger than the default of 1500. Use the ifconfig command | 375 | MTU to a value larger than the default of 1500. Use the ifconfig command |
376 | to increase the MTU size. For example: | 376 | to increase the MTU size. For example: |
377 | 377 | ||
378 | ifconfig eth<x> mtu 9000 up | 378 | ifconfig eth<x> mtu 9000 up |
379 | 379 | ||
380 | This setting is not saved across reboots. It can be made permanent if | 380 | This setting is not saved across reboots. It can be made permanent if |
381 | you add: | 381 | you add: |
382 | 382 | ||
383 | MTU=9000 | 383 | MTU=9000 |
384 | 384 | ||
385 | to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example | 385 | to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example |
386 | applies to the Red Hat distributions; other distributions may store this | 386 | applies to the Red Hat distributions; other distributions may store this |
387 | setting in a different location. | 387 | setting in a different location. |
388 | 388 | ||
389 | Notes: | 389 | Notes: |
390 | 390 | ||
391 | - To enable Jumbo Frames, increase the MTU size on the interface beyond | 391 | - To enable Jumbo Frames, increase the MTU size on the interface beyond |
392 | 1500. | 392 | 1500. |
393 | - The maximum MTU setting for Jumbo Frames is 16110. This value coincides | 393 | - The maximum MTU setting for Jumbo Frames is 16110. This value coincides |
394 | with the maximum Jumbo Frames size of 16128. | 394 | with the maximum Jumbo Frames size of 16128. |
395 | - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or | 395 | - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or |
396 | loss of link. | 396 | loss of link. |
397 | - Some Intel gigabit adapters that support Jumbo Frames have a frame size | 397 | - Some Intel gigabit adapters that support Jumbo Frames have a frame size |
398 | limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes. | 398 | limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes. |
399 | The adapters with this limitation are based on the Intel 82571EB and | 399 | The adapters with this limitation are based on the Intel 82571EB and |
400 | 82572EI controllers, which correspond to these product names: | 400 | 82572EI controllers, which correspond to these product names: |
401 | Intelยฎ PRO/1000 PT Dual Port Server Adapter | 401 | Intelยฎ PRO/1000 PT Dual Port Server Adapter |
402 | Intelยฎ PRO/1000 PF Dual Port Server Adapter | 402 | Intelยฎ PRO/1000 PF Dual Port Server Adapter |
403 | Intelยฎ PRO/1000 PT Server Adapter | 403 | Intelยฎ PRO/1000 PT Server Adapter |
404 | Intelยฎ PRO/1000 PT Desktop Adapter | 404 | Intelยฎ PRO/1000 PT Desktop Adapter |
405 | Intelยฎ PRO/1000 PF Server Adapter | 405 | Intelยฎ PRO/1000 PF Server Adapter |
406 | 406 | ||
407 | - The Intel PRO/1000 PM Network Connection does not support jumbo frames. | 407 | - The Intel PRO/1000 PM Network Connection does not support jumbo frames. |
408 | 408 | ||
409 | 409 | ||
410 | Ethtool | 410 | Ethtool |
411 | ------- | 411 | ------- |
412 | 412 | ||
413 | The driver utilizes the ethtool interface for driver configuration and | 413 | The driver utilizes the ethtool interface for driver configuration and |
414 | diagnostics, as well as displaying statistical information. Ethtool | 414 | diagnostics, as well as displaying statistical information. Ethtool |
415 | version 1.6 or later is required for this functionality. | 415 | version 1.6 or later is required for this functionality. |
416 | 416 | ||
417 | The latest release of ethtool can be found from | 417 | The latest release of ethtool can be found from |
418 | http://sourceforge.net/projects/gkernel. | 418 | http://sourceforge.net/projects/gkernel. |
419 | 419 | ||
420 | NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support | 420 | NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support |
421 | for a more complete ethtool feature set can be enabled by upgrading | 421 | for a more complete ethtool feature set can be enabled by upgrading |
422 | ethtool to ethtool-1.8.1. | 422 | ethtool to ethtool-1.8.1. |
423 | 423 | ||
424 | Enabling Wake on LAN* (WoL) | 424 | Enabling Wake on LAN* (WoL) |
425 | --------------------------- | 425 | --------------------------- |
426 | 426 | ||
427 | WoL is configured through the Ethtool* utility. Ethtool is included with | 427 | WoL is configured through the Ethtool* utility. Ethtool is included with |
428 | all versions of Red Hat after Red Hat 7.2. For other Linux distributions, | 428 | all versions of Red Hat after Red Hat 7.2. For other Linux distributions, |
429 | download and install Ethtool from the following website: | 429 | download and install Ethtool from the following website: |
430 | http://sourceforge.net/projects/gkernel. | 430 | http://sourceforge.net/projects/gkernel. |
431 | 431 | ||
432 | For instructions on enabling WoL with Ethtool, refer to the website listed | 432 | For instructions on enabling WoL with Ethtool, refer to the website listed |
433 | above. | 433 | above. |
434 | 434 | ||
435 | WoL will be enabled on the system during the next shut down or reboot. | 435 | WoL will be enabled on the system during the next shut down or reboot. |
436 | For this driver version, in order to enable WoL, the e1000 driver must be | 436 | For this driver version, in order to enable WoL, the e1000 driver must be |
437 | loaded when shutting down or rebooting the system. | 437 | loaded when shutting down or rebooting the system. |
438 | 438 | ||
439 | NAPI | 439 | NAPI |
440 | ---- | 440 | ---- |
441 | 441 | ||
442 | NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled | 442 | NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled |
443 | or disabled based on the configuration of the kernel. To override | 443 | or disabled based on the configuration of the kernel. To override |
444 | the default, use the following compile-time flags. | 444 | the default, use the following compile-time flags. |
445 | 445 | ||
446 | To enable NAPI, compile the driver module, passing in a configuration option: | 446 | To enable NAPI, compile the driver module, passing in a configuration option: |
447 | 447 | ||
448 | make CFLAGS_EXTRA=-DE1000_NAPI install | 448 | make CFLAGS_EXTRA=-DE1000_NAPI install |
449 | 449 | ||
450 | To disable NAPI, compile the driver module, passing in a configuration option: | 450 | To disable NAPI, compile the driver module, passing in a configuration option: |
451 | 451 | ||
452 | make CFLAGS_EXTRA=-DE1000_NO_NAPI install | 452 | make CFLAGS_EXTRA=-DE1000_NO_NAPI install |
453 | 453 | ||
454 | See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. | 454 | See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. |
455 | 455 | ||
456 | 456 | ||
457 | Known Issues | 457 | Known Issues |
458 | ============ | 458 | ============ |
459 | 459 | ||
460 | Jumbo Frames System Requirement | 460 | Jumbo Frames System Requirement |
461 | ------------------------------- | 461 | ------------------------------- |
462 | 462 | ||
463 | Memory allocation failures have been observed on Linux systems with 64 MB | 463 | Memory allocation failures have been observed on Linux systems with 64 MB |
464 | of RAM or less that are running Jumbo Frames. If you are using Jumbo | 464 | of RAM or less that are running Jumbo Frames. If you are using Jumbo |
465 | Frames, your system may require more than the advertised minimum | 465 | Frames, your system may require more than the advertised minimum |
466 | requirement of 64 MB of system memory. | 466 | requirement of 64 MB of system memory. |
467 | 467 | ||
468 | Performance Degradation with Jumbo Frames | 468 | Performance Degradation with Jumbo Frames |
469 | ----------------------------------------- | 469 | ----------------------------------------- |
470 | 470 | ||
471 | Degradation in throughput performance may be observed in some Jumbo frames | 471 | Degradation in throughput performance may be observed in some Jumbo frames |
472 | environments. If this is observed, increasing the application's socket | 472 | environments. If this is observed, increasing the application's socket |
473 | buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values | 473 | buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values |
474 | may help. See the specific application manual and | 474 | may help. See the specific application manual and |
475 | /usr/src/linux*/Documentation/ | 475 | /usr/src/linux*/Documentation/ |
476 | networking/ip-sysctl.txt for more details. | 476 | networking/ip-sysctl.txt for more details. |
477 | 477 | ||
478 | Jumbo frames on Foundry BigIron 8000 switch | 478 | Jumbo frames on Foundry BigIron 8000 switch |
479 | ------------------------------------------- | 479 | ------------------------------------------- |
480 | There is a known issue using Jumbo frames when connected to a Foundry | 480 | There is a known issue using Jumbo frames when connected to a Foundry |
481 | BigIron 8000 switch. This is a 3rd party limitation. If you experience | 481 | BigIron 8000 switch. This is a 3rd party limitation. If you experience |
482 | loss of packets, lower the MTU size. | 482 | loss of packets, lower the MTU size. |
483 | 483 | ||
484 | Multiple Interfaces on Same Ethernet Broadcast Network | 484 | Multiple Interfaces on Same Ethernet Broadcast Network |
485 | ------------------------------------------------------ | 485 | ------------------------------------------------------ |
486 | 486 | ||
487 | Due to the default ARP behavior on Linux, it is not possible to have | 487 | Due to the default ARP behavior on Linux, it is not possible to have |
488 | one system on two IP networks in the same Ethernet broadcast domain | 488 | one system on two IP networks in the same Ethernet broadcast domain |
489 | (non-partitioned switch) behave as expected. All Ethernet interfaces | 489 | (non-partitioned switch) behave as expected. All Ethernet interfaces |
490 | will respond to IP traffic for any IP address assigned to the system. | 490 | will respond to IP traffic for any IP address assigned to the system. |
491 | This results in unbalanced receive traffic. | 491 | This results in unbalanced receive traffic. |
492 | 492 | ||
493 | If you have multiple interfaces in a server, either turn on ARP | 493 | If you have multiple interfaces in a server, either turn on ARP |
494 | filtering by entering: | 494 | filtering by entering: |
495 | 495 | ||
496 | echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter | 496 | echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter |
497 | (this only works if your kernel's version is higher than 2.4.5), | 497 | (this only works if your kernel's version is higher than 2.4.5), |
498 | 498 | ||
499 | NOTE: This setting is not saved across reboots. The configuration | 499 | NOTE: This setting is not saved across reboots. The configuration |
500 | change can be made permanent by adding the line: | 500 | change can be made permanent by adding the line: |
501 | net.ipv4.conf.all.arp_filter = 1 | 501 | net.ipv4.conf.all.arp_filter = 1 |
502 | to the file /etc/sysctl.conf | 502 | to the file /etc/sysctl.conf |
503 | 503 | ||
504 | or, | 504 | or, |
505 | 505 | ||
506 | install the interfaces in separate broadcast domains (either in | 506 | install the interfaces in separate broadcast domains (either in |
507 | different switches or in a switch partitioned to VLANs). | 507 | different switches or in a switch partitioned to VLANs). |
508 | 508 | ||
509 | 82541/82547 can't link or are slow to link with some link partners | 509 | 82541/82547 can't link or are slow to link with some link partners |
510 | ----------------------------------------------------------------- | 510 | ----------------------------------------------------------------- |
511 | 511 | ||
512 | There is a known compatibility issue with 82541/82547 and some | 512 | There is a known compatibility issue with 82541/82547 and some |
513 | low-end switches where the link will not be established, or will | 513 | low-end switches where the link will not be established, or will |
514 | be slow to establish. In particular, these switches are known to | 514 | be slow to establish. In particular, these switches are known to |
515 | be incompatible with 82541/82547: | 515 | be incompatible with 82541/82547: |
516 | 516 | ||
517 | Planex FXG-08TE | 517 | Planex FXG-08TE |
518 | I-O Data ETG-SH8 | 518 | I-O Data ETG-SH8 |
519 | 519 | ||
520 | To workaround this issue, the driver can be compiled with an override | 520 | To workaround this issue, the driver can be compiled with an override |
521 | of the PHY's master/slave setting. Forcing master or forcing slave | 521 | of the PHY's master/slave setting. Forcing master or forcing slave |
522 | mode will improve time-to-link. | 522 | mode will improve time-to-link. |
523 | 523 | ||
524 | # make EXTRA_CFLAGS=-DE1000_MASTER_SLAVE=<n> | 524 | # make EXTRA_CFLAGS=-DE1000_MASTER_SLAVE=<n> |
525 | 525 | ||
526 | Where <n> is: | 526 | Where <n> is: |
527 | 527 | ||
528 | 0 = Hardware default | 528 | 0 = Hardware default |
529 | 1 = Master mode | 529 | 1 = Master mode |
530 | 2 = Slave mode | 530 | 2 = Slave mode |
531 | 3 = Auto master/slave | 531 | 3 = Auto master/slave |
532 | 532 | ||
533 | Disable rx flow control with ethtool | 533 | Disable rx flow control with ethtool |
534 | ------------------------------------ | 534 | ------------------------------------ |
535 | 535 | ||
536 | In order to disable receive flow control using ethtool, you must turn | 536 | In order to disable receive flow control using ethtool, you must turn |
537 | off auto-negotiation on the same command line. | 537 | off auto-negotiation on the same command line. |
538 | 538 | ||
539 | For example: | 539 | For example: |
540 | 540 | ||
541 | ethtool -A eth? autoneg off rx off | 541 | ethtool -A eth? autoneg off rx off |
542 | 542 | ||
543 | 543 | ||
544 | Support | 544 | Support |
545 | ======= | 545 | ======= |
546 | 546 | ||
547 | For general information, go to the Intel support website at: | 547 | For general information, go to the Intel support website at: |
548 | 548 | ||
549 | http://support.intel.com | 549 | http://support.intel.com |
550 | 550 | ||
551 | or the Intel Wired Networking project hosted by Sourceforge at: | 551 | or the Intel Wired Networking project hosted by Sourceforge at: |
552 | 552 | ||
553 | http://sourceforge.net/projects/e1000 | 553 | http://sourceforge.net/projects/e1000 |
554 | 554 | ||
555 | If an issue is identified with the released source code on the supported | 555 | If an issue is identified with the released source code on the supported |
556 | kernel with a supported adapter, email the specific information related | 556 | kernel with a supported adapter, email the specific information related |
557 | to the issue to e1000-devel@lists.sourceforge.net | 557 | to the issue to e1000-devel@lists.sourceforge.net |
558 | 558 | ||
559 | 559 | ||
560 | License | 560 | License |
561 | ======= | 561 | ======= |
562 | 562 | ||
563 | This software program is released under the terms of a license agreement | 563 | This software program is released under the terms of a license agreement |
564 | between you ('Licensee') and Intel. Do not use or load this software or any | 564 | between you ('Licensee') and Intel. Do not use or load this software or any |
565 | associated materials (collectively, the 'Software') until you have carefully | 565 | associated materials (collectively, the 'Software') until you have carefully |
566 | read the full terms and conditions of the file COPYING located in this software | 566 | read the full terms and conditions of the file COPYING located in this software |
567 | package. By loading or using the Software, you agree to the terms of this | 567 | package. By loading or using the Software, you agree to the terms of this |
568 | Agreement. If you do not agree with the terms of this Agreement, do not | 568 | Agreement. If you do not agree with the terms of this Agreement, do not |
569 | install or use the Software. | 569 | install or use the Software. |
570 | 570 | ||
571 | * Other names and brands may be claimed as the property of others. | 571 | * Other names and brands may be claimed as the property of others. |
572 | 572 |
Documentation/networking/s2io.txt
1 | Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver. | 1 | Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver. |
2 | 2 | ||
3 | Contents | 3 | Contents |
4 | ======= | 4 | ======= |
5 | - 1. Introduction | 5 | - 1. Introduction |
6 | - 2. Identifying the adapter/interface | 6 | - 2. Identifying the adapter/interface |
7 | - 3. Features supported | 7 | - 3. Features supported |
8 | - 4. Command line parameters | 8 | - 4. Command line parameters |
9 | - 5. Performance suggestions | 9 | - 5. Performance suggestions |
10 | - 6. Available Downloads | 10 | - 6. Available Downloads |
11 | 11 | ||
12 | 12 | ||
13 | 1. Introduction: | 13 | 1. Introduction: |
14 | This Linux driver supports Neterion's Xframe I PCI-X 1.0 and | 14 | This Linux driver supports Neterion's Xframe I PCI-X 1.0 and |
15 | Xframe II PCI-X 2.0 adapters. It supports several features | 15 | Xframe II PCI-X 2.0 adapters. It supports several features |
16 | such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on. | 16 | such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on. |
17 | See below for complete list of features. | 17 | See below for complete list of features. |
18 | All features are supported for both IPv4 and IPv6. | 18 | All features are supported for both IPv4 and IPv6. |
19 | 19 | ||
20 | 2. Identifying the adapter/interface: | 20 | 2. Identifying the adapter/interface: |
21 | a. Insert the adapter(s) in your system. | 21 | a. Insert the adapter(s) in your system. |
22 | b. Build and load driver | 22 | b. Build and load driver |
23 | # insmod s2io.ko | 23 | # insmod s2io.ko |
24 | c. View log messages | 24 | c. View log messages |
25 | # dmesg | tail -40 | 25 | # dmesg | tail -40 |
26 | You will see messages similar to: | 26 | You will see messages similar to: |
27 | eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA | 27 | eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA |
28 | eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA | 28 | eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA |
29 | eth4: Device is on 64 bit 133MHz PCIX(M1) bus | 29 | eth4: Device is on 64 bit 133MHz PCIX(M1) bus |
30 | 30 | ||
31 | The above messages identify the adapter type(Xframe I/II), adapter revision, | 31 | The above messages identify the adapter type(Xframe I/II), adapter revision, |
32 | driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X). | 32 | driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X). |
33 | In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed | 33 | In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed |
34 | as well. | 34 | as well. |
35 | 35 | ||
36 | To associate an interface with a physical adapter use "ethtool -p <ethX>". | 36 | To associate an interface with a physical adapter use "ethtool -p <ethX>". |
37 | The corresponding adapter's LED will blink multiple times. | 37 | The corresponding adapter's LED will blink multiple times. |
38 | 38 | ||
39 | 3. Features supported: | 39 | 3. Features supported: |
40 | a. Jumbo frames. Xframe I/II supports MTU upto 9600 bytes, | 40 | a. Jumbo frames. Xframe I/II supports MTU upto 9600 bytes, |
41 | modifiable using ifconfig command. | 41 | modifiable using ifconfig command. |
42 | 42 | ||
43 | b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit | 43 | b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit |
44 | and receive, TSO. | 44 | and receive, TSO. |
45 | 45 | ||
46 | c. Multi-buffer receive mode. Scattering of packet across multiple | 46 | c. Multi-buffer receive mode. Scattering of packet across multiple |
47 | buffers. Currently driver supports 2-buffer mode which yields | 47 | buffers. Currently driver supports 2-buffer mode which yields |
48 | significant performance improvement on certain platforms(SGI Altix, | 48 | significant performance improvement on certain platforms(SGI Altix, |
49 | IBM xSeries). | 49 | IBM xSeries). |
50 | 50 | ||
51 | d. MSI/MSI-X. Can be enabled on platforms which support this feature | 51 | d. MSI/MSI-X. Can be enabled on platforms which support this feature |
52 | (IA64, Xeon) resulting in noticeable performance improvement(upto 7% | 52 | (IA64, Xeon) resulting in noticeable performance improvement(upto 7% |
53 | on certain platforms). | 53 | on certain platforms). |
54 | 54 | ||
55 | e. NAPI. Compile-time option(CONFIG_S2IO_NAPI) for better Rx interrupt | 55 | e. NAPI. Compile-time option(CONFIG_S2IO_NAPI) for better Rx interrupt |
56 | moderation. | 56 | moderation. |
57 | 57 | ||
58 | f. Statistics. Comprehensive MAC-level and software statistics displayed | 58 | f. Statistics. Comprehensive MAC-level and software statistics displayed |
59 | using "ethtool -S" option. | 59 | using "ethtool -S" option. |
60 | 60 | ||
61 | g. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, | 61 | g. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, |
62 | with multiple steering options. | 62 | with multiple steering options. |
63 | 63 | ||
64 | 4. Command line parameters | 64 | 4. Command line parameters |
65 | a. tx_fifo_num | 65 | a. tx_fifo_num |
66 | Number of transmit queues | 66 | Number of transmit queues |
67 | Valid range: 1-8 | 67 | Valid range: 1-8 |
68 | Default: 1 | 68 | Default: 1 |
69 | 69 | ||
70 | b. rx_ring_num | 70 | b. rx_ring_num |
71 | Number of receive rings | 71 | Number of receive rings |
72 | Valid range: 1-8 | 72 | Valid range: 1-8 |
73 | Default: 1 | 73 | Default: 1 |
74 | 74 | ||
75 | c. tx_fifo_len | 75 | c. tx_fifo_len |
76 | Size of each transmit queue | 76 | Size of each transmit queue |
77 | Valid range: Total length of all queues should not exceed 8192 | 77 | Valid range: Total length of all queues should not exceed 8192 |
78 | Default: 4096 | 78 | Default: 4096 |
79 | 79 | ||
80 | d. rx_ring_sz | 80 | d. rx_ring_sz |
81 | Size of each receive ring(in 4K blocks) | 81 | Size of each receive ring(in 4K blocks) |
82 | Valid range: Limited by memory on system | 82 | Valid range: Limited by memory on system |
83 | Default: 30 | 83 | Default: 30 |
84 | 84 | ||
85 | e. intr_type | 85 | e. intr_type |
86 | Specifies interrupt type. Possible values 1(INTA), 2(MSI), 3(MSI-X) | 86 | Specifies interrupt type. Possible values 1(INTA), 2(MSI), 3(MSI-X) |
87 | Valid range: 1-3 | 87 | Valid range: 1-3 |
88 | Default: 1 | 88 | Default: 1 |
89 | 89 | ||
90 | 5. Performance suggestions | 90 | 5. Performance suggestions |
91 | General: | 91 | General: |
92 | a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration) | 92 | a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration) |
93 | b. Set TCP windows size to optimal value. | 93 | b. Set TCP windows size to optimal value. |
94 | For instance, for MTU=1500 a value of 210K has been observed to result in | 94 | For instance, for MTU=1500 a value of 210K has been observed to result in |
95 | good performance. | 95 | good performance. |
96 | # sysctl -w net.ipv4.tcp_rmem="210000 210000 210000" | 96 | # sysctl -w net.ipv4.tcp_rmem="210000 210000 210000" |
97 | # sysctl -w net.ipv4.tcp_wmem="210000 210000 210000" | 97 | # sysctl -w net.ipv4.tcp_wmem="210000 210000 210000" |
98 | For MTU=9000, TCP window size of 10 MB is recommended. | 98 | For MTU=9000, TCP window size of 10 MB is recommended. |
99 | # sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" | 99 | # sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" |
100 | # sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" | 100 | # sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" |
101 | 101 | ||
102 | Transmit performance: | 102 | Transmit performance: |
103 | a. By default, the driver respects BIOS settings for PCI bus parameters. | 103 | a. By default, the driver respects BIOS settings for PCI bus parameters. |
104 | However, you may want to experiment with PCI bus parameters | 104 | However, you may want to experiment with PCI bus parameters |
105 | max-split-transactions(MOST) and MMRBC (use setpci command). | 105 | max-split-transactions(MOST) and MMRBC (use setpci command). |
106 | A MOST value of 2 has been found optimal for Opterons and 3 for Itanium. | 106 | A MOST value of 2 has been found optimal for Opterons and 3 for Itanium. |
107 | It could be different for your hardware. | 107 | It could be different for your hardware. |
108 | Set MMRBC to 4K**. | 108 | Set MMRBC to 4K**. |
109 | 109 | ||
110 | For example you can set | 110 | For example you can set |
111 | For opteron | 111 | For opteron |
112 | #setpci -d 17d5:* 62=1d | 112 | #setpci -d 17d5:* 62=1d |
113 | For Itanium | 113 | For Itanium |
114 | #setpci -d 17d5:* 62=3d | 114 | #setpci -d 17d5:* 62=3d |
115 | 115 | ||
116 | For detailed description of the PCI registers, please see Xframe User Guide. | 116 | For detailed description of the PCI registers, please see Xframe User Guide. |
117 | 117 | ||
118 | b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this | 118 | b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this |
119 | parameter. | 119 | parameter. |
120 | c. Turn on TSO(using "ethtool -K") | 120 | c. Turn on TSO(using "ethtool -K") |
121 | # ethtool -K <ethX> tso on | 121 | # ethtool -K <ethX> tso on |
122 | 122 | ||
123 | Receive performance: | 123 | Receive performance: |
124 | a. By default, the driver respects BIOS settings for PCI bus parameters. | 124 | a. By default, the driver respects BIOS settings for PCI bus parameters. |
125 | However, you may want to set PCI latency timer to 248. | 125 | However, you may want to set PCI latency timer to 248. |
126 | #setpci -d 17d5:* LATENCY_TIMER=f8 | 126 | #setpci -d 17d5:* LATENCY_TIMER=f8 |
127 | For detailed description of the PCI registers, please see Xframe User Guide. | 127 | For detailed description of the PCI registers, please see Xframe User Guide. |
128 | b. Use 2-buffer mode. This results in large performance boost on | 128 | b. Use 2-buffer mode. This results in large performance boost on |
129 | on certain platforms(eg. SGI Altix, IBM xSeries). | 129 | certain platforms(eg. SGI Altix, IBM xSeries). |
130 | c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to | 130 | c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to |
131 | set/verify this option. | 131 | set/verify this option. |
132 | d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network | 132 | d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network |
133 | device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to | 133 | device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to |
134 | bring down CPU utilization. | 134 | bring down CPU utilization. |
135 | 135 | ||
136 | ** For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are | 136 | ** For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are |
137 | recommended as safe parameters. | 137 | recommended as safe parameters. |
138 | For more information, please review the AMD8131 errata at | 138 | For more information, please review the AMD8131 errata at |
139 | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26310.pdf | 139 | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26310.pdf |
140 | 140 | ||
141 | 6. Available Downloads | 141 | 6. Available Downloads |
142 | Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up | 142 | Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up |
143 | to date, also the latest "s2io" code (including support for 2.4 kernels) is | 143 | to date, also the latest "s2io" code (including support for 2.4 kernels) is |
144 | available via "Support" link on the Neterion site: http://www.neterion.com. | 144 | available via "Support" link on the Neterion site: http://www.neterion.com. |
145 | 145 | ||
146 | For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com, | 146 | For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com, |
147 | user: linuxdocs password: HALdocs | 147 | user: linuxdocs password: HALdocs |
148 | 148 | ||
149 | 7. Support | 149 | 7. Support |
150 | For further support please contact either your 10GbE Xframe NIC vendor (IBM, | 150 | For further support please contact either your 10GbE Xframe NIC vendor (IBM, |
151 | HP, SGI etc.) or click on the "Support" link on the Neterion site: | 151 | HP, SGI etc.) or click on the "Support" link on the Neterion site: |
152 | http://www.neterion.com. | 152 | http://www.neterion.com. |
153 | 153 | ||
154 | 154 |
Documentation/networking/sk98lin.txt
1 | (C)Copyright 1999-2004 Marvell(R). | 1 | (C)Copyright 1999-2004 Marvell(R). |
2 | All rights reserved | 2 | All rights reserved |
3 | =========================================================================== | 3 | =========================================================================== |
4 | 4 | ||
5 | sk98lin.txt created 13-Feb-2004 | 5 | sk98lin.txt created 13-Feb-2004 |
6 | 6 | ||
7 | Readme File for sk98lin v6.23 | 7 | Readme File for sk98lin v6.23 |
8 | Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX | 8 | Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX |
9 | 9 | ||
10 | This file contains | 10 | This file contains |
11 | 1 Overview | 11 | 1 Overview |
12 | 2 Required Files | 12 | 2 Required Files |
13 | 3 Installation | 13 | 3 Installation |
14 | 3.1 Driver Installation | 14 | 3.1 Driver Installation |
15 | 3.2 Inclusion of adapter at system start | 15 | 3.2 Inclusion of adapter at system start |
16 | 4 Driver Parameters | 16 | 4 Driver Parameters |
17 | 4.1 Per-Port Parameters | 17 | 4.1 Per-Port Parameters |
18 | 4.2 Adapter Parameters | 18 | 4.2 Adapter Parameters |
19 | 5 Large Frame Support | 19 | 5 Large Frame Support |
20 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | 20 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) |
21 | 7 Troubleshooting | 21 | 7 Troubleshooting |
22 | 22 | ||
23 | =========================================================================== | 23 | =========================================================================== |
24 | 24 | ||
25 | 25 | ||
26 | 1 Overview | 26 | 1 Overview |
27 | =========== | 27 | =========== |
28 | 28 | ||
29 | The sk98lin driver supports the Marvell Yukon and SysKonnect | 29 | The sk98lin driver supports the Marvell Yukon and SysKonnect |
30 | SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has | 30 | SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has |
31 | been tested with Linux on Intel/x86 machines. | 31 | been tested with Linux on Intel/x86 machines. |
32 | *** | 32 | *** |
33 | 33 | ||
34 | 34 | ||
35 | 2 Required Files | 35 | 2 Required Files |
36 | ================= | 36 | ================= |
37 | 37 | ||
38 | The linux kernel source. | 38 | The linux kernel source. |
39 | No additional files required. | 39 | No additional files required. |
40 | *** | 40 | *** |
41 | 41 | ||
42 | 42 | ||
43 | 3 Installation | 43 | 3 Installation |
44 | =============== | 44 | =============== |
45 | 45 | ||
46 | It is recommended to download the latest version of the driver from the | 46 | It is recommended to download the latest version of the driver from the |
47 | SysKonnect web site www.syskonnect.com. If you have downloaded the latest | 47 | SysKonnect web site www.syskonnect.com. If you have downloaded the latest |
48 | driver, the Linux kernel has to be patched before the driver can be | 48 | driver, the Linux kernel has to be patched before the driver can be |
49 | installed. For details on how to patch a Linux kernel, refer to the | 49 | installed. For details on how to patch a Linux kernel, refer to the |
50 | patch.txt file. | 50 | patch.txt file. |
51 | 51 | ||
52 | 3.1 Driver Installation | 52 | 3.1 Driver Installation |
53 | ------------------------ | 53 | ------------------------ |
54 | 54 | ||
55 | The following steps describe the actions that are required to install | 55 | The following steps describe the actions that are required to install |
56 | the driver and to start it manually. These steps should be carried | 56 | the driver and to start it manually. These steps should be carried |
57 | out for the initial driver setup. Once confirmed to be ok, they can | 57 | out for the initial driver setup. Once confirmed to be ok, they can |
58 | be included in the system start. | 58 | be included in the system start. |
59 | 59 | ||
60 | NOTE 1: To perform the following tasks you need 'root' access. | 60 | NOTE 1: To perform the following tasks you need 'root' access. |
61 | 61 | ||
62 | NOTE 2: In case of problems, please read the section "Troubleshooting" | 62 | NOTE 2: In case of problems, please read the section "Troubleshooting" |
63 | below. | 63 | below. |
64 | 64 | ||
65 | The driver can either be integrated into the kernel or it can be compiled | 65 | The driver can either be integrated into the kernel or it can be compiled |
66 | as a module. Select the appropriate option during the kernel | 66 | as a module. Select the appropriate option during the kernel |
67 | configuration. | 67 | configuration. |
68 | 68 | ||
69 | Compile/use the driver as a module | 69 | Compile/use the driver as a module |
70 | ---------------------------------- | 70 | ---------------------------------- |
71 | To compile the driver, go to the directory /usr/src/linux and | 71 | To compile the driver, go to the directory /usr/src/linux and |
72 | execute the command "make menuconfig" or "make xconfig" and proceed as | 72 | execute the command "make menuconfig" or "make xconfig" and proceed as |
73 | follows: | 73 | follows: |
74 | 74 | ||
75 | To integrate the driver permanently into the kernel, proceed as follows: | 75 | To integrate the driver permanently into the kernel, proceed as follows: |
76 | 76 | ||
77 | 1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | 77 | 1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" |
78 | 2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | 78 | 2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" |
79 | with (*) | 79 | with (*) |
80 | 3. Build a new kernel when the configuration of the above options is | 80 | 3. Build a new kernel when the configuration of the above options is |
81 | finished. | 81 | finished. |
82 | 4. Install the new kernel. | 82 | 4. Install the new kernel. |
83 | 5. Reboot your system. | 83 | 5. Reboot your system. |
84 | 84 | ||
85 | To use the driver as a module, proceed as follows: | 85 | To use the driver as a module, proceed as follows: |
86 | 86 | ||
87 | 1. Enable 'loadable module support' in the kernel. | 87 | 1. Enable 'loadable module support' in the kernel. |
88 | 2. For automatic driver start, enable the 'Kernel module loader'. | 88 | 2. For automatic driver start, enable the 'Kernel module loader'. |
89 | 3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | 89 | 3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" |
90 | 4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | 90 | 4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" |
91 | with (M) | 91 | with (M) |
92 | 5. Execute the command "make modules". | 92 | 5. Execute the command "make modules". |
93 | 6. Execute the command "make modules_install". | 93 | 6. Execute the command "make modules_install". |
94 | The appropriate modules will be installed. | 94 | The appropriate modules will be installed. |
95 | 7. Reboot your system. | 95 | 7. Reboot your system. |
96 | 96 | ||
97 | 97 | ||
98 | Load the module manually | 98 | Load the module manually |
99 | ------------------------ | 99 | ------------------------ |
100 | To load the module manually, proceed as follows: | 100 | To load the module manually, proceed as follows: |
101 | 101 | ||
102 | 1. Enter "modprobe sk98lin". | 102 | 1. Enter "modprobe sk98lin". |
103 | 2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in | 103 | 2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in |
104 | your computer and you have a /proc file system, execute the command: | 104 | your computer and you have a /proc file system, execute the command: |
105 | "ls /proc/net/sk98lin/" | 105 | "ls /proc/net/sk98lin/" |
106 | This should produce an output containing a line with the following | 106 | This should produce an output containing a line with the following |
107 | format: | 107 | format: |
108 | eth0 eth1 ... | 108 | eth0 eth1 ... |
109 | which indicates that your adapter has been found and initialized. | 109 | which indicates that your adapter has been found and initialized. |
110 | 110 | ||
111 | NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx | 111 | NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx |
112 | adapter installed, the adapters will be listed as 'eth0', | 112 | adapter installed, the adapters will be listed as 'eth0', |
113 | 'eth1', 'eth2', etc. | 113 | 'eth1', 'eth2', etc. |
114 | For each adapter, repeat steps 3 and 4 below. | 114 | For each adapter, repeat steps 3 and 4 below. |
115 | 115 | ||
116 | NOTE 2: If you have other Ethernet adapters installed, your Marvell | 116 | NOTE 2: If you have other Ethernet adapters installed, your Marvell |
117 | Yukon or SysKonnect SK-98xx adapter will be mapped to the | 117 | Yukon or SysKonnect SK-98xx adapter will be mapped to the |
118 | next available number, e.g. 'eth1'. The mapping is executed | 118 | next available number, e.g. 'eth1'. The mapping is executed |
119 | automatically. | 119 | automatically. |
120 | The module installation message (displayed either in a system | 120 | The module installation message (displayed either in a system |
121 | log file or on the console) prints a line for each adapter | 121 | log file or on the console) prints a line for each adapter |
122 | found containing the corresponding 'ethX'. | 122 | found containing the corresponding 'ethX'. |
123 | 123 | ||
124 | 3. Select an IP address and assign it to the respective adapter by | 124 | 3. Select an IP address and assign it to the respective adapter by |
125 | entering: | 125 | entering: |
126 | ifconfig eth0 <ip-address> | 126 | ifconfig eth0 <ip-address> |
127 | With this command, the adapter is connected to the Ethernet. | 127 | With this command, the adapter is connected to the Ethernet. |
128 | 128 | ||
129 | SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter | 129 | SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter |
130 | is now active, the link status LED of the primary port is active and | 130 | is now active, the link status LED of the primary port is active and |
131 | the link status LED of the secondary port (on dual port adapters) is | 131 | the link status LED of the secondary port (on dual port adapters) is |
132 | blinking (if the ports are connected to a switch or hub). | 132 | blinking (if the ports are connected to a switch or hub). |
133 | SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. | 133 | SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. |
134 | In addition, you will receive a status message on the console stating | 134 | In addition, you will receive a status message on the console stating |
135 | "ethX: network connection up using port Y" and showing the selected | 135 | "ethX: network connection up using port Y" and showing the selected |
136 | connection parameters (x stands for the ethernet device number | 136 | connection parameters (x stands for the ethernet device number |
137 | (0,1,2, etc), y stands for the port name (A or B)). | 137 | (0,1,2, etc), y stands for the port name (A or B)). |
138 | 138 | ||
139 | NOTE: If you are in doubt about IP addresses, ask your network | 139 | NOTE: If you are in doubt about IP addresses, ask your network |
140 | administrator for assistance. | 140 | administrator for assistance. |
141 | 141 | ||
142 | 4. Your adapter should now be fully operational. | 142 | 4. Your adapter should now be fully operational. |
143 | Use 'ping <otherstation>' to verify the connection to other computers | 143 | Use 'ping <otherstation>' to verify the connection to other computers |
144 | on your network. | 144 | on your network. |
145 | 5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. | 145 | 5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. |
146 | For example by executing: | 146 | For example by executing: |
147 | "cat /proc/net/sk98lin/eth0" | 147 | "cat /proc/net/sk98lin/eth0" |
148 | 148 | ||
149 | Unload the module | 149 | Unload the module |
150 | ----------------- | 150 | ----------------- |
151 | To stop and unload the driver modules, proceed as follows: | 151 | To stop and unload the driver modules, proceed as follows: |
152 | 152 | ||
153 | 1. Execute the command "ifconfig eth0 down". | 153 | 1. Execute the command "ifconfig eth0 down". |
154 | 2. Execute the command "rmmod sk98lin". | 154 | 2. Execute the command "rmmod sk98lin". |
155 | 155 | ||
156 | 3.2 Inclusion of adapter at system start | 156 | 3.2 Inclusion of adapter at system start |
157 | ----------------------------------------- | 157 | ----------------------------------------- |
158 | 158 | ||
159 | Since a large number of different Linux distributions are | 159 | Since a large number of different Linux distributions are |
160 | available, we are unable to describe a general installation procedure | 160 | available, we are unable to describe a general installation procedure |
161 | for the driver module. | 161 | for the driver module. |
162 | Because the driver is now integrated in the kernel, installation should | 162 | Because the driver is now integrated in the kernel, installation should |
163 | be easy, using the standard mechanism of your distribution. | 163 | be easy, using the standard mechanism of your distribution. |
164 | Refer to the distribution's manual for installation of ethernet adapters. | 164 | Refer to the distribution's manual for installation of ethernet adapters. |
165 | 165 | ||
166 | *** | 166 | *** |
167 | 167 | ||
168 | 4 Driver Parameters | 168 | 4 Driver Parameters |
169 | ==================== | 169 | ==================== |
170 | 170 | ||
171 | Parameters can be set at the command line after the module has been | 171 | Parameters can be set at the command line after the module has been |
172 | loaded with the command 'modprobe'. | 172 | loaded with the command 'modprobe'. |
173 | In some distributions, the configuration tools are able to pass parameters | 173 | In some distributions, the configuration tools are able to pass parameters |
174 | to the driver module. | 174 | to the driver module. |
175 | 175 | ||
176 | If you use the kernel module loader, you can set driver parameters | 176 | If you use the kernel module loader, you can set driver parameters |
177 | in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). | 177 | in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). |
178 | To set the driver parameters in this file, proceed as follows: | 178 | To set the driver parameters in this file, proceed as follows: |
179 | 179 | ||
180 | 1. Insert a line of the form : | 180 | 1. Insert a line of the form : |
181 | options sk98lin ... | 181 | options sk98lin ... |
182 | For "...", the same syntax is required as described for the command | 182 | For "...", the same syntax is required as described for the command |
183 | line parameters of modprobe below. | 183 | line parameters of modprobe below. |
184 | 2. To activate the new parameters, either reboot your computer | 184 | 2. To activate the new parameters, either reboot your computer |
185 | or | 185 | or |
186 | unload and reload the driver. | 186 | unload and reload the driver. |
187 | The syntax of the driver parameters is: | 187 | The syntax of the driver parameters is: |
188 | 188 | ||
189 | modprobe sk98lin parameter=value1[,value2[,value3...]] | 189 | modprobe sk98lin parameter=value1[,value2[,value3...]] |
190 | 190 | ||
191 | where value1 refers to the first adapter, value2 to the second etc. | 191 | where value1 refers to the first adapter, value2 to the second etc. |
192 | 192 | ||
193 | NOTE: All parameters are case sensitive. Write them exactly as shown | 193 | NOTE: All parameters are case sensitive. Write them exactly as shown |
194 | below. | 194 | below. |
195 | 195 | ||
196 | Example: | 196 | Example: |
197 | Suppose you have two adapters. You want to set auto-negotiation | 197 | Suppose you have two adapters. You want to set auto-negotiation |
198 | on the first adapter to ON and on the second adapter to OFF. | 198 | on the first adapter to ON and on the second adapter to OFF. |
199 | You also want to set DuplexCapabilities on the first adapter | 199 | You also want to set DuplexCapabilities on the first adapter |
200 | to FULL, and on the second adapter to HALF. | 200 | to FULL, and on the second adapter to HALF. |
201 | Then, you must enter: | 201 | Then, you must enter: |
202 | 202 | ||
203 | modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half | 203 | modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half |
204 | 204 | ||
205 | NOTE: The number of adapters that can be configured this way is | 205 | NOTE: The number of adapters that can be configured this way is |
206 | limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). | 206 | limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). |
207 | The current limit is 16. If you happen to install | 207 | The current limit is 16. If you happen to install |
208 | more adapters, adjust this and recompile. | 208 | more adapters, adjust this and recompile. |
209 | 209 | ||
210 | 210 | ||
211 | 4.1 Per-Port Parameters | 211 | 4.1 Per-Port Parameters |
212 | ------------------------ | 212 | ------------------------ |
213 | 213 | ||
214 | These settings are available for each port on the adapter. | 214 | These settings are available for each port on the adapter. |
215 | In the following description, '?' stands for the port for | 215 | In the following description, '?' stands for the port for |
216 | which you set the parameter (A or B). | 216 | which you set the parameter (A or B). |
217 | 217 | ||
218 | Speed | 218 | Speed |
219 | ----- | 219 | ----- |
220 | Parameter: Speed_? | 220 | Parameter: Speed_? |
221 | Values: 10, 100, 1000, Auto | 221 | Values: 10, 100, 1000, Auto |
222 | Default: Auto | 222 | Default: Auto |
223 | 223 | ||
224 | This parameter is used to set the speed capabilities. It is only valid | 224 | This parameter is used to set the speed capabilities. It is only valid |
225 | for the SK-98xx V2.0 copper adapters. | 225 | for the SK-98xx V2.0 copper adapters. |
226 | Usually, the speed is negotiated between the two ports during link | 226 | Usually, the speed is negotiated between the two ports during link |
227 | establishment. If this fails, a port can be forced to a specific setting | 227 | establishment. If this fails, a port can be forced to a specific setting |
228 | with this parameter. | 228 | with this parameter. |
229 | 229 | ||
230 | Auto-Negotiation | 230 | Auto-Negotiation |
231 | ---------------- | 231 | ---------------- |
232 | Parameter: AutoNeg_? | 232 | Parameter: AutoNeg_? |
233 | Values: On, Off, Sense | 233 | Values: On, Off, Sense |
234 | Default: On | 234 | Default: On |
235 | 235 | ||
236 | The "Sense"-mode automatically detects whether the link partner supports | 236 | The "Sense"-mode automatically detects whether the link partner supports |
237 | auto-negotiation or not. | 237 | auto-negotiation or not. |
238 | 238 | ||
239 | Duplex Capabilities | 239 | Duplex Capabilities |
240 | ------------------- | 240 | ------------------- |
241 | Parameter: DupCap_? | 241 | Parameter: DupCap_? |
242 | Values: Half, Full, Both | 242 | Values: Half, Full, Both |
243 | Default: Both | 243 | Default: Both |
244 | 244 | ||
245 | This parameters is only relevant if auto-negotiation for this port is | 245 | This parameters is only relevant if auto-negotiation for this port is |
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | 246 | not set to "Sense". If auto-negotiation is set to "On", all three values |
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | 247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. |
248 | This parameter is useful if your link partner does not support all | 248 | This parameter is useful if your link partner does not support all |
249 | possible combinations. | 249 | possible combinations. |
250 | 250 | ||
251 | Flow Control | 251 | Flow Control |
252 | ------------ | 252 | ------------ |
253 | Parameter: FlowCtrl_? | 253 | Parameter: FlowCtrl_? |
254 | Values: Sym, SymOrRem, LocSend, None | 254 | Values: Sym, SymOrRem, LocSend, None |
255 | Default: SymOrRem | 255 | Default: SymOrRem |
256 | 256 | ||
257 | This parameter can be used to set the flow control capabilities the | 257 | This parameter can be used to set the flow control capabilities the |
258 | port reports during auto-negotiation. It can be set for each port | 258 | port reports during auto-negotiation. It can be set for each port |
259 | individually. | 259 | individually. |
260 | Possible modes: | 260 | Possible modes: |
261 | -- Sym = Symmetric: both link partners are allowed to send | 261 | -- Sym = Symmetric: both link partners are allowed to send |
262 | PAUSE frames | 262 | PAUSE frames |
263 | -- SymOrRem = SymmetricOrRemote: both or only remote partner | 263 | -- SymOrRem = SymmetricOrRemote: both or only remote partner |
264 | are allowed to send PAUSE frames | 264 | are allowed to send PAUSE frames |
265 | -- LocSend = LocalSend: only local link partner is allowed | 265 | -- LocSend = LocalSend: only local link partner is allowed |
266 | to send PAUSE frames | 266 | to send PAUSE frames |
267 | -- None = no link partner is allowed to send PAUSE frames | 267 | -- None = no link partner is allowed to send PAUSE frames |
268 | 268 | ||
269 | NOTE: This parameter is ignored if auto-negotiation is set to "Off". | 269 | NOTE: This parameter is ignored if auto-negotiation is set to "Off". |
270 | 270 | ||
271 | Role in Master-Slave-Negotiation (1000Base-T only) | 271 | Role in Master-Slave-Negotiation (1000Base-T only) |
272 | -------------------------------------------------- | 272 | -------------------------------------------------- |
273 | Parameter: Role_? | 273 | Parameter: Role_? |
274 | Values: Auto, Master, Slave | 274 | Values: Auto, Master, Slave |
275 | Default: Auto | 275 | Default: Auto |
276 | 276 | ||
277 | This parameter is only valid for the SK-9821 and SK-9822 adapters. | 277 | This parameter is only valid for the SK-9821 and SK-9822 adapters. |
278 | For two 1000Base-T ports to communicate, one must take the role of the | 278 | For two 1000Base-T ports to communicate, one must take the role of the |
279 | master (providing timing information), while the other must be the | 279 | master (providing timing information), while the other must be the |
280 | slave. Usually, this is negotiated between the two ports during link | 280 | slave. Usually, this is negotiated between the two ports during link |
281 | establishment. If this fails, a port can be forced to a specific setting | 281 | establishment. If this fails, a port can be forced to a specific setting |
282 | with this parameter. | 282 | with this parameter. |
283 | 283 | ||
284 | 284 | ||
285 | 4.2 Adapter Parameters | 285 | 4.2 Adapter Parameters |
286 | ----------------------- | 286 | ----------------------- |
287 | 287 | ||
288 | Connection Type (SK-98xx V2.0 copper adapters only) | 288 | Connection Type (SK-98xx V2.0 copper adapters only) |
289 | --------------- | 289 | --------------- |
290 | Parameter: ConType | 290 | Parameter: ConType |
291 | Values: Auto, 100FD, 100HD, 10FD, 10HD | 291 | Values: Auto, 100FD, 100HD, 10FD, 10HD |
292 | Default: Auto | 292 | Default: Auto |
293 | 293 | ||
294 | The parameter 'ConType' is a combination of all five per-port parameters | 294 | The parameter 'ConType' is a combination of all five per-port parameters |
295 | within one single parameter. This simplifies the configuration of both ports | 295 | within one single parameter. This simplifies the configuration of both ports |
296 | of an adapter card! The different values of this variable reflect the most | 296 | of an adapter card! The different values of this variable reflect the most |
297 | meaningful combinations of port parameters. | 297 | meaningful combinations of port parameters. |
298 | 298 | ||
299 | The following table shows the values of 'ConType' and the corresponding | 299 | The following table shows the values of 'ConType' and the corresponding |
300 | combinations of the per-port parameters: | 300 | combinations of the per-port parameters: |
301 | 301 | ||
302 | ConType | DupCap AutoNeg FlowCtrl Role Speed | 302 | ConType | DupCap AutoNeg FlowCtrl Role Speed |
303 | ----------+------------------------------------------------------ | 303 | ----------+------------------------------------------------------ |
304 | Auto | Both On SymOrRem Auto Auto | 304 | Auto | Both On SymOrRem Auto Auto |
305 | 100FD | Full Off None Auto (ignored) 100 | 305 | 100FD | Full Off None Auto (ignored) 100 |
306 | 100HD | Half Off None Auto (ignored) 100 | 306 | 100HD | Half Off None Auto (ignored) 100 |
307 | 10FD | Full Off None Auto (ignored) 10 | 307 | 10FD | Full Off None Auto (ignored) 10 |
308 | 10HD | Half Off None Auto (ignored) 10 | 308 | 10HD | Half Off None Auto (ignored) 10 |
309 | 309 | ||
310 | Stating any other port parameter together with this 'ConType' variable | 310 | Stating any other port parameter together with this 'ConType' variable |
311 | will result in a merged configuration of those settings. This due to | 311 | will result in a merged configuration of those settings. This due to |
312 | the fact, that the per-port parameters (e.g. Speed_? ) have a higher | 312 | the fact, that the per-port parameters (e.g. Speed_? ) have a higher |
313 | priority than the combined variable 'ConType'. | 313 | priority than the combined variable 'ConType'. |
314 | 314 | ||
315 | NOTE: This parameter is always used on both ports of the adapter card. | 315 | NOTE: This parameter is always used on both ports of the adapter card. |
316 | 316 | ||
317 | Interrupt Moderation | 317 | Interrupt Moderation |
318 | -------------------- | 318 | -------------------- |
319 | Parameter: Moderation | 319 | Parameter: Moderation |
320 | Values: None, Static, Dynamic | 320 | Values: None, Static, Dynamic |
321 | Default: None | 321 | Default: None |
322 | 322 | ||
323 | Interrupt moderation is employed to limit the maximum number of interrupts | 323 | Interrupt moderation is employed to limit the maximum number of interrupts |
324 | the driver has to serve. That is, one or more interrupts (which indicate any | 324 | the driver has to serve. That is, one or more interrupts (which indicate any |
325 | transmit or receive packet to be processed) are queued until the driver | 325 | transmit or receive packet to be processed) are queued until the driver |
326 | processes them. When queued interrupts are to be served, is determined by the | 326 | processes them. When queued interrupts are to be served, is determined by the |
327 | 'IntsPerSec' parameter, which is explained later below. | 327 | 'IntsPerSec' parameter, which is explained later below. |
328 | 328 | ||
329 | Possible modes: | 329 | Possible modes: |
330 | 330 | ||
331 | -- None - No interrupt moderation is applied on the adapter card. | 331 | -- None - No interrupt moderation is applied on the adapter card. |
332 | Therefore, each transmit or receive interrupt is served immediately | 332 | Therefore, each transmit or receive interrupt is served immediately |
333 | as soon as it appears on the interrupt line of the adapter card. | 333 | as soon as it appears on the interrupt line of the adapter card. |
334 | 334 | ||
335 | -- Static - Interrupt moderation is applied on the adapter card. | 335 | -- Static - Interrupt moderation is applied on the adapter card. |
336 | All transmit and receive interrupts are queued until a complete | 336 | All transmit and receive interrupts are queued until a complete |
337 | moderation interval ends. If such a moderation interval ends, all | 337 | moderation interval ends. If such a moderation interval ends, all |
338 | queued interrupts are processed in one big bunch without any delay. | 338 | queued interrupts are processed in one big bunch without any delay. |
339 | The term 'static' reflects the fact, that interrupt moderation is | 339 | The term 'static' reflects the fact, that interrupt moderation is |
340 | always enabled, regardless how much network load is currently | 340 | always enabled, regardless how much network load is currently |
341 | passing via a particular interface. In addition, the duration of | 341 | passing via a particular interface. In addition, the duration of |
342 | the moderation interval has a fixed length that never changes while | 342 | the moderation interval has a fixed length that never changes while |
343 | the driver is operational. | 343 | the driver is operational. |
344 | 344 | ||
345 | -- Dynamic - Interrupt moderation might be applied on the adapter card, | 345 | -- Dynamic - Interrupt moderation might be applied on the adapter card, |
346 | depending on the load of the system. If the driver detects that the | 346 | depending on the load of the system. If the driver detects that the |
347 | system load is too high, the driver tries to shield the system against | 347 | system load is too high, the driver tries to shield the system against |
348 | too much network load by enabling interrupt moderation. If - at a later | 348 | too much network load by enabling interrupt moderation. If - at a later |
349 | time - the CPU utilizaton decreases again (or if the network load is | 349 | time - the CPU utilizaton decreases again (or if the network load is |
350 | negligible) the interrupt moderation will automatically be disabled. | 350 | negligible) the interrupt moderation will automatically be disabled. |
351 | 351 | ||
352 | Interrupt moderation should be used when the driver has to handle one or more | 352 | Interrupt moderation should be used when the driver has to handle one or more |
353 | interfaces with a high network load, which - as a consequence - leads also to a | 353 | interfaces with a high network load, which - as a consequence - leads also to a |
354 | high CPU utilization. When moderation is applied in such high network load | 354 | high CPU utilization. When moderation is applied in such high network load |
355 | situations, CPU load might be reduced by 20-30%. | 355 | situations, CPU load might be reduced by 20-30%. |
356 | 356 | ||
357 | NOTE: The drawback of using interrupt moderation is an increase of the round- | 357 | NOTE: The drawback of using interrupt moderation is an increase of the round- |
358 | trip-time (RTT), due to the queueing and serving of interrupts at dedicated | 358 | trip-time (RTT), due to the queueing and serving of interrupts at dedicated |
359 | moderation times. | 359 | moderation times. |
360 | 360 | ||
361 | Interrupts per second | 361 | Interrupts per second |
362 | --------------------- | 362 | --------------------- |
363 | Parameter: IntsPerSec | 363 | Parameter: IntsPerSec |
364 | Values: 30...40000 (interrupts per second) | 364 | Values: 30...40000 (interrupts per second) |
365 | Default: 2000 | 365 | Default: 2000 |
366 | 366 | ||
367 | This parameter is only used if either static or dynamic interrupt moderation | 367 | This parameter is only used if either static or dynamic interrupt moderation |
368 | is used on a network adapter card. Using this parameter if no moderation is | 368 | is used on a network adapter card. Using this parameter if no moderation is |
369 | applied will lead to no action performed. | 369 | applied will lead to no action performed. |
370 | 370 | ||
371 | This parameter determines the length of any interrupt moderation interval. | 371 | This parameter determines the length of any interrupt moderation interval. |
372 | Assuming that static interrupt moderation is to be used, an 'IntsPerSec' | 372 | Assuming that static interrupt moderation is to be used, an 'IntsPerSec' |
373 | parameter value of 2000 will lead to an interrupt moderation interval of | 373 | parameter value of 2000 will lead to an interrupt moderation interval of |
374 | 500 microseconds. | 374 | 500 microseconds. |
375 | 375 | ||
376 | NOTE: The duration of the moderation interval is to be chosen with care. | 376 | NOTE: The duration of the moderation interval is to be chosen with care. |
377 | At first glance, selecting a very long duration (e.g. only 100 interrupts per | 377 | At first glance, selecting a very long duration (e.g. only 100 interrupts per |
378 | second) seems to be meaningful, but the increase of packet-processing delay | 378 | second) seems to be meaningful, but the increase of packet-processing delay |
379 | is tremendous. On the other hand, selecting a very short moderation time might | 379 | is tremendous. On the other hand, selecting a very short moderation time might |
380 | compensate the use of any moderation being applied. | 380 | compensate the use of any moderation being applied. |
381 | 381 | ||
382 | 382 | ||
383 | Preferred Port | 383 | Preferred Port |
384 | -------------- | 384 | -------------- |
385 | Parameter: PrefPort | 385 | Parameter: PrefPort |
386 | Values: A, B | 386 | Values: A, B |
387 | Default: A | 387 | Default: A |
388 | 388 | ||
389 | This is used to force the preferred port to A or B (on dual-port network | 389 | This is used to force the preferred port to A or B (on dual-port network |
390 | adapters). The preferred port is the one that is used if both are detected | 390 | adapters). The preferred port is the one that is used if both are detected |
391 | as fully functional. | 391 | as fully functional. |
392 | 392 | ||
393 | RLMT Mode (Redundant Link Management Technology) | 393 | RLMT Mode (Redundant Link Management Technology) |
394 | ------------------------------------------------ | 394 | ------------------------------------------------ |
395 | Parameter: RlmtMode | 395 | Parameter: RlmtMode |
396 | Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet | 396 | Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet |
397 | Default: CheckLinkState | 397 | Default: CheckLinkState |
398 | 398 | ||
399 | RLMT monitors the status of the port. If the link of the active port | 399 | RLMT monitors the status of the port. If the link of the active port |
400 | fails, RLMT switches immediately to the standby link. The virtual link is | 400 | fails, RLMT switches immediately to the standby link. The virtual link is |
401 | maintained as long as at least one 'physical' link is up. | 401 | maintained as long as at least one 'physical' link is up. |
402 | 402 | ||
403 | Possible modes: | 403 | Possible modes: |
404 | 404 | ||
405 | -- CheckLinkState - Check link state only: RLMT uses the link state | 405 | -- CheckLinkState - Check link state only: RLMT uses the link state |
406 | reported by the adapter hardware for each individual port to | 406 | reported by the adapter hardware for each individual port to |
407 | determine whether a port can be used for all network traffic or | 407 | determine whether a port can be used for all network traffic or |
408 | not. | 408 | not. |
409 | 409 | ||
410 | -- CheckLocalPort - In this mode, RLMT monitors the network path | 410 | -- CheckLocalPort - In this mode, RLMT monitors the network path |
411 | between the two ports of an adapter by regularly exchanging packets | 411 | between the two ports of an adapter by regularly exchanging packets |
412 | between them. This mode requires a network configuration in which | 412 | between them. This mode requires a network configuration in which |
413 | the two ports are able to "see" each other (i.e. there must not be | 413 | the two ports are able to "see" each other (i.e. there must not be |
414 | any router between the ports). | 414 | any router between the ports). |
415 | 415 | ||
416 | -- CheckSeg - Check local port and segmentation: This mode supports the | 416 | -- CheckSeg - Check local port and segmentation: This mode supports the |
417 | same functions as the CheckLocalPort mode and additionally checks | 417 | same functions as the CheckLocalPort mode and additionally checks |
418 | network segmentation between the ports. Therefore, this mode is only | 418 | network segmentation between the ports. Therefore, this mode is only |
419 | to be used if Gigabit Ethernet switches are installed on the network | 419 | to be used if Gigabit Ethernet switches are installed on the network |
420 | that have been configured to use the Spanning Tree protocol. | 420 | that have been configured to use the Spanning Tree protocol. |
421 | 421 | ||
422 | -- DualNet - In this mode, ports A and B are used as separate devices. | 422 | -- DualNet - In this mode, ports A and B are used as separate devices. |
423 | If you have a dual port adapter, port A will be configured as eth0 | 423 | If you have a dual port adapter, port A will be configured as eth0 |
424 | and port B as eth1. Both ports can be used independently with | 424 | and port B as eth1. Both ports can be used independently with |
425 | distinct IP addresses. The preferred port setting is not used. | 425 | distinct IP addresses. The preferred port setting is not used. |
426 | RLMT is turned off. | 426 | RLMT is turned off. |
427 | 427 | ||
428 | NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations | 428 | NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations |
429 | where a network path between the ports on one adapter exists. | 429 | where a network path between the ports on one adapter exists. |
430 | Moreover, they are not designed to work where adapters are connected | 430 | Moreover, they are not designed to work where adapters are connected |
431 | back-to-back. | 431 | back-to-back. |
432 | *** | 432 | *** |
433 | 433 | ||
434 | 434 | ||
435 | 5 Large Frame Support | 435 | 5 Large Frame Support |
436 | ====================== | 436 | ====================== |
437 | 437 | ||
438 | The driver supports large frames (also called jumbo frames). Using large | 438 | The driver supports large frames (also called jumbo frames). Using large |
439 | frames can result in an improved throughput if transferring large amounts | 439 | frames can result in an improved throughput if transferring large amounts |
440 | of data. | 440 | of data. |
441 | To enable large frames, set the MTU (maximum transfer unit) of the | 441 | To enable large frames, set the MTU (maximum transfer unit) of the |
442 | interface to the desired value (up to 9000), execute the following | 442 | interface to the desired value (up to 9000), execute the following |
443 | command: | 443 | command: |
444 | ifconfig eth0 mtu 9000 | 444 | ifconfig eth0 mtu 9000 |
445 | This will only work if you have two adapters connected back-to-back | 445 | This will only work if you have two adapters connected back-to-back |
446 | or if you use a switch that supports large frames. When using a switch, | 446 | or if you use a switch that supports large frames. When using a switch, |
447 | it should be configured to allow large frames and auto-negotiation should | 447 | it should be configured to allow large frames and auto-negotiation should |
448 | be set to OFF. The setting must be configured on all adapters that can be | 448 | be set to OFF. The setting must be configured on all adapters that can be |
449 | reached by the large frames. If one adapter is not set to receive large | 449 | reached by the large frames. If one adapter is not set to receive large |
450 | frames, it will simply drop them. | 450 | frames, it will simply drop them. |
451 | 451 | ||
452 | You can switch back to the standard ethernet frame size by executing the | 452 | You can switch back to the standard ethernet frame size by executing the |
453 | following command: | 453 | following command: |
454 | ifconfig eth0 mtu 1500 | 454 | ifconfig eth0 mtu 1500 |
455 | 455 | ||
456 | To permanently configure this setting, add a script with the 'ifconfig' | 456 | To permanently configure this setting, add a script with the 'ifconfig' |
457 | line to the system startup sequence (named something like "S99sk98lin" | 457 | line to the system startup sequence (named something like "S99sk98lin" |
458 | in /etc/rc.d/rc2.d). | 458 | in /etc/rc.d/rc2.d). |
459 | *** | 459 | *** |
460 | 460 | ||
461 | 461 | ||
462 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | 462 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) |
463 | ================================================================== | 463 | ================================================================== |
464 | 464 | ||
465 | The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and | 465 | The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and |
466 | Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. | 466 | Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. |
467 | These features are only available after installation of open source | 467 | These features are only available after installation of open source |
468 | modules available on the Internet: | 468 | modules available on the Internet: |
469 | For VLAN go to: http://www.candelatech.com/~greear/vlan.html | 469 | For VLAN go to: http://www.candelatech.com/~greear/vlan.html |
470 | For Link Aggregation go to: http://www.st.rim.or.jp/~yumo | 470 | For Link Aggregation go to: http://www.st.rim.or.jp/~yumo |
471 | 471 | ||
472 | NOTE: SysKonnect GmbH does not offer any support for these open source | 472 | NOTE: SysKonnect GmbH does not offer any support for these open source |
473 | modules and does not take the responsibility for any kind of | 473 | modules and does not take the responsibility for any kind of |
474 | failures or problems arising in connection with these modules. | 474 | failures or problems arising in connection with these modules. |
475 | 475 | ||
476 | NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may | 476 | NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may |
477 | cause problems when unloading the driver. | 477 | cause problems when unloading the driver. |
478 | 478 | ||
479 | 479 | ||
480 | 7 Troubleshooting | 480 | 7 Troubleshooting |
481 | ================== | 481 | ================== |
482 | 482 | ||
483 | If any problems occur during the installation process, check the | 483 | If any problems occur during the installation process, check the |
484 | following list: | 484 | following list: |
485 | 485 | ||
486 | 486 | ||
487 | Problem: The SK-98xx adapter cannot be found by the driver. | 487 | Problem: The SK-98xx adapter cannot be found by the driver. |
488 | Solution: In /proc/pci search for the following entry: | 488 | Solution: In /proc/pci search for the following entry: |
489 | 'Ethernet controller: SysKonnect SK-98xx ...' | 489 | 'Ethernet controller: SysKonnect SK-98xx ...' |
490 | If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has | 490 | If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has |
491 | been found by the system and should be operational. | 491 | been found by the system and should be operational. |
492 | If this entry does not exist or if the file '/proc/pci' is not | 492 | If this entry does not exist or if the file '/proc/pci' is not |
493 | found, there may be a hardware problem or the PCI support may | 493 | found, there may be a hardware problem or the PCI support may |
494 | not be enabled in your kernel. | 494 | not be enabled in your kernel. |
495 | The adapter can be checked using the diagnostics program which | 495 | The adapter can be checked using the diagnostics program which |
496 | is available on the SysKonnect web site: | 496 | is available on the SysKonnect web site: |
497 | www.syskonnect.com | 497 | www.syskonnect.com |
498 | 498 | ||
499 | Some COMPAQ machines have problems dealing with PCI under Linux. | 499 | Some COMPAQ machines have problems dealing with PCI under Linux. |
500 | Linux. This problem is described in the 'PCI howto' document | 500 | This problem is described in the 'PCI howto' document |
501 | (included in some distributions or available from the | 501 | (included in some distributions or available from the |
502 | web, e.g. at 'www.linux.org'). | 502 | web, e.g. at 'www.linux.org'). |
503 | 503 | ||
504 | 504 | ||
505 | Problem: Programs such as 'ifconfig' or 'route' cannot be found or the | 505 | Problem: Programs such as 'ifconfig' or 'route' cannot be found or the |
506 | error message 'Operation not permitted' is displayed. | 506 | error message 'Operation not permitted' is displayed. |
507 | Reason: You are not logged in as user 'root'. | 507 | Reason: You are not logged in as user 'root'. |
508 | Solution: Logout and login as 'root' or change to 'root' via 'su'. | 508 | Solution: Logout and login as 'root' or change to 'root' via 'su'. |
509 | 509 | ||
510 | 510 | ||
511 | Problem: Upon use of the command 'ping <address>' the message | 511 | Problem: Upon use of the command 'ping <address>' the message |
512 | "ping: sendto: Network is unreachable" is displayed. | 512 | "ping: sendto: Network is unreachable" is displayed. |
513 | Reason: Your route is not set correctly. | 513 | Reason: Your route is not set correctly. |
514 | Solution: If you are using RedHat, you probably forgot to set up the | 514 | Solution: If you are using RedHat, you probably forgot to set up the |
515 | route in the 'network configuration'. | 515 | route in the 'network configuration'. |
516 | Check the existing routes with the 'route' command and check | 516 | Check the existing routes with the 'route' command and check |
517 | if an entry for 'eth0' exists, and if so, if it is set correctly. | 517 | if an entry for 'eth0' exists, and if so, if it is set correctly. |
518 | 518 | ||
519 | 519 | ||
520 | Problem: The driver can be started, the adapter is connected to the | 520 | Problem: The driver can be started, the adapter is connected to the |
521 | network, but you cannot receive or transmit any packets; | 521 | network, but you cannot receive or transmit any packets; |
522 | e.g. 'ping' does not work. | 522 | e.g. 'ping' does not work. |
523 | Reason: There is an incorrect route in your routing table. | 523 | Reason: There is an incorrect route in your routing table. |
524 | Solution: Check the routing table with the command 'route' and read the | 524 | Solution: Check the routing table with the command 'route' and read the |
525 | manual help pages dealing with routes (enter 'man route'). | 525 | manual help pages dealing with routes (enter 'man route'). |
526 | 526 | ||
527 | NOTE: Although the 2.2.x kernel versions generate the routing entry | 527 | NOTE: Although the 2.2.x kernel versions generate the routing entry |
528 | automatically, problems of this kind may occur here as well. We've | 528 | automatically, problems of this kind may occur here as well. We've |
529 | come across a situation in which the driver started correctly at | 529 | come across a situation in which the driver started correctly at |
530 | system start, but after the driver has been removed and reloaded, | 530 | system start, but after the driver has been removed and reloaded, |
531 | the route of the adapter's network pointed to the 'dummy0'device | 531 | the route of the adapter's network pointed to the 'dummy0'device |
532 | and had to be corrected manually. | 532 | and had to be corrected manually. |
533 | 533 | ||
534 | 534 | ||
535 | Problem: Your computer should act as a router between multiple | 535 | Problem: Your computer should act as a router between multiple |
536 | IP subnetworks (using multiple adapters), but computers in | 536 | IP subnetworks (using multiple adapters), but computers in |
537 | other subnetworks cannot be reached. | 537 | other subnetworks cannot be reached. |
538 | Reason: Either the router's kernel is not configured for IP forwarding | 538 | Reason: Either the router's kernel is not configured for IP forwarding |
539 | or the routing table and gateway configuration of at least one | 539 | or the routing table and gateway configuration of at least one |
540 | computer is not working. | 540 | computer is not working. |
541 | 541 | ||
542 | Problem: Upon driver start, the following error message is displayed: | 542 | Problem: Upon driver start, the following error message is displayed: |
543 | "eth0: -- ERROR -- | 543 | "eth0: -- ERROR -- |
544 | Class: internal Software error | 544 | Class: internal Software error |
545 | Nr: 0xcc | 545 | Nr: 0xcc |
546 | Msg: SkGeInitPort() cannot init running ports" | 546 | Msg: SkGeInitPort() cannot init running ports" |
547 | Reason: You are using a driver compiled for single processor machines | 547 | Reason: You are using a driver compiled for single processor machines |
548 | on a multiprocessor machine with SMP (Symmetric MultiProcessor) | 548 | on a multiprocessor machine with SMP (Symmetric MultiProcessor) |
549 | kernel. | 549 | kernel. |
550 | Solution: Configure your kernel appropriately and recompile the kernel or | 550 | Solution: Configure your kernel appropriately and recompile the kernel or |
551 | the modules. | 551 | the modules. |
552 | 552 | ||
553 | 553 | ||
554 | 554 | ||
555 | If your problem is not listed here, please contact SysKonnect's technical | 555 | If your problem is not listed here, please contact SysKonnect's technical |
556 | support for help (linux@syskonnect.de). | 556 | support for help (linux@syskonnect.de). |
557 | When contacting our technical support, please ensure that the following | 557 | When contacting our technical support, please ensure that the following |
558 | information is available: | 558 | information is available: |
559 | - System Manufacturer and HW Informations (CPU, Memory... ) | 559 | - System Manufacturer and HW Informations (CPU, Memory... ) |
560 | - PCI-Boards in your system | 560 | - PCI-Boards in your system |
561 | - Distribution | 561 | - Distribution |
562 | - Kernel version | 562 | - Kernel version |
563 | - Driver version | 563 | - Driver version |
564 | *** | 564 | *** |
565 | 565 | ||
566 | 566 | ||
567 | 567 | ||
568 | ***End of Readme File*** | 568 | ***End of Readme File*** |
569 | 569 |
Documentation/pci-error-recovery.txt
1 | 1 | ||
2 | PCI Error Recovery | 2 | PCI Error Recovery |
3 | ------------------ | 3 | ------------------ |
4 | February 2, 2006 | 4 | February 2, 2006 |
5 | 5 | ||
6 | Current document maintainer: | 6 | Current document maintainer: |
7 | Linas Vepstas <linas@austin.ibm.com> | 7 | Linas Vepstas <linas@austin.ibm.com> |
8 | 8 | ||
9 | 9 | ||
10 | Many PCI bus controllers are able to detect a variety of hardware | 10 | Many PCI bus controllers are able to detect a variety of hardware |
11 | PCI errors on the bus, such as parity errors on the data and address | 11 | PCI errors on the bus, such as parity errors on the data and address |
12 | busses, as well as SERR and PERR errors. Some of the more advanced | 12 | busses, as well as SERR and PERR errors. Some of the more advanced |
13 | chipsets are able to deal with these errors; these include PCI-E chipsets, | 13 | chipsets are able to deal with these errors; these include PCI-E chipsets, |
14 | and the PCI-host bridges found on IBM Power4 and Power5-based pSeries | 14 | and the PCI-host bridges found on IBM Power4 and Power5-based pSeries |
15 | boxes. A typical action taken is to disconnect the affected device, | 15 | boxes. A typical action taken is to disconnect the affected device, |
16 | halting all I/O to it. The goal of a disconnection is to avoid system | 16 | halting all I/O to it. The goal of a disconnection is to avoid system |
17 | corruption; for example, to halt system memory corruption due to DMA's | 17 | corruption; for example, to halt system memory corruption due to DMA's |
18 | to "wild" addresses. Typically, a reconnection mechanism is also | 18 | to "wild" addresses. Typically, a reconnection mechanism is also |
19 | offered, so that the affected PCI device(s) are reset and put back | 19 | offered, so that the affected PCI device(s) are reset and put back |
20 | into working condition. The reset phase requires coordination | 20 | into working condition. The reset phase requires coordination |
21 | between the affected device drivers and the PCI controller chip. | 21 | between the affected device drivers and the PCI controller chip. |
22 | This document describes a generic API for notifying device drivers | 22 | This document describes a generic API for notifying device drivers |
23 | of a bus disconnection, and then performing error recovery. | 23 | of a bus disconnection, and then performing error recovery. |
24 | This API is currently implemented in the 2.6.16 and later kernels. | 24 | This API is currently implemented in the 2.6.16 and later kernels. |
25 | 25 | ||
26 | Reporting and recovery is performed in several steps. First, when | 26 | Reporting and recovery is performed in several steps. First, when |
27 | a PCI hardware error has resulted in a bus disconnect, that event | 27 | a PCI hardware error has resulted in a bus disconnect, that event |
28 | is reported as soon as possible to all affected device drivers, | 28 | is reported as soon as possible to all affected device drivers, |
29 | including multiple instances of a device driver on multi-function | 29 | including multiple instances of a device driver on multi-function |
30 | cards. This allows device drivers to avoid deadlocking in spinloops, | 30 | cards. This allows device drivers to avoid deadlocking in spinloops, |
31 | waiting for some i/o-space register to change, when it never will. | 31 | waiting for some i/o-space register to change, when it never will. |
32 | It also gives the drivers a chance to defer incoming I/O as | 32 | It also gives the drivers a chance to defer incoming I/O as |
33 | needed. | 33 | needed. |
34 | 34 | ||
35 | Next, recovery is performed in several stages. Most of the complexity | 35 | Next, recovery is performed in several stages. Most of the complexity |
36 | is forced by the need to handle multi-function devices, that is, | 36 | is forced by the need to handle multi-function devices, that is, |
37 | devices that have multiple device drivers associated with them. | 37 | devices that have multiple device drivers associated with them. |
38 | In the first stage, each driver is allowed to indicate what type | 38 | In the first stage, each driver is allowed to indicate what type |
39 | of reset it desires, the choices being a simple re-enabling of I/O | 39 | of reset it desires, the choices being a simple re-enabling of I/O |
40 | or requesting a hard reset (a full electrical #RST of the PCI card). | 40 | or requesting a hard reset (a full electrical #RST of the PCI card). |
41 | If any driver requests a full reset, that is what will be done. | 41 | If any driver requests a full reset, that is what will be done. |
42 | 42 | ||
43 | After a full reset and/or a re-enabling of I/O, all drivers are | 43 | After a full reset and/or a re-enabling of I/O, all drivers are |
44 | again notified, so that they may then perform any device setup/config | 44 | again notified, so that they may then perform any device setup/config |
45 | that may be required. After these have all completed, a final | 45 | that may be required. After these have all completed, a final |
46 | "resume normal operations" event is sent out. | 46 | "resume normal operations" event is sent out. |
47 | 47 | ||
48 | The biggest reason for choosing a kernel-based implementation rather | 48 | The biggest reason for choosing a kernel-based implementation rather |
49 | than a user-space implementation was the need to deal with bus | 49 | than a user-space implementation was the need to deal with bus |
50 | disconnects of PCI devices attached to storage media, and, in particular, | 50 | disconnects of PCI devices attached to storage media, and, in particular, |
51 | disconnects from devices holding the root file system. If the root | 51 | disconnects from devices holding the root file system. If the root |
52 | file system is disconnected, a user-space mechanism would have to go | 52 | file system is disconnected, a user-space mechanism would have to go |
53 | through a large number of contortions to complete recovery. Almost all | 53 | through a large number of contortions to complete recovery. Almost all |
54 | of the current Linux file systems are not tolerant of disconnection | 54 | of the current Linux file systems are not tolerant of disconnection |
55 | from/reconnection to their underlying block device. By contrast, | 55 | from/reconnection to their underlying block device. By contrast, |
56 | bus errors are easy to manage in the device driver. Indeed, most | 56 | bus errors are easy to manage in the device driver. Indeed, most |
57 | device drivers already handle very similar recovery procedures; | 57 | device drivers already handle very similar recovery procedures; |
58 | for example, the SCSI-generic layer already provides significant | 58 | for example, the SCSI-generic layer already provides significant |
59 | mechanisms for dealing with SCSI bus errors and SCSI bus resets. | 59 | mechanisms for dealing with SCSI bus errors and SCSI bus resets. |
60 | 60 | ||
61 | 61 | ||
62 | Detailed Design | 62 | Detailed Design |
63 | --------------- | 63 | --------------- |
64 | Design and implementation details below, based on a chain of | 64 | Design and implementation details below, based on a chain of |
65 | public email discussions with Ben Herrenschmidt, circa 5 April 2005. | 65 | public email discussions with Ben Herrenschmidt, circa 5 April 2005. |
66 | 66 | ||
67 | The error recovery API support is exposed to the driver in the form of | 67 | The error recovery API support is exposed to the driver in the form of |
68 | a structure of function pointers pointed to by a new field in struct | 68 | a structure of function pointers pointed to by a new field in struct |
69 | pci_driver. A driver that fails to provide the structure is "non-aware", | 69 | pci_driver. A driver that fails to provide the structure is "non-aware", |
70 | and the actual recovery steps taken are platform dependent. The | 70 | and the actual recovery steps taken are platform dependent. The |
71 | arch/powerpc implementation will simulate a PCI hotplug remove/add. | 71 | arch/powerpc implementation will simulate a PCI hotplug remove/add. |
72 | 72 | ||
73 | This structure has the form: | 73 | This structure has the form: |
74 | struct pci_error_handlers | 74 | struct pci_error_handlers |
75 | { | 75 | { |
76 | int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); | 76 | int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); |
77 | int (*mmio_enabled)(struct pci_dev *dev); | 77 | int (*mmio_enabled)(struct pci_dev *dev); |
78 | int (*link_reset)(struct pci_dev *dev); | 78 | int (*link_reset)(struct pci_dev *dev); |
79 | int (*slot_reset)(struct pci_dev *dev); | 79 | int (*slot_reset)(struct pci_dev *dev); |
80 | void (*resume)(struct pci_dev *dev); | 80 | void (*resume)(struct pci_dev *dev); |
81 | }; | 81 | }; |
82 | 82 | ||
83 | The possible channel states are: | 83 | The possible channel states are: |
84 | enum pci_channel_state { | 84 | enum pci_channel_state { |
85 | pci_channel_io_normal, /* I/O channel is in normal state */ | 85 | pci_channel_io_normal, /* I/O channel is in normal state */ |
86 | pci_channel_io_frozen, /* I/O to channel is blocked */ | 86 | pci_channel_io_frozen, /* I/O to channel is blocked */ |
87 | pci_channel_io_perm_failure, /* PCI card is dead */ | 87 | pci_channel_io_perm_failure, /* PCI card is dead */ |
88 | }; | 88 | }; |
89 | 89 | ||
90 | Possible return values are: | 90 | Possible return values are: |
91 | enum pci_ers_result { | 91 | enum pci_ers_result { |
92 | PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ | 92 | PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ |
93 | PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ | 93 | PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ |
94 | PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ | 94 | PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ |
95 | PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ | 95 | PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ |
96 | PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ | 96 | PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ |
97 | }; | 97 | }; |
98 | 98 | ||
99 | A driver does not have to implement all of these callbacks; however, | 99 | A driver does not have to implement all of these callbacks; however, |
100 | if it implements any, it must implement error_detected(). If a callback | 100 | if it implements any, it must implement error_detected(). If a callback |
101 | is not implemented, the corresponding feature is considered unsupported. | 101 | is not implemented, the corresponding feature is considered unsupported. |
102 | For example, if mmio_enabled() and resume() aren't there, then it | 102 | For example, if mmio_enabled() and resume() aren't there, then it |
103 | is assumed that the driver is not doing any direct recovery and requires | 103 | is assumed that the driver is not doing any direct recovery and requires |
104 | a reset. If link_reset() is not implemented, the card is assumed as | 104 | a reset. If link_reset() is not implemented, the card is assumed as |
105 | not care about link resets. Typically a driver will want to know about | 105 | not care about link resets. Typically a driver will want to know about |
106 | a slot_reset(). | 106 | a slot_reset(). |
107 | 107 | ||
108 | The actual steps taken by a platform to recover from a PCI error | 108 | The actual steps taken by a platform to recover from a PCI error |
109 | event will be platform-dependent, but will follow the general | 109 | event will be platform-dependent, but will follow the general |
110 | sequence described below. | 110 | sequence described below. |
111 | 111 | ||
112 | STEP 0: Error Event | 112 | STEP 0: Error Event |
113 | ------------------- | 113 | ------------------- |
114 | PCI bus error is detect by the PCI hardware. On powerpc, the slot | 114 | PCI bus error is detect by the PCI hardware. On powerpc, the slot |
115 | is isolated, in that all I/O is blocked: all reads return 0xffffffff, | 115 | is isolated, in that all I/O is blocked: all reads return 0xffffffff, |
116 | all writes are ignored. | 116 | all writes are ignored. |
117 | 117 | ||
118 | 118 | ||
119 | STEP 1: Notification | 119 | STEP 1: Notification |
120 | -------------------- | 120 | -------------------- |
121 | Platform calls the error_detected() callback on every instance of | 121 | Platform calls the error_detected() callback on every instance of |
122 | every driver affected by the error. | 122 | every driver affected by the error. |
123 | 123 | ||
124 | At this point, the device might not be accessible anymore, depending on | 124 | At this point, the device might not be accessible anymore, depending on |
125 | the platform (the slot will be isolated on powerpc). The driver may | 125 | the platform (the slot will be isolated on powerpc). The driver may |
126 | already have "noticed" the error because of a failing I/O, but this | 126 | already have "noticed" the error because of a failing I/O, but this |
127 | is the proper "synchronization point", that is, it gives the driver | 127 | is the proper "synchronization point", that is, it gives the driver |
128 | a chance to cleanup, waiting for pending stuff (timers, whatever, etc...) | 128 | a chance to cleanup, waiting for pending stuff (timers, whatever, etc...) |
129 | to complete; it can take semaphores, schedule, etc... everything but | 129 | to complete; it can take semaphores, schedule, etc... everything but |
130 | touch the device. Within this function and after it returns, the driver | 130 | touch the device. Within this function and after it returns, the driver |
131 | shouldn't do any new IOs. Called in task context. This is sort of a | 131 | shouldn't do any new IOs. Called in task context. This is sort of a |
132 | "quiesce" point. See note about interrupts at the end of this doc. | 132 | "quiesce" point. See note about interrupts at the end of this doc. |
133 | 133 | ||
134 | All drivers participating in this system must implement this call. | 134 | All drivers participating in this system must implement this call. |
135 | The driver must return one of the following result codes: | 135 | The driver must return one of the following result codes: |
136 | - PCI_ERS_RESULT_CAN_RECOVER: | 136 | - PCI_ERS_RESULT_CAN_RECOVER: |
137 | Driver returns this if it thinks it might be able to recover | 137 | Driver returns this if it thinks it might be able to recover |
138 | the HW by just banging IOs or if it wants to be given | 138 | the HW by just banging IOs or if it wants to be given |
139 | a chance to extract some diagnostic information (see | 139 | a chance to extract some diagnostic information (see |
140 | mmio_enable, below). | 140 | mmio_enable, below). |
141 | - PCI_ERS_RESULT_NEED_RESET: | 141 | - PCI_ERS_RESULT_NEED_RESET: |
142 | Driver returns this if it can't recover without a hard | 142 | Driver returns this if it can't recover without a hard |
143 | slot reset. | 143 | slot reset. |
144 | - PCI_ERS_RESULT_DISCONNECT: | 144 | - PCI_ERS_RESULT_DISCONNECT: |
145 | Driver returns this if it doesn't want to recover at all. | 145 | Driver returns this if it doesn't want to recover at all. |
146 | 146 | ||
147 | The next step taken will depend on the result codes returned by the | 147 | The next step taken will depend on the result codes returned by the |
148 | drivers. | 148 | drivers. |
149 | 149 | ||
150 | If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, | 150 | If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, |
151 | then the platform should re-enable IOs on the slot (or do nothing in | 151 | then the platform should re-enable IOs on the slot (or do nothing in |
152 | particular, if the platform doesn't isolate slots), and recovery | 152 | particular, if the platform doesn't isolate slots), and recovery |
153 | proceeds to STEP 2 (MMIO Enable). | 153 | proceeds to STEP 2 (MMIO Enable). |
154 | 154 | ||
155 | If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), | 155 | If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), |
156 | then recovery proceeds to STEP 4 (Slot Reset). | 156 | then recovery proceeds to STEP 4 (Slot Reset). |
157 | 157 | ||
158 | If the platform is unable to recover the slot, the next step | 158 | If the platform is unable to recover the slot, the next step |
159 | is STEP 6 (Permanent Failure). | 159 | is STEP 6 (Permanent Failure). |
160 | 160 | ||
161 | >>> The current powerpc implementation assumes that a device driver will | 161 | >>> The current powerpc implementation assumes that a device driver will |
162 | >>> *not* schedule or semaphore in this routine; the current powerpc | 162 | >>> *not* schedule or semaphore in this routine; the current powerpc |
163 | >>> implementation uses one kernel thread to notify all devices; | 163 | >>> implementation uses one kernel thread to notify all devices; |
164 | >>> thus, if one device sleeps/schedules, all devices are affected. | 164 | >>> thus, if one device sleeps/schedules, all devices are affected. |
165 | >>> Doing better requires complex multi-threaded logic in the error | 165 | >>> Doing better requires complex multi-threaded logic in the error |
166 | >>> recovery implementation (e.g. waiting for all notification threads | 166 | >>> recovery implementation (e.g. waiting for all notification threads |
167 | >>> to "join" before proceeding with recovery.) This seems excessively | 167 | >>> to "join" before proceeding with recovery.) This seems excessively |
168 | >>> complex and not worth implementing. | 168 | >>> complex and not worth implementing. |
169 | 169 | ||
170 | >>> The current powerpc implementation doesn't much care if the device | 170 | >>> The current powerpc implementation doesn't much care if the device |
171 | >>> attempts I/O at this point, or not. I/O's will fail, returning | 171 | >>> attempts I/O at this point, or not. I/O's will fail, returning |
172 | >>> a value of 0xff on read, and writes will be dropped. If the device | 172 | >>> a value of 0xff on read, and writes will be dropped. If the device |
173 | >>> driver attempts more than 10K I/O's to a frozen adapter, it will | 173 | >>> driver attempts more than 10K I/O's to a frozen adapter, it will |
174 | >>> assume that the device driver has gone into an infinite loop, and | 174 | >>> assume that the device driver has gone into an infinite loop, and |
175 | >>> it will panic the the kernel. There doesn't seem to be any other | 175 | >>> it will panic the kernel. There doesn't seem to be any other |
176 | >>> way of stopping a device driver that insists on spinning on I/O. | 176 | >>> way of stopping a device driver that insists on spinning on I/O. |
177 | 177 | ||
178 | STEP 2: MMIO Enabled | 178 | STEP 2: MMIO Enabled |
179 | ------------------- | 179 | ------------------- |
180 | The platform re-enables MMIO to the device (but typically not the | 180 | The platform re-enables MMIO to the device (but typically not the |
181 | DMA), and then calls the mmio_enabled() callback on all affected | 181 | DMA), and then calls the mmio_enabled() callback on all affected |
182 | device drivers. | 182 | device drivers. |
183 | 183 | ||
184 | This is the "early recovery" call. IOs are allowed again, but DMA is | 184 | This is the "early recovery" call. IOs are allowed again, but DMA is |
185 | not (hrm... to be discussed, I prefer not), with some restrictions. This | 185 | not (hrm... to be discussed, I prefer not), with some restrictions. This |
186 | is NOT a callback for the driver to start operations again, only to | 186 | is NOT a callback for the driver to start operations again, only to |
187 | peek/poke at the device, extract diagnostic information, if any, and | 187 | peek/poke at the device, extract diagnostic information, if any, and |
188 | eventually do things like trigger a device local reset or some such, | 188 | eventually do things like trigger a device local reset or some such, |
189 | but not restart operations. This is callback is made if all drivers on | 189 | but not restart operations. This is callback is made if all drivers on |
190 | a segment agree that they can try to recover and if no automatic link reset | 190 | a segment agree that they can try to recover and if no automatic link reset |
191 | was performed by the HW. If the platform can't just re-enable IOs without | 191 | was performed by the HW. If the platform can't just re-enable IOs without |
192 | a slot reset or a link reset, it wont call this callback, and instead | 192 | a slot reset or a link reset, it wont call this callback, and instead |
193 | will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) | 193 | will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) |
194 | 194 | ||
195 | >>> The following is proposed; no platform implements this yet: | 195 | >>> The following is proposed; no platform implements this yet: |
196 | >>> Proposal: All I/O's should be done _synchronously_ from within | 196 | >>> Proposal: All I/O's should be done _synchronously_ from within |
197 | >>> this callback, errors triggered by them will be returned via | 197 | >>> this callback, errors triggered by them will be returned via |
198 | >>> the normal pci_check_whatever() API, no new error_detected() | 198 | >>> the normal pci_check_whatever() API, no new error_detected() |
199 | >>> callback will be issued due to an error happening here. However, | 199 | >>> callback will be issued due to an error happening here. However, |
200 | >>> such an error might cause IOs to be re-blocked for the whole | 200 | >>> such an error might cause IOs to be re-blocked for the whole |
201 | >>> segment, and thus invalidate the recovery that other devices | 201 | >>> segment, and thus invalidate the recovery that other devices |
202 | >>> on the same segment might have done, forcing the whole segment | 202 | >>> on the same segment might have done, forcing the whole segment |
203 | >>> into one of the next states, that is, link reset or slot reset. | 203 | >>> into one of the next states, that is, link reset or slot reset. |
204 | 204 | ||
205 | The driver should return one of the following result codes: | 205 | The driver should return one of the following result codes: |
206 | - PCI_ERS_RESULT_RECOVERED | 206 | - PCI_ERS_RESULT_RECOVERED |
207 | Driver returns this if it thinks the device is fully | 207 | Driver returns this if it thinks the device is fully |
208 | functional and thinks it is ready to start | 208 | functional and thinks it is ready to start |
209 | normal driver operations again. There is no | 209 | normal driver operations again. There is no |
210 | guarantee that the driver will actually be | 210 | guarantee that the driver will actually be |
211 | allowed to proceed, as another driver on the | 211 | allowed to proceed, as another driver on the |
212 | same segment might have failed and thus triggered a | 212 | same segment might have failed and thus triggered a |
213 | slot reset on platforms that support it. | 213 | slot reset on platforms that support it. |
214 | 214 | ||
215 | - PCI_ERS_RESULT_NEED_RESET | 215 | - PCI_ERS_RESULT_NEED_RESET |
216 | Driver returns this if it thinks the device is not | 216 | Driver returns this if it thinks the device is not |
217 | recoverable in it's current state and it needs a slot | 217 | recoverable in it's current state and it needs a slot |
218 | reset to proceed. | 218 | reset to proceed. |
219 | 219 | ||
220 | - PCI_ERS_RESULT_DISCONNECT | 220 | - PCI_ERS_RESULT_DISCONNECT |
221 | Same as above. Total failure, no recovery even after | 221 | Same as above. Total failure, no recovery even after |
222 | reset driver dead. (To be defined more precisely) | 222 | reset driver dead. (To be defined more precisely) |
223 | 223 | ||
224 | The next step taken depends on the results returned by the drivers. | 224 | The next step taken depends on the results returned by the drivers. |
225 | If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform | 225 | If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform |
226 | proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). | 226 | proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). |
227 | 227 | ||
228 | If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform | 228 | If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform |
229 | proceeds to STEP 4 (Slot Reset) | 229 | proceeds to STEP 4 (Slot Reset) |
230 | 230 | ||
231 | >>> The current powerpc implementation does not implement this callback. | 231 | >>> The current powerpc implementation does not implement this callback. |
232 | 232 | ||
233 | 233 | ||
234 | STEP 3: Link Reset | 234 | STEP 3: Link Reset |
235 | ------------------ | 235 | ------------------ |
236 | The platform resets the link, and then calls the link_reset() callback | 236 | The platform resets the link, and then calls the link_reset() callback |
237 | on all affected device drivers. This is a PCI-Express specific state | 237 | on all affected device drivers. This is a PCI-Express specific state |
238 | and is done whenever a non-fatal error has been detected that can be | 238 | and is done whenever a non-fatal error has been detected that can be |
239 | "solved" by resetting the link. This call informs the driver of the | 239 | "solved" by resetting the link. This call informs the driver of the |
240 | reset and the driver should check to see if the device appears to be | 240 | reset and the driver should check to see if the device appears to be |
241 | in working condition. | 241 | in working condition. |
242 | 242 | ||
243 | The driver is not supposed to restart normal driver I/O operations | 243 | The driver is not supposed to restart normal driver I/O operations |
244 | at this point. It should limit itself to "probing" the device to | 244 | at this point. It should limit itself to "probing" the device to |
245 | check it's recoverability status. If all is right, then the platform | 245 | check it's recoverability status. If all is right, then the platform |
246 | will call resume() once all drivers have ack'd link_reset(). | 246 | will call resume() once all drivers have ack'd link_reset(). |
247 | 247 | ||
248 | Result codes: | 248 | Result codes: |
249 | (identical to STEP 3 (MMIO Enabled) | 249 | (identical to STEP 3 (MMIO Enabled) |
250 | 250 | ||
251 | The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5 | 251 | The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5 |
252 | (Resume Operations). | 252 | (Resume Operations). |
253 | 253 | ||
254 | >>> The current powerpc implementation does not implement this callback. | 254 | >>> The current powerpc implementation does not implement this callback. |
255 | 255 | ||
256 | 256 | ||
257 | STEP 4: Slot Reset | 257 | STEP 4: Slot Reset |
258 | ------------------ | 258 | ------------------ |
259 | The platform performs a soft or hard reset of the device, and then | 259 | The platform performs a soft or hard reset of the device, and then |
260 | calls the slot_reset() callback. | 260 | calls the slot_reset() callback. |
261 | 261 | ||
262 | A soft reset consists of asserting the adapter #RST line and then | 262 | A soft reset consists of asserting the adapter #RST line and then |
263 | restoring the PCI BAR's and PCI configuration header to a state | 263 | restoring the PCI BAR's and PCI configuration header to a state |
264 | that is equivalent to what it would be after a fresh system | 264 | that is equivalent to what it would be after a fresh system |
265 | power-on followed by power-on BIOS/system firmware initialization. | 265 | power-on followed by power-on BIOS/system firmware initialization. |
266 | If the platform supports PCI hotplug, then the reset might be | 266 | If the platform supports PCI hotplug, then the reset might be |
267 | performed by toggling the slot electrical power off/on. | 267 | performed by toggling the slot electrical power off/on. |
268 | 268 | ||
269 | It is important for the platform to restore the PCI config space | 269 | It is important for the platform to restore the PCI config space |
270 | to the "fresh poweron" state, rather than the "last state". After | 270 | to the "fresh poweron" state, rather than the "last state". After |
271 | a slot reset, the device driver will almost always use its standard | 271 | a slot reset, the device driver will almost always use its standard |
272 | device initialization routines, and an unusual config space setup | 272 | device initialization routines, and an unusual config space setup |
273 | may result in hung devices, kernel panics, or silent data corruption. | 273 | may result in hung devices, kernel panics, or silent data corruption. |
274 | 274 | ||
275 | This call gives drivers the chance to re-initialize the hardware | 275 | This call gives drivers the chance to re-initialize the hardware |
276 | (re-download firmware, etc.). At this point, the driver may assume | 276 | (re-download firmware, etc.). At this point, the driver may assume |
277 | that he card is in a fresh state and is fully functional. In | 277 | that he card is in a fresh state and is fully functional. In |
278 | particular, interrupt generation should work normally. | 278 | particular, interrupt generation should work normally. |
279 | 279 | ||
280 | Drivers should not yet restart normal I/O processing operations | 280 | Drivers should not yet restart normal I/O processing operations |
281 | at this point. If all device drivers report success on this | 281 | at this point. If all device drivers report success on this |
282 | callback, the platform will call resume() to complete the sequence, | 282 | callback, the platform will call resume() to complete the sequence, |
283 | and let the driver restart normal I/O processing. | 283 | and let the driver restart normal I/O processing. |
284 | 284 | ||
285 | A driver can still return a critical failure for this function if | 285 | A driver can still return a critical failure for this function if |
286 | it can't get the device operational after reset. If the platform | 286 | it can't get the device operational after reset. If the platform |
287 | previously tried a soft reset, it might now try a hard reset (power | 287 | previously tried a soft reset, it might now try a hard reset (power |
288 | cycle) and then call slot_reset() again. It the device still can't | 288 | cycle) and then call slot_reset() again. It the device still can't |
289 | be recovered, there is nothing more that can be done; the platform | 289 | be recovered, there is nothing more that can be done; the platform |
290 | will typically report a "permanent failure" in such a case. The | 290 | will typically report a "permanent failure" in such a case. The |
291 | device will be considered "dead" in this case. | 291 | device will be considered "dead" in this case. |
292 | 292 | ||
293 | Drivers for multi-function cards will need to coordinate among | 293 | Drivers for multi-function cards will need to coordinate among |
294 | themselves as to which driver instance will perform any "one-shot" | 294 | themselves as to which driver instance will perform any "one-shot" |
295 | or global device initialization. For example, the Symbios sym53cxx2 | 295 | or global device initialization. For example, the Symbios sym53cxx2 |
296 | driver performs device init only from PCI function 0: | 296 | driver performs device init only from PCI function 0: |
297 | 297 | ||
298 | + if (PCI_FUNC(pdev->devfn) == 0) | 298 | + if (PCI_FUNC(pdev->devfn) == 0) |
299 | + sym_reset_scsi_bus(np, 0); | 299 | + sym_reset_scsi_bus(np, 0); |
300 | 300 | ||
301 | Result codes: | 301 | Result codes: |
302 | - PCI_ERS_RESULT_DISCONNECT | 302 | - PCI_ERS_RESULT_DISCONNECT |
303 | Same as above. | 303 | Same as above. |
304 | 304 | ||
305 | Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent | 305 | Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent |
306 | Failure). | 306 | Failure). |
307 | 307 | ||
308 | >>> The current powerpc implementation does not currently try a | 308 | >>> The current powerpc implementation does not currently try a |
309 | >>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. | 309 | >>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. |
310 | >>> However, it probably should. | 310 | >>> However, it probably should. |
311 | 311 | ||
312 | 312 | ||
313 | STEP 5: Resume Operations | 313 | STEP 5: Resume Operations |
314 | ------------------------- | 314 | ------------------------- |
315 | The platform will call the resume() callback on all affected device | 315 | The platform will call the resume() callback on all affected device |
316 | drivers if all drivers on the segment have returned | 316 | drivers if all drivers on the segment have returned |
317 | PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. | 317 | PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. |
318 | The goal of this callback is to tell the driver to restart activity, | 318 | The goal of this callback is to tell the driver to restart activity, |
319 | that everything is back and running. This callback does not return | 319 | that everything is back and running. This callback does not return |
320 | a result code. | 320 | a result code. |
321 | 321 | ||
322 | At this point, if a new error happens, the platform will restart | 322 | At this point, if a new error happens, the platform will restart |
323 | a new error recovery sequence. | 323 | a new error recovery sequence. |
324 | 324 | ||
325 | STEP 6: Permanent Failure | 325 | STEP 6: Permanent Failure |
326 | ------------------------- | 326 | ------------------------- |
327 | A "permanent failure" has occurred, and the platform cannot recover | 327 | A "permanent failure" has occurred, and the platform cannot recover |
328 | the device. The platform will call error_detected() with a | 328 | the device. The platform will call error_detected() with a |
329 | pci_channel_state value of pci_channel_io_perm_failure. | 329 | pci_channel_state value of pci_channel_io_perm_failure. |
330 | 330 | ||
331 | The device driver should, at this point, assume the worst. It should | 331 | The device driver should, at this point, assume the worst. It should |
332 | cancel all pending I/O, refuse all new I/O, returning -EIO to | 332 | cancel all pending I/O, refuse all new I/O, returning -EIO to |
333 | higher layers. The device driver should then clean up all of its | 333 | higher layers. The device driver should then clean up all of its |
334 | memory and remove itself from kernel operations, much as it would | 334 | memory and remove itself from kernel operations, much as it would |
335 | during system shutdown. | 335 | during system shutdown. |
336 | 336 | ||
337 | The platform will typically notify the system operator of the | 337 | The platform will typically notify the system operator of the |
338 | permanent failure in some way. If the device is hotplug-capable, | 338 | permanent failure in some way. If the device is hotplug-capable, |
339 | the operator will probably want to remove and replace the device. | 339 | the operator will probably want to remove and replace the device. |
340 | Note, however, not all failures are truly "permanent". Some are | 340 | Note, however, not all failures are truly "permanent". Some are |
341 | caused by over-heating, some by a poorly seated card. Many | 341 | caused by over-heating, some by a poorly seated card. Many |
342 | PCI error events are caused by software bugs, e.g. DMA's to | 342 | PCI error events are caused by software bugs, e.g. DMA's to |
343 | wild addresses or bogus split transactions due to programming | 343 | wild addresses or bogus split transactions due to programming |
344 | errors. See the discussion in powerpc/eeh-pci-error-recovery.txt | 344 | errors. See the discussion in powerpc/eeh-pci-error-recovery.txt |
345 | for additional detail on real-life experience of the causes of | 345 | for additional detail on real-life experience of the causes of |
346 | software errors. | 346 | software errors. |
347 | 347 | ||
348 | 348 | ||
349 | Conclusion; General Remarks | 349 | Conclusion; General Remarks |
350 | --------------------------- | 350 | --------------------------- |
351 | The way those callbacks are called is platform policy. A platform with | 351 | The way those callbacks are called is platform policy. A platform with |
352 | no slot reset capability may want to just "ignore" drivers that can't | 352 | no slot reset capability may want to just "ignore" drivers that can't |
353 | recover (disconnect them) and try to let other cards on the same segment | 353 | recover (disconnect them) and try to let other cards on the same segment |
354 | recover. Keep in mind that in most real life cases, though, there will | 354 | recover. Keep in mind that in most real life cases, though, there will |
355 | be only one driver per segment. | 355 | be only one driver per segment. |
356 | 356 | ||
357 | Now, a note about interrupts. If you get an interrupt and your | 357 | Now, a note about interrupts. If you get an interrupt and your |
358 | device is dead or has been isolated, there is a problem :) | 358 | device is dead or has been isolated, there is a problem :) |
359 | The current policy is to turn this into a platform policy. | 359 | The current policy is to turn this into a platform policy. |
360 | That is, the recovery API only requires that: | 360 | That is, the recovery API only requires that: |
361 | 361 | ||
362 | - There is no guarantee that interrupt delivery can proceed from any | 362 | - There is no guarantee that interrupt delivery can proceed from any |
363 | device on the segment starting from the error detection and until the | 363 | device on the segment starting from the error detection and until the |
364 | resume callback is sent, at which point interrupts are expected to be | 364 | resume callback is sent, at which point interrupts are expected to be |
365 | fully operational. | 365 | fully operational. |
366 | 366 | ||
367 | - There is no guarantee that interrupt delivery is stopped, that is, | 367 | - There is no guarantee that interrupt delivery is stopped, that is, |
368 | a driver that gets an interrupt after detecting an error, or that detects | 368 | a driver that gets an interrupt after detecting an error, or that detects |
369 | an error within the interrupt handler such that it prevents proper | 369 | an error within the interrupt handler such that it prevents proper |
370 | ack'ing of the interrupt (and thus removal of the source) should just | 370 | ack'ing of the interrupt (and thus removal of the source) should just |
371 | return IRQ_NOTHANDLED. It's up to the platform to deal with that | 371 | return IRQ_NOTHANDLED. It's up to the platform to deal with that |
372 | condition, typically by masking the IRQ source during the duration of | 372 | condition, typically by masking the IRQ source during the duration of |
373 | the error handling. It is expected that the platform "knows" which | 373 | the error handling. It is expected that the platform "knows" which |
374 | interrupts are routed to error-management capable slots and can deal | 374 | interrupts are routed to error-management capable slots and can deal |
375 | with temporarily disabling that IRQ number during error processing (this | 375 | with temporarily disabling that IRQ number during error processing (this |
376 | isn't terribly complex). That means some IRQ latency for other devices | 376 | isn't terribly complex). That means some IRQ latency for other devices |
377 | sharing the interrupt, but there is simply no other way. High end | 377 | sharing the interrupt, but there is simply no other way. High end |
378 | platforms aren't supposed to share interrupts between many devices | 378 | platforms aren't supposed to share interrupts between many devices |
379 | anyway :) | 379 | anyway :) |
380 | 380 | ||
381 | >>> Implementation details for the powerpc platform are discussed in | 381 | >>> Implementation details for the powerpc platform are discussed in |
382 | >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt | 382 | >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt |
383 | 383 | ||
384 | >>> As of this writing, there are six device drivers with patches | 384 | >>> As of this writing, there are six device drivers with patches |
385 | >>> implementing error recovery. Not all of these patches are in | 385 | >>> implementing error recovery. Not all of these patches are in |
386 | >>> mainline yet. These may be used as "examples": | 386 | >>> mainline yet. These may be used as "examples": |
387 | >>> | 387 | >>> |
388 | >>> drivers/scsi/ipr.c | 388 | >>> drivers/scsi/ipr.c |
389 | >>> drivers/scsi/sym53cxx_2 | 389 | >>> drivers/scsi/sym53cxx_2 |
390 | >>> drivers/next/e100.c | 390 | >>> drivers/next/e100.c |
391 | >>> drivers/net/e1000 | 391 | >>> drivers/net/e1000 |
392 | >>> drivers/net/ixgb | 392 | >>> drivers/net/ixgb |
393 | >>> drivers/net/s2io.c | 393 | >>> drivers/net/s2io.c |
394 | 394 | ||
395 | The End | 395 | The End |
396 | ------- | 396 | ------- |
397 | 397 |
Documentation/power/swsusp.txt
1 | Some warnings, first. | 1 | Some warnings, first. |
2 | 2 | ||
3 | * BIG FAT WARNING ********************************************************* | 3 | * BIG FAT WARNING ********************************************************* |
4 | * | 4 | * |
5 | * If you touch anything on disk between suspend and resume... | 5 | * If you touch anything on disk between suspend and resume... |
6 | * ...kiss your data goodbye. | 6 | * ...kiss your data goodbye. |
7 | * | 7 | * |
8 | * If you do resume from initrd after your filesystems are mounted... | 8 | * If you do resume from initrd after your filesystems are mounted... |
9 | * ...bye bye root partition. | 9 | * ...bye bye root partition. |
10 | * [this is actually same case as above] | 10 | * [this is actually same case as above] |
11 | * | 11 | * |
12 | * If you have unsupported (*) devices using DMA, you may have some | 12 | * If you have unsupported (*) devices using DMA, you may have some |
13 | * problems. If your disk driver does not support suspend... (IDE does), | 13 | * problems. If your disk driver does not support suspend... (IDE does), |
14 | * it may cause some problems, too. If you change kernel command line | 14 | * it may cause some problems, too. If you change kernel command line |
15 | * between suspend and resume, it may do something wrong. If you change | 15 | * between suspend and resume, it may do something wrong. If you change |
16 | * your hardware while system is suspended... well, it was not good idea; | 16 | * your hardware while system is suspended... well, it was not good idea; |
17 | * but it will probably only crash. | 17 | * but it will probably only crash. |
18 | * | 18 | * |
19 | * (*) suspend/resume support is needed to make it safe. | 19 | * (*) suspend/resume support is needed to make it safe. |
20 | * | 20 | * |
21 | * If you have any filesystems on USB devices mounted before software suspend, | 21 | * If you have any filesystems on USB devices mounted before software suspend, |
22 | * they won't be accessible after resume and you may lose data, as though | 22 | * they won't be accessible after resume and you may lose data, as though |
23 | * you have unplugged the USB devices with mounted filesystems on them; | 23 | * you have unplugged the USB devices with mounted filesystems on them; |
24 | * see the FAQ below for details. (This is not true for more traditional | 24 | * see the FAQ below for details. (This is not true for more traditional |
25 | * power states like "standby", which normally don't turn USB off.) | 25 | * power states like "standby", which normally don't turn USB off.) |
26 | 26 | ||
27 | You need to append resume=/dev/your_swap_partition to kernel command | 27 | You need to append resume=/dev/your_swap_partition to kernel command |
28 | line. Then you suspend by | 28 | line. Then you suspend by |
29 | 29 | ||
30 | echo shutdown > /sys/power/disk; echo disk > /sys/power/state | 30 | echo shutdown > /sys/power/disk; echo disk > /sys/power/state |
31 | 31 | ||
32 | . If you feel ACPI works pretty well on your system, you might try | 32 | . If you feel ACPI works pretty well on your system, you might try |
33 | 33 | ||
34 | echo platform > /sys/power/disk; echo disk > /sys/power/state | 34 | echo platform > /sys/power/disk; echo disk > /sys/power/state |
35 | 35 | ||
36 | . If you have SATA disks, you'll need recent kernels with SATA suspend | 36 | . If you have SATA disks, you'll need recent kernels with SATA suspend |
37 | support. For suspend and resume to work, make sure your disk drivers | 37 | support. For suspend and resume to work, make sure your disk drivers |
38 | are built into kernel -- not modules. [There's way to make | 38 | are built into kernel -- not modules. [There's way to make |
39 | suspend/resume with modular disk drivers, see FAQ, but you probably | 39 | suspend/resume with modular disk drivers, see FAQ, but you probably |
40 | should not do that.] | 40 | should not do that.] |
41 | 41 | ||
42 | If you want to limit the suspend image size to N bytes, do | 42 | If you want to limit the suspend image size to N bytes, do |
43 | 43 | ||
44 | echo N > /sys/power/image_size | 44 | echo N > /sys/power/image_size |
45 | 45 | ||
46 | before suspend (it is limited to 500 MB by default). | 46 | before suspend (it is limited to 500 MB by default). |
47 | 47 | ||
48 | 48 | ||
49 | Article about goals and implementation of Software Suspend for Linux | 49 | Article about goals and implementation of Software Suspend for Linux |
50 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 50 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
51 | Author: Gโรกbor Kuti | 51 | Author: Gโรกbor Kuti |
52 | Last revised: 2003-10-20 by Pavel Machek | 52 | Last revised: 2003-10-20 by Pavel Machek |
53 | 53 | ||
54 | Idea and goals to achieve | 54 | Idea and goals to achieve |
55 | 55 | ||
56 | Nowadays it is common in several laptops that they have a suspend button. It | 56 | Nowadays it is common in several laptops that they have a suspend button. It |
57 | saves the state of the machine to a filesystem or to a partition and switches | 57 | saves the state of the machine to a filesystem or to a partition and switches |
58 | to standby mode. Later resuming the machine the saved state is loaded back to | 58 | to standby mode. Later resuming the machine the saved state is loaded back to |
59 | ram and the machine can continue its work. It has two real benefits. First we | 59 | ram and the machine can continue its work. It has two real benefits. First we |
60 | save ourselves the time machine goes down and later boots up, energy costs | 60 | save ourselves the time machine goes down and later boots up, energy costs |
61 | are real high when running from batteries. The other gain is that we don't have to | 61 | are real high when running from batteries. The other gain is that we don't have to |
62 | interrupt our programs so processes that are calculating something for a long | 62 | interrupt our programs so processes that are calculating something for a long |
63 | time shouldn't need to be written interruptible. | 63 | time shouldn't need to be written interruptible. |
64 | 64 | ||
65 | swsusp saves the state of the machine into active swaps and then reboots or | 65 | swsusp saves the state of the machine into active swaps and then reboots or |
66 | powerdowns. You must explicitly specify the swap partition to resume from with | 66 | powerdowns. You must explicitly specify the swap partition to resume from with |
67 | ``resume='' kernel option. If signature is found it loads and restores saved | 67 | ``resume='' kernel option. If signature is found it loads and restores saved |
68 | state. If the option ``noresume'' is specified as a boot parameter, it skips | 68 | state. If the option ``noresume'' is specified as a boot parameter, it skips |
69 | the resuming. | 69 | the resuming. |
70 | 70 | ||
71 | In the meantime while the system is suspended you should not add/remove any | 71 | In the meantime while the system is suspended you should not add/remove any |
72 | of the hardware, write to the filesystems, etc. | 72 | of the hardware, write to the filesystems, etc. |
73 | 73 | ||
74 | Sleep states summary | 74 | Sleep states summary |
75 | ==================== | 75 | ==================== |
76 | 76 | ||
77 | There are three different interfaces you can use, /proc/acpi should | 77 | There are three different interfaces you can use, /proc/acpi should |
78 | work like this: | 78 | work like this: |
79 | 79 | ||
80 | In a really perfect world: | 80 | In a really perfect world: |
81 | echo 1 > /proc/acpi/sleep # for standby | 81 | echo 1 > /proc/acpi/sleep # for standby |
82 | echo 2 > /proc/acpi/sleep # for suspend to ram | 82 | echo 2 > /proc/acpi/sleep # for suspend to ram |
83 | echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative | 83 | echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative |
84 | echo 4 > /proc/acpi/sleep # for suspend to disk | 84 | echo 4 > /proc/acpi/sleep # for suspend to disk |
85 | echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system | 85 | echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system |
86 | 86 | ||
87 | and perhaps | 87 | and perhaps |
88 | echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios | 88 | echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios |
89 | 89 | ||
90 | Frequently Asked Questions | 90 | Frequently Asked Questions |
91 | ========================== | 91 | ========================== |
92 | 92 | ||
93 | Q: well, suspending a server is IMHO a really stupid thing, | 93 | Q: well, suspending a server is IMHO a really stupid thing, |
94 | but... (Diego Zuccato): | 94 | but... (Diego Zuccato): |
95 | 95 | ||
96 | A: You bought new UPS for your server. How do you install it without | 96 | A: You bought new UPS for your server. How do you install it without |
97 | bringing machine down? Suspend to disk, rearrange power cables, | 97 | bringing machine down? Suspend to disk, rearrange power cables, |
98 | resume. | 98 | resume. |
99 | 99 | ||
100 | You have your server on UPS. Power died, and UPS is indicating 30 | 100 | You have your server on UPS. Power died, and UPS is indicating 30 |
101 | seconds to failure. What do you do? Suspend to disk. | 101 | seconds to failure. What do you do? Suspend to disk. |
102 | 102 | ||
103 | 103 | ||
104 | Q: Maybe I'm missing something, but why don't the regular I/O paths work? | 104 | Q: Maybe I'm missing something, but why don't the regular I/O paths work? |
105 | 105 | ||
106 | A: We do use the regular I/O paths. However we cannot restore the data | 106 | A: We do use the regular I/O paths. However we cannot restore the data |
107 | to its original location as we load it. That would create an | 107 | to its original location as we load it. That would create an |
108 | inconsistent kernel state which would certainly result in an oops. | 108 | inconsistent kernel state which would certainly result in an oops. |
109 | Instead, we load the image into unused memory and then atomically copy | 109 | Instead, we load the image into unused memory and then atomically copy |
110 | it back to it original location. This implies, of course, a maximum | 110 | it back to it original location. This implies, of course, a maximum |
111 | image size of half the amount of memory. | 111 | image size of half the amount of memory. |
112 | 112 | ||
113 | There are two solutions to this: | 113 | There are two solutions to this: |
114 | 114 | ||
115 | * require half of memory to be free during suspend. That way you can | 115 | * require half of memory to be free during suspend. That way you can |
116 | read "new" data onto free spots, then cli and copy | 116 | read "new" data onto free spots, then cli and copy |
117 | 117 | ||
118 | * assume we had special "polling" ide driver that only uses memory | 118 | * assume we had special "polling" ide driver that only uses memory |
119 | between 0-640KB. That way, I'd have to make sure that 0-640KB is free | 119 | between 0-640KB. That way, I'd have to make sure that 0-640KB is free |
120 | during suspending, but otherwise it would work... | 120 | during suspending, but otherwise it would work... |
121 | 121 | ||
122 | suspend2 shares this fundamental limitation, but does not include user | 122 | suspend2 shares this fundamental limitation, but does not include user |
123 | data and disk caches into "used memory" by saving them in | 123 | data and disk caches into "used memory" by saving them in |
124 | advance. That means that the limitation goes away in practice. | 124 | advance. That means that the limitation goes away in practice. |
125 | 125 | ||
126 | Q: Does linux support ACPI S4? | 126 | Q: Does linux support ACPI S4? |
127 | 127 | ||
128 | A: Yes. That's what echo platform > /sys/power/disk does. | 128 | A: Yes. That's what echo platform > /sys/power/disk does. |
129 | 129 | ||
130 | Q: What is 'suspend2'? | 130 | Q: What is 'suspend2'? |
131 | 131 | ||
132 | A: suspend2 is 'Software Suspend 2', a forked implementation of | 132 | A: suspend2 is 'Software Suspend 2', a forked implementation of |
133 | suspend-to-disk which is available as separate patches for 2.4 and 2.6 | 133 | suspend-to-disk which is available as separate patches for 2.4 and 2.6 |
134 | kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB | 134 | kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB |
135 | highmem and preemption. It also has a extensible architecture that | 135 | highmem and preemption. It also has a extensible architecture that |
136 | allows for arbitrary transformations on the image (compression, | 136 | allows for arbitrary transformations on the image (compression, |
137 | encryption) and arbitrary backends for writing the image (eg to swap | 137 | encryption) and arbitrary backends for writing the image (eg to swap |
138 | or an NFS share[Work In Progress]). Questions regarding suspend2 | 138 | or an NFS share[Work In Progress]). Questions regarding suspend2 |
139 | should be sent to the mailing list available through the suspend2 | 139 | should be sent to the mailing list available through the suspend2 |
140 | website, and not to the Linux Kernel Mailing List. We are working | 140 | website, and not to the Linux Kernel Mailing List. We are working |
141 | toward merging suspend2 into the mainline kernel. | 141 | toward merging suspend2 into the mainline kernel. |
142 | 142 | ||
143 | Q: A kernel thread must voluntarily freeze itself (call 'refrigerator'). | 143 | Q: A kernel thread must voluntarily freeze itself (call 'refrigerator'). |
144 | I found some kernel threads that don't do it, and they don't freeze | 144 | I found some kernel threads that don't do it, and they don't freeze |
145 | so the system can't sleep. Is this a known behavior? | 145 | so the system can't sleep. Is this a known behavior? |
146 | 146 | ||
147 | A: All such kernel threads need to be fixed, one by one. Select the | 147 | A: All such kernel threads need to be fixed, one by one. Select the |
148 | place where the thread is safe to be frozen (no kernel semaphores | 148 | place where the thread is safe to be frozen (no kernel semaphores |
149 | should be held at that point and it must be safe to sleep there), and | 149 | should be held at that point and it must be safe to sleep there), and |
150 | add: | 150 | add: |
151 | 151 | ||
152 | try_to_freeze(); | 152 | try_to_freeze(); |
153 | 153 | ||
154 | If the thread is needed for writing the image to storage, you should | 154 | If the thread is needed for writing the image to storage, you should |
155 | instead set the PF_NOFREEZE process flag when creating the thread (and | 155 | instead set the PF_NOFREEZE process flag when creating the thread (and |
156 | be very carefull). | 156 | be very carefull). |
157 | 157 | ||
158 | 158 | ||
159 | Q: What is the difference between between "platform", "shutdown" and | 159 | Q: What is the difference between "platform", "shutdown" and |
160 | "firmware" in /sys/power/disk? | 160 | "firmware" in /sys/power/disk? |
161 | 161 | ||
162 | A: | 162 | A: |
163 | 163 | ||
164 | shutdown: save state in linux, then tell bios to powerdown | 164 | shutdown: save state in linux, then tell bios to powerdown |
165 | 165 | ||
166 | platform: save state in linux, then tell bios to powerdown and blink | 166 | platform: save state in linux, then tell bios to powerdown and blink |
167 | "suspended led" | 167 | "suspended led" |
168 | 168 | ||
169 | firmware: tell bios to save state itself [needs BIOS-specific suspend | 169 | firmware: tell bios to save state itself [needs BIOS-specific suspend |
170 | partition, and has very little to do with swsusp] | 170 | partition, and has very little to do with swsusp] |
171 | 171 | ||
172 | "platform" is actually right thing to do, but "shutdown" is most | 172 | "platform" is actually right thing to do, but "shutdown" is most |
173 | reliable. | 173 | reliable. |
174 | 174 | ||
175 | Q: I do not understand why you have such strong objections to idea of | 175 | Q: I do not understand why you have such strong objections to idea of |
176 | selective suspend. | 176 | selective suspend. |
177 | 177 | ||
178 | A: Do selective suspend during runtime power management, that's okay. But | 178 | A: Do selective suspend during runtime power management, that's okay. But |
179 | it's useless for suspend-to-disk. (And I do not see how you could use | 179 | it's useless for suspend-to-disk. (And I do not see how you could use |
180 | it for suspend-to-ram, I hope you do not want that). | 180 | it for suspend-to-ram, I hope you do not want that). |
181 | 181 | ||
182 | Lets see, so you suggest to | 182 | Lets see, so you suggest to |
183 | 183 | ||
184 | * SUSPEND all but swap device and parents | 184 | * SUSPEND all but swap device and parents |
185 | * Snapshot | 185 | * Snapshot |
186 | * Write image to disk | 186 | * Write image to disk |
187 | * SUSPEND swap device and parents | 187 | * SUSPEND swap device and parents |
188 | * Powerdown | 188 | * Powerdown |
189 | 189 | ||
190 | Oh no, that does not work, if swap device or its parents uses DMA, | 190 | Oh no, that does not work, if swap device or its parents uses DMA, |
191 | you've corrupted data. You'd have to do | 191 | you've corrupted data. You'd have to do |
192 | 192 | ||
193 | * SUSPEND all but swap device and parents | 193 | * SUSPEND all but swap device and parents |
194 | * FREEZE swap device and parents | 194 | * FREEZE swap device and parents |
195 | * Snapshot | 195 | * Snapshot |
196 | * UNFREEZE swap device and parents | 196 | * UNFREEZE swap device and parents |
197 | * Write | 197 | * Write |
198 | * SUSPEND swap device and parents | 198 | * SUSPEND swap device and parents |
199 | 199 | ||
200 | Which means that you still need that FREEZE state, and you get more | 200 | Which means that you still need that FREEZE state, and you get more |
201 | complicated code. (And I have not yet introduce details like system | 201 | complicated code. (And I have not yet introduce details like system |
202 | devices). | 202 | devices). |
203 | 203 | ||
204 | Q: There don't seem to be any generally useful behavioral | 204 | Q: There don't seem to be any generally useful behavioral |
205 | distinctions between SUSPEND and FREEZE. | 205 | distinctions between SUSPEND and FREEZE. |
206 | 206 | ||
207 | A: Doing SUSPEND when you are asked to do FREEZE is always correct, | 207 | A: Doing SUSPEND when you are asked to do FREEZE is always correct, |
208 | but it may be unneccessarily slow. If you want your driver to stay simple, | 208 | but it may be unneccessarily slow. If you want your driver to stay simple, |
209 | slowness may not matter to you. It can always be fixed later. | 209 | slowness may not matter to you. It can always be fixed later. |
210 | 210 | ||
211 | For devices like disk it does matter, you do not want to spindown for | 211 | For devices like disk it does matter, you do not want to spindown for |
212 | FREEZE. | 212 | FREEZE. |
213 | 213 | ||
214 | Q: After resuming, system is paging heavily, leading to very bad interactivity. | 214 | Q: After resuming, system is paging heavily, leading to very bad interactivity. |
215 | 215 | ||
216 | A: Try running | 216 | A: Try running |
217 | 217 | ||
218 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null | 218 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null |
219 | 219 | ||
220 | after resume. swapoff -a; swapon -a may also be useful. | 220 | after resume. swapoff -a; swapon -a may also be useful. |
221 | 221 | ||
222 | Q: What happens to devices during swsusp? They seem to be resumed | 222 | Q: What happens to devices during swsusp? They seem to be resumed |
223 | during system suspend? | 223 | during system suspend? |
224 | 224 | ||
225 | A: That's correct. We need to resume them if we want to write image to | 225 | A: That's correct. We need to resume them if we want to write image to |
226 | disk. Whole sequence goes like | 226 | disk. Whole sequence goes like |
227 | 227 | ||
228 | Suspend part | 228 | Suspend part |
229 | ~~~~~~~~~~~~ | 229 | ~~~~~~~~~~~~ |
230 | running system, user asks for suspend-to-disk | 230 | running system, user asks for suspend-to-disk |
231 | 231 | ||
232 | user processes are stopped | 232 | user processes are stopped |
233 | 233 | ||
234 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere | 234 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere |
235 | with state snapshot | 235 | with state snapshot |
236 | 236 | ||
237 | state snapshot: copy of whole used memory is taken with interrupts disabled | 237 | state snapshot: copy of whole used memory is taken with interrupts disabled |
238 | 238 | ||
239 | resume(): devices are woken up so that we can write image to swap | 239 | resume(): devices are woken up so that we can write image to swap |
240 | 240 | ||
241 | write image to swap | 241 | write image to swap |
242 | 242 | ||
243 | suspend(PMSG_SUSPEND): suspend devices so that we can power off | 243 | suspend(PMSG_SUSPEND): suspend devices so that we can power off |
244 | 244 | ||
245 | turn the power off | 245 | turn the power off |
246 | 246 | ||
247 | Resume part | 247 | Resume part |
248 | ~~~~~~~~~~~ | 248 | ~~~~~~~~~~~ |
249 | (is actually pretty similar) | 249 | (is actually pretty similar) |
250 | 250 | ||
251 | running system, user asks for suspend-to-disk | 251 | running system, user asks for suspend-to-disk |
252 | 252 | ||
253 | user processes are stopped (in common case there are none, but with resume-from-initrd, noone knows) | 253 | user processes are stopped (in common case there are none, but with resume-from-initrd, noone knows) |
254 | 254 | ||
255 | read image from disk | 255 | read image from disk |
256 | 256 | ||
257 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere | 257 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere |
258 | with image restoration | 258 | with image restoration |
259 | 259 | ||
260 | image restoration: rewrite memory with image | 260 | image restoration: rewrite memory with image |
261 | 261 | ||
262 | resume(): devices are woken up so that system can continue | 262 | resume(): devices are woken up so that system can continue |
263 | 263 | ||
264 | thaw all user processes | 264 | thaw all user processes |
265 | 265 | ||
266 | Q: What is this 'Encrypt suspend image' for? | 266 | Q: What is this 'Encrypt suspend image' for? |
267 | 267 | ||
268 | A: First of all: it is not a replacement for dm-crypt encrypted swap. | 268 | A: First of all: it is not a replacement for dm-crypt encrypted swap. |
269 | It cannot protect your computer while it is suspended. Instead it does | 269 | It cannot protect your computer while it is suspended. Instead it does |
270 | protect from leaking sensitive data after resume from suspend. | 270 | protect from leaking sensitive data after resume from suspend. |
271 | 271 | ||
272 | Think of the following: you suspend while an application is running | 272 | Think of the following: you suspend while an application is running |
273 | that keeps sensitive data in memory. The application itself prevents | 273 | that keeps sensitive data in memory. The application itself prevents |
274 | the data from being swapped out. Suspend, however, must write these | 274 | the data from being swapped out. Suspend, however, must write these |
275 | data to swap to be able to resume later on. Without suspend encryption | 275 | data to swap to be able to resume later on. Without suspend encryption |
276 | your sensitive data are then stored in plaintext on disk. This means | 276 | your sensitive data are then stored in plaintext on disk. This means |
277 | that after resume your sensitive data are accessible to all | 277 | that after resume your sensitive data are accessible to all |
278 | applications having direct access to the swap device which was used | 278 | applications having direct access to the swap device which was used |
279 | for suspend. If you don't need swap after resume these data can remain | 279 | for suspend. If you don't need swap after resume these data can remain |
280 | on disk virtually forever. Thus it can happen that your system gets | 280 | on disk virtually forever. Thus it can happen that your system gets |
281 | broken in weeks later and sensitive data which you thought were | 281 | broken in weeks later and sensitive data which you thought were |
282 | encrypted and protected are retrieved and stolen from the swap device. | 282 | encrypted and protected are retrieved and stolen from the swap device. |
283 | To prevent this situation you should use 'Encrypt suspend image'. | 283 | To prevent this situation you should use 'Encrypt suspend image'. |
284 | 284 | ||
285 | During suspend a temporary key is created and this key is used to | 285 | During suspend a temporary key is created and this key is used to |
286 | encrypt the data written to disk. When, during resume, the data was | 286 | encrypt the data written to disk. When, during resume, the data was |
287 | read back into memory the temporary key is destroyed which simply | 287 | read back into memory the temporary key is destroyed which simply |
288 | means that all data written to disk during suspend are then | 288 | means that all data written to disk during suspend are then |
289 | inaccessible so they can't be stolen later on. The only thing that | 289 | inaccessible so they can't be stolen later on. The only thing that |
290 | you must then take care of is that you call 'mkswap' for the swap | 290 | you must then take care of is that you call 'mkswap' for the swap |
291 | partition used for suspend as early as possible during regular | 291 | partition used for suspend as early as possible during regular |
292 | boot. This asserts that any temporary key from an oopsed suspend or | 292 | boot. This asserts that any temporary key from an oopsed suspend or |
293 | from a failed or aborted resume is erased from the swap device. | 293 | from a failed or aborted resume is erased from the swap device. |
294 | 294 | ||
295 | As a rule of thumb use encrypted swap to protect your data while your | 295 | As a rule of thumb use encrypted swap to protect your data while your |
296 | system is shut down or suspended. Additionally use the encrypted | 296 | system is shut down or suspended. Additionally use the encrypted |
297 | suspend image to prevent sensitive data from being stolen after | 297 | suspend image to prevent sensitive data from being stolen after |
298 | resume. | 298 | resume. |
299 | 299 | ||
300 | Q: Why can't we suspend to a swap file? | 300 | Q: Why can't we suspend to a swap file? |
301 | 301 | ||
302 | A: Because accessing swap file needs the filesystem mounted, and | 302 | A: Because accessing swap file needs the filesystem mounted, and |
303 | filesystem might do something wrong (like replaying the journal) | 303 | filesystem might do something wrong (like replaying the journal) |
304 | during mount. | 304 | during mount. |
305 | 305 | ||
306 | There are few ways to get that fixed: | 306 | There are few ways to get that fixed: |
307 | 307 | ||
308 | 1) Probably could be solved by modifying every filesystem to support | 308 | 1) Probably could be solved by modifying every filesystem to support |
309 | some kind of "really read-only!" option. Patches welcome. | 309 | some kind of "really read-only!" option. Patches welcome. |
310 | 310 | ||
311 | 2) suspend2 gets around that by storing absolute positions in on-disk | 311 | 2) suspend2 gets around that by storing absolute positions in on-disk |
312 | image (and blocksize), with resume parameter pointing directly to | 312 | image (and blocksize), with resume parameter pointing directly to |
313 | suspend header. | 313 | suspend header. |
314 | 314 | ||
315 | Q: Is there a maximum system RAM size that is supported by swsusp? | 315 | Q: Is there a maximum system RAM size that is supported by swsusp? |
316 | 316 | ||
317 | A: It should work okay with highmem. | 317 | A: It should work okay with highmem. |
318 | 318 | ||
319 | Q: Does swsusp (to disk) use only one swap partition or can it use | 319 | Q: Does swsusp (to disk) use only one swap partition or can it use |
320 | multiple swap partitions (aggregate them into one logical space)? | 320 | multiple swap partitions (aggregate them into one logical space)? |
321 | 321 | ||
322 | A: Only one swap partition, sorry. | 322 | A: Only one swap partition, sorry. |
323 | 323 | ||
324 | Q: If my application(s) causes lots of memory & swap space to be used | 324 | Q: If my application(s) causes lots of memory & swap space to be used |
325 | (over half of the total system RAM), is it correct that it is likely | 325 | (over half of the total system RAM), is it correct that it is likely |
326 | to be useless to try to suspend to disk while that app is running? | 326 | to be useless to try to suspend to disk while that app is running? |
327 | 327 | ||
328 | A: No, it should work okay, as long as your app does not mlock() | 328 | A: No, it should work okay, as long as your app does not mlock() |
329 | it. Just prepare big enough swap partition. | 329 | it. Just prepare big enough swap partition. |
330 | 330 | ||
331 | Q: What information is useful for debugging suspend-to-disk problems? | 331 | Q: What information is useful for debugging suspend-to-disk problems? |
332 | 332 | ||
333 | A: Well, last messages on the screen are always useful. If something | 333 | A: Well, last messages on the screen are always useful. If something |
334 | is broken, it is usually some kernel driver, therefore trying with as | 334 | is broken, it is usually some kernel driver, therefore trying with as |
335 | little as possible modules loaded helps a lot. I also prefer people to | 335 | little as possible modules loaded helps a lot. I also prefer people to |
336 | suspend from console, preferably without X running. Booting with | 336 | suspend from console, preferably without X running. Booting with |
337 | init=/bin/bash, then swapon and starting suspend sequence manually | 337 | init=/bin/bash, then swapon and starting suspend sequence manually |
338 | usually does the trick. Then it is good idea to try with latest | 338 | usually does the trick. Then it is good idea to try with latest |
339 | vanilla kernel. | 339 | vanilla kernel. |
340 | 340 | ||
341 | Q: How can distributions ship a swsusp-supporting kernel with modular | 341 | Q: How can distributions ship a swsusp-supporting kernel with modular |
342 | disk drivers (especially SATA)? | 342 | disk drivers (especially SATA)? |
343 | 343 | ||
344 | A: Well, it can be done, load the drivers, then do echo into | 344 | A: Well, it can be done, load the drivers, then do echo into |
345 | /sys/power/disk/resume file from initrd. Be sure not to mount | 345 | /sys/power/disk/resume file from initrd. Be sure not to mount |
346 | anything, not even read-only mount, or you are going to lose your | 346 | anything, not even read-only mount, or you are going to lose your |
347 | data. | 347 | data. |
348 | 348 | ||
349 | Q: How do I make suspend more verbose? | 349 | Q: How do I make suspend more verbose? |
350 | 350 | ||
351 | A: If you want to see any non-error kernel messages on the virtual | 351 | A: If you want to see any non-error kernel messages on the virtual |
352 | terminal the kernel switches to during suspend, you have to set the | 352 | terminal the kernel switches to during suspend, you have to set the |
353 | kernel console loglevel to at least 4 (KERN_WARNING), for example by | 353 | kernel console loglevel to at least 4 (KERN_WARNING), for example by |
354 | doing | 354 | doing |
355 | 355 | ||
356 | # save the old loglevel | 356 | # save the old loglevel |
357 | read LOGLEVEL DUMMY < /proc/sys/kernel/printk | 357 | read LOGLEVEL DUMMY < /proc/sys/kernel/printk |
358 | # set the loglevel so we see the progress bar. | 358 | # set the loglevel so we see the progress bar. |
359 | # if the level is higher than needed, we leave it alone. | 359 | # if the level is higher than needed, we leave it alone. |
360 | if [ $LOGLEVEL -lt 5 ]; then | 360 | if [ $LOGLEVEL -lt 5 ]; then |
361 | echo 5 > /proc/sys/kernel/printk | 361 | echo 5 > /proc/sys/kernel/printk |
362 | fi | 362 | fi |
363 | 363 | ||
364 | IMG_SZ=0 | 364 | IMG_SZ=0 |
365 | read IMG_SZ < /sys/power/image_size | 365 | read IMG_SZ < /sys/power/image_size |
366 | echo -n disk > /sys/power/state | 366 | echo -n disk > /sys/power/state |
367 | RET=$? | 367 | RET=$? |
368 | # | 368 | # |
369 | # the logic here is: | 369 | # the logic here is: |
370 | # if image_size > 0 (without kernel support, IMG_SZ will be zero), | 370 | # if image_size > 0 (without kernel support, IMG_SZ will be zero), |
371 | # then try again with image_size set to zero. | 371 | # then try again with image_size set to zero. |
372 | if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size | 372 | if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size |
373 | echo 0 > /sys/power/image_size | 373 | echo 0 > /sys/power/image_size |
374 | echo -n disk > /sys/power/state | 374 | echo -n disk > /sys/power/state |
375 | RET=$? | 375 | RET=$? |
376 | fi | 376 | fi |
377 | 377 | ||
378 | # restore previous loglevel | 378 | # restore previous loglevel |
379 | echo $LOGLEVEL > /proc/sys/kernel/printk | 379 | echo $LOGLEVEL > /proc/sys/kernel/printk |
380 | exit $RET | 380 | exit $RET |
381 | 381 | ||
382 | Q: Is this true that if I have a mounted filesystem on a USB device and | 382 | Q: Is this true that if I have a mounted filesystem on a USB device and |
383 | I suspend to disk, I can lose data unless the filesystem has been mounted | 383 | I suspend to disk, I can lose data unless the filesystem has been mounted |
384 | with "sync"? | 384 | with "sync"? |
385 | 385 | ||
386 | A: That's right ... if you disconnect that device, you may lose data. | 386 | A: That's right ... if you disconnect that device, you may lose data. |
387 | In fact, even with "-o sync" you can lose data if your programs have | 387 | In fact, even with "-o sync" you can lose data if your programs have |
388 | information in buffers they haven't written out to a disk you disconnect, | 388 | information in buffers they haven't written out to a disk you disconnect, |
389 | or if you disconnect before the device finished saving data you wrote. | 389 | or if you disconnect before the device finished saving data you wrote. |
390 | 390 | ||
391 | Software suspend normally powers down USB controllers, which is equivalent | 391 | Software suspend normally powers down USB controllers, which is equivalent |
392 | to disconnecting all USB devices attached to your system. | 392 | to disconnecting all USB devices attached to your system. |
393 | 393 | ||
394 | Your system might well support low-power modes for its USB controllers | 394 | Your system might well support low-power modes for its USB controllers |
395 | while the system is asleep, maintaining the connection, using true sleep | 395 | while the system is asleep, maintaining the connection, using true sleep |
396 | modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the | 396 | modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the |
397 | /sys/power/state file; write "standby" or "mem".) We've not seen any | 397 | /sys/power/state file; write "standby" or "mem".) We've not seen any |
398 | hardware that can use these modes through software suspend, although in | 398 | hardware that can use these modes through software suspend, although in |
399 | theory some systems might support "platform" or "firmware" modes that | 399 | theory some systems might support "platform" or "firmware" modes that |
400 | won't break the USB connections. | 400 | won't break the USB connections. |
401 | 401 | ||
402 | Remember that it's always a bad idea to unplug a disk drive containing a | 402 | Remember that it's always a bad idea to unplug a disk drive containing a |
403 | mounted filesystem. That's true even when your system is asleep! The | 403 | mounted filesystem. That's true even when your system is asleep! The |
404 | safest thing is to unmount all filesystems on removable media (such USB, | 404 | safest thing is to unmount all filesystems on removable media (such USB, |
405 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) | 405 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) |
406 | before suspending; then remount them after resuming. | 406 | before suspending; then remount them after resuming. |
407 | 407 | ||
408 | Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were | 408 | Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were |
409 | compiled with the similar configuration files. Anyway I found that | 409 | compiled with the similar configuration files. Anyway I found that |
410 | suspend to disk (and resume) is much slower on 2.6.16 compared to | 410 | suspend to disk (and resume) is much slower on 2.6.16 compared to |
411 | 2.6.15. Any idea for why that might happen or how can I speed it up? | 411 | 2.6.15. Any idea for why that might happen or how can I speed it up? |
412 | 412 | ||
413 | A: This is because the size of the suspend image is now greater than | 413 | A: This is because the size of the suspend image is now greater than |
414 | for 2.6.15 (by saving more data we can get more responsive system | 414 | for 2.6.15 (by saving more data we can get more responsive system |
415 | after resume). | 415 | after resume). |
416 | 416 | ||
417 | There's the /sys/power/image_size knob that controls the size of the | 417 | There's the /sys/power/image_size knob that controls the size of the |
418 | image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as | 418 | image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as |
419 | root), the 2.6.15 behavior should be restored. If it is still too | 419 | root), the 2.6.15 behavior should be restored. If it is still too |
420 | slow, take a look at suspend.sf.net -- userland suspend is faster and | 420 | slow, take a look at suspend.sf.net -- userland suspend is faster and |
421 | supports LZF compression to speed it up further. | 421 | supports LZF compression to speed it up further. |
422 | 422 |
Documentation/prio_tree.txt
1 | The prio_tree.c code indexes vmas using 3 different indexes: | 1 | The prio_tree.c code indexes vmas using 3 different indexes: |
2 | * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff | 2 | * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff |
3 | * radix_index = vm_pgoff : start_vm_pgoff | 3 | * radix_index = vm_pgoff : start_vm_pgoff |
4 | * size_index = vm_size_in_pages | 4 | * size_index = vm_size_in_pages |
5 | 5 | ||
6 | A regular radix-priority-search-tree indexes vmas using only heap_index and | 6 | A regular radix-priority-search-tree indexes vmas using only heap_index and |
7 | radix_index. The conditions for indexing are: | 7 | radix_index. The conditions for indexing are: |
8 | * ->heap_index >= ->left->heap_index && | 8 | * ->heap_index >= ->left->heap_index && |
9 | ->heap_index >= ->right->heap_index | 9 | ->heap_index >= ->right->heap_index |
10 | * if (->heap_index == ->left->heap_index) | 10 | * if (->heap_index == ->left->heap_index) |
11 | then ->radix_index < ->left->radix_index; | 11 | then ->radix_index < ->left->radix_index; |
12 | * if (->heap_index == ->right->heap_index) | 12 | * if (->heap_index == ->right->heap_index) |
13 | then ->radix_index < ->right->radix_index; | 13 | then ->radix_index < ->right->radix_index; |
14 | * nodes are hashed to left or right subtree using radix_index | 14 | * nodes are hashed to left or right subtree using radix_index |
15 | similar to a pure binary radix tree. | 15 | similar to a pure binary radix tree. |
16 | 16 | ||
17 | A regular radix-priority-search-tree helps to store and query | 17 | A regular radix-priority-search-tree helps to store and query |
18 | intervals (vmas). However, a regular radix-priority-search-tree is only | 18 | intervals (vmas). However, a regular radix-priority-search-tree is only |
19 | suitable for storing vmas with different radix indices (vm_pgoff). | 19 | suitable for storing vmas with different radix indices (vm_pgoff). |
20 | 20 | ||
21 | Therefore, the prio_tree.c extends the regular radix-priority-search-tree | 21 | Therefore, the prio_tree.c extends the regular radix-priority-search-tree |
22 | to handle many vmas with the same vm_pgoff. Such vmas are handled in | 22 | to handle many vmas with the same vm_pgoff. Such vmas are handled in |
23 | 2 different ways: 1) All vmas with the same radix _and_ heap indices are | 23 | 2 different ways: 1) All vmas with the same radix _and_ heap indices are |
24 | linked using vm_set.list, 2) if there are many vmas with the same radix | 24 | linked using vm_set.list, 2) if there are many vmas with the same radix |
25 | index, but different heap indices and if the regular radix-priority-search | 25 | index, but different heap indices and if the regular radix-priority-search |
26 | tree cannot index them all, we build an overflow-sub-tree that indexes such | 26 | tree cannot index them all, we build an overflow-sub-tree that indexes such |
27 | vmas using heap and size indices instead of heap and radix indices. For | 27 | vmas using heap and size indices instead of heap and radix indices. For |
28 | example, in the figure below some vmas with vm_pgoff = 0 (zero) are | 28 | example, in the figure below some vmas with vm_pgoff = 0 (zero) are |
29 | indexed by regular radix-priority-search-tree whereas others are pushed | 29 | indexed by regular radix-priority-search-tree whereas others are pushed |
30 | into an overflow-subtree. Note that all vmas in an overflow-sub-tree have | 30 | into an overflow-subtree. Note that all vmas in an overflow-sub-tree have |
31 | the same vm_pgoff (radix_index) and if necessary we build different | 31 | the same vm_pgoff (radix_index) and if necessary we build different |
32 | overflow-sub-trees to handle each possible radix_index. For example, | 32 | overflow-sub-trees to handle each possible radix_index. For example, |
33 | in figure we have 3 overflow-sub-trees corresponding to radix indices | 33 | in figure we have 3 overflow-sub-trees corresponding to radix indices |
34 | 0, 2, and 4. | 34 | 0, 2, and 4. |
35 | 35 | ||
36 | In the final tree the first few (prio_tree_root->index_bits) levels | 36 | In the final tree the first few (prio_tree_root->index_bits) levels |
37 | are indexed using heap and radix indices whereas the overflow-sub-trees below | 37 | are indexed using heap and radix indices whereas the overflow-sub-trees below |
38 | those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are | 38 | those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are |
39 | indexed using heap and size indices. In overflow-sub-trees the size_index | 39 | indexed using heap and size indices. In overflow-sub-trees the size_index |
40 | is used for hashing the nodes to appropriate places. | 40 | is used for hashing the nodes to appropriate places. |
41 | 41 | ||
42 | Now, an example prio_tree: | 42 | Now, an example prio_tree: |
43 | 43 | ||
44 | vmas are represented [radix_index, size_index, heap_index] | 44 | vmas are represented [radix_index, size_index, heap_index] |
45 | i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff] | 45 | i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff] |
46 | 46 | ||
47 | level prio_tree_root->index_bits = 3 | 47 | level prio_tree_root->index_bits = 3 |
48 | ----- | 48 | ----- |
49 | _ | 49 | _ |
50 | 0 [0,7,7] | | 50 | 0 [0,7,7] | |
51 | / \ | | 51 | / \ | |
52 | ------------------ ------------ | Regular | 52 | ------------------ ------------ | Regular |
53 | / \ | radix priority | 53 | / \ | radix priority |
54 | 1 [1,6,7] [4,3,7] | search tree | 54 | 1 [1,6,7] [4,3,7] | search tree |
55 | / \ / \ | | 55 | / \ / \ | |
56 | ------- ----- ------ ----- | heap-and-radix | 56 | ------- ----- ------ ----- | heap-and-radix |
57 | / \ / \ | indexed | 57 | / \ / \ | indexed |
58 | 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] | | 58 | 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] | |
59 | / \ / \ / \ / \ | | 59 | / \ / \ / \ / \ | |
60 | 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] | | 60 | 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] | |
61 | / / / _ | 61 | / / / _ |
62 | / / / _ | 62 | / / / _ |
63 | 4 [0,4,4] [2,3,5] [4,1,5] | | 63 | 4 [0,4,4] [2,3,5] [4,1,5] | |
64 | / / / | | 64 | / / / | |
65 | 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees | 65 | 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees |
66 | / / | | 66 | / / | |
67 | 6 [0,2,2] [2,1,3] | heap-and-size | 67 | 6 [0,2,2] [2,1,3] | heap-and-size |
68 | / / | indexed | 68 | / / | indexed |
69 | 7 [0,1,1] [2,0,2] | | 69 | 7 [0,1,1] [2,0,2] | |
70 | / | | 70 | / | |
71 | 8 [0,0,0] | | 71 | 8 [0,0,0] | |
72 | _ | 72 | _ |
73 | 73 | ||
74 | Note that we use prio_tree_root->index_bits to optimize the height | 74 | Note that we use prio_tree_root->index_bits to optimize the height |
75 | of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is | 75 | of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is |
76 | set according to the maximum end_vm_pgoff mapped, we are sure that all | 76 | set according to the maximum end_vm_pgoff mapped, we are sure that all |
77 | bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore, | 77 | bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore, |
78 | we only use the first prio_tree_root->index_bits as radix_index. | 78 | we only use the first prio_tree_root->index_bits as radix_index. |
79 | Whenever index_bits is increased in prio_tree_expand, we shuffle the tree | 79 | Whenever index_bits is increased in prio_tree_expand, we shuffle the tree |
80 | to make sure that the first prio_tree_root->index_bits levels of the tree | 80 | to make sure that the first prio_tree_root->index_bits levels of the tree |
81 | is indexed properly using heap and radix indices. | 81 | is indexed properly using heap and radix indices. |
82 | 82 | ||
83 | We do not optimize the height of overflow-sub-trees using index_bits. | 83 | We do not optimize the height of overflow-sub-trees using index_bits. |
84 | The reason is: there can be many such overflow-sub-trees and all of | 84 | The reason is: there can be many such overflow-sub-trees and all of |
85 | them have to be suffled whenever the index_bits increases. This may involve | 85 | them have to be suffled whenever the index_bits increases. This may involve |
86 | walking the whole prio_tree in prio_tree_insert->prio_tree_expand code | 86 | walking the whole prio_tree in prio_tree_insert->prio_tree_expand code |
87 | path which is not desirable. Hence, we do not optimize the height of the | 87 | path which is not desirable. Hence, we do not optimize the height of the |
88 | heap-and-size indexed overflow-sub-trees using prio_tree->index_bits. | 88 | heap-and-size indexed overflow-sub-trees using prio_tree->index_bits. |
89 | Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits | 89 | Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits |
90 | of size_index. This may lead to skewed sub-trees because most of the | 90 | of size_index. This may lead to skewed sub-trees because most of the |
91 | higher significant bits of the size_index are likely to be be 0 (zero). In | 91 | higher significant bits of the size_index are likely to be 0 (zero). In |
92 | the example above, all 3 overflow-sub-trees are skewed. This may marginally | 92 | the example above, all 3 overflow-sub-trees are skewed. This may marginally |
93 | affect the performance. However, processes rarely map many vmas with the | 93 | affect the performance. However, processes rarely map many vmas with the |
94 | same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally | 94 | same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally |
95 | do not require overflow-sub-trees to index all vmas. | 95 | do not require overflow-sub-trees to index all vmas. |
96 | 96 | ||
97 | From the above discussion it is clear that the maximum height of | 97 | From the above discussion it is clear that the maximum height of |
98 | a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG. | 98 | a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG. |
99 | However, in most of the common cases we do not need overflow-sub-trees, | 99 | However, in most of the common cases we do not need overflow-sub-trees, |
100 | so the tree height in the common cases will be prio_tree_root->index_bits. | 100 | so the tree height in the common cases will be prio_tree_root->index_bits. |
101 | 101 | ||
102 | It is fair to mention here that the prio_tree_root->index_bits | 102 | It is fair to mention here that the prio_tree_root->index_bits |
103 | is increased on demand, however, the index_bits is not decreased when | 103 | is increased on demand, however, the index_bits is not decreased when |
104 | vmas are removed from the prio_tree. That's tricky to do. Hence, it's | 104 | vmas are removed from the prio_tree. That's tricky to do. Hence, it's |
105 | left as a home work problem. | 105 | left as a home work problem. |
106 | 106 | ||
107 | 107 | ||
108 | 108 |
Documentation/rpc-cache.txt
1 | This document gives a brief introduction to the caching | 1 | This document gives a brief introduction to the caching |
2 | mechanisms in the sunrpc layer that is used, in particular, | 2 | mechanisms in the sunrpc layer that is used, in particular, |
3 | for NFS authentication. | 3 | for NFS authentication. |
4 | 4 | ||
5 | CACHES | 5 | CACHES |
6 | ====== | 6 | ====== |
7 | The caching replaces the old exports table and allows for | 7 | The caching replaces the old exports table and allows for |
8 | a wide variety of values to be caches. | 8 | a wide variety of values to be caches. |
9 | 9 | ||
10 | There are a number of caches that are similar in structure though | 10 | There are a number of caches that are similar in structure though |
11 | quite possibly very different in content and use. There is a corpus | 11 | quite possibly very different in content and use. There is a corpus |
12 | of common code for managing these caches. | 12 | of common code for managing these caches. |
13 | 13 | ||
14 | Examples of caches that are likely to be needed are: | 14 | Examples of caches that are likely to be needed are: |
15 | - mapping from IP address to client name | 15 | - mapping from IP address to client name |
16 | - mapping from client name and filesystem to export options | 16 | - mapping from client name and filesystem to export options |
17 | - mapping from UID to list of GIDs, to work around NFS's limitation | 17 | - mapping from UID to list of GIDs, to work around NFS's limitation |
18 | of 16 gids. | 18 | of 16 gids. |
19 | - mappings between local UID/GID and remote UID/GID for sites that | 19 | - mappings between local UID/GID and remote UID/GID for sites that |
20 | do not have uniform uid assignment | 20 | do not have uniform uid assignment |
21 | - mapping from network identify to public key for crypto authentication. | 21 | - mapping from network identify to public key for crypto authentication. |
22 | 22 | ||
23 | The common code handles such things as: | 23 | The common code handles such things as: |
24 | - general cache lookup with correct locking | 24 | - general cache lookup with correct locking |
25 | - supporting 'NEGATIVE' as well as positive entries | 25 | - supporting 'NEGATIVE' as well as positive entries |
26 | - allowing an EXPIRED time on cache items, and removing | 26 | - allowing an EXPIRED time on cache items, and removing |
27 | items after they expire, and are no longer in-use. | 27 | items after they expire, and are no longer in-use. |
28 | - making requests to user-space to fill in cache entries | 28 | - making requests to user-space to fill in cache entries |
29 | - allowing user-space to directly set entries in the cache | 29 | - allowing user-space to directly set entries in the cache |
30 | - delaying RPC requests that depend on as-yet incomplete | 30 | - delaying RPC requests that depend on as-yet incomplete |
31 | cache entries, and replaying those requests when the cache entry | 31 | cache entries, and replaying those requests when the cache entry |
32 | is complete. | 32 | is complete. |
33 | - clean out old entries as they expire. | 33 | - clean out old entries as they expire. |
34 | 34 | ||
35 | Creating a Cache | 35 | Creating a Cache |
36 | ---------------- | 36 | ---------------- |
37 | 37 | ||
38 | 1/ A cache needs a datum to store. This is in the form of a | 38 | 1/ A cache needs a datum to store. This is in the form of a |
39 | structure definition that must contain a | 39 | structure definition that must contain a |
40 | struct cache_head | 40 | struct cache_head |
41 | as an element, usually the first. | 41 | as an element, usually the first. |
42 | It will also contain a key and some content. | 42 | It will also contain a key and some content. |
43 | Each cache element is reference counted and contains | 43 | Each cache element is reference counted and contains |
44 | expiry and update times for use in cache management. | 44 | expiry and update times for use in cache management. |
45 | 2/ A cache needs a "cache_detail" structure that | 45 | 2/ A cache needs a "cache_detail" structure that |
46 | describes the cache. This stores the hash table, some | 46 | describes the cache. This stores the hash table, some |
47 | parameters for cache management, and some operations detailing how | 47 | parameters for cache management, and some operations detailing how |
48 | to work with particular cache items. | 48 | to work with particular cache items. |
49 | The operations requires are: | 49 | The operations requires are: |
50 | struct cache_head *alloc(void) | 50 | struct cache_head *alloc(void) |
51 | This simply allocates appropriate memory and returns | 51 | This simply allocates appropriate memory and returns |
52 | a pointer to the cache_detail embedded within the | 52 | a pointer to the cache_detail embedded within the |
53 | structure | 53 | structure |
54 | void cache_put(struct kref *) | 54 | void cache_put(struct kref *) |
55 | This is called when the last reference to an item is | 55 | This is called when the last reference to an item is |
56 | is dropped. The pointer passed is to the 'ref' field | 56 | dropped. The pointer passed is to the 'ref' field |
57 | in the cache_head. cache_put should release any | 57 | in the cache_head. cache_put should release any |
58 | references create by 'cache_init' and, if CACHE_VALID | 58 | references create by 'cache_init' and, if CACHE_VALID |
59 | is set, any references created by cache_update. | 59 | is set, any references created by cache_update. |
60 | It should then release the memory allocated by | 60 | It should then release the memory allocated by |
61 | 'alloc'. | 61 | 'alloc'. |
62 | int match(struct cache_head *orig, struct cache_head *new) | 62 | int match(struct cache_head *orig, struct cache_head *new) |
63 | test if the keys in the two structures match. Return | 63 | test if the keys in the two structures match. Return |
64 | 1 if they do, 0 if they don't. | 64 | 1 if they do, 0 if they don't. |
65 | void init(struct cache_head *orig, struct cache_head *new) | 65 | void init(struct cache_head *orig, struct cache_head *new) |
66 | Set the 'key' fields in 'new' from 'orig'. This may | 66 | Set the 'key' fields in 'new' from 'orig'. This may |
67 | include taking references to shared objects. | 67 | include taking references to shared objects. |
68 | void update(struct cache_head *orig, struct cache_head *new) | 68 | void update(struct cache_head *orig, struct cache_head *new) |
69 | Set the 'content' fileds in 'new' from 'orig'. | 69 | Set the 'content' fileds in 'new' from 'orig'. |
70 | int cache_show(struct seq_file *m, struct cache_detail *cd, | 70 | int cache_show(struct seq_file *m, struct cache_detail *cd, |
71 | struct cache_head *h) | 71 | struct cache_head *h) |
72 | Optional. Used to provide a /proc file that lists the | 72 | Optional. Used to provide a /proc file that lists the |
73 | contents of a cache. This should show one item, | 73 | contents of a cache. This should show one item, |
74 | usually on just one line. | 74 | usually on just one line. |
75 | int cache_request(struct cache_detail *cd, struct cache_head *h, | 75 | int cache_request(struct cache_detail *cd, struct cache_head *h, |
76 | char **bpp, int *blen) | 76 | char **bpp, int *blen) |
77 | Format a request to be send to user-space for an item | 77 | Format a request to be send to user-space for an item |
78 | to be instantiated. *bpp is a buffer of size *blen. | 78 | to be instantiated. *bpp is a buffer of size *blen. |
79 | bpp should be moved forward over the encoded message, | 79 | bpp should be moved forward over the encoded message, |
80 | and *blen should be reduced to show how much free | 80 | and *blen should be reduced to show how much free |
81 | space remains. Return 0 on success or <0 if not | 81 | space remains. Return 0 on success or <0 if not |
82 | enough room or other problem. | 82 | enough room or other problem. |
83 | int cache_parse(struct cache_detail *cd, char *buf, int len) | 83 | int cache_parse(struct cache_detail *cd, char *buf, int len) |
84 | A message from user space has arrived to fill out a | 84 | A message from user space has arrived to fill out a |
85 | cache entry. It is in 'buf' of length 'len'. | 85 | cache entry. It is in 'buf' of length 'len'. |
86 | cache_parse should parse this, find the item in the | 86 | cache_parse should parse this, find the item in the |
87 | cache with sunrpc_cache_lookup, and update the item | 87 | cache with sunrpc_cache_lookup, and update the item |
88 | with sunrpc_cache_update. | 88 | with sunrpc_cache_update. |
89 | 89 | ||
90 | 90 | ||
91 | 3/ A cache needs to be registered using cache_register(). This | 91 | 3/ A cache needs to be registered using cache_register(). This |
92 | includes it on a list of caches that will be regularly | 92 | includes it on a list of caches that will be regularly |
93 | cleaned to discard old data. | 93 | cleaned to discard old data. |
94 | 94 | ||
95 | Using a cache | 95 | Using a cache |
96 | ------------- | 96 | ------------- |
97 | 97 | ||
98 | To find a value in a cache, call sunrpc_cache_lookup passing a pointer | 98 | To find a value in a cache, call sunrpc_cache_lookup passing a pointer |
99 | to the cache_head in a sample item with the 'key' fields filled in. | 99 | to the cache_head in a sample item with the 'key' fields filled in. |
100 | This will be passed to ->match to identify the target entry. If no | 100 | This will be passed to ->match to identify the target entry. If no |
101 | entry is found, a new entry will be create, added to the cache, and | 101 | entry is found, a new entry will be create, added to the cache, and |
102 | marked as not containing valid data. | 102 | marked as not containing valid data. |
103 | 103 | ||
104 | The item returned is typically passed to cache_check which will check | 104 | The item returned is typically passed to cache_check which will check |
105 | if the data is valid, and may initiate an up-call to get fresh data. | 105 | if the data is valid, and may initiate an up-call to get fresh data. |
106 | cache_check will return -ENOENT in the entry is negative or if an up | 106 | cache_check will return -ENOENT in the entry is negative or if an up |
107 | call is needed but not possible, -EAGAIN if an upcall is pending, | 107 | call is needed but not possible, -EAGAIN if an upcall is pending, |
108 | or 0 if the data is valid; | 108 | or 0 if the data is valid; |
109 | 109 | ||
110 | cache_check can be passed a "struct cache_req *". This structure is | 110 | cache_check can be passed a "struct cache_req *". This structure is |
111 | typically embedded in the actual request and can be used to create a | 111 | typically embedded in the actual request and can be used to create a |
112 | deferred copy of the request (struct cache_deferred_req). This is | 112 | deferred copy of the request (struct cache_deferred_req). This is |
113 | done when the found cache item is not uptodate, but the is reason to | 113 | done when the found cache item is not uptodate, but the is reason to |
114 | believe that userspace might provide information soon. When the cache | 114 | believe that userspace might provide information soon. When the cache |
115 | item does become valid, the deferred copy of the request will be | 115 | item does become valid, the deferred copy of the request will be |
116 | revisited (->revisit). It is expected that this method will | 116 | revisited (->revisit). It is expected that this method will |
117 | reschedule the request for processing. | 117 | reschedule the request for processing. |
118 | 118 | ||
119 | The value returned by sunrpc_cache_lookup can also be passed to | 119 | The value returned by sunrpc_cache_lookup can also be passed to |
120 | sunrpc_cache_update to set the content for the item. A second item is | 120 | sunrpc_cache_update to set the content for the item. A second item is |
121 | passed which should hold the content. If the item found by _lookup | 121 | passed which should hold the content. If the item found by _lookup |
122 | has valid data, then it is discarded and a new item is created. This | 122 | has valid data, then it is discarded and a new item is created. This |
123 | saves any user of an item from worrying about content changing while | 123 | saves any user of an item from worrying about content changing while |
124 | it is being inspected. If the item found by _lookup does not contain | 124 | it is being inspected. If the item found by _lookup does not contain |
125 | valid data, then the content is copied across and CACHE_VALID is set. | 125 | valid data, then the content is copied across and CACHE_VALID is set. |
126 | 126 | ||
127 | Populating a cache | 127 | Populating a cache |
128 | ------------------ | 128 | ------------------ |
129 | 129 | ||
130 | Each cache has a name, and when the cache is registered, a directory | 130 | Each cache has a name, and when the cache is registered, a directory |
131 | with that name is created in /proc/net/rpc | 131 | with that name is created in /proc/net/rpc |
132 | 132 | ||
133 | This directory contains a file called 'channel' which is a channel | 133 | This directory contains a file called 'channel' which is a channel |
134 | for communicating between kernel and user for populating the cache. | 134 | for communicating between kernel and user for populating the cache. |
135 | This directory may later contain other files of interacting | 135 | This directory may later contain other files of interacting |
136 | with the cache. | 136 | with the cache. |
137 | 137 | ||
138 | The 'channel' works a bit like a datagram socket. Each 'write' is | 138 | The 'channel' works a bit like a datagram socket. Each 'write' is |
139 | passed as a whole to the cache for parsing and interpretation. | 139 | passed as a whole to the cache for parsing and interpretation. |
140 | Each cache can treat the write requests differently, but it is | 140 | Each cache can treat the write requests differently, but it is |
141 | expected that a message written will contain: | 141 | expected that a message written will contain: |
142 | - a key | 142 | - a key |
143 | - an expiry time | 143 | - an expiry time |
144 | - a content. | 144 | - a content. |
145 | with the intention that an item in the cache with the give key | 145 | with the intention that an item in the cache with the give key |
146 | should be create or updated to have the given content, and the | 146 | should be create or updated to have the given content, and the |
147 | expiry time should be set on that item. | 147 | expiry time should be set on that item. |
148 | 148 | ||
149 | Reading from a channel is a bit more interesting. When a cache | 149 | Reading from a channel is a bit more interesting. When a cache |
150 | lookup fails, or when it succeeds but finds an entry that may soon | 150 | lookup fails, or when it succeeds but finds an entry that may soon |
151 | expire, a request is lodged for that cache item to be updated by | 151 | expire, a request is lodged for that cache item to be updated by |
152 | user-space. These requests appear in the channel file. | 152 | user-space. These requests appear in the channel file. |
153 | 153 | ||
154 | Successive reads will return successive requests. | 154 | Successive reads will return successive requests. |
155 | If there are no more requests to return, read will return EOF, but a | 155 | If there are no more requests to return, read will return EOF, but a |
156 | select or poll for read will block waiting for another request to be | 156 | select or poll for read will block waiting for another request to be |
157 | added. | 157 | added. |
158 | 158 | ||
159 | Thus a user-space helper is likely to: | 159 | Thus a user-space helper is likely to: |
160 | open the channel. | 160 | open the channel. |
161 | select for readable | 161 | select for readable |
162 | read a request | 162 | read a request |
163 | write a response | 163 | write a response |
164 | loop. | 164 | loop. |
165 | 165 | ||
166 | If it dies and needs to be restarted, any requests that have not been | 166 | If it dies and needs to be restarted, any requests that have not been |
167 | answered will still appear in the file and will be read by the new | 167 | answered will still appear in the file and will be read by the new |
168 | instance of the helper. | 168 | instance of the helper. |
169 | 169 | ||
170 | Each cache should define a "cache_parse" method which takes a message | 170 | Each cache should define a "cache_parse" method which takes a message |
171 | written from user-space and processes it. It should return an error | 171 | written from user-space and processes it. It should return an error |
172 | (which propagates back to the write syscall) or 0. | 172 | (which propagates back to the write syscall) or 0. |
173 | 173 | ||
174 | Each cache should also define a "cache_request" method which | 174 | Each cache should also define a "cache_request" method which |
175 | takes a cache item and encodes a request into the buffer | 175 | takes a cache item and encodes a request into the buffer |
176 | provided. | 176 | provided. |
177 | 177 | ||
178 | Note: If a cache has no active readers on the channel, and has had not | 178 | Note: If a cache has no active readers on the channel, and has had not |
179 | active readers for more than 60 seconds, further requests will not be | 179 | active readers for more than 60 seconds, further requests will not be |
180 | added to the channel but instead all lookups that do not find a valid | 180 | added to the channel but instead all lookups that do not find a valid |
181 | entry will fail. This is partly for backward compatibility: The | 181 | entry will fail. This is partly for backward compatibility: The |
182 | previous nfs exports table was deemed to be authoritative and a | 182 | previous nfs exports table was deemed to be authoritative and a |
183 | failed lookup meant a definite 'no'. | 183 | failed lookup meant a definite 'no'. |
184 | 184 | ||
185 | request/response format | 185 | request/response format |
186 | ----------------------- | 186 | ----------------------- |
187 | 187 | ||
188 | While each cache is free to use it's own format for requests | 188 | While each cache is free to use it's own format for requests |
189 | and responses over channel, the following is recommended as | 189 | and responses over channel, the following is recommended as |
190 | appropriate and support routines are available to help: | 190 | appropriate and support routines are available to help: |
191 | Each request or response record should be printable ASCII | 191 | Each request or response record should be printable ASCII |
192 | with precisely one newline character which should be at the end. | 192 | with precisely one newline character which should be at the end. |
193 | Fields within the record should be separated by spaces, normally one. | 193 | Fields within the record should be separated by spaces, normally one. |
194 | If spaces, newlines, or nul characters are needed in a field they | 194 | If spaces, newlines, or nul characters are needed in a field they |
195 | much be quoted. two mechanisms are available: | 195 | much be quoted. two mechanisms are available: |
196 | 1/ If a field begins '\x' then it must contain an even number of | 196 | 1/ If a field begins '\x' then it must contain an even number of |
197 | hex digits, and pairs of these digits provide the bytes in the | 197 | hex digits, and pairs of these digits provide the bytes in the |
198 | field. | 198 | field. |
199 | 2/ otherwise a \ in the field must be followed by 3 octal digits | 199 | 2/ otherwise a \ in the field must be followed by 3 octal digits |
200 | which give the code for a byte. Other characters are treated | 200 | which give the code for a byte. Other characters are treated |
201 | as them selves. At the very least, space, newline, nul, and | 201 | as them selves. At the very least, space, newline, nul, and |
202 | '\' must be quoted in this way. | 202 | '\' must be quoted in this way. |
203 | 203 |
Documentation/s390/Debugging390.txt
1 | 1 | ||
2 | Debugging on Linux for s/390 & z/Architecture | 2 | Debugging on Linux for s/390 & z/Architecture |
3 | by | 3 | by |
4 | Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) | 4 | Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) |
5 | Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation | 5 | Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation |
6 | Best viewed with fixed width fonts | 6 | Best viewed with fixed width fonts |
7 | 7 | ||
8 | Overview of Document: | 8 | Overview of Document: |
9 | ===================== | 9 | ===================== |
10 | This document is intended to give an good overview of how to debug | 10 | This document is intended to give an good overview of how to debug |
11 | Linux for s/390 & z/Architecture. It isn't intended as a complete reference & not a | 11 | Linux for s/390 & z/Architecture. It isn't intended as a complete reference & not a |
12 | tutorial on the fundamentals of C & assembly. It doesn't go into | 12 | tutorial on the fundamentals of C & assembly. It doesn't go into |
13 | 390 IO in any detail. It is intended to complement the documents in the | 13 | 390 IO in any detail. It is intended to complement the documents in the |
14 | reference section below & any other worthwhile references you get. | 14 | reference section below & any other worthwhile references you get. |
15 | 15 | ||
16 | It is intended like the Enterprise Systems Architecture/390 Reference Summary | 16 | It is intended like the Enterprise Systems Architecture/390 Reference Summary |
17 | to be printed out & used as a quick cheat sheet self help style reference when | 17 | to be printed out & used as a quick cheat sheet self help style reference when |
18 | problems occur. | 18 | problems occur. |
19 | 19 | ||
20 | Contents | 20 | Contents |
21 | ======== | 21 | ======== |
22 | Register Set | 22 | Register Set |
23 | Address Spaces on Intel Linux | 23 | Address Spaces on Intel Linux |
24 | Address Spaces on Linux for s/390 & z/Architecture | 24 | Address Spaces on Linux for s/390 & z/Architecture |
25 | The Linux for s/390 & z/Architecture Kernel Task Structure | 25 | The Linux for s/390 & z/Architecture Kernel Task Structure |
26 | Register Usage & Stackframes on Linux for s/390 & z/Architecture | 26 | Register Usage & Stackframes on Linux for s/390 & z/Architecture |
27 | A sample program with comments | 27 | A sample program with comments |
28 | Compiling programs for debugging on Linux for s/390 & z/Architecture | 28 | Compiling programs for debugging on Linux for s/390 & z/Architecture |
29 | Figuring out gcc compile errors | 29 | Figuring out gcc compile errors |
30 | Debugging Tools | 30 | Debugging Tools |
31 | objdump | 31 | objdump |
32 | strace | 32 | strace |
33 | Performance Debugging | 33 | Performance Debugging |
34 | Debugging under VM | 34 | Debugging under VM |
35 | s/390 & z/Architecture IO Overview | 35 | s/390 & z/Architecture IO Overview |
36 | Debugging IO on s/390 & z/Architecture under VM | 36 | Debugging IO on s/390 & z/Architecture under VM |
37 | GDB on s/390 & z/Architecture | 37 | GDB on s/390 & z/Architecture |
38 | Stack chaining in gdb by hand | 38 | Stack chaining in gdb by hand |
39 | Examining core dumps | 39 | Examining core dumps |
40 | ldd | 40 | ldd |
41 | Debugging modules | 41 | Debugging modules |
42 | The proc file system | 42 | The proc file system |
43 | Starting points for debugging scripting languages etc. | 43 | Starting points for debugging scripting languages etc. |
44 | Dumptool & Lcrash | 44 | Dumptool & Lcrash |
45 | SysRq | 45 | SysRq |
46 | References | 46 | References |
47 | Special Thanks | 47 | Special Thanks |
48 | 48 | ||
49 | Register Set | 49 | Register Set |
50 | ============ | 50 | ============ |
51 | The current architectures have the following registers. | 51 | The current architectures have the following registers. |
52 | 52 | ||
53 | 16 General propose registers, 32 bit on s/390 64 bit on z/Architecture, r0-r15 or gpr0-gpr15 used for arithmetic & addressing. | 53 | 16 General propose registers, 32 bit on s/390 64 bit on z/Architecture, r0-r15 or gpr0-gpr15 used for arithmetic & addressing. |
54 | 54 | ||
55 | 16 Control registers, 32 bit on s/390 64 bit on z/Architecture, ( cr0-cr15 kernel usage only ) used for memory management, | 55 | 16 Control registers, 32 bit on s/390 64 bit on z/Architecture, ( cr0-cr15 kernel usage only ) used for memory management, |
56 | interrupt control,debugging control etc. | 56 | interrupt control,debugging control etc. |
57 | 57 | ||
58 | 16 Access registers ( ar0-ar15 ) 32 bit on s/390 & z/Architecture | 58 | 16 Access registers ( ar0-ar15 ) 32 bit on s/390 & z/Architecture |
59 | not used by normal programs but potentially could | 59 | not used by normal programs but potentially could |
60 | be used as temporary storage. Their main purpose is their 1 to 1 | 60 | be used as temporary storage. Their main purpose is their 1 to 1 |
61 | association with general purpose registers and are used in | 61 | association with general purpose registers and are used in |
62 | the kernel for copying data between kernel & user address spaces. | 62 | the kernel for copying data between kernel & user address spaces. |
63 | Access register 0 ( & access register 1 on z/Architecture ( needs 64 bit | 63 | Access register 0 ( & access register 1 on z/Architecture ( needs 64 bit |
64 | pointer ) ) is currently used by the pthread library as a pointer to | 64 | pointer ) ) is currently used by the pthread library as a pointer to |
65 | the current running threads private area. | 65 | the current running threads private area. |
66 | 66 | ||
67 | 16 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating | 67 | 16 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating |
68 | point format compliant on G5 upwards & a Floating point control reg (FPC) | 68 | point format compliant on G5 upwards & a Floating point control reg (FPC) |
69 | 4 64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines. | 69 | 4 64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines. |
70 | Note: | 70 | Note: |
71 | Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines, | 71 | Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines, |
72 | ( provided the kernel is configured for this ). | 72 | ( provided the kernel is configured for this ). |
73 | 73 | ||
74 | 74 | ||
75 | The PSW is the most important register on the machine it | 75 | The PSW is the most important register on the machine it |
76 | is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of | 76 | is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of |
77 | a program counter (pc), condition code register,memory space designator. | 77 | a program counter (pc), condition code register,memory space designator. |
78 | In IBM standard notation I am counting bit 0 as the MSB. | 78 | In IBM standard notation I am counting bit 0 as the MSB. |
79 | It has several advantages over a normal program counter | 79 | It has several advantages over a normal program counter |
80 | in that you can change address translation & program counter | 80 | in that you can change address translation & program counter |
81 | in a single instruction. To change address translation, | 81 | in a single instruction. To change address translation, |
82 | e.g. switching address translation off requires that you | 82 | e.g. switching address translation off requires that you |
83 | have a logical=physical mapping for the address you are | 83 | have a logical=physical mapping for the address you are |
84 | currently running at. | 84 | currently running at. |
85 | 85 | ||
86 | Bit Value | 86 | Bit Value |
87 | s/390 z/Architecture | 87 | s/390 z/Architecture |
88 | 0 0 Reserved ( must be 0 ) otherwise specification exception occurs. | 88 | 0 0 Reserved ( must be 0 ) otherwise specification exception occurs. |
89 | 89 | ||
90 | 1 1 Program Event Recording 1 PER enabled, | 90 | 1 1 Program Event Recording 1 PER enabled, |
91 | PER is used to facilitate debugging e.g. single stepping. | 91 | PER is used to facilitate debugging e.g. single stepping. |
92 | 92 | ||
93 | 2-4 2-4 Reserved ( must be 0 ). | 93 | 2-4 2-4 Reserved ( must be 0 ). |
94 | 94 | ||
95 | 5 5 Dynamic address translation 1=DAT on. | 95 | 5 5 Dynamic address translation 1=DAT on. |
96 | 96 | ||
97 | 6 6 Input/Output interrupt Mask | 97 | 6 6 Input/Output interrupt Mask |
98 | 98 | ||
99 | 7 7 External interrupt Mask used primarily for interprocessor signalling & | 99 | 7 7 External interrupt Mask used primarily for interprocessor signalling & |
100 | clock interrupts. | 100 | clock interrupts. |
101 | 101 | ||
102 | 8-11 8-11 PSW Key used for complex memory protection mechanism not used under linux | 102 | 8-11 8-11 PSW Key used for complex memory protection mechanism not used under linux |
103 | 103 | ||
104 | 12 12 1 on s/390 0 on z/Architecture | 104 | 12 12 1 on s/390 0 on z/Architecture |
105 | 105 | ||
106 | 13 13 Machine Check Mask 1=enable machine check interrupts | 106 | 13 13 Machine Check Mask 1=enable machine check interrupts |
107 | 107 | ||
108 | 14 14 Wait State set this to 1 to stop the processor except for interrupts & give | 108 | 14 14 Wait State set this to 1 to stop the processor except for interrupts & give |
109 | time to other LPARS used in CPU idle in the kernel to increase overall | 109 | time to other LPARS used in CPU idle in the kernel to increase overall |
110 | usage of processor resources. | 110 | usage of processor resources. |
111 | 111 | ||
112 | 15 15 Problem state ( if set to 1 certain instructions are disabled ) | 112 | 15 15 Problem state ( if set to 1 certain instructions are disabled ) |
113 | all linux user programs run with this bit 1 | 113 | all linux user programs run with this bit 1 |
114 | ( useful info for debugging under VM ). | 114 | ( useful info for debugging under VM ). |
115 | 115 | ||
116 | 16-17 16-17 Address Space Control | 116 | 16-17 16-17 Address Space Control |
117 | 117 | ||
118 | 00 Primary Space Mode when DAT on | 118 | 00 Primary Space Mode when DAT on |
119 | The linux kernel currently runs in this mode, CR1 is affiliated with | 119 | The linux kernel currently runs in this mode, CR1 is affiliated with |
120 | this mode & points to the primary segment table origin etc. | 120 | this mode & points to the primary segment table origin etc. |
121 | 121 | ||
122 | 01 Access register mode this mode is used in functions to | 122 | 01 Access register mode this mode is used in functions to |
123 | copy data between kernel & user space. | 123 | copy data between kernel & user space. |
124 | 124 | ||
125 | 10 Secondary space mode not used in linux however CR7 the | 125 | 10 Secondary space mode not used in linux however CR7 the |
126 | register affiliated with this mode is & this & normally | 126 | register affiliated with this mode is & this & normally |
127 | CR13=CR7 to allow us to copy data between kernel & user space. | 127 | CR13=CR7 to allow us to copy data between kernel & user space. |
128 | We do this as follows: | 128 | We do this as follows: |
129 | We set ar2 to 0 to designate its | 129 | We set ar2 to 0 to designate its |
130 | affiliated gpr ( gpr2 )to point to primary=kernel space. | 130 | affiliated gpr ( gpr2 )to point to primary=kernel space. |
131 | We set ar4 to 1 to designate its | 131 | We set ar4 to 1 to designate its |
132 | affiliated gpr ( gpr4 ) to point to secondary=home=user space | 132 | affiliated gpr ( gpr4 ) to point to secondary=home=user space |
133 | & then essentially do a memcopy(gpr2,gpr4,size) to | 133 | & then essentially do a memcopy(gpr2,gpr4,size) to |
134 | copy data between the address spaces, the reason we use home space for the | 134 | copy data between the address spaces, the reason we use home space for the |
135 | kernel & don't keep secondary space free is that code will not run in | 135 | kernel & don't keep secondary space free is that code will not run in |
136 | secondary space. | 136 | secondary space. |
137 | 137 | ||
138 | 11 Home Space Mode all user programs run in this mode. | 138 | 11 Home Space Mode all user programs run in this mode. |
139 | it is affiliated with CR13. | 139 | it is affiliated with CR13. |
140 | 140 | ||
141 | 18-19 18-19 Condition codes (CC) | 141 | 18-19 18-19 Condition codes (CC) |
142 | 142 | ||
143 | 20 20 Fixed point overflow mask if 1=FPU exceptions for this event | 143 | 20 20 Fixed point overflow mask if 1=FPU exceptions for this event |
144 | occur ( normally 0 ) | 144 | occur ( normally 0 ) |
145 | 145 | ||
146 | 21 21 Decimal overflow mask if 1=FPU exceptions for this event occur | 146 | 21 21 Decimal overflow mask if 1=FPU exceptions for this event occur |
147 | ( normally 0 ) | 147 | ( normally 0 ) |
148 | 148 | ||
149 | 22 22 Exponent underflow mask if 1=FPU exceptions for this event occur | 149 | 22 22 Exponent underflow mask if 1=FPU exceptions for this event occur |
150 | ( normally 0 ) | 150 | ( normally 0 ) |
151 | 151 | ||
152 | 23 23 Significance Mask if 1=FPU exceptions for this event occur | 152 | 23 23 Significance Mask if 1=FPU exceptions for this event occur |
153 | ( normally 0 ) | 153 | ( normally 0 ) |
154 | 154 | ||
155 | 24-31 24-30 Reserved Must be 0. | 155 | 24-31 24-30 Reserved Must be 0. |
156 | 156 | ||
157 | 31 Extended Addressing Mode | 157 | 31 Extended Addressing Mode |
158 | 32 Basic Addressing Mode | 158 | 32 Basic Addressing Mode |
159 | Used to set addressing mode | 159 | Used to set addressing mode |
160 | PSW 31 PSW 32 | 160 | PSW 31 PSW 32 |
161 | 0 0 24 bit | 161 | 0 0 24 bit |
162 | 0 1 31 bit | 162 | 0 1 31 bit |
163 | 1 1 64 bit | 163 | 1 1 64 bit |
164 | 164 | ||
165 | 32 1=31 bit addressing mode 0=24 bit addressing mode (for backward | 165 | 32 1=31 bit addressing mode 0=24 bit addressing mode (for backward |
166 | compatibility), linux always runs with this bit set to 1 | 166 | compatibility), linux always runs with this bit set to 1 |
167 | 167 | ||
168 | 33-64 Instruction address. | 168 | 33-64 Instruction address. |
169 | 33-63 Reserved must be 0 | 169 | 33-63 Reserved must be 0 |
170 | 64-127 Address | 170 | 64-127 Address |
171 | In 24 bits mode bits 64-103=0 bits 104-127 Address | 171 | In 24 bits mode bits 64-103=0 bits 104-127 Address |
172 | In 31 bits mode bits 64-96=0 bits 97-127 Address | 172 | In 31 bits mode bits 64-96=0 bits 97-127 Address |
173 | Note: unlike 31 bit mode on s/390 bit 96 must be zero | 173 | Note: unlike 31 bit mode on s/390 bit 96 must be zero |
174 | when loading the address with LPSWE otherwise a | 174 | when loading the address with LPSWE otherwise a |
175 | specification exception occurs, LPSW is fully backward | 175 | specification exception occurs, LPSW is fully backward |
176 | compatible. | 176 | compatible. |
177 | 177 | ||
178 | 178 | ||
179 | Prefix Page(s) | 179 | Prefix Page(s) |
180 | -------------- | 180 | -------------- |
181 | This per cpu memory area is too intimately tied to the processor not to mention. | 181 | This per cpu memory area is too intimately tied to the processor not to mention. |
182 | It exists between the real addresses 0-4096 on s/390 & 0-8192 z/Architecture & is exchanged | 182 | It exists between the real addresses 0-4096 on s/390 & 0-8192 z/Architecture & is exchanged |
183 | with a 1 page on s/390 or 2 pages on z/Architecture in absolute storage by the set | 183 | with a 1 page on s/390 or 2 pages on z/Architecture in absolute storage by the set |
184 | prefix instruction in linux'es startup. | 184 | prefix instruction in linux'es startup. |
185 | This page is mapped to a different prefix for each processor in an SMP configuration | 185 | This page is mapped to a different prefix for each processor in an SMP configuration |
186 | ( assuming the os designer is sane of course :-) ). | 186 | ( assuming the os designer is sane of course :-) ). |
187 | Bytes 0-512 ( 200 hex ) on s/390 & 0-512,4096-4544,4604-5119 currently on z/Architecture | 187 | Bytes 0-512 ( 200 hex ) on s/390 & 0-512,4096-4544,4604-5119 currently on z/Architecture |
188 | are used by the processor itself for holding such information as exception indications & | 188 | are used by the processor itself for holding such information as exception indications & |
189 | entry points for exceptions. | 189 | entry points for exceptions. |
190 | Bytes after 0xc00 hex are used by linux for per processor globals on s/390 & z/Architecture | 190 | Bytes after 0xc00 hex are used by linux for per processor globals on s/390 & z/Architecture |
191 | ( there is a gap on z/Architecture too currently between 0xc00 & 1000 which linux uses ). | 191 | ( there is a gap on z/Architecture too currently between 0xc00 & 1000 which linux uses ). |
192 | The closest thing to this on traditional architectures is the interrupt | 192 | The closest thing to this on traditional architectures is the interrupt |
193 | vector table. This is a good thing & does simplify some of the kernel coding | 193 | vector table. This is a good thing & does simplify some of the kernel coding |
194 | however it means that we now cannot catch stray NULL pointers in the | 194 | however it means that we now cannot catch stray NULL pointers in the |
195 | kernel without hard coded checks. | 195 | kernel without hard coded checks. |
196 | 196 | ||
197 | 197 | ||
198 | 198 | ||
199 | Address Spaces on Intel Linux | 199 | Address Spaces on Intel Linux |
200 | ============================= | 200 | ============================= |
201 | 201 | ||
202 | The traditional Intel Linux is approximately mapped as follows forgive | 202 | The traditional Intel Linux is approximately mapped as follows forgive |
203 | the ascii art. | 203 | the ascii art. |
204 | 0xFFFFFFFF 4GB Himem ***************** | 204 | 0xFFFFFFFF 4GB Himem ***************** |
205 | * * | 205 | * * |
206 | * Kernel Space * | 206 | * Kernel Space * |
207 | * * | 207 | * * |
208 | ***************** **************** | 208 | ***************** **************** |
209 | User Space Himem (typically 0xC0000000 3GB )* User Stack * * * | 209 | User Space Himem (typically 0xC0000000 3GB )* User Stack * * * |
210 | ***************** * * | 210 | ***************** * * |
211 | * Shared Libs * * Next Process * | 211 | * Shared Libs * * Next Process * |
212 | ***************** * to * | 212 | ***************** * to * |
213 | * * <== * Run * <== | 213 | * * <== * Run * <== |
214 | * User Program * * * | 214 | * User Program * * * |
215 | * Data BSS * * * | 215 | * Data BSS * * * |
216 | * Text * * * | 216 | * Text * * * |
217 | * Sections * * * | 217 | * Sections * * * |
218 | 0x00000000 ***************** **************** | 218 | 0x00000000 ***************** **************** |
219 | 219 | ||
220 | Now it is easy to see that on Intel it is quite easy to recognise a kernel address | 220 | Now it is easy to see that on Intel it is quite easy to recognise a kernel address |
221 | as being one greater than user space himem ( in this case 0xC0000000). | 221 | as being one greater than user space himem ( in this case 0xC0000000). |
222 | & addresses of less than this are the ones in the current running program on this | 222 | & addresses of less than this are the ones in the current running program on this |
223 | processor ( if an smp box ). | 223 | processor ( if an smp box ). |
224 | If using the virtual machine ( VM ) as a debugger it is quite difficult to | 224 | If using the virtual machine ( VM ) as a debugger it is quite difficult to |
225 | know which user process is running as the address space you are looking at | 225 | know which user process is running as the address space you are looking at |
226 | could be from any process in the run queue. | 226 | could be from any process in the run queue. |
227 | 227 | ||
228 | The limitation of Intels addressing technique is that the linux | 228 | The limitation of Intels addressing technique is that the linux |
229 | kernel uses a very simple real address to virtual addressing technique | 229 | kernel uses a very simple real address to virtual addressing technique |
230 | of Real Address=Virtual Address-User Space Himem. | 230 | of Real Address=Virtual Address-User Space Himem. |
231 | This means that on Intel the kernel linux can typically only address | 231 | This means that on Intel the kernel linux can typically only address |
232 | Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines | 232 | Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines |
233 | can typically use. | 233 | can typically use. |
234 | They can lower User Himem to 2GB or lower & thus be | 234 | They can lower User Himem to 2GB or lower & thus be |
235 | able to use 2GB of RAM however this shrinks the maximum size | 235 | able to use 2GB of RAM however this shrinks the maximum size |
236 | of User Space from 3GB to 2GB they have a no win limit of 4GB unless | 236 | of User Space from 3GB to 2GB they have a no win limit of 4GB unless |
237 | they go to 64 Bit. | 237 | they go to 64 Bit. |
238 | 238 | ||
239 | 239 | ||
240 | On 390 our limitations & strengths make us slightly different. | 240 | On 390 our limitations & strengths make us slightly different. |
241 | For backward compatibility we are only allowed use 31 bits (2GB) | 241 | For backward compatibility we are only allowed use 31 bits (2GB) |
242 | of our 32 bit addresses, however, we use entirely separate address | 242 | of our 32 bit addresses, however, we use entirely separate address |
243 | spaces for the user & kernel. | 243 | spaces for the user & kernel. |
244 | 244 | ||
245 | This means we can support 2GB of non Extended RAM on s/390, & more | 245 | This means we can support 2GB of non Extended RAM on s/390, & more |
246 | with the Extended memory management swap device & | 246 | with the Extended memory management swap device & |
247 | currently 4TB of physical memory currently on z/Architecture. | 247 | currently 4TB of physical memory currently on z/Architecture. |
248 | 248 | ||
249 | 249 | ||
250 | Address Spaces on Linux for s/390 & z/Architecture | 250 | Address Spaces on Linux for s/390 & z/Architecture |
251 | ================================================== | 251 | ================================================== |
252 | 252 | ||
253 | Our addressing scheme is as follows | 253 | Our addressing scheme is as follows |
254 | 254 | ||
255 | 255 | ||
256 | Himem 0x7fffffff 2GB on s/390 ***************** **************** | 256 | Himem 0x7fffffff 2GB on s/390 ***************** **************** |
257 | currently 0x3ffffffffff (2^42)-1 * User Stack * * * | 257 | currently 0x3ffffffffff (2^42)-1 * User Stack * * * |
258 | on z/Architecture. ***************** * * | 258 | on z/Architecture. ***************** * * |
259 | * Shared Libs * * * | 259 | * Shared Libs * * * |
260 | ***************** * * | 260 | ***************** * * |
261 | * * * Kernel * | 261 | * * * Kernel * |
262 | * User Program * * * | 262 | * User Program * * * |
263 | * Data BSS * * * | 263 | * Data BSS * * * |
264 | * Text * * * | 264 | * Text * * * |
265 | * Sections * * * | 265 | * Sections * * * |
266 | 0x00000000 ***************** **************** | 266 | 0x00000000 ***************** **************** |
267 | 267 | ||
268 | This also means that we need to look at the PSW problem state bit | 268 | This also means that we need to look at the PSW problem state bit |
269 | or the addressing mode to decide whether we are looking at | 269 | or the addressing mode to decide whether we are looking at |
270 | user or kernel space. | 270 | user or kernel space. |
271 | 271 | ||
272 | Virtual Addresses on s/390 & z/Architecture | 272 | Virtual Addresses on s/390 & z/Architecture |
273 | =========================================== | 273 | =========================================== |
274 | 274 | ||
275 | A virtual address on s/390 is made up of 3 parts | 275 | A virtual address on s/390 is made up of 3 parts |
276 | The SX ( segment index, roughly corresponding to the PGD & PMD in linux terminology ) | 276 | The SX ( segment index, roughly corresponding to the PGD & PMD in linux terminology ) |
277 | being bits 1-11. | 277 | being bits 1-11. |
278 | The PX ( page index, corresponding to the page table entry (pte) in linux terminology ) | 278 | The PX ( page index, corresponding to the page table entry (pte) in linux terminology ) |
279 | being bits 12-19. | 279 | being bits 12-19. |
280 | The remaining bits BX (the byte index are the offset in the page ) | 280 | The remaining bits BX (the byte index are the offset in the page ) |
281 | i.e. bits 20 to 31. | 281 | i.e. bits 20 to 31. |
282 | 282 | ||
283 | On z/Architecture in linux we currently make up an address from 4 parts. | 283 | On z/Architecture in linux we currently make up an address from 4 parts. |
284 | The region index bits (RX) 0-32 we currently use bits 22-32 | 284 | The region index bits (RX) 0-32 we currently use bits 22-32 |
285 | The segment index (SX) being bits 33-43 | 285 | The segment index (SX) being bits 33-43 |
286 | The page index (PX) being bits 44-51 | 286 | The page index (PX) being bits 44-51 |
287 | The byte index (BX) being bits 52-63 | 287 | The byte index (BX) being bits 52-63 |
288 | 288 | ||
289 | Notes: | 289 | Notes: |
290 | 1) s/390 has no PMD so the PMD is really the PGD also. | 290 | 1) s/390 has no PMD so the PMD is really the PGD also. |
291 | A lot of this stuff is defined in pgtable.h. | 291 | A lot of this stuff is defined in pgtable.h. |
292 | 292 | ||
293 | 2) Also seeing as s/390's page indexes are only 1k in size | 293 | 2) Also seeing as s/390's page indexes are only 1k in size |
294 | (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k ) | 294 | (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k ) |
295 | to make the best use of memory by updating 4 segment indices | 295 | to make the best use of memory by updating 4 segment indices |
296 | entries each time we mess with a PMD & use offsets | 296 | entries each time we mess with a PMD & use offsets |
297 | 0,1024,2048 & 3072 in this page as for our segment indexes. | 297 | 0,1024,2048 & 3072 in this page as for our segment indexes. |
298 | On z/Architecture our page indexes are now 2k in size | 298 | On z/Architecture our page indexes are now 2k in size |
299 | ( bits 12-19 x 8 bytes per pte ) we do a similar trick | 299 | ( bits 12-19 x 8 bytes per pte ) we do a similar trick |
300 | but only mess with 2 segment indices each time we mess with | 300 | but only mess with 2 segment indices each time we mess with |
301 | a PMD. | 301 | a PMD. |
302 | 302 | ||
303 | 3) As z/Architecture supports upto a massive 5-level page table lookup we | 303 | 3) As z/Architecture supports upto a massive 5-level page table lookup we |
304 | can only use 3 currently on Linux ( as this is all the generic kernel | 304 | can only use 3 currently on Linux ( as this is all the generic kernel |
305 | currently supports ) however this may change in future | 305 | currently supports ) however this may change in future |
306 | this allows us to access ( according to my sums ) | 306 | this allows us to access ( according to my sums ) |
307 | 4TB of virtual storage per process i.e. | 307 | 4TB of virtual storage per process i.e. |
308 | 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes, | 308 | 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes, |
309 | enough for another 2 or 3 of years I think :-). | 309 | enough for another 2 or 3 of years I think :-). |
310 | to do this we use a region-third-table designation type in | 310 | to do this we use a region-third-table designation type in |
311 | our address space control registers. | 311 | our address space control registers. |
312 | 312 | ||
313 | 313 | ||
314 | The Linux for s/390 & z/Architecture Kernel Task Structure | 314 | The Linux for s/390 & z/Architecture Kernel Task Structure |
315 | ========================================================== | 315 | ========================================================== |
316 | Each process/thread under Linux for S390 has its own kernel task_struct | 316 | Each process/thread under Linux for S390 has its own kernel task_struct |
317 | defined in linux/include/linux/sched.h | 317 | defined in linux/include/linux/sched.h |
318 | The S390 on initialisation & resuming of a process on a cpu sets | 318 | The S390 on initialisation & resuming of a process on a cpu sets |
319 | the __LC_KERNEL_STACK variable in the spare prefix area for this cpu | 319 | the __LC_KERNEL_STACK variable in the spare prefix area for this cpu |
320 | (which we use for per-processor globals). | 320 | (which we use for per-processor globals). |
321 | 321 | ||
322 | The kernel stack pointer is intimately tied with the task structure for | 322 | The kernel stack pointer is intimately tied with the task structure for |
323 | each processor as follows. | 323 | each processor as follows. |
324 | 324 | ||
325 | s/390 | 325 | s/390 |
326 | ************************ | 326 | ************************ |
327 | * 1 page kernel stack * | 327 | * 1 page kernel stack * |
328 | * ( 4K ) * | 328 | * ( 4K ) * |
329 | ************************ | 329 | ************************ |
330 | * 1 page task_struct * | 330 | * 1 page task_struct * |
331 | * ( 4K ) * | 331 | * ( 4K ) * |
332 | 8K aligned ************************ | 332 | 8K aligned ************************ |
333 | 333 | ||
334 | z/Architecture | 334 | z/Architecture |
335 | ************************ | 335 | ************************ |
336 | * 2 page kernel stack * | 336 | * 2 page kernel stack * |
337 | * ( 8K ) * | 337 | * ( 8K ) * |
338 | ************************ | 338 | ************************ |
339 | * 2 page task_struct * | 339 | * 2 page task_struct * |
340 | * ( 8K ) * | 340 | * ( 8K ) * |
341 | 16K aligned ************************ | 341 | 16K aligned ************************ |
342 | 342 | ||
343 | What this means is that we don't need to dedicate any register or global variable | 343 | What this means is that we don't need to dedicate any register or global variable |
344 | to point to the current running process & can retrieve it with the following | 344 | to point to the current running process & can retrieve it with the following |
345 | very simple construct for s/390 & one very similar for z/Architecture. | 345 | very simple construct for s/390 & one very similar for z/Architecture. |
346 | 346 | ||
347 | static inline struct task_struct * get_current(void) | 347 | static inline struct task_struct * get_current(void) |
348 | { | 348 | { |
349 | struct task_struct *current; | 349 | struct task_struct *current; |
350 | __asm__("lhi %0,-8192\n\t" | 350 | __asm__("lhi %0,-8192\n\t" |
351 | "nr %0,15" | 351 | "nr %0,15" |
352 | : "=r" (current) ); | 352 | : "=r" (current) ); |
353 | return current; | 353 | return current; |
354 | } | 354 | } |
355 | 355 | ||
356 | i.e. just anding the current kernel stack pointer with the mask -8192. | 356 | i.e. just anding the current kernel stack pointer with the mask -8192. |
357 | Thankfully because Linux doesn't have support for nested IO interrupts | 357 | Thankfully because Linux doesn't have support for nested IO interrupts |
358 | & our devices have large buffers can survive interrupts being shut for | 358 | & our devices have large buffers can survive interrupts being shut for |
359 | short amounts of time we don't need a separate stack for interrupts. | 359 | short amounts of time we don't need a separate stack for interrupts. |
360 | 360 | ||
361 | 361 | ||
362 | 362 | ||
363 | 363 | ||
364 | Register Usage & Stackframes on Linux for s/390 & z/Architecture | 364 | Register Usage & Stackframes on Linux for s/390 & z/Architecture |
365 | ================================================================= | 365 | ================================================================= |
366 | Overview: | 366 | Overview: |
367 | --------- | 367 | --------- |
368 | This is the code that gcc produces at the top & the bottom of | 368 | This is the code that gcc produces at the top & the bottom of |
369 | each function. It usually is fairly consistent & similar from | 369 | each function. It usually is fairly consistent & similar from |
370 | function to function & if you know its layout you can probably | 370 | function to function & if you know its layout you can probably |
371 | make some headway in finding the ultimate cause of a problem | 371 | make some headway in finding the ultimate cause of a problem |
372 | after a crash without a source level debugger. | 372 | after a crash without a source level debugger. |
373 | 373 | ||
374 | Note: To follow stackframes requires a knowledge of C or Pascal & | 374 | Note: To follow stackframes requires a knowledge of C or Pascal & |
375 | limited knowledge of one assembly language. | 375 | limited knowledge of one assembly language. |
376 | 376 | ||
377 | It should be noted that there are some differences between the | 377 | It should be noted that there are some differences between the |
378 | s/390 & z/Architecture stack layouts as the z/Architecture stack layout didn't have | 378 | s/390 & z/Architecture stack layouts as the z/Architecture stack layout didn't have |
379 | to maintain compatibility with older linkage formats. | 379 | to maintain compatibility with older linkage formats. |
380 | 380 | ||
381 | Glossary: | 381 | Glossary: |
382 | --------- | 382 | --------- |
383 | alloca: | 383 | alloca: |
384 | This is a built in compiler function for runtime allocation | 384 | This is a built in compiler function for runtime allocation |
385 | of extra space on the callers stack which is obviously freed | 385 | of extra space on the callers stack which is obviously freed |
386 | up on function exit ( e.g. the caller may choose to allocate nothing | 386 | up on function exit ( e.g. the caller may choose to allocate nothing |
387 | of a buffer of 4k if required for temporary purposes ), it generates | 387 | of a buffer of 4k if required for temporary purposes ), it generates |
388 | very efficient code ( a few cycles ) when compared to alternatives | 388 | very efficient code ( a few cycles ) when compared to alternatives |
389 | like malloc. | 389 | like malloc. |
390 | 390 | ||
391 | automatics: These are local variables on the stack, | 391 | automatics: These are local variables on the stack, |
392 | i.e they aren't in registers & they aren't static. | 392 | i.e they aren't in registers & they aren't static. |
393 | 393 | ||
394 | back-chain: | 394 | back-chain: |
395 | This is a pointer to the stack pointer before entering a | 395 | This is a pointer to the stack pointer before entering a |
396 | framed functions ( see frameless function ) prologue got by | 396 | framed functions ( see frameless function ) prologue got by |
397 | dereferencing the address of the current stack pointer, | 397 | dereferencing the address of the current stack pointer, |
398 | i.e. got by accessing the 32 bit value at the stack pointers | 398 | i.e. got by accessing the 32 bit value at the stack pointers |
399 | current location. | 399 | current location. |
400 | 400 | ||
401 | base-pointer: | 401 | base-pointer: |
402 | This is a pointer to the back of the literal pool which | 402 | This is a pointer to the back of the literal pool which |
403 | is an area just behind each procedure used to store constants | 403 | is an area just behind each procedure used to store constants |
404 | in each function. | 404 | in each function. |
405 | 405 | ||
406 | call-clobbered: The caller probably needs to save these registers if there | 406 | call-clobbered: The caller probably needs to save these registers if there |
407 | is something of value in them, on the stack or elsewhere before making a | 407 | is something of value in them, on the stack or elsewhere before making a |
408 | call to another procedure so that it can restore it later. | 408 | call to another procedure so that it can restore it later. |
409 | 409 | ||
410 | epilogue: | 410 | epilogue: |
411 | The code generated by the compiler to return to the caller. | 411 | The code generated by the compiler to return to the caller. |
412 | 412 | ||
413 | frameless-function | 413 | frameless-function |
414 | A frameless function in Linux for s390 & z/Architecture is one which doesn't | 414 | A frameless function in Linux for s390 & z/Architecture is one which doesn't |
415 | need more than the register save area ( 96 bytes on s/390, 160 on z/Architecture ) | 415 | need more than the register save area ( 96 bytes on s/390, 160 on z/Architecture ) |
416 | given to it by the caller. | 416 | given to it by the caller. |
417 | A frameless function never: | 417 | A frameless function never: |
418 | 1) Sets up a back chain. | 418 | 1) Sets up a back chain. |
419 | 2) Calls alloca. | 419 | 2) Calls alloca. |
420 | 3) Calls other normal functions | 420 | 3) Calls other normal functions |
421 | 4) Has automatics. | 421 | 4) Has automatics. |
422 | 422 | ||
423 | GOT-pointer: | 423 | GOT-pointer: |
424 | This is a pointer to the global-offset-table in ELF | 424 | This is a pointer to the global-offset-table in ELF |
425 | ( Executable Linkable Format, Linux'es most common executable format ), | 425 | ( Executable Linkable Format, Linux'es most common executable format ), |
426 | all globals & shared library objects are found using this pointer. | 426 | all globals & shared library objects are found using this pointer. |
427 | 427 | ||
428 | lazy-binding | 428 | lazy-binding |
429 | ELF shared libraries are typically only loaded when routines in the shared | 429 | ELF shared libraries are typically only loaded when routines in the shared |
430 | library are actually first called at runtime. This is lazy binding. | 430 | library are actually first called at runtime. This is lazy binding. |
431 | 431 | ||
432 | procedure-linkage-table | 432 | procedure-linkage-table |
433 | This is a table found from the GOT which contains pointers to routines | 433 | This is a table found from the GOT which contains pointers to routines |
434 | in other shared libraries which can't be called to by easier means. | 434 | in other shared libraries which can't be called to by easier means. |
435 | 435 | ||
436 | prologue: | 436 | prologue: |
437 | The code generated by the compiler to set up the stack frame. | 437 | The code generated by the compiler to set up the stack frame. |
438 | 438 | ||
439 | outgoing-args: | 439 | outgoing-args: |
440 | This is extra area allocated on the stack of the calling function if the | 440 | This is extra area allocated on the stack of the calling function if the |
441 | parameters for the callee's cannot all be put in registers, the same | 441 | parameters for the callee's cannot all be put in registers, the same |
442 | area can be reused by each function the caller calls. | 442 | area can be reused by each function the caller calls. |
443 | 443 | ||
444 | routine-descriptor: | 444 | routine-descriptor: |
445 | A COFF executable format based concept of a procedure reference | 445 | A COFF executable format based concept of a procedure reference |
446 | actually being 8 bytes or more as opposed to a simple pointer to the routine. | 446 | actually being 8 bytes or more as opposed to a simple pointer to the routine. |
447 | This is typically defined as follows | 447 | This is typically defined as follows |
448 | Routine Descriptor offset 0=Pointer to Function | 448 | Routine Descriptor offset 0=Pointer to Function |
449 | Routine Descriptor offset 4=Pointer to Table of Contents | 449 | Routine Descriptor offset 4=Pointer to Table of Contents |
450 | The table of contents/TOC is roughly equivalent to a GOT pointer. | 450 | The table of contents/TOC is roughly equivalent to a GOT pointer. |
451 | & it means that shared libraries etc. can be shared between several | 451 | & it means that shared libraries etc. can be shared between several |
452 | environments each with their own TOC. | 452 | environments each with their own TOC. |
453 | 453 | ||
454 | 454 | ||
455 | static-chain: This is used in nested functions a concept adopted from pascal | 455 | static-chain: This is used in nested functions a concept adopted from pascal |
456 | by gcc not used in ansi C or C++ ( although quite useful ), basically it | 456 | by gcc not used in ansi C or C++ ( although quite useful ), basically it |
457 | is a pointer used to reference local variables of enclosing functions. | 457 | is a pointer used to reference local variables of enclosing functions. |
458 | You might come across this stuff once or twice in your lifetime. | 458 | You might come across this stuff once or twice in your lifetime. |
459 | 459 | ||
460 | e.g. | 460 | e.g. |
461 | The function below should return 11 though gcc may get upset & toss warnings | 461 | The function below should return 11 though gcc may get upset & toss warnings |
462 | about unused variables. | 462 | about unused variables. |
463 | int FunctionA(int a) | 463 | int FunctionA(int a) |
464 | { | 464 | { |
465 | int b; | 465 | int b; |
466 | FunctionC(int c) | 466 | FunctionC(int c) |
467 | { | 467 | { |
468 | b=c+1; | 468 | b=c+1; |
469 | } | 469 | } |
470 | FunctionC(10); | 470 | FunctionC(10); |
471 | return(b); | 471 | return(b); |
472 | } | 472 | } |
473 | 473 | ||
474 | 474 | ||
475 | s/390 & z/Architecture Register usage | 475 | s/390 & z/Architecture Register usage |
476 | ===================================== | 476 | ===================================== |
477 | r0 used by syscalls/assembly call-clobbered | 477 | r0 used by syscalls/assembly call-clobbered |
478 | r1 used by syscalls/assembly call-clobbered | 478 | r1 used by syscalls/assembly call-clobbered |
479 | r2 argument 0 / return value 0 call-clobbered | 479 | r2 argument 0 / return value 0 call-clobbered |
480 | r3 argument 1 / return value 1 (if long long) call-clobbered | 480 | r3 argument 1 / return value 1 (if long long) call-clobbered |
481 | r4 argument 2 call-clobbered | 481 | r4 argument 2 call-clobbered |
482 | r5 argument 3 call-clobbered | 482 | r5 argument 3 call-clobbered |
483 | r6 argument 5 saved | 483 | r6 argument 5 saved |
484 | r7 pointer-to arguments 5 to ... saved | 484 | r7 pointer-to arguments 5 to ... saved |
485 | r8 this & that saved | 485 | r8 this & that saved |
486 | r9 this & that saved | 486 | r9 this & that saved |
487 | r10 static-chain ( if nested function ) saved | 487 | r10 static-chain ( if nested function ) saved |
488 | r11 frame-pointer ( if function used alloca ) saved | 488 | r11 frame-pointer ( if function used alloca ) saved |
489 | r12 got-pointer saved | 489 | r12 got-pointer saved |
490 | r13 base-pointer saved | 490 | r13 base-pointer saved |
491 | r14 return-address saved | 491 | r14 return-address saved |
492 | r15 stack-pointer saved | 492 | r15 stack-pointer saved |
493 | 493 | ||
494 | f0 argument 0 / return value ( float/double ) call-clobbered | 494 | f0 argument 0 / return value ( float/double ) call-clobbered |
495 | f2 argument 1 call-clobbered | 495 | f2 argument 1 call-clobbered |
496 | f4 z/Architecture argument 2 saved | 496 | f4 z/Architecture argument 2 saved |
497 | f6 z/Architecture argument 3 saved | 497 | f6 z/Architecture argument 3 saved |
498 | The remaining floating points | 498 | The remaining floating points |
499 | f1,f3,f5 f7-f15 are call-clobbered. | 499 | f1,f3,f5 f7-f15 are call-clobbered. |
500 | 500 | ||
501 | Notes: | 501 | Notes: |
502 | ------ | 502 | ------ |
503 | 1) The only requirement is that registers which are used | 503 | 1) The only requirement is that registers which are used |
504 | by the callee are saved, e.g. the compiler is perfectly | 504 | by the callee are saved, e.g. the compiler is perfectly |
505 | capible of using r11 for purposes other than a frame a | 505 | capible of using r11 for purposes other than a frame a |
506 | frame pointer if a frame pointer is not needed. | 506 | frame pointer if a frame pointer is not needed. |
507 | 2) In functions with variable arguments e.g. printf the calling procedure | 507 | 2) In functions with variable arguments e.g. printf the calling procedure |
508 | is identical to one without variable arguments & the same number of | 508 | is identical to one without variable arguments & the same number of |
509 | parameters. However, the prologue of this function is somewhat more | 509 | parameters. However, the prologue of this function is somewhat more |
510 | hairy owing to it having to move these parameters to the stack to | 510 | hairy owing to it having to move these parameters to the stack to |
511 | get va_start, va_arg & va_end to work. | 511 | get va_start, va_arg & va_end to work. |
512 | 3) Access registers are currently unused by gcc but are used in | 512 | 3) Access registers are currently unused by gcc but are used in |
513 | the kernel. Possibilities exist to use them at the moment for | 513 | the kernel. Possibilities exist to use them at the moment for |
514 | temporary storage but it isn't recommended. | 514 | temporary storage but it isn't recommended. |
515 | 4) Only 4 of the floating point registers are used for | 515 | 4) Only 4 of the floating point registers are used for |
516 | parameter passing as older machines such as G3 only have only 4 | 516 | parameter passing as older machines such as G3 only have only 4 |
517 | & it keeps the stack frame compatible with other compilers. | 517 | & it keeps the stack frame compatible with other compilers. |
518 | However with IEEE floating point emulation under linux on the | 518 | However with IEEE floating point emulation under linux on the |
519 | older machines you are free to use the other 12. | 519 | older machines you are free to use the other 12. |
520 | 5) A long long or double parameter cannot be have the | 520 | 5) A long long or double parameter cannot be have the |
521 | first 4 bytes in a register & the second four bytes in the | 521 | first 4 bytes in a register & the second four bytes in the |
522 | outgoing args area. It must be purely in the outgoing args | 522 | outgoing args area. It must be purely in the outgoing args |
523 | area if crossing this boundary. | 523 | area if crossing this boundary. |
524 | 6) Floating point parameters are mixed with outgoing args | 524 | 6) Floating point parameters are mixed with outgoing args |
525 | on the outgoing args area in the order the are passed in as parameters. | 525 | on the outgoing args area in the order the are passed in as parameters. |
526 | 7) Floating point arguments 2 & 3 are saved in the outgoing args area for | 526 | 7) Floating point arguments 2 & 3 are saved in the outgoing args area for |
527 | z/Architecture | 527 | z/Architecture |
528 | 528 | ||
529 | 529 | ||
530 | Stack Frame Layout | 530 | Stack Frame Layout |
531 | ------------------ | 531 | ------------------ |
532 | s/390 z/Architecture | 532 | s/390 z/Architecture |
533 | 0 0 back chain ( a 0 here signifies end of back chain ) | 533 | 0 0 back chain ( a 0 here signifies end of back chain ) |
534 | 4 8 eos ( end of stack, not used on Linux for S390 used in other linkage formats ) | 534 | 4 8 eos ( end of stack, not used on Linux for S390 used in other linkage formats ) |
535 | 8 16 glue used in other s/390 linkage formats for saved routine descriptors etc. | 535 | 8 16 glue used in other s/390 linkage formats for saved routine descriptors etc. |
536 | 12 24 glue used in other s/390 linkage formats for saved routine descriptors etc. | 536 | 12 24 glue used in other s/390 linkage formats for saved routine descriptors etc. |
537 | 16 32 scratch area | 537 | 16 32 scratch area |
538 | 20 40 scratch area | 538 | 20 40 scratch area |
539 | 24 48 saved r6 of caller function | 539 | 24 48 saved r6 of caller function |
540 | 28 56 saved r7 of caller function | 540 | 28 56 saved r7 of caller function |
541 | 32 64 saved r8 of caller function | 541 | 32 64 saved r8 of caller function |
542 | 36 72 saved r9 of caller function | 542 | 36 72 saved r9 of caller function |
543 | 40 80 saved r10 of caller function | 543 | 40 80 saved r10 of caller function |
544 | 44 88 saved r11 of caller function | 544 | 44 88 saved r11 of caller function |
545 | 48 96 saved r12 of caller function | 545 | 48 96 saved r12 of caller function |
546 | 52 104 saved r13 of caller function | 546 | 52 104 saved r13 of caller function |
547 | 56 112 saved r14 of caller function | 547 | 56 112 saved r14 of caller function |
548 | 60 120 saved r15 of caller function | 548 | 60 120 saved r15 of caller function |
549 | 64 128 saved f4 of caller function | 549 | 64 128 saved f4 of caller function |
550 | 72 132 saved f6 of caller function | 550 | 72 132 saved f6 of caller function |
551 | 80 undefined | 551 | 80 undefined |
552 | 96 160 outgoing args passed from caller to callee | 552 | 96 160 outgoing args passed from caller to callee |
553 | 96+x 160+x possible stack alignment ( 8 bytes desirable ) | 553 | 96+x 160+x possible stack alignment ( 8 bytes desirable ) |
554 | 96+x+y 160+x+y alloca space of caller ( if used ) | 554 | 96+x+y 160+x+y alloca space of caller ( if used ) |
555 | 96+x+y+z 160+x+y+z automatics of caller ( if used ) | 555 | 96+x+y+z 160+x+y+z automatics of caller ( if used ) |
556 | 0 back-chain | 556 | 0 back-chain |
557 | 557 | ||
558 | A sample program with comments. | 558 | A sample program with comments. |
559 | =============================== | 559 | =============================== |
560 | 560 | ||
561 | Comments on the function test | 561 | Comments on the function test |
562 | ----------------------------- | 562 | ----------------------------- |
563 | 1) It didn't need to set up a pointer to the constant pool gpr13 as it isn't used | 563 | 1) It didn't need to set up a pointer to the constant pool gpr13 as it isn't used |
564 | ( :-( ). | 564 | ( :-( ). |
565 | 2) This is a frameless function & no stack is bought. | 565 | 2) This is a frameless function & no stack is bought. |
566 | 3) The compiler was clever enough to recognise that it could return the | 566 | 3) The compiler was clever enough to recognise that it could return the |
567 | value in r2 as well as use it for the passed in parameter ( :-) ). | 567 | value in r2 as well as use it for the passed in parameter ( :-) ). |
568 | 4) The basr ( branch relative & save ) trick works as follows the instruction | 568 | 4) The basr ( branch relative & save ) trick works as follows the instruction |
569 | has a special case with r0,r0 with some instruction operands is understood as | 569 | has a special case with r0,r0 with some instruction operands is understood as |
570 | the literal value 0, some risc architectures also do this ). So now | 570 | the literal value 0, some risc architectures also do this ). So now |
571 | we are branching to the next address & the address new program counter is | 571 | we are branching to the next address & the address new program counter is |
572 | in r13,so now we subtract the size of the function prologue we have executed | 572 | in r13,so now we subtract the size of the function prologue we have executed |
573 | + the size of the literal pool to get to the top of the literal pool | 573 | + the size of the literal pool to get to the top of the literal pool |
574 | 0040037c int test(int b) | 574 | 0040037c int test(int b) |
575 | { # Function prologue below | 575 | { # Function prologue below |
576 | 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14 | 576 | 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14 |
577 | 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using | 577 | 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using |
578 | 400382: a7 da ff fa ahi %r13,-6 # basr trick | 578 | 400382: a7 da ff fa ahi %r13,-6 # basr trick |
579 | return(5+b); | 579 | return(5+b); |
580 | # Huge main program | 580 | # Huge main program |
581 | 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2 | 581 | 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2 |
582 | 582 | ||
583 | # Function epilogue below | 583 | # Function epilogue below |
584 | 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14 | 584 | 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14 |
585 | 40038e: 07 fe br %r14 # return | 585 | 40038e: 07 fe br %r14 # return |
586 | } | 586 | } |
587 | 587 | ||
588 | Comments on the function main | 588 | Comments on the function main |
589 | ----------------------------- | 589 | ----------------------------- |
590 | 1) The compiler did this function optimally ( 8-) ) | 590 | 1) The compiler did this function optimally ( 8-) ) |
591 | 591 | ||
592 | Literal pool for main. | 592 | Literal pool for main. |
593 | 400390: ff ff ff ec .long 0xffffffec | 593 | 400390: ff ff ff ec .long 0xffffffec |
594 | main(int argc,char *argv[]) | 594 | main(int argc,char *argv[]) |
595 | { # Function prologue below | 595 | { # Function prologue below |
596 | 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers | 596 | 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers |
597 | 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0 | 597 | 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0 |
598 | 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving | 598 | 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving |
599 | 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to | 599 | 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to |
600 | 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool | 600 | 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool |
601 | 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain | 601 | 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain |
602 | 602 | ||
603 | return(test(5)); # Main Program Below | 603 | return(test(5)); # Main Program Below |
604 | 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from | 604 | 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from |
605 | # literal pool | 605 | # literal pool |
606 | 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5 | 606 | 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5 |
607 | 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return | 607 | 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return |
608 | # address using branch & save instruction. | 608 | # address using branch & save instruction. |
609 | 609 | ||
610 | # Function Epilogue below | 610 | # Function Epilogue below |
611 | 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers. | 611 | 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers. |
612 | 4003b8: 07 fe br %r14 # return to do program exit | 612 | 4003b8: 07 fe br %r14 # return to do program exit |
613 | } | 613 | } |
614 | 614 | ||
615 | 615 | ||
616 | Compiler updates | 616 | Compiler updates |
617 | ---------------- | 617 | ---------------- |
618 | 618 | ||
619 | main(int argc,char *argv[]) | 619 | main(int argc,char *argv[]) |
620 | { | 620 | { |
621 | 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15) | 621 | 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15) |
622 | 400500: a7 d5 00 04 bras %r13,400508 <main+0xc> | 622 | 400500: a7 d5 00 04 bras %r13,400508 <main+0xc> |
623 | 400504: 00 40 04 f4 .long 0x004004f4 | 623 | 400504: 00 40 04 f4 .long 0x004004f4 |
624 | # compiler now puts constant pool in code to so it saves an instruction | 624 | # compiler now puts constant pool in code to so it saves an instruction |
625 | 400508: 18 0f lr %r0,%r15 | 625 | 400508: 18 0f lr %r0,%r15 |
626 | 40050a: a7 fa ff a0 ahi %r15,-96 | 626 | 40050a: a7 fa ff a0 ahi %r15,-96 |
627 | 40050e: 50 00 f0 00 st %r0,0(%r15) | 627 | 40050e: 50 00 f0 00 st %r0,0(%r15) |
628 | return(test(5)); | 628 | return(test(5)); |
629 | 400512: 58 10 d0 00 l %r1,0(%r13) | 629 | 400512: 58 10 d0 00 l %r1,0(%r13) |
630 | 400516: a7 28 00 05 lhi %r2,5 | 630 | 400516: a7 28 00 05 lhi %r2,5 |
631 | 40051a: 0d e1 basr %r14,%r1 | 631 | 40051a: 0d e1 basr %r14,%r1 |
632 | # compiler adds 1 extra instruction to epilogue this is done to | 632 | # compiler adds 1 extra instruction to epilogue this is done to |
633 | # avoid processor pipeline stalls owing to data dependencies on g5 & | 633 | # avoid processor pipeline stalls owing to data dependencies on g5 & |
634 | # above as register 14 in the old code was needed directly after being loaded | 634 | # above as register 14 in the old code was needed directly after being loaded |
635 | # by the lm %r11,%r15,140(%r15) for the br %14. | 635 | # by the lm %r11,%r15,140(%r15) for the br %14. |
636 | 40051c: 58 40 f0 98 l %r4,152(%r15) | 636 | 40051c: 58 40 f0 98 l %r4,152(%r15) |
637 | 400520: 98 7f f0 7c lm %r7,%r15,124(%r15) | 637 | 400520: 98 7f f0 7c lm %r7,%r15,124(%r15) |
638 | 400524: 07 f4 br %r4 | 638 | 400524: 07 f4 br %r4 |
639 | } | 639 | } |
640 | 640 | ||
641 | 641 | ||
642 | Hartmut ( our compiler developer ) also has been threatening to take out the | 642 | Hartmut ( our compiler developer ) also has been threatening to take out the |
643 | stack backchain in optimised code as this also causes pipeline stalls, you | 643 | stack backchain in optimised code as this also causes pipeline stalls, you |
644 | have been warned. | 644 | have been warned. |
645 | 645 | ||
646 | 64 bit z/Architecture code disassembly | 646 | 64 bit z/Architecture code disassembly |
647 | -------------------------------------- | 647 | -------------------------------------- |
648 | 648 | ||
649 | If you understand the stuff above you'll understand the stuff | 649 | If you understand the stuff above you'll understand the stuff |
650 | below too so I'll avoid repeating myself & just say that | 650 | below too so I'll avoid repeating myself & just say that |
651 | some of the instructions have g's on the end of them to indicate | 651 | some of the instructions have g's on the end of them to indicate |
652 | they are 64 bit & the stack offsets are a bigger, | 652 | they are 64 bit & the stack offsets are a bigger, |
653 | the only other difference you'll find between 32 & 64 bit is that | 653 | the only other difference you'll find between 32 & 64 bit is that |
654 | we now use f4 & f6 for floating point arguments on 64 bit. | 654 | we now use f4 & f6 for floating point arguments on 64 bit. |
655 | 00000000800005b0 <test>: | 655 | 00000000800005b0 <test>: |
656 | int test(int b) | 656 | int test(int b) |
657 | { | 657 | { |
658 | return(5+b); | 658 | return(5+b); |
659 | 800005b0: a7 2a 00 05 ahi %r2,5 | 659 | 800005b0: a7 2a 00 05 ahi %r2,5 |
660 | 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer | 660 | 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer |
661 | 800005b8: 07 fe br %r14 | 661 | 800005b8: 07 fe br %r14 |
662 | 800005ba: 07 07 bcr 0,%r7 | 662 | 800005ba: 07 07 bcr 0,%r7 |
663 | 663 | ||
664 | 664 | ||
665 | } | 665 | } |
666 | 666 | ||
667 | 00000000800005bc <main>: | 667 | 00000000800005bc <main>: |
668 | main(int argc,char *argv[]) | 668 | main(int argc,char *argv[]) |
669 | { | 669 | { |
670 | 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15) | 670 | 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15) |
671 | 800005c2: b9 04 00 1f lgr %r1,%r15 | 671 | 800005c2: b9 04 00 1f lgr %r1,%r15 |
672 | 800005c6: a7 fb ff 60 aghi %r15,-160 | 672 | 800005c6: a7 fb ff 60 aghi %r15,-160 |
673 | 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15) | 673 | 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15) |
674 | return(test(5)); | 674 | return(test(5)); |
675 | 800005d0: a7 29 00 05 lghi %r2,5 | 675 | 800005d0: a7 29 00 05 lghi %r2,5 |
676 | # brasl allows jumps > 64k & is overkill here bras would do fune | 676 | # brasl allows jumps > 64k & is overkill here bras would do fune |
677 | 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test> | 677 | 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test> |
678 | 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15) | 678 | 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15) |
679 | 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15) | 679 | 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15) |
680 | 800005e6: 07 f4 br %r4 | 680 | 800005e6: 07 f4 br %r4 |
681 | } | 681 | } |
682 | 682 | ||
683 | 683 | ||
684 | 684 | ||
685 | Compiling programs for debugging on Linux for s/390 & z/Architecture | 685 | Compiling programs for debugging on Linux for s/390 & z/Architecture |
686 | ==================================================================== | 686 | ==================================================================== |
687 | -gdwarf-2 now works it should be considered the default debugging | 687 | -gdwarf-2 now works it should be considered the default debugging |
688 | format for s/390 & z/Architecture as it is more reliable for debugging | 688 | format for s/390 & z/Architecture as it is more reliable for debugging |
689 | shared libraries, normal -g debugging works much better now | 689 | shared libraries, normal -g debugging works much better now |
690 | Thanks to the IBM java compiler developers bug reports. | 690 | Thanks to the IBM java compiler developers bug reports. |
691 | 691 | ||
692 | This is typically done adding/appending the flags -g or -gdwarf-2 to the | 692 | This is typically done adding/appending the flags -g or -gdwarf-2 to the |
693 | CFLAGS & LDFLAGS variables Makefile of the program concerned. | 693 | CFLAGS & LDFLAGS variables Makefile of the program concerned. |
694 | 694 | ||
695 | If using gdb & you would like accurate displays of registers & | 695 | If using gdb & you would like accurate displays of registers & |
696 | stack traces compile without optimisation i.e make sure | 696 | stack traces compile without optimisation i.e make sure |
697 | that there is no -O2 or similar on the CFLAGS line of the Makefile & | 697 | that there is no -O2 or similar on the CFLAGS line of the Makefile & |
698 | the emitted gcc commands, obviously this will produce worse code | 698 | the emitted gcc commands, obviously this will produce worse code |
699 | ( not advisable for shipment ) but it is an aid to the debugging process. | 699 | ( not advisable for shipment ) but it is an aid to the debugging process. |
700 | 700 | ||
701 | This aids debugging because the compiler will copy parameters passed in | 701 | This aids debugging because the compiler will copy parameters passed in |
702 | in registers onto the stack so backtracing & looking at passed in | 702 | in registers onto the stack so backtracing & looking at passed in |
703 | parameters will work, however some larger programs which use inline functions | 703 | parameters will work, however some larger programs which use inline functions |
704 | will not compile without optimisation. | 704 | will not compile without optimisation. |
705 | 705 | ||
706 | Debugging with optimisation has since much improved after fixing | 706 | Debugging with optimisation has since much improved after fixing |
707 | some bugs, please make sure you are using gdb-5.0 or later developed | 707 | some bugs, please make sure you are using gdb-5.0 or later developed |
708 | after Nov'2000. | 708 | after Nov'2000. |
709 | 709 | ||
710 | Figuring out gcc compile errors | 710 | Figuring out gcc compile errors |
711 | =============================== | 711 | =============================== |
712 | If you are getting a lot of syntax errors compiling a program & the problem | 712 | If you are getting a lot of syntax errors compiling a program & the problem |
713 | isn't blatantly obvious from the source. | 713 | isn't blatantly obvious from the source. |
714 | It often helps to just preprocess the file, this is done with the -E | 714 | It often helps to just preprocess the file, this is done with the -E |
715 | option in gcc. | 715 | option in gcc. |
716 | What this does is that it runs through the very first phase of compilation | 716 | What this does is that it runs through the very first phase of compilation |
717 | ( compilation in gcc is done in several stages & gcc calls many programs to | 717 | ( compilation in gcc is done in several stages & gcc calls many programs to |
718 | achieve its end result ) with the -E option gcc just calls the gcc preprocessor (cpp). | 718 | achieve its end result ) with the -E option gcc just calls the gcc preprocessor (cpp). |
719 | The c preprocessor does the following, it joins all the files #included together | 719 | The c preprocessor does the following, it joins all the files #included together |
720 | recursively ( #include files can #include other files ) & also the c file you wish to compile. | 720 | recursively ( #include files can #include other files ) & also the c file you wish to compile. |
721 | It puts a fully qualified path of the #included files in a comment & it | 721 | It puts a fully qualified path of the #included files in a comment & it |
722 | does macro expansion. | 722 | does macro expansion. |
723 | This is useful for debugging because | 723 | This is useful for debugging because |
724 | 1) You can double check whether the files you expect to be included are the ones | 724 | 1) You can double check whether the files you expect to be included are the ones |
725 | that are being included ( e.g. double check that you aren't going to the i386 asm directory ). | 725 | that are being included ( e.g. double check that you aren't going to the i386 asm directory ). |
726 | 2) Check that macro definitions aren't clashing with typedefs, | 726 | 2) Check that macro definitions aren't clashing with typedefs, |
727 | 3) Check that definitions aren't being used before they are being included. | 727 | 3) Check that definitions aren't being used before they are being included. |
728 | 4) Helps put the line emitting the error under the microscope if it contains macros. | 728 | 4) Helps put the line emitting the error under the microscope if it contains macros. |
729 | 729 | ||
730 | For convenience the Linux kernel's makefile will do preprocessing automatically for you | 730 | For convenience the Linux kernel's makefile will do preprocessing automatically for you |
731 | by suffixing the file you want built with .i ( instead of .o ) | 731 | by suffixing the file you want built with .i ( instead of .o ) |
732 | 732 | ||
733 | e.g. | 733 | e.g. |
734 | from the linux directory type | 734 | from the linux directory type |
735 | make arch/s390/kernel/signal.i | 735 | make arch/s390/kernel/signal.i |
736 | this will build | 736 | this will build |
737 | 737 | ||
738 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer | 738 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer |
739 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -E arch/s390/kernel/signal.c | 739 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -E arch/s390/kernel/signal.c |
740 | > arch/s390/kernel/signal.i | 740 | > arch/s390/kernel/signal.i |
741 | 741 | ||
742 | Now look at signal.i you should see something like. | 742 | Now look at signal.i you should see something like. |
743 | 743 | ||
744 | 744 | ||
745 | # 1 "/home1/barrow/linux/include/asm/types.h" 1 | 745 | # 1 "/home1/barrow/linux/include/asm/types.h" 1 |
746 | typedef unsigned short umode_t; | 746 | typedef unsigned short umode_t; |
747 | typedef __signed__ char __s8; | 747 | typedef __signed__ char __s8; |
748 | typedef unsigned char __u8; | 748 | typedef unsigned char __u8; |
749 | typedef __signed__ short __s16; | 749 | typedef __signed__ short __s16; |
750 | typedef unsigned short __u16; | 750 | typedef unsigned short __u16; |
751 | 751 | ||
752 | If instead you are getting errors further down e.g. | 752 | If instead you are getting errors further down e.g. |
753 | unknown instruction:2515 "move.l" or better still unknown instruction:2515 | 753 | unknown instruction:2515 "move.l" or better still unknown instruction:2515 |
754 | "Fixme not implemented yet, call Martin" you are probably are attempting to compile some code | 754 | "Fixme not implemented yet, call Martin" you are probably are attempting to compile some code |
755 | meant for another architecture or code that is simply not implemented, with a fixme statement | 755 | meant for another architecture or code that is simply not implemented, with a fixme statement |
756 | stuck into the inline assembly code so that the author of the file now knows he has work to do. | 756 | stuck into the inline assembly code so that the author of the file now knows he has work to do. |
757 | To look at the assembly emitted by gcc just before it is about to call gas ( the gnu assembler ) | 757 | To look at the assembly emitted by gcc just before it is about to call gas ( the gnu assembler ) |
758 | use the -S option. | 758 | use the -S option. |
759 | Again for your convenience the Linux kernel's Makefile will hold your hand & | 759 | Again for your convenience the Linux kernel's Makefile will hold your hand & |
760 | do all this donkey work for you also by building the file with the .s suffix. | 760 | do all this donkey work for you also by building the file with the .s suffix. |
761 | e.g. | 761 | e.g. |
762 | from the Linux directory type | 762 | from the Linux directory type |
763 | make arch/s390/kernel/signal.s | 763 | make arch/s390/kernel/signal.s |
764 | 764 | ||
765 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer | 765 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer |
766 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -S arch/s390/kernel/signal.c | 766 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -S arch/s390/kernel/signal.c |
767 | -o arch/s390/kernel/signal.s | 767 | -o arch/s390/kernel/signal.s |
768 | 768 | ||
769 | 769 | ||
770 | This will output something like, ( please note the constant pool & the useful comments | 770 | This will output something like, ( please note the constant pool & the useful comments |
771 | in the prologue to give you a hand at interpreting it ). | 771 | in the prologue to give you a hand at interpreting it ). |
772 | 772 | ||
773 | .LC54: | 773 | .LC54: |
774 | .string "misaligned (__u16 *) in __xchg\n" | 774 | .string "misaligned (__u16 *) in __xchg\n" |
775 | .LC57: | 775 | .LC57: |
776 | .string "misaligned (__u32 *) in __xchg\n" | 776 | .string "misaligned (__u32 *) in __xchg\n" |
777 | .L$PG1: # Pool sys_sigsuspend | 777 | .L$PG1: # Pool sys_sigsuspend |
778 | .LC192: | 778 | .LC192: |
779 | .long -262401 | 779 | .long -262401 |
780 | .LC193: | 780 | .LC193: |
781 | .long -1 | 781 | .long -1 |
782 | .LC194: | 782 | .LC194: |
783 | .long schedule-.L$PG1 | 783 | .long schedule-.L$PG1 |
784 | .LC195: | 784 | .LC195: |
785 | .long do_signal-.L$PG1 | 785 | .long do_signal-.L$PG1 |
786 | .align 4 | 786 | .align 4 |
787 | .globl sys_sigsuspend | 787 | .globl sys_sigsuspend |
788 | .type sys_sigsuspend,@function | 788 | .type sys_sigsuspend,@function |
789 | sys_sigsuspend: | 789 | sys_sigsuspend: |
790 | # leaf function 0 | 790 | # leaf function 0 |
791 | # automatics 16 | 791 | # automatics 16 |
792 | # outgoing args 0 | 792 | # outgoing args 0 |
793 | # need frame pointer 0 | 793 | # need frame pointer 0 |
794 | # call alloca 0 | 794 | # call alloca 0 |
795 | # has varargs 0 | 795 | # has varargs 0 |
796 | # incoming args (stack) 0 | 796 | # incoming args (stack) 0 |
797 | # function length 168 | 797 | # function length 168 |
798 | STM 8,15,32(15) | 798 | STM 8,15,32(15) |
799 | LR 0,15 | 799 | LR 0,15 |
800 | AHI 15,-112 | 800 | AHI 15,-112 |
801 | BASR 13,0 | 801 | BASR 13,0 |
802 | .L$CO1: AHI 13,.L$PG1-.L$CO1 | 802 | .L$CO1: AHI 13,.L$PG1-.L$CO1 |
803 | ST 0,0(15) | 803 | ST 0,0(15) |
804 | LR 8,2 | 804 | LR 8,2 |
805 | N 5,.LC192-.L$PG1(13) | 805 | N 5,.LC192-.L$PG1(13) |
806 | 806 | ||
807 | Adding -g to the above output makes the output even more useful | 807 | Adding -g to the above output makes the output even more useful |
808 | e.g. typing | 808 | e.g. typing |
809 | make CC:="s390-gcc -g" kernel/sched.s | 809 | make CC:="s390-gcc -g" kernel/sched.s |
810 | 810 | ||
811 | which compiles. | 811 | which compiles. |
812 | s390-gcc -g -D__KERNEL__ -I/home/barrow/linux-2.3/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -fno-strength-reduce -S kernel/sched.c -o kernel/sched.s | 812 | s390-gcc -g -D__KERNEL__ -I/home/barrow/linux-2.3/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -fno-strength-reduce -S kernel/sched.c -o kernel/sched.s |
813 | 813 | ||
814 | also outputs stabs ( debugger ) info, from this info you can find out the | 814 | also outputs stabs ( debugger ) info, from this info you can find out the |
815 | offsets & sizes of various elements in structures. | 815 | offsets & sizes of various elements in structures. |
816 | e.g. the stab for the structure | 816 | e.g. the stab for the structure |
817 | struct rlimit { | 817 | struct rlimit { |
818 | unsigned long rlim_cur; | 818 | unsigned long rlim_cur; |
819 | unsigned long rlim_max; | 819 | unsigned long rlim_max; |
820 | }; | 820 | }; |
821 | is | 821 | is |
822 | .stabs "rlimit:T(151,2)=s8rlim_cur:(0,5),0,32;rlim_max:(0,5),32,32;;",128,0,0,0 | 822 | .stabs "rlimit:T(151,2)=s8rlim_cur:(0,5),0,32;rlim_max:(0,5),32,32;;",128,0,0,0 |
823 | from this stab you can see that | 823 | from this stab you can see that |
824 | rlimit_cur starts at bit offset 0 & is 32 bits in size | 824 | rlimit_cur starts at bit offset 0 & is 32 bits in size |
825 | rlimit_max starts at bit offset 32 & is 32 bits in size. | 825 | rlimit_max starts at bit offset 32 & is 32 bits in size. |
826 | 826 | ||
827 | 827 | ||
828 | Debugging Tools: | 828 | Debugging Tools: |
829 | ================ | 829 | ================ |
830 | 830 | ||
831 | objdump | 831 | objdump |
832 | ======= | 832 | ======= |
833 | This is a tool with many options the most useful being ( if compiled with -g). | 833 | This is a tool with many options the most useful being ( if compiled with -g). |
834 | objdump --source <victim program or object file> > <victims debug listing > | 834 | objdump --source <victim program or object file> > <victims debug listing > |
835 | 835 | ||
836 | 836 | ||
837 | The whole kernel can be compiled like this ( Doing this will make a 17MB kernel | 837 | The whole kernel can be compiled like this ( Doing this will make a 17MB kernel |
838 | & a 200 MB listing ) however you have to strip it before building the image | 838 | & a 200 MB listing ) however you have to strip it before building the image |
839 | using the strip command to make it a more reasonable size to boot it. | 839 | using the strip command to make it a more reasonable size to boot it. |
840 | 840 | ||
841 | A source/assembly mixed dump of the kernel can be done with the line | 841 | A source/assembly mixed dump of the kernel can be done with the line |
842 | objdump --source vmlinux > vmlinux.lst | 842 | objdump --source vmlinux > vmlinux.lst |
843 | Also, if the file isn't compiled -g, this will output as much debugging information | 843 | Also, if the file isn't compiled -g, this will output as much debugging information |
844 | as it can (e.g. function names). This is very slow as it spends lots | 844 | as it can (e.g. function names). This is very slow as it spends lots |
845 | of time searching for debugging info. The following self explanatory line should be used | 845 | of time searching for debugging info. The following self explanatory line should be used |
846 | instead if the code isn't compiled -g, as it is much faster: | 846 | instead if the code isn't compiled -g, as it is much faster: |
847 | objdump --disassemble-all --syms vmlinux > vmlinux.lst | 847 | objdump --disassemble-all --syms vmlinux > vmlinux.lst |
848 | 848 | ||
849 | As hard drive space is valuble most of us use the following approach. | 849 | As hard drive space is valuble most of us use the following approach. |
850 | 1) Look at the emitted psw on the console to find the crash address in the kernel. | 850 | 1) Look at the emitted psw on the console to find the crash address in the kernel. |
851 | 2) Look at the file System.map ( in the linux directory ) produced when building | 851 | 2) Look at the file System.map ( in the linux directory ) produced when building |
852 | the kernel to find the closest address less than the current PSW to find the | 852 | the kernel to find the closest address less than the current PSW to find the |
853 | offending function. | 853 | offending function. |
854 | 3) use grep or similar to search the source tree looking for the source file | 854 | 3) use grep or similar to search the source tree looking for the source file |
855 | with this function if you don't know where it is. | 855 | with this function if you don't know where it is. |
856 | 4) rebuild this object file with -g on, as an example suppose the file was | 856 | 4) rebuild this object file with -g on, as an example suppose the file was |
857 | ( /arch/s390/kernel/signal.o ) | 857 | ( /arch/s390/kernel/signal.o ) |
858 | 5) Assuming the file with the erroneous function is signal.c Move to the base of the | 858 | 5) Assuming the file with the erroneous function is signal.c Move to the base of the |
859 | Linux source tree. | 859 | Linux source tree. |
860 | 6) rm /arch/s390/kernel/signal.o | 860 | 6) rm /arch/s390/kernel/signal.o |
861 | 7) make /arch/s390/kernel/signal.o | 861 | 7) make /arch/s390/kernel/signal.o |
862 | 8) watch the gcc command line emitted | 862 | 8) watch the gcc command line emitted |
863 | 9) type it in again or alternatively cut & paste it on the console adding the -g option. | 863 | 9) type it in again or alternatively cut & paste it on the console adding the -g option. |
864 | 10) objdump --source arch/s390/kernel/signal.o > signal.lst | 864 | 10) objdump --source arch/s390/kernel/signal.o > signal.lst |
865 | This will output the source & the assembly intermixed, as the snippet below shows | 865 | This will output the source & the assembly intermixed, as the snippet below shows |
866 | This will unfortunately output addresses which aren't the same | 866 | This will unfortunately output addresses which aren't the same |
867 | as the kernel ones you should be able to get around the mental arithmetic | 867 | as the kernel ones you should be able to get around the mental arithmetic |
868 | by playing with the --adjust-vma parameter to objdump. | 868 | by playing with the --adjust-vma parameter to objdump. |
869 | 869 | ||
870 | 870 | ||
871 | 871 | ||
872 | 872 | ||
873 | static inline void spin_lock(spinlock_t *lp) | 873 | static inline void spin_lock(spinlock_t *lp) |
874 | { | 874 | { |
875 | a0: 18 34 lr %r3,%r4 | 875 | a0: 18 34 lr %r3,%r4 |
876 | a2: a7 3a 03 bc ahi %r3,956 | 876 | a2: a7 3a 03 bc ahi %r3,956 |
877 | __asm__ __volatile(" lhi 1,-1\n" | 877 | __asm__ __volatile(" lhi 1,-1\n" |
878 | a6: a7 18 ff ff lhi %r1,-1 | 878 | a6: a7 18 ff ff lhi %r1,-1 |
879 | aa: 1f 00 slr %r0,%r0 | 879 | aa: 1f 00 slr %r0,%r0 |
880 | ac: ba 01 30 00 cs %r0,%r1,0(%r3) | 880 | ac: ba 01 30 00 cs %r0,%r1,0(%r3) |
881 | b0: a7 44 ff fd jm aa <sys_sigsuspend+0x2e> | 881 | b0: a7 44 ff fd jm aa <sys_sigsuspend+0x2e> |
882 | saveset = current->blocked; | 882 | saveset = current->blocked; |
883 | b4: d2 07 f0 68 mvc 104(8,%r15),972(%r4) | 883 | b4: d2 07 f0 68 mvc 104(8,%r15),972(%r4) |
884 | b8: 43 cc | 884 | b8: 43 cc |
885 | return (set->sig[0] & mask) != 0; | 885 | return (set->sig[0] & mask) != 0; |
886 | } | 886 | } |
887 | 887 | ||
888 | 6) If debugging under VM go down to that section in the document for more info. | 888 | 6) If debugging under VM go down to that section in the document for more info. |
889 | 889 | ||
890 | 890 | ||
891 | I now have a tool which takes the pain out of --adjust-vma | 891 | I now have a tool which takes the pain out of --adjust-vma |
892 | & you are able to do something like | 892 | & you are able to do something like |
893 | make /arch/s390/kernel/traps.lst | 893 | make /arch/s390/kernel/traps.lst |
894 | & it automatically generates the correctly relocated entries for | 894 | & it automatically generates the correctly relocated entries for |
895 | the text segment in traps.lst. | 895 | the text segment in traps.lst. |
896 | This tool is now standard in linux distro's in scripts/makelst | 896 | This tool is now standard in linux distro's in scripts/makelst |
897 | 897 | ||
898 | strace: | 898 | strace: |
899 | ------- | 899 | ------- |
900 | Q. What is it ? | 900 | Q. What is it ? |
901 | A. It is a tool for intercepting calls to the kernel & logging them | 901 | A. It is a tool for intercepting calls to the kernel & logging them |
902 | to a file & on the screen. | 902 | to a file & on the screen. |
903 | 903 | ||
904 | Q. What use is it ? | 904 | Q. What use is it ? |
905 | A. You can used it to find out what files a particular program opens. | 905 | A. You can used it to find out what files a particular program opens. |
906 | 906 | ||
907 | 907 | ||
908 | 908 | ||
909 | Example 1 | 909 | Example 1 |
910 | --------- | 910 | --------- |
911 | If you wanted to know does ping work but didn't have the source | 911 | If you wanted to know does ping work but didn't have the source |
912 | strace ping -c 1 127.0.0.1 | 912 | strace ping -c 1 127.0.0.1 |
913 | & then look at the man pages for each of the syscalls below, | 913 | & then look at the man pages for each of the syscalls below, |
914 | ( In fact this is sometimes easier than looking at some spagetti | 914 | ( In fact this is sometimes easier than looking at some spagetti |
915 | source which conditionally compiles for several architectures ). | 915 | source which conditionally compiles for several architectures ). |
916 | Not everything that it throws out needs to make sense immediately. | 916 | Not everything that it throws out needs to make sense immediately. |
917 | 917 | ||
918 | Just looking quickly you can see that it is making up a RAW socket | 918 | Just looking quickly you can see that it is making up a RAW socket |
919 | for the ICMP protocol. | 919 | for the ICMP protocol. |
920 | Doing an alarm(10) for a 10 second timeout | 920 | Doing an alarm(10) for a 10 second timeout |
921 | & doing a gettimeofday call before & after each read to see | 921 | & doing a gettimeofday call before & after each read to see |
922 | how long the replies took, & writing some text to stdout so the user | 922 | how long the replies took, & writing some text to stdout so the user |
923 | has an idea what is going on. | 923 | has an idea what is going on. |
924 | 924 | ||
925 | socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 | 925 | socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 |
926 | getuid() = 0 | 926 | getuid() = 0 |
927 | setuid(0) = 0 | 927 | setuid(0) = 0 |
928 | stat("/usr/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) | 928 | stat("/usr/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) |
929 | stat("/usr/share/locale/libc/C", 0xbffff134) = -1 ENOENT (No such file or directory) | 929 | stat("/usr/share/locale/libc/C", 0xbffff134) = -1 ENOENT (No such file or directory) |
930 | stat("/usr/local/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) | 930 | stat("/usr/local/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) |
931 | getpid() = 353 | 931 | getpid() = 353 |
932 | setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 | 932 | setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 |
933 | setsockopt(3, SOL_SOCKET, SO_RCVBUF, [49152], 4) = 0 | 933 | setsockopt(3, SOL_SOCKET, SO_RCVBUF, [49152], 4) = 0 |
934 | fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(3, 1), ...}) = 0 | 934 | fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(3, 1), ...}) = 0 |
935 | mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40008000 | 935 | mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40008000 |
936 | ioctl(1, TCGETS, {B9600 opost isig icanon echo ...}) = 0 | 936 | ioctl(1, TCGETS, {B9600 opost isig icanon echo ...}) = 0 |
937 | write(1, "PING 127.0.0.1 (127.0.0.1): 56 d"..., 42PING 127.0.0.1 (127.0.0.1): 56 data bytes | 937 | write(1, "PING 127.0.0.1 (127.0.0.1): 56 d"..., 42PING 127.0.0.1 (127.0.0.1): 56 data bytes |
938 | ) = 42 | 938 | ) = 42 |
939 | sigaction(SIGINT, {0x8049ba0, [], SA_RESTART}, {SIG_DFL}) = 0 | 939 | sigaction(SIGINT, {0x8049ba0, [], SA_RESTART}, {SIG_DFL}) = 0 |
940 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {SIG_DFL}) = 0 | 940 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {SIG_DFL}) = 0 |
941 | gettimeofday({948904719, 138951}, NULL) = 0 | 941 | gettimeofday({948904719, 138951}, NULL) = 0 |
942 | sendto(3, "\10\0D\201a\1\0\0\17#\2178\307\36"..., 64, 0, {sin_family=AF_INET, | 942 | sendto(3, "\10\0D\201a\1\0\0\17#\2178\307\36"..., 64, 0, {sin_family=AF_INET, |
943 | sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64 | 943 | sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64 |
944 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 | 944 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 |
945 | sigaction(SIGALRM, {0x8049ba0, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 | 945 | sigaction(SIGALRM, {0x8049ba0, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 |
946 | alarm(10) = 0 | 946 | alarm(10) = 0 |
947 | recvfrom(3, "E\0\0T\0005\0\0@\1|r\177\0\0\1\177"..., 192, 0, | 947 | recvfrom(3, "E\0\0T\0005\0\0@\1|r\177\0\0\1\177"..., 192, 0, |
948 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 | 948 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 |
949 | gettimeofday({948904719, 160224}, NULL) = 0 | 949 | gettimeofday({948904719, 160224}, NULL) = 0 |
950 | recvfrom(3, "E\0\0T\0006\0\0\377\1\275p\177\0"..., 192, 0, | 950 | recvfrom(3, "E\0\0T\0006\0\0\377\1\275p\177\0"..., 192, 0, |
951 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 | 951 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 |
952 | gettimeofday({948904719, 166952}, NULL) = 0 | 952 | gettimeofday({948904719, 166952}, NULL) = 0 |
953 | write(1, "64 bytes from 127.0.0.1: icmp_se"..., | 953 | write(1, "64 bytes from 127.0.0.1: icmp_se"..., |
954 | 5764 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=28.0 ms | 954 | 5764 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=28.0 ms |
955 | 955 | ||
956 | Example 2 | 956 | Example 2 |
957 | --------- | 957 | --------- |
958 | strace passwd 2>&1 | grep open | 958 | strace passwd 2>&1 | grep open |
959 | produces the following output | 959 | produces the following output |
960 | open("/etc/ld.so.cache", O_RDONLY) = 3 | 960 | open("/etc/ld.so.cache", O_RDONLY) = 3 |
961 | open("/opt/kde/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No such file or directory) | 961 | open("/opt/kde/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No such file or directory) |
962 | open("/lib/libc.so.5", O_RDONLY) = 3 | 962 | open("/lib/libc.so.5", O_RDONLY) = 3 |
963 | open("/dev", O_RDONLY) = 3 | 963 | open("/dev", O_RDONLY) = 3 |
964 | open("/var/run/utmp", O_RDONLY) = 3 | 964 | open("/var/run/utmp", O_RDONLY) = 3 |
965 | open("/etc/passwd", O_RDONLY) = 3 | 965 | open("/etc/passwd", O_RDONLY) = 3 |
966 | open("/etc/shadow", O_RDONLY) = 3 | 966 | open("/etc/shadow", O_RDONLY) = 3 |
967 | open("/etc/login.defs", O_RDONLY) = 4 | 967 | open("/etc/login.defs", O_RDONLY) = 4 |
968 | open("/dev/tty", O_RDONLY) = 4 | 968 | open("/dev/tty", O_RDONLY) = 4 |
969 | 969 | ||
970 | The 2>&1 is done to redirect stderr to stdout & grep is then filtering this input | 970 | The 2>&1 is done to redirect stderr to stdout & grep is then filtering this input |
971 | through the pipe for each line containing the string open. | 971 | through the pipe for each line containing the string open. |
972 | 972 | ||
973 | 973 | ||
974 | Example 3 | 974 | Example 3 |
975 | --------- | 975 | --------- |
976 | Getting sophisticated | 976 | Getting sophisticated |
977 | telnetd crashes & I don't know why | 977 | telnetd crashes & I don't know why |
978 | 978 | ||
979 | Steps | 979 | Steps |
980 | ----- | 980 | ----- |
981 | 1) Replace the following line in /etc/inetd.conf | 981 | 1) Replace the following line in /etc/inetd.conf |
982 | telnet stream tcp nowait root /usr/sbin/in.telnetd -h | 982 | telnet stream tcp nowait root /usr/sbin/in.telnetd -h |
983 | with | 983 | with |
984 | telnet stream tcp nowait root /blah | 984 | telnet stream tcp nowait root /blah |
985 | 985 | ||
986 | 2) Create the file /blah with the following contents to start tracing telnetd | 986 | 2) Create the file /blah with the following contents to start tracing telnetd |
987 | #!/bin/bash | 987 | #!/bin/bash |
988 | /usr/bin/strace -o/t1 -f /usr/sbin/in.telnetd -h | 988 | /usr/bin/strace -o/t1 -f /usr/sbin/in.telnetd -h |
989 | 3) chmod 700 /blah to make it executable only to root | 989 | 3) chmod 700 /blah to make it executable only to root |
990 | 4) | 990 | 4) |
991 | killall -HUP inetd | 991 | killall -HUP inetd |
992 | or ps aux | grep inetd | 992 | or ps aux | grep inetd |
993 | get inetd's process id | 993 | get inetd's process id |
994 | & kill -HUP inetd to restart it. | 994 | & kill -HUP inetd to restart it. |
995 | 995 | ||
996 | Important options | 996 | Important options |
997 | ----------------- | 997 | ----------------- |
998 | -o is used to tell strace to output to a file in our case t1 in the root directory | 998 | -o is used to tell strace to output to a file in our case t1 in the root directory |
999 | -f is to follow children i.e. | 999 | -f is to follow children i.e. |
1000 | e.g in our case above telnetd will start the login process & subsequently a shell like bash. | 1000 | e.g in our case above telnetd will start the login process & subsequently a shell like bash. |
1001 | You will be able to tell which is which from the process ID's listed on the left hand side | 1001 | You will be able to tell which is which from the process ID's listed on the left hand side |
1002 | of the strace output. | 1002 | of the strace output. |
1003 | -p<pid> will tell strace to attach to a running process, yup this can be done provided | 1003 | -p<pid> will tell strace to attach to a running process, yup this can be done provided |
1004 | it isn't being traced or debugged already & you have enough privileges, | 1004 | it isn't being traced or debugged already & you have enough privileges, |
1005 | the reason 2 processes cannot trace or debug the same program is that strace | 1005 | the reason 2 processes cannot trace or debug the same program is that strace |
1006 | becomes the parent process of the one being debugged & processes ( unlike people ) | 1006 | becomes the parent process of the one being debugged & processes ( unlike people ) |
1007 | can have only one parent. | 1007 | can have only one parent. |
1008 | 1008 | ||
1009 | 1009 | ||
1010 | However the file /t1 will get big quite quickly | 1010 | However the file /t1 will get big quite quickly |
1011 | to test it telnet 127.0.0.1 | 1011 | to test it telnet 127.0.0.1 |
1012 | 1012 | ||
1013 | now look at what files in.telnetd execve'd | 1013 | now look at what files in.telnetd execve'd |
1014 | 413 execve("/usr/sbin/in.telnetd", ["/usr/sbin/in.telnetd", "-h"], [/* 17 vars */]) = 0 | 1014 | 413 execve("/usr/sbin/in.telnetd", ["/usr/sbin/in.telnetd", "-h"], [/* 17 vars */]) = 0 |
1015 | 414 execve("/bin/login", ["/bin/login", "-h", "localhost", "-p"], [/* 2 vars */]) = 0 | 1015 | 414 execve("/bin/login", ["/bin/login", "-h", "localhost", "-p"], [/* 2 vars */]) = 0 |
1016 | 1016 | ||
1017 | Whey it worked!. | 1017 | Whey it worked!. |
1018 | 1018 | ||
1019 | 1019 | ||
1020 | Other hints: | 1020 | Other hints: |
1021 | ------------ | 1021 | ------------ |
1022 | If the program is not very interactive ( i.e. not much keyboard input ) | 1022 | If the program is not very interactive ( i.e. not much keyboard input ) |
1023 | & is crashing in one architecture but not in another you can do | 1023 | & is crashing in one architecture but not in another you can do |
1024 | an strace of both programs under as identical a scenario as you can | 1024 | an strace of both programs under as identical a scenario as you can |
1025 | on both architectures outputting to a file then. | 1025 | on both architectures outputting to a file then. |
1026 | do a diff of the two traces using the diff program | 1026 | do a diff of the two traces using the diff program |
1027 | i.e. | 1027 | i.e. |
1028 | diff output1 output2 | 1028 | diff output1 output2 |
1029 | & maybe you'll be able to see where the call paths differed, this | 1029 | & maybe you'll be able to see where the call paths differed, this |
1030 | is possibly near the cause of the crash. | 1030 | is possibly near the cause of the crash. |
1031 | 1031 | ||
1032 | More info | 1032 | More info |
1033 | --------- | 1033 | --------- |
1034 | Look at man pages for strace & the various syscalls | 1034 | Look at man pages for strace & the various syscalls |
1035 | e.g. man strace, man alarm, man socket. | 1035 | e.g. man strace, man alarm, man socket. |
1036 | 1036 | ||
1037 | 1037 | ||
1038 | Performance Debugging | 1038 | Performance Debugging |
1039 | ===================== | 1039 | ===================== |
1040 | gcc is capible of compiling in profiling code just add the -p option | 1040 | gcc is capible of compiling in profiling code just add the -p option |
1041 | to the CFLAGS, this obviously affects program size & performance. | 1041 | to the CFLAGS, this obviously affects program size & performance. |
1042 | This can be used by the gprof gnu profiling tool or the | 1042 | This can be used by the gprof gnu profiling tool or the |
1043 | gcov the gnu code coverage tool ( code coverage is a means of testing | 1043 | gcov the gnu code coverage tool ( code coverage is a means of testing |
1044 | code quality by checking if all the code in an executable in exercised by | 1044 | code quality by checking if all the code in an executable in exercised by |
1045 | a tester ). | 1045 | a tester ). |
1046 | 1046 | ||
1047 | 1047 | ||
1048 | Using top to find out where processes are sleeping in the kernel | 1048 | Using top to find out where processes are sleeping in the kernel |
1049 | ---------------------------------------------------------------- | 1049 | ---------------------------------------------------------------- |
1050 | To do this copy the System.map from the root directory where | 1050 | To do this copy the System.map from the root directory where |
1051 | the linux kernel was built to the /boot directory on your | 1051 | the linux kernel was built to the /boot directory on your |
1052 | linux machine. | 1052 | linux machine. |
1053 | Start top | 1053 | Start top |
1054 | Now type fU<return> | 1054 | Now type fU<return> |
1055 | You should see a new field called WCHAN which | 1055 | You should see a new field called WCHAN which |
1056 | tells you where each process is sleeping here is a typical output. | 1056 | tells you where each process is sleeping here is a typical output. |
1057 | 1057 | ||
1058 | 6:59pm up 41 min, 1 user, load average: 0.00, 0.00, 0.00 | 1058 | 6:59pm up 41 min, 1 user, load average: 0.00, 0.00, 0.00 |
1059 | 28 processes: 27 sleeping, 1 running, 0 zombie, 0 stopped | 1059 | 28 processes: 27 sleeping, 1 running, 0 zombie, 0 stopped |
1060 | CPU states: 0.0% user, 0.1% system, 0.0% nice, 99.8% idle | 1060 | CPU states: 0.0% user, 0.1% system, 0.0% nice, 99.8% idle |
1061 | Mem: 254900K av, 45976K used, 208924K free, 0K shrd, 28636K buff | 1061 | Mem: 254900K av, 45976K used, 208924K free, 0K shrd, 28636K buff |
1062 | Swap: 0K av, 0K used, 0K free 8620K cached | 1062 | Swap: 0K av, 0K used, 0K free 8620K cached |
1063 | 1063 | ||
1064 | PID USER PRI NI SIZE RSS SHARE WCHAN STAT LIB %CPU %MEM TIME COMMAND | 1064 | PID USER PRI NI SIZE RSS SHARE WCHAN STAT LIB %CPU %MEM TIME COMMAND |
1065 | 750 root 12 0 848 848 700 do_select S 0 0.1 0.3 0:00 in.telnetd | 1065 | 750 root 12 0 848 848 700 do_select S 0 0.1 0.3 0:00 in.telnetd |
1066 | 767 root 16 0 1140 1140 964 R 0 0.1 0.4 0:00 top | 1066 | 767 root 16 0 1140 1140 964 R 0 0.1 0.4 0:00 top |
1067 | 1 root 8 0 212 212 180 do_select S 0 0.0 0.0 0:00 init | 1067 | 1 root 8 0 212 212 180 do_select S 0 0.0 0.0 0:00 init |
1068 | 2 root 9 0 0 0 0 down_inte SW 0 0.0 0.0 0:00 kmcheck | 1068 | 2 root 9 0 0 0 0 down_inte SW 0 0.0 0.0 0:00 kmcheck |
1069 | 1069 | ||
1070 | The time command | 1070 | The time command |
1071 | ---------------- | 1071 | ---------------- |
1072 | Another related command is the time command which gives you an indication | 1072 | Another related command is the time command which gives you an indication |
1073 | of where a process is spending the majority of its time. | 1073 | of where a process is spending the majority of its time. |
1074 | e.g. | 1074 | e.g. |
1075 | time ping -c 5 nc | 1075 | time ping -c 5 nc |
1076 | outputs | 1076 | outputs |
1077 | real 0m4.054s | 1077 | real 0m4.054s |
1078 | user 0m0.010s | 1078 | user 0m0.010s |
1079 | sys 0m0.010s | 1079 | sys 0m0.010s |
1080 | 1080 | ||
1081 | Debugging under VM | 1081 | Debugging under VM |
1082 | ================== | 1082 | ================== |
1083 | 1083 | ||
1084 | Notes | 1084 | Notes |
1085 | ----- | 1085 | ----- |
1086 | Addresses & values in the VM debugger are always hex never decimal | 1086 | Addresses & values in the VM debugger are always hex never decimal |
1087 | Address ranges are of the format <HexValue1>-<HexValue2> or <HexValue1>.<HexValue2> | 1087 | Address ranges are of the format <HexValue1>-<HexValue2> or <HexValue1>.<HexValue2> |
1088 | e.g. The address range 0x2000 to 0x3000 can be described described as | 1088 | e.g. The address range 0x2000 to 0x3000 can be described as 2000-3000 or 2000.1000 |
1089 | 2000-3000 or 2000.1000 | ||
1090 | 1089 | ||
1091 | The VM Debugger is case insensitive. | 1090 | The VM Debugger is case insensitive. |
1092 | 1091 | ||
1093 | VM's strengths are usually other debuggers weaknesses you can get at any resource | 1092 | VM's strengths are usually other debuggers weaknesses you can get at any resource |
1094 | no matter how sensitive e.g. memory management resources,change address translation | 1093 | no matter how sensitive e.g. memory management resources,change address translation |
1095 | in the PSW. For kernel hacking you will reap dividends if you get good at it. | 1094 | in the PSW. For kernel hacking you will reap dividends if you get good at it. |
1096 | 1095 | ||
1097 | The VM Debugger displays operators but not operands, probably because some | 1096 | The VM Debugger displays operators but not operands, probably because some |
1098 | of it was written when memory was expensive & the programmer was probably proud that | 1097 | of it was written when memory was expensive & the programmer was probably proud that |
1099 | it fitted into 2k of memory & the programmers & didn't want to shock hardcore VM'ers by | 1098 | it fitted into 2k of memory & the programmers & didn't want to shock hardcore VM'ers by |
1100 | changing the interface :-), also the debugger displays useful information on the same line & | 1099 | changing the interface :-), also the debugger displays useful information on the same line & |
1101 | the author of the code probably felt that it was a good idea not to go over | 1100 | the author of the code probably felt that it was a good idea not to go over |
1102 | the 80 columns on the screen. | 1101 | the 80 columns on the screen. |
1103 | 1102 | ||
1104 | As some of you are probably in a panic now this isn't as unintuitive as it may seem | 1103 | As some of you are probably in a panic now this isn't as unintuitive as it may seem |
1105 | as the 390 instructions are easy to decode mentally & you can make a good guess at a lot | 1104 | as the 390 instructions are easy to decode mentally & you can make a good guess at a lot |
1106 | of them as all the operands are nibble ( half byte aligned ) & if you have an objdump listing | 1105 | of them as all the operands are nibble ( half byte aligned ) & if you have an objdump listing |
1107 | also it is quite easy to follow, if you don't have an objdump listing keep a copy of | 1106 | also it is quite easy to follow, if you don't have an objdump listing keep a copy of |
1108 | the s/390 Reference Summary & look at between pages 2 & 7 or alternatively the | 1107 | the s/390 Reference Summary & look at between pages 2 & 7 or alternatively the |
1109 | s/390 principles of operation. | 1108 | s/390 principles of operation. |
1110 | e.g. even I can guess that | 1109 | e.g. even I can guess that |
1111 | 0001AFF8' LR 180F CC 0 | 1110 | 0001AFF8' LR 180F CC 0 |
1112 | is a ( load register ) lr r0,r15 | 1111 | is a ( load register ) lr r0,r15 |
1113 | 1112 | ||
1114 | Also it is very easy to tell the length of a 390 instruction from the 2 most significant | 1113 | Also it is very easy to tell the length of a 390 instruction from the 2 most significant |
1115 | bits in the instruction ( not that this info is really useful except if you are trying to | 1114 | bits in the instruction ( not that this info is really useful except if you are trying to |
1116 | make sense of a hexdump of code ). | 1115 | make sense of a hexdump of code ). |
1117 | Here is a table | 1116 | Here is a table |
1118 | Bits Instruction Length | 1117 | Bits Instruction Length |
1119 | ------------------------------------------ | 1118 | ------------------------------------------ |
1120 | 00 2 Bytes | 1119 | 00 2 Bytes |
1121 | 01 4 Bytes | 1120 | 01 4 Bytes |
1122 | 10 4 Bytes | 1121 | 10 4 Bytes |
1123 | 11 6 Bytes | 1122 | 11 6 Bytes |
1124 | 1123 | ||
1125 | 1124 | ||
1126 | 1125 | ||
1127 | 1126 | ||
1128 | The debugger also displays other useful info on the same line such as the | 1127 | The debugger also displays other useful info on the same line such as the |
1129 | addresses being operated on destination addresses of branches & condition codes. | 1128 | addresses being operated on destination addresses of branches & condition codes. |
1130 | e.g. | 1129 | e.g. |
1131 | 00019736' AHI A7DAFF0E CC 1 | 1130 | 00019736' AHI A7DAFF0E CC 1 |
1132 | 000198BA' BRC A7840004 -> 000198C2' CC 0 | 1131 | 000198BA' BRC A7840004 -> 000198C2' CC 0 |
1133 | 000198CE' STM 900EF068 >> 0FA95E78 CC 2 | 1132 | 000198CE' STM 900EF068 >> 0FA95E78 CC 2 |
1134 | 1133 | ||
1135 | 1134 | ||
1136 | 1135 | ||
1137 | Useful VM debugger commands | 1136 | Useful VM debugger commands |
1138 | --------------------------- | 1137 | --------------------------- |
1139 | 1138 | ||
1140 | I suppose I'd better mention this before I start | 1139 | I suppose I'd better mention this before I start |
1141 | to list the current active traces do | 1140 | to list the current active traces do |
1142 | Q TR | 1141 | Q TR |
1143 | there can be a maximum of 255 of these per set | 1142 | there can be a maximum of 255 of these per set |
1144 | ( more about trace sets later ). | 1143 | ( more about trace sets later ). |
1145 | To stop traces issue a | 1144 | To stop traces issue a |
1146 | TR END. | 1145 | TR END. |
1147 | To delete a particular breakpoint issue | 1146 | To delete a particular breakpoint issue |
1148 | TR DEL <breakpoint number> | 1147 | TR DEL <breakpoint number> |
1149 | 1148 | ||
1150 | The PA1 key drops to CP mode so you can issue debugger commands, | 1149 | The PA1 key drops to CP mode so you can issue debugger commands, |
1151 | Doing alt c (on my 3270 console at least ) clears the screen. | 1150 | Doing alt c (on my 3270 console at least ) clears the screen. |
1152 | hitting b <enter> comes back to the running operating system | 1151 | hitting b <enter> comes back to the running operating system |
1153 | from cp mode ( in our case linux ). | 1152 | from cp mode ( in our case linux ). |
1154 | It is typically useful to add shortcuts to your profile.exec file | 1153 | It is typically useful to add shortcuts to your profile.exec file |
1155 | if you have one ( this is roughly equivalent to autoexec.bat in DOS ). | 1154 | if you have one ( this is roughly equivalent to autoexec.bat in DOS ). |
1156 | file here are a few from mine. | 1155 | file here are a few from mine. |
1157 | /* this gives me command history on issuing f12 */ | 1156 | /* this gives me command history on issuing f12 */ |
1158 | set pf12 retrieve | 1157 | set pf12 retrieve |
1159 | /* this continues */ | 1158 | /* this continues */ |
1160 | set pf8 imm b | 1159 | set pf8 imm b |
1161 | /* goes to trace set a */ | 1160 | /* goes to trace set a */ |
1162 | set pf1 imm tr goto a | 1161 | set pf1 imm tr goto a |
1163 | /* goes to trace set b */ | 1162 | /* goes to trace set b */ |
1164 | set pf2 imm tr goto b | 1163 | set pf2 imm tr goto b |
1165 | /* goes to trace set c */ | 1164 | /* goes to trace set c */ |
1166 | set pf3 imm tr goto c | 1165 | set pf3 imm tr goto c |
1167 | 1166 | ||
1168 | 1167 | ||
1169 | 1168 | ||
1170 | Instruction Tracing | 1169 | Instruction Tracing |
1171 | ------------------- | 1170 | ------------------- |
1172 | Setting a simple breakpoint | 1171 | Setting a simple breakpoint |
1173 | TR I PSWA <address> | 1172 | TR I PSWA <address> |
1174 | To debug a particular function try | 1173 | To debug a particular function try |
1175 | TR I R <function address range> | 1174 | TR I R <function address range> |
1176 | TR I on its own will single step. | 1175 | TR I on its own will single step. |
1177 | TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics | 1176 | TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics |
1178 | e.g. | 1177 | e.g. |
1179 | TR I DATA 4D R 0197BC.4000 | 1178 | TR I DATA 4D R 0197BC.4000 |
1180 | will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000 | 1179 | will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000 |
1181 | if you were inclined you could add traces for all branch instructions & | 1180 | if you were inclined you could add traces for all branch instructions & |
1182 | suffix them with the run prefix so you would have a backtrace on screen | 1181 | suffix them with the run prefix so you would have a backtrace on screen |
1183 | when a program crashes. | 1182 | when a program crashes. |
1184 | TR BR <INTO OR FROM> will trace branches into or out of an address. | 1183 | TR BR <INTO OR FROM> will trace branches into or out of an address. |
1185 | e.g. | 1184 | e.g. |
1186 | TR BR INTO 0 is often quite useful if a program is getting awkward & deciding | 1185 | TR BR INTO 0 is often quite useful if a program is getting awkward & deciding |
1187 | to branch to 0 & crashing as this will stop at the address before in jumps to 0. | 1186 | to branch to 0 & crashing as this will stop at the address before in jumps to 0. |
1188 | TR I R <address range> RUN cmd d g | 1187 | TR I R <address range> RUN cmd d g |
1189 | single steps a range of addresses but stays running & | 1188 | single steps a range of addresses but stays running & |
1190 | displays the gprs on each step. | 1189 | displays the gprs on each step. |
1191 | 1190 | ||
1192 | 1191 | ||
1193 | 1192 | ||
1194 | Displaying & modifying Registers | 1193 | Displaying & modifying Registers |
1195 | -------------------------------- | 1194 | -------------------------------- |
1196 | D G will display all the gprs | 1195 | D G will display all the gprs |
1197 | Adding a extra G to all the commands is necessary to access the full 64 bit | 1196 | Adding a extra G to all the commands is necessary to access the full 64 bit |
1198 | content in VM on z/Architecture obviously this isn't required for access registers | 1197 | content in VM on z/Architecture obviously this isn't required for access registers |
1199 | as these are still 32 bit. | 1198 | as these are still 32 bit. |
1200 | e.g. DGG instead of DG | 1199 | e.g. DGG instead of DG |
1201 | D X will display all the control registers | 1200 | D X will display all the control registers |
1202 | D AR will display all the access registers | 1201 | D AR will display all the access registers |
1203 | D AR4-7 will display access registers 4 to 7 | 1202 | D AR4-7 will display access registers 4 to 7 |
1204 | CPU ALL D G will display the GRPS of all CPUS in the configuration | 1203 | CPU ALL D G will display the GRPS of all CPUS in the configuration |
1205 | D PSW will display the current PSW | 1204 | D PSW will display the current PSW |
1206 | st PSW 2000 will put the value 2000 into the PSW & | 1205 | st PSW 2000 will put the value 2000 into the PSW & |
1207 | cause crash your machine. | 1206 | cause crash your machine. |
1208 | D PREFIX displays the prefix offset | 1207 | D PREFIX displays the prefix offset |
1209 | 1208 | ||
1210 | 1209 | ||
1211 | Displaying Memory | 1210 | Displaying Memory |
1212 | ----------------- | 1211 | ----------------- |
1213 | To display memory mapped using the current PSW's mapping try | 1212 | To display memory mapped using the current PSW's mapping try |
1214 | D <range> | 1213 | D <range> |
1215 | To make VM display a message each time it hits a particular address & continue try | 1214 | To make VM display a message each time it hits a particular address & continue try |
1216 | D I<range> will disassemble/display a range of instructions. | 1215 | D I<range> will disassemble/display a range of instructions. |
1217 | ST addr 32 bit word will store a 32 bit aligned address | 1216 | ST addr 32 bit word will store a 32 bit aligned address |
1218 | D T<range> will display the EBCDIC in an address ( if you are that way inclined ) | 1217 | D T<range> will display the EBCDIC in an address ( if you are that way inclined ) |
1219 | D R<range> will display real addresses ( without DAT ) but with prefixing. | 1218 | D R<range> will display real addresses ( without DAT ) but with prefixing. |
1220 | There are other complex options to display if you need to get at say home space | 1219 | There are other complex options to display if you need to get at say home space |
1221 | but are in primary space the easiest thing to do is to temporarily | 1220 | but are in primary space the easiest thing to do is to temporarily |
1222 | modify the PSW to the other addressing mode, display the stuff & then | 1221 | modify the PSW to the other addressing mode, display the stuff & then |
1223 | restore it. | 1222 | restore it. |
1224 | 1223 | ||
1225 | 1224 | ||
1226 | 1225 | ||
1227 | Hints | 1226 | Hints |
1228 | ----- | 1227 | ----- |
1229 | If you want to issue a debugger command without halting your virtual machine with the | 1228 | If you want to issue a debugger command without halting your virtual machine with the |
1230 | PA1 key try prefixing the command with #CP e.g. | 1229 | PA1 key try prefixing the command with #CP e.g. |
1231 | #cp tr i pswa 2000 | 1230 | #cp tr i pswa 2000 |
1232 | also suffixing most debugger commands with RUN will cause them not | 1231 | also suffixing most debugger commands with RUN will cause them not |
1233 | to stop just display the mnemonic at the current instruction on the console. | 1232 | to stop just display the mnemonic at the current instruction on the console. |
1234 | If you have several breakpoints you want to put into your program & | 1233 | If you have several breakpoints you want to put into your program & |
1235 | you get fed up of cross referencing with System.map | 1234 | you get fed up of cross referencing with System.map |
1236 | you can do the following trick for several symbols. | 1235 | you can do the following trick for several symbols. |
1237 | grep do_signal System.map | 1236 | grep do_signal System.map |
1238 | which emits the following among other things | 1237 | which emits the following among other things |
1239 | 0001f4e0 T do_signal | 1238 | 0001f4e0 T do_signal |
1240 | now you can do | 1239 | now you can do |
1241 | 1240 | ||
1242 | TR I PSWA 0001f4e0 cmd msg * do_signal | 1241 | TR I PSWA 0001f4e0 cmd msg * do_signal |
1243 | This sends a message to your own console each time do_signal is entered. | 1242 | This sends a message to your own console each time do_signal is entered. |
1244 | ( As an aside I wrote a perl script once which automatically generated a REXX | 1243 | ( As an aside I wrote a perl script once which automatically generated a REXX |
1245 | script with breakpoints on every kernel procedure, this isn't a good idea | 1244 | script with breakpoints on every kernel procedure, this isn't a good idea |
1246 | because there are thousands of these routines & VM can only set 255 breakpoints | 1245 | because there are thousands of these routines & VM can only set 255 breakpoints |
1247 | at a time so you nearly had to spend as long pruning the file down as you would | 1246 | at a time so you nearly had to spend as long pruning the file down as you would |
1248 | entering the msg's by hand ),however, the trick might be useful for a single object file. | 1247 | entering the msg's by hand ),however, the trick might be useful for a single object file. |
1249 | On linux'es 3270 emulator x3270 there is a very useful option under the file ment | 1248 | On linux'es 3270 emulator x3270 there is a very useful option under the file ment |
1250 | Save Screens In File this is very good of keeping a copy of traces. | 1249 | Save Screens In File this is very good of keeping a copy of traces. |
1251 | 1250 | ||
1252 | From CMS help <command name> will give you online help on a particular command. | 1251 | From CMS help <command name> will give you online help on a particular command. |
1253 | e.g. | 1252 | e.g. |
1254 | HELP DISPLAY | 1253 | HELP DISPLAY |
1255 | 1254 | ||
1256 | Also CP has a file called profile.exec which automatically gets called | 1255 | Also CP has a file called profile.exec which automatically gets called |
1257 | on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session | 1256 | on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session |
1258 | CP has a feature similar to doskey, it may be useful for you to | 1257 | CP has a feature similar to doskey, it may be useful for you to |
1259 | use profile.exec to define some keystrokes. | 1258 | use profile.exec to define some keystrokes. |
1260 | e.g. | 1259 | e.g. |
1261 | SET PF9 IMM B | 1260 | SET PF9 IMM B |
1262 | This does a single step in VM on pressing F8. | 1261 | This does a single step in VM on pressing F8. |
1263 | SET PF10 ^ | 1262 | SET PF10 ^ |
1264 | This sets up the ^ key. | 1263 | This sets up the ^ key. |
1265 | which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly into some 3270 consoles. | 1264 | which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly into some 3270 consoles. |
1266 | SET PF11 ^- | 1265 | SET PF11 ^- |
1267 | This types the starting keystrokes for a sysrq see SysRq below. | 1266 | This types the starting keystrokes for a sysrq see SysRq below. |
1268 | SET PF12 RETRIEVE | 1267 | SET PF12 RETRIEVE |
1269 | This retrieves command history on pressing F12. | 1268 | This retrieves command history on pressing F12. |
1270 | 1269 | ||
1271 | 1270 | ||
1272 | Sometimes in VM the display is set up to scroll automatically this | 1271 | Sometimes in VM the display is set up to scroll automatically this |
1273 | can be very annoying if there are messages you wish to look at | 1272 | can be very annoying if there are messages you wish to look at |
1274 | to stop this do | 1273 | to stop this do |
1275 | TERM MORE 255 255 | 1274 | TERM MORE 255 255 |
1276 | This will nearly stop automatic screen updates, however it will | 1275 | This will nearly stop automatic screen updates, however it will |
1277 | cause a denial of service if lots of messages go to the 3270 console, | 1276 | cause a denial of service if lots of messages go to the 3270 console, |
1278 | so it would be foolish to use this as the default on a production machine. | 1277 | so it would be foolish to use this as the default on a production machine. |
1279 | 1278 | ||
1280 | 1279 | ||
1281 | Tracing particular processes | 1280 | Tracing particular processes |
1282 | ---------------------------- | 1281 | ---------------------------- |
1283 | The kernel's text segment is intentionally at an address in memory that it will | 1282 | The kernel's text segment is intentionally at an address in memory that it will |
1284 | very seldom collide with text segments of user programs ( thanks Martin ), | 1283 | very seldom collide with text segments of user programs ( thanks Martin ), |
1285 | this simplifies debugging the kernel. | 1284 | this simplifies debugging the kernel. |
1286 | However it is quite common for user processes to have addresses which collide | 1285 | However it is quite common for user processes to have addresses which collide |
1287 | this can make debugging a particular process under VM painful under normal | 1286 | this can make debugging a particular process under VM painful under normal |
1288 | circumstances as the process may change when doing a | 1287 | circumstances as the process may change when doing a |
1289 | TR I R <address range>. | 1288 | TR I R <address range>. |
1290 | Thankfully after reading VM's online help I figured out how to debug | 1289 | Thankfully after reading VM's online help I figured out how to debug |
1291 | I particular process. | 1290 | I particular process. |
1292 | 1291 | ||
1293 | Your first problem is to find the STD ( segment table designation ) | 1292 | Your first problem is to find the STD ( segment table designation ) |
1294 | of the program you wish to debug. | 1293 | of the program you wish to debug. |
1295 | There are several ways you can do this here are a few | 1294 | There are several ways you can do this here are a few |
1296 | 1) objdump --syms <program to be debugged> | grep main | 1295 | 1) objdump --syms <program to be debugged> | grep main |
1297 | To get the address of main in the program. | 1296 | To get the address of main in the program. |
1298 | tr i pswa <address of main> | 1297 | tr i pswa <address of main> |
1299 | Start the program, if VM drops to CP on what looks like the entry | 1298 | Start the program, if VM drops to CP on what looks like the entry |
1300 | point of the main function this is most likely the process you wish to debug. | 1299 | point of the main function this is most likely the process you wish to debug. |
1301 | Now do a D X13 or D XG13 on z/Architecture. | 1300 | Now do a D X13 or D XG13 on z/Architecture. |
1302 | On 31 bit the STD is bits 1-19 ( the STO segment table origin ) | 1301 | On 31 bit the STD is bits 1-19 ( the STO segment table origin ) |
1303 | & 25-31 ( the STL segment table length ) of CR13. | 1302 | & 25-31 ( the STL segment table length ) of CR13. |
1304 | now type | 1303 | now type |
1305 | TR I R STD <CR13's value> 0.7fffffff | 1304 | TR I R STD <CR13's value> 0.7fffffff |
1306 | e.g. | 1305 | e.g. |
1307 | TR I R STD 8F32E1FF 0.7fffffff | 1306 | TR I R STD 8F32E1FF 0.7fffffff |
1308 | Another very useful variation is | 1307 | Another very useful variation is |
1309 | TR STORE INTO STD <CR13's value> <address range> | 1308 | TR STORE INTO STD <CR13's value> <address range> |
1310 | for finding out when a particular variable changes. | 1309 | for finding out when a particular variable changes. |
1311 | 1310 | ||
1312 | An alternative way of finding the STD of a currently running process | 1311 | An alternative way of finding the STD of a currently running process |
1313 | is to do the following, ( this method is more complex but | 1312 | is to do the following, ( this method is more complex but |
1314 | could be quite convenient if you aren't updating the kernel much & | 1313 | could be quite convenient if you aren't updating the kernel much & |
1315 | so your kernel structures will stay constant for a reasonable period of | 1314 | so your kernel structures will stay constant for a reasonable period of |
1316 | time ). | 1315 | time ). |
1317 | 1316 | ||
1318 | grep task /proc/<pid>/status | 1317 | grep task /proc/<pid>/status |
1319 | from this you should see something like | 1318 | from this you should see something like |
1320 | task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68 | 1319 | task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68 |
1321 | This now gives you a pointer to the task structure. | 1320 | This now gives you a pointer to the task structure. |
1322 | Now make CC:="s390-gcc -g" kernel/sched.s | 1321 | Now make CC:="s390-gcc -g" kernel/sched.s |
1323 | To get the task_struct stabinfo. | 1322 | To get the task_struct stabinfo. |
1324 | ( task_struct is defined in include/linux/sched.h ). | 1323 | ( task_struct is defined in include/linux/sched.h ). |
1325 | Now we want to look at | 1324 | Now we want to look at |
1326 | task->active_mm->pgd | 1325 | task->active_mm->pgd |
1327 | on my machine the active_mm in the task structure stab is | 1326 | on my machine the active_mm in the task structure stab is |
1328 | active_mm:(4,12),672,32 | 1327 | active_mm:(4,12),672,32 |
1329 | its offset is 672/8=84=0x54 | 1328 | its offset is 672/8=84=0x54 |
1330 | the pgd member in the mm_struct stab is | 1329 | the pgd member in the mm_struct stab is |
1331 | pgd:(4,6)=*(29,5),96,32 | 1330 | pgd:(4,6)=*(29,5),96,32 |
1332 | so its offset is 96/8=12=0xc | 1331 | so its offset is 96/8=12=0xc |
1333 | 1332 | ||
1334 | so we'll | 1333 | so we'll |
1335 | hexdump -s 0xf160054 /dev/mem | more | 1334 | hexdump -s 0xf160054 /dev/mem | more |
1336 | i.e. task_struct+active_mm offset | 1335 | i.e. task_struct+active_mm offset |
1337 | to look at the active_mm member | 1336 | to look at the active_mm member |
1338 | f160054 0fee cc60 0019 e334 0000 0000 0000 0011 | 1337 | f160054 0fee cc60 0019 e334 0000 0000 0000 0011 |
1339 | hexdump -s 0x0feecc6c /dev/mem | more | 1338 | hexdump -s 0x0feecc6c /dev/mem | more |
1340 | i.e. active_mm+pgd offset | 1339 | i.e. active_mm+pgd offset |
1341 | feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010 | 1340 | feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010 |
1342 | we get something like | 1341 | we get something like |
1343 | now do | 1342 | now do |
1344 | TR I R STD <pgd|0x7f> 0.7fffffff | 1343 | TR I R STD <pgd|0x7f> 0.7fffffff |
1345 | i.e. the 0x7f is added because the pgd only | 1344 | i.e. the 0x7f is added because the pgd only |
1346 | gives the page table origin & we need to set the low bits | 1345 | gives the page table origin & we need to set the low bits |
1347 | to the maximum possible segment table length. | 1346 | to the maximum possible segment table length. |
1348 | TR I R STD 0f2c007f 0.7fffffff | 1347 | TR I R STD 0f2c007f 0.7fffffff |
1349 | on z/Architecture you'll probably need to do | 1348 | on z/Architecture you'll probably need to do |
1350 | TR I R STD <pgd|0x7> 0.ffffffffffffffff | 1349 | TR I R STD <pgd|0x7> 0.ffffffffffffffff |
1351 | to set the TableType to 0x1 & the Table length to 3. | 1350 | to set the TableType to 0x1 & the Table length to 3. |
1352 | 1351 | ||
1353 | 1352 | ||
1354 | 1353 | ||
1355 | Tracing Program Exceptions | 1354 | Tracing Program Exceptions |
1356 | -------------------------- | 1355 | -------------------------- |
1357 | If you get a crash which says something like | 1356 | If you get a crash which says something like |
1358 | illegal operation or specification exception followed by a register dump | 1357 | illegal operation or specification exception followed by a register dump |
1359 | You can restart linux & trace these using the tr prog <range or value> trace option. | 1358 | You can restart linux & trace these using the tr prog <range or value> trace option. |
1360 | 1359 | ||
1361 | 1360 | ||
1362 | 1361 | ||
1363 | The most common ones you will normally be tracing for is | 1362 | The most common ones you will normally be tracing for is |
1364 | 1=operation exception | 1363 | 1=operation exception |
1365 | 2=privileged operation exception | 1364 | 2=privileged operation exception |
1366 | 4=protection exception | 1365 | 4=protection exception |
1367 | 5=addressing exception | 1366 | 5=addressing exception |
1368 | 6=specification exception | 1367 | 6=specification exception |
1369 | 10=segment translation exception | 1368 | 10=segment translation exception |
1370 | 11=page translation exception | 1369 | 11=page translation exception |
1371 | 1370 | ||
1372 | The full list of these is on page 22 of the current s/390 Reference Summary. | 1371 | The full list of these is on page 22 of the current s/390 Reference Summary. |
1373 | e.g. | 1372 | e.g. |
1374 | tr prog 10 will trace segment translation exceptions. | 1373 | tr prog 10 will trace segment translation exceptions. |
1375 | tr prog on its own will trace all program interruption codes. | 1374 | tr prog on its own will trace all program interruption codes. |
1376 | 1375 | ||
1377 | Trace Sets | 1376 | Trace Sets |
1378 | ---------- | 1377 | ---------- |
1379 | On starting VM you are initially in the INITIAL trace set. | 1378 | On starting VM you are initially in the INITIAL trace set. |
1380 | You can do a Q TR to verify this. | 1379 | You can do a Q TR to verify this. |
1381 | If you have a complex tracing situation where you wish to wait for instance | 1380 | If you have a complex tracing situation where you wish to wait for instance |
1382 | till a driver is open before you start tracing IO, but know in your | 1381 | till a driver is open before you start tracing IO, but know in your |
1383 | heart that you are going to have to make several runs through the code till you | 1382 | heart that you are going to have to make several runs through the code till you |
1384 | have a clue whats going on. | 1383 | have a clue whats going on. |
1385 | 1384 | ||
1386 | What you can do is | 1385 | What you can do is |
1387 | TR I PSWA <Driver open address> | 1386 | TR I PSWA <Driver open address> |
1388 | hit b to continue till breakpoint | 1387 | hit b to continue till breakpoint |
1389 | reach the breakpoint | 1388 | reach the breakpoint |
1390 | now do your | 1389 | now do your |
1391 | TR GOTO B | 1390 | TR GOTO B |
1392 | TR IO 7c08-7c09 inst int run | 1391 | TR IO 7c08-7c09 inst int run |
1393 | or whatever the IO channels you wish to trace are & hit b | 1392 | or whatever the IO channels you wish to trace are & hit b |
1394 | 1393 | ||
1395 | To got back to the initial trace set do | 1394 | To got back to the initial trace set do |
1396 | TR GOTO INITIAL | 1395 | TR GOTO INITIAL |
1397 | & the TR I PSWA <Driver open address> will be the only active breakpoint again. | 1396 | & the TR I PSWA <Driver open address> will be the only active breakpoint again. |
1398 | 1397 | ||
1399 | 1398 | ||
1400 | Tracing linux syscalls under VM | 1399 | Tracing linux syscalls under VM |
1401 | ------------------------------- | 1400 | ------------------------------- |
1402 | Syscalls are implemented on Linux for S390 by the Supervisor call instruction (SVC) there 256 | 1401 | Syscalls are implemented on Linux for S390 by the Supervisor call instruction (SVC) there 256 |
1403 | possibilities of these as the instruction is made up of a 0xA opcode & the second byte being | 1402 | possibilities of these as the instruction is made up of a 0xA opcode & the second byte being |
1404 | the syscall number. They are traced using the simple command. | 1403 | the syscall number. They are traced using the simple command. |
1405 | TR SVC <Optional value or range> | 1404 | TR SVC <Optional value or range> |
1406 | the syscalls are defined in linux/include/asm-s390/unistd.h | 1405 | the syscalls are defined in linux/include/asm-s390/unistd.h |
1407 | e.g. to trace all file opens just do | 1406 | e.g. to trace all file opens just do |
1408 | TR SVC 5 ( as this is the syscall number of open ) | 1407 | TR SVC 5 ( as this is the syscall number of open ) |
1409 | 1408 | ||
1410 | 1409 | ||
1411 | SMP Specific commands | 1410 | SMP Specific commands |
1412 | --------------------- | 1411 | --------------------- |
1413 | To find out how many cpus you have | 1412 | To find out how many cpus you have |
1414 | Q CPUS displays all the CPU's available to your virtual machine | 1413 | Q CPUS displays all the CPU's available to your virtual machine |
1415 | To find the cpu that the current cpu VM debugger commands are being directed at do | 1414 | To find the cpu that the current cpu VM debugger commands are being directed at do |
1416 | Q CPU to change the current cpu cpu VM debugger commands are being directed at do | 1415 | Q CPU to change the current cpu VM debugger commands are being directed at do |
1417 | CPU <desired cpu no> | 1416 | CPU <desired cpu no> |
1418 | 1417 | ||
1419 | On a SMP guest issue a command to all CPUs try prefixing the command with cpu all. | 1418 | On a SMP guest issue a command to all CPUs try prefixing the command with cpu all. |
1420 | To issue a command to a particular cpu try cpu <cpu number> e.g. | 1419 | To issue a command to a particular cpu try cpu <cpu number> e.g. |
1421 | CPU 01 TR I R 2000.3000 | 1420 | CPU 01 TR I R 2000.3000 |
1422 | If you are running on a guest with several cpus & you have a IO related problem | 1421 | If you are running on a guest with several cpus & you have a IO related problem |
1423 | & cannot follow the flow of code but you know it isnt smp related. | 1422 | & cannot follow the flow of code but you know it isnt smp related. |
1424 | from the bash prompt issue | 1423 | from the bash prompt issue |
1425 | shutdown -h now or halt. | 1424 | shutdown -h now or halt. |
1426 | do a Q CPUS to find out how many cpus you have | 1425 | do a Q CPUS to find out how many cpus you have |
1427 | detach each one of them from cp except cpu 0 | 1426 | detach each one of them from cp except cpu 0 |
1428 | by issuing a | 1427 | by issuing a |
1429 | DETACH CPU 01-(number of cpus in configuration) | 1428 | DETACH CPU 01-(number of cpus in configuration) |
1430 | & boot linux again. | 1429 | & boot linux again. |
1431 | TR SIGP will trace inter processor signal processor instructions. | 1430 | TR SIGP will trace inter processor signal processor instructions. |
1432 | DEFINE CPU 01-(number in configuration) | 1431 | DEFINE CPU 01-(number in configuration) |
1433 | will get your guests cpus back. | 1432 | will get your guests cpus back. |
1434 | 1433 | ||
1435 | 1434 | ||
1436 | Help for displaying ascii textstrings | 1435 | Help for displaying ascii textstrings |
1437 | ------------------------------------- | 1436 | ------------------------------------- |
1438 | On the very latest VM Nucleus'es VM can now display ascii | 1437 | On the very latest VM Nucleus'es VM can now display ascii |
1439 | ( thanks Neale for the hint ) by doing | 1438 | ( thanks Neale for the hint ) by doing |
1440 | D TX<lowaddr>.<len> | 1439 | D TX<lowaddr>.<len> |
1441 | e.g. | 1440 | e.g. |
1442 | D TX0.100 | 1441 | D TX0.100 |
1443 | 1442 | ||
1444 | Alternatively | 1443 | Alternatively |
1445 | ============= | 1444 | ============= |
1446 | Under older VM debuggers ( I love EBDIC too ) you can use this little program I wrote which | 1445 | Under older VM debuggers ( I love EBDIC too ) you can use this little program I wrote which |
1447 | will convert a command line of hex digits to ascii text which can be compiled under linux & | 1446 | will convert a command line of hex digits to ascii text which can be compiled under linux & |
1448 | you can copy the hex digits from your x3270 terminal to your xterm if you are debugging | 1447 | you can copy the hex digits from your x3270 terminal to your xterm if you are debugging |
1449 | from a linuxbox. | 1448 | from a linuxbox. |
1450 | 1449 | ||
1451 | This is quite useful when looking at a parameter passed in as a text string | 1450 | This is quite useful when looking at a parameter passed in as a text string |
1452 | under VM ( unless you are good at decoding ASCII in your head ). | 1451 | under VM ( unless you are good at decoding ASCII in your head ). |
1453 | 1452 | ||
1454 | e.g. consider tracing an open syscall | 1453 | e.g. consider tracing an open syscall |
1455 | TR SVC 5 | 1454 | TR SVC 5 |
1456 | We have stopped at a breakpoint | 1455 | We have stopped at a breakpoint |
1457 | 000151B0' SVC 0A05 -> 0001909A' CC 0 | 1456 | 000151B0' SVC 0A05 -> 0001909A' CC 0 |
1458 | 1457 | ||
1459 | D 20.8 to check the SVC old psw in the prefix area & see was it from userspace | 1458 | D 20.8 to check the SVC old psw in the prefix area & see was it from userspace |
1460 | ( for the layout of the prefix area consult P18 of the s/390 390 Reference Summary | 1459 | ( for the layout of the prefix area consult P18 of the s/390 390 Reference Summary |
1461 | if you have it available ). | 1460 | if you have it available ). |
1462 | V00000020 070C2000 800151B2 | 1461 | V00000020 070C2000 800151B2 |
1463 | The problem state bit wasn't set & it's also too early in the boot sequence | 1462 | The problem state bit wasn't set & it's also too early in the boot sequence |
1464 | for it to be a userspace SVC if it was we would have to temporarily switch the | 1463 | for it to be a userspace SVC if it was we would have to temporarily switch the |
1465 | psw to user space addressing so we could get at the first parameter of the open in | 1464 | psw to user space addressing so we could get at the first parameter of the open in |
1466 | gpr2. | 1465 | gpr2. |
1467 | Next do a | 1466 | Next do a |
1468 | D G2 | 1467 | D G2 |
1469 | GPR 2 = 00014CB4 | 1468 | GPR 2 = 00014CB4 |
1470 | Now display what gpr2 is pointing to | 1469 | Now display what gpr2 is pointing to |
1471 | D 00014CB4.20 | 1470 | D 00014CB4.20 |
1472 | V00014CB4 2F646576 2F636F6E 736F6C65 00001BF5 | 1471 | V00014CB4 2F646576 2F636F6E 736F6C65 00001BF5 |
1473 | V00014CC4 FC00014C B4001001 E0001000 B8070707 | 1472 | V00014CC4 FC00014C B4001001 E0001000 B8070707 |
1474 | Now copy the text till the first 00 hex ( which is the end of the string | 1473 | Now copy the text till the first 00 hex ( which is the end of the string |
1475 | to an xterm & do hex2ascii on it. | 1474 | to an xterm & do hex2ascii on it. |
1476 | hex2ascii 2F646576 2F636F6E 736F6C65 00 | 1475 | hex2ascii 2F646576 2F636F6E 736F6C65 00 |
1477 | outputs | 1476 | outputs |
1478 | Decoded Hex:=/ d e v / c o n s o l e 0x00 | 1477 | Decoded Hex:=/ d e v / c o n s o l e 0x00 |
1479 | We were opening the console device, | 1478 | We were opening the console device, |
1480 | 1479 | ||
1481 | You can compile the code below yourself for practice :-), | 1480 | You can compile the code below yourself for practice :-), |
1482 | /* | 1481 | /* |
1483 | * hex2ascii.c | 1482 | * hex2ascii.c |
1484 | * a useful little tool for converting a hexadecimal command line to ascii | 1483 | * a useful little tool for converting a hexadecimal command line to ascii |
1485 | * | 1484 | * |
1486 | * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) | 1485 | * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) |
1487 | * (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation. | 1486 | * (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation. |
1488 | */ | 1487 | */ |
1489 | #include <stdio.h> | 1488 | #include <stdio.h> |
1490 | 1489 | ||
1491 | int main(int argc,char *argv[]) | 1490 | int main(int argc,char *argv[]) |
1492 | { | 1491 | { |
1493 | int cnt1,cnt2,len,toggle=0; | 1492 | int cnt1,cnt2,len,toggle=0; |
1494 | int startcnt=1; | 1493 | int startcnt=1; |
1495 | unsigned char c,hex; | 1494 | unsigned char c,hex; |
1496 | 1495 | ||
1497 | if(argc>1&&(strcmp(argv[1],"-a")==0)) | 1496 | if(argc>1&&(strcmp(argv[1],"-a")==0)) |
1498 | startcnt=2; | 1497 | startcnt=2; |
1499 | printf("Decoded Hex:="); | 1498 | printf("Decoded Hex:="); |
1500 | for(cnt1=startcnt;cnt1<argc;cnt1++) | 1499 | for(cnt1=startcnt;cnt1<argc;cnt1++) |
1501 | { | 1500 | { |
1502 | len=strlen(argv[cnt1]); | 1501 | len=strlen(argv[cnt1]); |
1503 | for(cnt2=0;cnt2<len;cnt2++) | 1502 | for(cnt2=0;cnt2<len;cnt2++) |
1504 | { | 1503 | { |
1505 | c=argv[cnt1][cnt2]; | 1504 | c=argv[cnt1][cnt2]; |
1506 | if(c>='0'&&c<='9') | 1505 | if(c>='0'&&c<='9') |
1507 | c=c-'0'; | 1506 | c=c-'0'; |
1508 | if(c>='A'&&c<='F') | 1507 | if(c>='A'&&c<='F') |
1509 | c=c-'A'+10; | 1508 | c=c-'A'+10; |
1510 | if(c>='a'&&c<='f') | 1509 | if(c>='a'&&c<='f') |
1511 | c=c-'a'+10; | 1510 | c=c-'a'+10; |
1512 | switch(toggle) | 1511 | switch(toggle) |
1513 | { | 1512 | { |
1514 | case 0: | 1513 | case 0: |
1515 | hex=c<<4; | 1514 | hex=c<<4; |
1516 | toggle=1; | 1515 | toggle=1; |
1517 | break; | 1516 | break; |
1518 | case 1: | 1517 | case 1: |
1519 | hex+=c; | 1518 | hex+=c; |
1520 | if(hex<32||hex>127) | 1519 | if(hex<32||hex>127) |
1521 | { | 1520 | { |
1522 | if(startcnt==1) | 1521 | if(startcnt==1) |
1523 | printf("0x%02X ",(int)hex); | 1522 | printf("0x%02X ",(int)hex); |
1524 | else | 1523 | else |
1525 | printf("."); | 1524 | printf("."); |
1526 | } | 1525 | } |
1527 | else | 1526 | else |
1528 | { | 1527 | { |
1529 | printf("%c",hex); | 1528 | printf("%c",hex); |
1530 | if(startcnt==1) | 1529 | if(startcnt==1) |
1531 | printf(" "); | 1530 | printf(" "); |
1532 | } | 1531 | } |
1533 | toggle=0; | 1532 | toggle=0; |
1534 | break; | 1533 | break; |
1535 | } | 1534 | } |
1536 | } | 1535 | } |
1537 | } | 1536 | } |
1538 | printf("\n"); | 1537 | printf("\n"); |
1539 | } | 1538 | } |
1540 | 1539 | ||
1541 | 1540 | ||
1542 | 1541 | ||
1543 | 1542 | ||
1544 | Stack tracing under VM | 1543 | Stack tracing under VM |
1545 | ---------------------- | 1544 | ---------------------- |
1546 | A basic backtrace | 1545 | A basic backtrace |
1547 | ----------------- | 1546 | ----------------- |
1548 | 1547 | ||
1549 | Here are the tricks I use 9 out of 10 times it works pretty well, | 1548 | Here are the tricks I use 9 out of 10 times it works pretty well, |
1550 | 1549 | ||
1551 | When your backchain reaches a dead end | 1550 | When your backchain reaches a dead end |
1552 | -------------------------------------- | 1551 | -------------------------------------- |
1553 | This can happen when an exception happens in the kernel & the kernel is entered twice | 1552 | This can happen when an exception happens in the kernel & the kernel is entered twice |
1554 | if you reach the NULL pointer at the end of the back chain you should be | 1553 | if you reach the NULL pointer at the end of the back chain you should be |
1555 | able to sniff further back if you follow the following tricks. | 1554 | able to sniff further back if you follow the following tricks. |
1556 | 1) A kernel address should be easy to recognise since it is in | 1555 | 1) A kernel address should be easy to recognise since it is in |
1557 | primary space & the problem state bit isn't set & also | 1556 | primary space & the problem state bit isn't set & also |
1558 | The Hi bit of the address is set. | 1557 | The Hi bit of the address is set. |
1559 | 2) Another backchain should also be easy to recognise since it is an | 1558 | 2) Another backchain should also be easy to recognise since it is an |
1560 | address pointing to another address approximately 100 bytes or 0x70 hex | 1559 | address pointing to another address approximately 100 bytes or 0x70 hex |
1561 | behind the current stackpointer. | 1560 | behind the current stackpointer. |
1562 | 1561 | ||
1563 | 1562 | ||
1564 | Here is some practice. | 1563 | Here is some practice. |
1565 | boot the kernel & hit PA1 at some random time | 1564 | boot the kernel & hit PA1 at some random time |
1566 | d g to display the gprs, this should display something like | 1565 | d g to display the gprs, this should display something like |
1567 | GPR 0 = 00000001 00156018 0014359C 00000000 | 1566 | GPR 0 = 00000001 00156018 0014359C 00000000 |
1568 | GPR 4 = 00000001 001B8888 000003E0 00000000 | 1567 | GPR 4 = 00000001 001B8888 000003E0 00000000 |
1569 | GPR 8 = 00100080 00100084 00000000 000FE000 | 1568 | GPR 8 = 00100080 00100084 00000000 000FE000 |
1570 | GPR 12 = 00010400 8001B2DC 8001B36A 000FFED8 | 1569 | GPR 12 = 00010400 8001B2DC 8001B36A 000FFED8 |
1571 | Note that GPR14 is a return address but as we are real men we are going to | 1570 | Note that GPR14 is a return address but as we are real men we are going to |
1572 | trace the stack. | 1571 | trace the stack. |
1573 | display 0x40 bytes after the stack pointer. | 1572 | display 0x40 bytes after the stack pointer. |
1574 | 1573 | ||
1575 | V000FFED8 000FFF38 8001B838 80014C8E 000FFF38 | 1574 | V000FFED8 000FFF38 8001B838 80014C8E 000FFF38 |
1576 | V000FFEE8 00000000 00000000 000003E0 00000000 | 1575 | V000FFEE8 00000000 00000000 000003E0 00000000 |
1577 | V000FFEF8 00100080 00100084 00000000 000FE000 | 1576 | V000FFEF8 00100080 00100084 00000000 000FE000 |
1578 | V000FFF08 00010400 8001B2DC 8001B36A 000FFED8 | 1577 | V000FFF08 00010400 8001B2DC 8001B36A 000FFED8 |
1579 | 1578 | ||
1580 | 1579 | ||
1581 | Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if | 1580 | Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if |
1582 | you look above at our stackframe & also agrees with GPR14. | 1581 | you look above at our stackframe & also agrees with GPR14. |
1583 | 1582 | ||
1584 | now backchain | 1583 | now backchain |
1585 | d 000FFF38.40 | 1584 | d 000FFF38.40 |
1586 | we now are taking the contents of SP to get our first backchain. | 1585 | we now are taking the contents of SP to get our first backchain. |
1587 | 1586 | ||
1588 | V000FFF38 000FFFA0 00000000 00014995 00147094 | 1587 | V000FFF38 000FFFA0 00000000 00014995 00147094 |
1589 | V000FFF48 00147090 001470A0 000003E0 00000000 | 1588 | V000FFF48 00147090 001470A0 000003E0 00000000 |
1590 | V000FFF58 00100080 00100084 00000000 001BF1D0 | 1589 | V000FFF58 00100080 00100084 00000000 001BF1D0 |
1591 | V000FFF68 00010400 800149BA 80014CA6 000FFF38 | 1590 | V000FFF68 00010400 800149BA 80014CA6 000FFF38 |
1592 | 1591 | ||
1593 | This displays a 2nd return address of 80014CA6 | 1592 | This displays a 2nd return address of 80014CA6 |
1594 | 1593 | ||
1595 | now do d 000FFFA0.40 for our 3rd backchain | 1594 | now do d 000FFFA0.40 for our 3rd backchain |
1596 | 1595 | ||
1597 | V000FFFA0 04B52002 0001107F 00000000 00000000 | 1596 | V000FFFA0 04B52002 0001107F 00000000 00000000 |
1598 | V000FFFB0 00000000 00000000 FF000000 0001107F | 1597 | V000FFFB0 00000000 00000000 FF000000 0001107F |
1599 | V000FFFC0 00000000 00000000 00000000 00000000 | 1598 | V000FFFC0 00000000 00000000 00000000 00000000 |
1600 | V000FFFD0 00010400 80010802 8001085A 000FFFA0 | 1599 | V000FFFD0 00010400 80010802 8001085A 000FFFA0 |
1601 | 1600 | ||
1602 | 1601 | ||
1603 | our 3rd return address is 8001085A | 1602 | our 3rd return address is 8001085A |
1604 | 1603 | ||
1605 | as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines | 1604 | as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines |
1606 | for the sake of optimisation dont set up a backchain. | 1605 | for the sake of optimisation dont set up a backchain. |
1607 | 1606 | ||
1608 | now look at System.map to see if the addresses make any sense. | 1607 | now look at System.map to see if the addresses make any sense. |
1609 | 1608 | ||
1610 | grep -i 0001b3 System.map | 1609 | grep -i 0001b3 System.map |
1611 | outputs among other things | 1610 | outputs among other things |
1612 | 0001b304 T cpu_idle | 1611 | 0001b304 T cpu_idle |
1613 | so 8001B36A | 1612 | so 8001B36A |
1614 | is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it ) | 1613 | is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it ) |
1615 | 1614 | ||
1616 | 1615 | ||
1617 | grep -i 00014 System.map | 1616 | grep -i 00014 System.map |
1618 | produces among other things | 1617 | produces among other things |
1619 | 00014a78 T start_kernel | 1618 | 00014a78 T start_kernel |
1620 | so 0014CA6 is start_kernel+some hex number I can't add in my head. | 1619 | so 0014CA6 is start_kernel+some hex number I can't add in my head. |
1621 | 1620 | ||
1622 | grep -i 00108 System.map | 1621 | grep -i 00108 System.map |
1623 | this produces | 1622 | this produces |
1624 | 00010800 T _stext | 1623 | 00010800 T _stext |
1625 | so 8001085A is _stext+0x5a | 1624 | so 8001085A is _stext+0x5a |
1626 | 1625 | ||
1627 | Congrats you've done your first backchain. | 1626 | Congrats you've done your first backchain. |
1628 | 1627 | ||
1629 | 1628 | ||
1630 | 1629 | ||
1631 | s/390 & z/Architecture IO Overview | 1630 | s/390 & z/Architecture IO Overview |
1632 | ================================== | 1631 | ================================== |
1633 | 1632 | ||
1634 | I am not going to give a course in 390 IO architecture as this would take me quite a | 1633 | I am not going to give a course in 390 IO architecture as this would take me quite a |
1635 | while & I'm no expert. Instead I'll give a 390 IO architecture summary for Dummies if you have | 1634 | while & I'm no expert. Instead I'll give a 390 IO architecture summary for Dummies if you have |
1636 | the s/390 principles of operation available read this instead. If nothing else you may find a few | 1635 | the s/390 principles of operation available read this instead. If nothing else you may find a few |
1637 | useful keywords in here & be able to use them on a web search engine like altavista to find | 1636 | useful keywords in here & be able to use them on a web search engine like altavista to find |
1638 | more useful information. | 1637 | more useful information. |
1639 | 1638 | ||
1640 | Unlike other bus architectures modern 390 systems do their IO using mostly | 1639 | Unlike other bus architectures modern 390 systems do their IO using mostly |
1641 | fibre optics & devices such as tapes & disks can be shared between several mainframes, | 1640 | fibre optics & devices such as tapes & disks can be shared between several mainframes, |
1642 | also S390 can support upto 65536 devices while a high end PC based system might be choking | 1641 | also S390 can support upto 65536 devices while a high end PC based system might be choking |
1643 | with around 64. Here is some of the common IO terminology | 1642 | with around 64. Here is some of the common IO terminology |
1644 | 1643 | ||
1645 | Subchannel: | 1644 | Subchannel: |
1646 | This is the logical number most IO commands use to talk to an IO device there can be upto | 1645 | This is the logical number most IO commands use to talk to an IO device there can be upto |
1647 | 0x10000 (65536) of these in a configuration typically there is a few hundred. Under VM | 1646 | 0x10000 (65536) of these in a configuration typically there is a few hundred. Under VM |
1648 | for simplicity they are allocated contiguously, however on the native hardware they are not | 1647 | for simplicity they are allocated contiguously, however on the native hardware they are not |
1649 | they typically stay consistent between boots provided no new hardware is inserted or removed. | 1648 | they typically stay consistent between boots provided no new hardware is inserted or removed. |
1650 | Under Linux for 390 we use these as IRQ's & also when issuing an IO command (CLEAR SUBCHANNEL, | 1649 | Under Linux for 390 we use these as IRQ's & also when issuing an IO command (CLEAR SUBCHANNEL, |
1651 | HALT SUBCHANNEL,MODIFY SUBCHANNEL,RESUME SUBCHANNEL,START SUBCHANNEL,STORE SUBCHANNEL & | 1650 | HALT SUBCHANNEL,MODIFY SUBCHANNEL,RESUME SUBCHANNEL,START SUBCHANNEL,STORE SUBCHANNEL & |
1652 | TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most | 1651 | TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most |
1653 | important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check | 1652 | important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check |
1654 | whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel | 1653 | whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel |
1655 | can have up to 8 channel paths to a device this offers redunancy if one is not available. | 1654 | can have up to 8 channel paths to a device this offers redunancy if one is not available. |
1656 | 1655 | ||
1657 | 1656 | ||
1658 | Device Number: | 1657 | Device Number: |
1659 | This number remains static & Is closely tied to the hardware, there are 65536 of these | 1658 | This number remains static & Is closely tied to the hardware, there are 65536 of these |
1660 | also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits ) | 1659 | also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits ) |
1661 | & another lsb 8 bits. These remain static even if more devices are inserted or removed | 1660 | & another lsb 8 bits. These remain static even if more devices are inserted or removed |
1662 | from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided | 1661 | from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided |
1663 | devices arent inserted or removed. | 1662 | devices arent inserted or removed. |
1664 | 1663 | ||
1665 | Channel Control Words: | 1664 | Channel Control Words: |
1666 | CCWS are linked lists of instructions initially pointed to by an operation request block (ORB), | 1665 | CCWS are linked lists of instructions initially pointed to by an operation request block (ORB), |
1667 | which is initially given to Start Subchannel (SSCH) command along with the subchannel number | 1666 | which is initially given to Start Subchannel (SSCH) command along with the subchannel number |
1668 | for the IO subsystem to process while the CPU continues executing normal code. | 1667 | for the IO subsystem to process while the CPU continues executing normal code. |
1669 | These come in two flavours, Format 0 ( 24 bit for backward ) | 1668 | These come in two flavours, Format 0 ( 24 bit for backward ) |
1670 | compatibility & Format 1 ( 31 bit ). These are typically used to issue read & write | 1669 | compatibility & Format 1 ( 31 bit ). These are typically used to issue read & write |
1671 | ( & many other instructions ) they consist of a length field & an absolute address field. | 1670 | ( & many other instructions ) they consist of a length field & an absolute address field. |
1672 | For each IO typically get 1 or 2 interrupts one for channel end ( primary status ) when the | 1671 | For each IO typically get 1 or 2 interrupts one for channel end ( primary status ) when the |
1673 | channel is idle & the second for device end ( secondary status ) sometimes you get both | 1672 | channel is idle & the second for device end ( secondary status ) sometimes you get both |
1674 | concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each interrupt, | 1673 | concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each interrupt, |
1675 | from which you receive an Interruption response block (IRB). If you get channel & device end | 1674 | from which you receive an Interruption response block (IRB). If you get channel & device end |
1676 | status in the IRB without channel checks etc. your IO probably went okay. If you didn't you | 1675 | status in the IRB without channel checks etc. your IO probably went okay. If you didn't you |
1677 | probably need a doctor to examine the IRB & extended status word etc. | 1676 | probably need a doctor to examine the IRB & extended status word etc. |
1678 | If an error occurs, more sophistocated control units have a facitity known as | 1677 | If an error occurs, more sophistocated control units have a facitity known as |
1679 | concurrent sense this means that if an error occurs Extended sense information will | 1678 | concurrent sense this means that if an error occurs Extended sense information will |
1680 | be presented in the Extended status word in the IRB if not you have to issue a | 1679 | be presented in the Extended status word in the IRB if not you have to issue a |
1681 | subsequent SENSE CCW command after the test subchannel. | 1680 | subsequent SENSE CCW command after the test subchannel. |
1682 | 1681 | ||
1683 | 1682 | ||
1684 | TPI( Test pending interrupt) can also be used for polled IO but in multitasking multiprocessor | 1683 | TPI( Test pending interrupt) can also be used for polled IO but in multitasking multiprocessor |
1685 | systems it isn't recommended except for checking special cases ( i.e. non looping checks for | 1684 | systems it isn't recommended except for checking special cases ( i.e. non looping checks for |
1686 | pending IO etc. ). | 1685 | pending IO etc. ). |
1687 | 1686 | ||
1688 | Store Subchannel & Modify Subchannel can be used to examine & modify operating characteristics | 1687 | Store Subchannel & Modify Subchannel can be used to examine & modify operating characteristics |
1689 | of a subchannel ( e.g. channel paths ). | 1688 | of a subchannel ( e.g. channel paths ). |
1690 | 1689 | ||
1691 | Other IO related Terms: | 1690 | Other IO related Terms: |
1692 | Sysplex: S390's Clustering Technology | 1691 | Sysplex: S390's Clustering Technology |
1693 | QDIO: S390's new high speed IO architecture to support devices such as gigabit ethernet, | 1692 | QDIO: S390's new high speed IO architecture to support devices such as gigabit ethernet, |
1694 | this architecture is also designed to be forward compatible with up & coming 64 bit machines. | 1693 | this architecture is also designed to be forward compatible with up & coming 64 bit machines. |
1695 | 1694 | ||
1696 | 1695 | ||
1697 | General Concepts | 1696 | General Concepts |
1698 | 1697 | ||
1699 | Input Output Processors (IOP's) are responsible for communicating between | 1698 | Input Output Processors (IOP's) are responsible for communicating between |
1700 | the mainframe CPU's & the channel & relieve the mainframe CPU's from the | 1699 | the mainframe CPU's & the channel & relieve the mainframe CPU's from the |
1701 | burden of communicating with IO devices directly, this allows the CPU's to | 1700 | burden of communicating with IO devices directly, this allows the CPU's to |
1702 | concentrate on data processing. | 1701 | concentrate on data processing. |
1703 | 1702 | ||
1704 | IOP's can use one or more links ( known as channel paths ) to talk to each | 1703 | IOP's can use one or more links ( known as channel paths ) to talk to each |
1705 | IO device. It first checks for path availability & chooses an available one, | 1704 | IO device. It first checks for path availability & chooses an available one, |
1706 | then starts ( & sometimes terminates IO ). | 1705 | then starts ( & sometimes terminates IO ). |
1707 | There are two types of channel path: ESCON & the Parallel IO interface. | 1706 | There are two types of channel path: ESCON & the Parallel IO interface. |
1708 | 1707 | ||
1709 | IO devices are attached to control units, control units provide the | 1708 | IO devices are attached to control units, control units provide the |
1710 | logic to interface the channel paths & channel path IO protocols to | 1709 | logic to interface the channel paths & channel path IO protocols to |
1711 | the IO devices, they can be integrated with the devices or housed separately | 1710 | the IO devices, they can be integrated with the devices or housed separately |
1712 | & often talk to several similar devices ( typical examples would be raid | 1711 | & often talk to several similar devices ( typical examples would be raid |
1713 | controllers or a control unit which connects to 1000 3270 terminals ). | 1712 | controllers or a control unit which connects to 1000 3270 terminals ). |
1714 | 1713 | ||
1715 | 1714 | ||
1716 | +---------------------------------------------------------------+ | 1715 | +---------------------------------------------------------------+ |
1717 | | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | | 1716 | | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | |
1718 | | | CPU | | CPU | | CPU | | CPU | | Main | | Expanded | | | 1717 | | | CPU | | CPU | | CPU | | CPU | | Main | | Expanded | | |
1719 | | | | | | | | | | | Memory | | Storage | | | 1718 | | | | | | | | | | | Memory | | Storage | | |
1720 | | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | | 1719 | | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | |
1721 | |---------------------------------------------------------------+ | 1720 | |---------------------------------------------------------------+ |
1722 | | IOP | IOP | IOP | | 1721 | | IOP | IOP | IOP | |
1723 | |--------------------------------------------------------------- | 1722 | |--------------------------------------------------------------- |
1724 | | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | | 1723 | | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | |
1725 | ---------------------------------------------------------------- | 1724 | ---------------------------------------------------------------- |
1726 | || || | 1725 | || || |
1727 | || Bus & Tag Channel Path || ESCON | 1726 | || Bus & Tag Channel Path || ESCON |
1728 | || ====================== || Channel | 1727 | || ====================== || Channel |
1729 | || || || || Path | 1728 | || || || || Path |
1730 | +----------+ +----------+ +----------+ | 1729 | +----------+ +----------+ +----------+ |
1731 | | | | | | | | 1730 | | | | | | | |
1732 | | CU | | CU | | CU | | 1731 | | CU | | CU | | CU | |
1733 | | | | | | | | 1732 | | | | | | | |
1734 | +----------+ +----------+ +----------+ | 1733 | +----------+ +----------+ +----------+ |
1735 | | | | | | | 1734 | | | | | | |
1736 | +----------+ +----------+ +----------+ +----------+ +----------+ | 1735 | +----------+ +----------+ +----------+ +----------+ +----------+ |
1737 | |I/O Device| |I/O Device| |I/O Device| |I/O Device| |I/O Device| | 1736 | |I/O Device| |I/O Device| |I/O Device| |I/O Device| |I/O Device| |
1738 | +----------+ +----------+ +----------+ +----------+ +----------+ | 1737 | +----------+ +----------+ +----------+ +----------+ +----------+ |
1739 | CPU = Central Processing Unit | 1738 | CPU = Central Processing Unit |
1740 | C = Channel | 1739 | C = Channel |
1741 | IOP = IP Processor | 1740 | IOP = IP Processor |
1742 | CU = Control Unit | 1741 | CU = Control Unit |
1743 | 1742 | ||
1744 | The 390 IO systems come in 2 flavours the current 390 machines support both | 1743 | The 390 IO systems come in 2 flavours the current 390 machines support both |
1745 | 1744 | ||
1746 | The Older 360 & 370 Interface,sometimes called the Parallel I/O interface, | 1745 | The Older 360 & 370 Interface,sometimes called the Parallel I/O interface, |
1747 | sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers | 1746 | sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers |
1748 | Interface (OEMI). | 1747 | Interface (OEMI). |
1749 | 1748 | ||
1750 | This byte wide Parallel channel path/bus has parity & data on the "Bus" cable | 1749 | This byte wide Parallel channel path/bus has parity & data on the "Bus" cable |
1751 | & control lines on the "Tag" cable. These can operate in byte multiplex mode for | 1750 | & control lines on the "Tag" cable. These can operate in byte multiplex mode for |
1752 | sharing between several slow devices or burst mode & monopolize the channel for the | 1751 | sharing between several slow devices or burst mode & monopolize the channel for the |
1753 | whole burst. Upto 256 devices can be addressed on one of these cables. These cables are | 1752 | whole burst. Upto 256 devices can be addressed on one of these cables. These cables are |
1754 | about one inch in diameter. The maximum unextended length supported by these cables is | 1753 | about one inch in diameter. The maximum unextended length supported by these cables is |
1755 | 125 Meters but this can be extended up to 2km with a fibre optic channel extended | 1754 | 125 Meters but this can be extended up to 2km with a fibre optic channel extended |
1756 | such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however | 1755 | such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however |
1757 | some really old processors support only transfer rates of 3.0, 2.0 & 1.0 MB/sec. | 1756 | some really old processors support only transfer rates of 3.0, 2.0 & 1.0 MB/sec. |
1758 | One of these paths can be daisy chained to up to 8 control units. | 1757 | One of these paths can be daisy chained to up to 8 control units. |
1759 | 1758 | ||
1760 | 1759 | ||
1761 | ESCON if fibre optic it is also called FICON | 1760 | ESCON if fibre optic it is also called FICON |
1762 | Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers | 1761 | Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers |
1763 | for communication at a signaling rate of upto 200 megabits/sec. As 10bits are transferred | 1762 | for communication at a signaling rate of upto 200 megabits/sec. As 10bits are transferred |
1764 | for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once | 1763 | for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once |
1765 | control info & CRC are added. ESCON only operates in burst mode. | 1764 | control info & CRC are added. ESCON only operates in burst mode. |
1766 | 1765 | ||
1767 | ESCONs typical max cable length is 3km for the led version & 20km for the laser version | 1766 | ESCONs typical max cable length is 3km for the led version & 20km for the laser version |
1768 | known as XDF ( extended distance facility ). This can be further extended by using an | 1767 | known as XDF ( extended distance facility ). This can be further extended by using an |
1769 | ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is | 1768 | ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is |
1770 | serial it uses a packet switching architecture the standard Bus & Tag control protocol | 1769 | serial it uses a packet switching architecture the standard Bus & Tag control protocol |
1771 | is however present within the packets. Upto 256 devices can be attached to each control | 1770 | is however present within the packets. Upto 256 devices can be attached to each control |
1772 | unit that uses one of these interfaces. | 1771 | unit that uses one of these interfaces. |
1773 | 1772 | ||
1774 | Common 390 Devices include: | 1773 | Common 390 Devices include: |
1775 | Network adapters typically OSA2,3172's,2116's & OSA-E gigabit ethernet adapters, | 1774 | Network adapters typically OSA2,3172's,2116's & OSA-E gigabit ethernet adapters, |
1776 | Consoles 3270 & 3215 ( a teletype emulated under linux for a line mode console ). | 1775 | Consoles 3270 & 3215 ( a teletype emulated under linux for a line mode console ). |
1777 | DASD's direct access storage devices ( otherwise known as hard disks ). | 1776 | DASD's direct access storage devices ( otherwise known as hard disks ). |
1778 | Tape Drives. | 1777 | Tape Drives. |
1779 | CTC ( Channel to Channel Adapters ), | 1778 | CTC ( Channel to Channel Adapters ), |
1780 | ESCON or Parallel Cables used as a very high speed serial link | 1779 | ESCON or Parallel Cables used as a very high speed serial link |
1781 | between 2 machines. We use 2 cables under linux to do a bi-directional serial link. | 1780 | between 2 machines. We use 2 cables under linux to do a bi-directional serial link. |
1782 | 1781 | ||
1783 | 1782 | ||
1784 | Debugging IO on s/390 & z/Architecture under VM | 1783 | Debugging IO on s/390 & z/Architecture under VM |
1785 | =============================================== | 1784 | =============================================== |
1786 | 1785 | ||
1787 | Now we are ready to go on with IO tracing commands under VM | 1786 | Now we are ready to go on with IO tracing commands under VM |
1788 | 1787 | ||
1789 | A few self explanatory queries: | 1788 | A few self explanatory queries: |
1790 | Q OSA | 1789 | Q OSA |
1791 | Q CTC | 1790 | Q CTC |
1792 | Q DISK ( This command is CMS specific ) | 1791 | Q DISK ( This command is CMS specific ) |
1793 | Q DASD | 1792 | Q DASD |
1794 | 1793 | ||
1795 | 1794 | ||
1796 | 1795 | ||
1797 | 1796 | ||
1798 | 1797 | ||
1799 | 1798 | ||
1800 | Q OSA on my machine returns | 1799 | Q OSA on my machine returns |
1801 | OSA 7C08 ON OSA 7C08 SUBCHANNEL = 0000 | 1800 | OSA 7C08 ON OSA 7C08 SUBCHANNEL = 0000 |
1802 | OSA 7C09 ON OSA 7C09 SUBCHANNEL = 0001 | 1801 | OSA 7C09 ON OSA 7C09 SUBCHANNEL = 0001 |
1803 | OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002 | 1802 | OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002 |
1804 | OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003 | 1803 | OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003 |
1805 | 1804 | ||
1806 | If you have a guest with certain privileges you may be able to see devices | 1805 | If you have a guest with certain privileges you may be able to see devices |
1807 | which don't belong to you. To avoid this, add the option V. | 1806 | which don't belong to you. To avoid this, add the option V. |
1808 | e.g. | 1807 | e.g. |
1809 | Q V OSA | 1808 | Q V OSA |
1810 | 1809 | ||
1811 | Now using the device numbers returned by this command we will | 1810 | Now using the device numbers returned by this command we will |
1812 | Trace the io starting up on the first device 7c08 & 7c09 | 1811 | Trace the io starting up on the first device 7c08 & 7c09 |
1813 | In our simplest case we can trace the | 1812 | In our simplest case we can trace the |
1814 | start subchannels | 1813 | start subchannels |
1815 | like TR SSCH 7C08-7C09 | 1814 | like TR SSCH 7C08-7C09 |
1816 | or the halt subchannels | 1815 | or the halt subchannels |
1817 | or TR HSCH 7C08-7C09 | 1816 | or TR HSCH 7C08-7C09 |
1818 | MSCH's ,STSCH's I think you can guess the rest | 1817 | MSCH's ,STSCH's I think you can guess the rest |
1819 | 1818 | ||
1820 | Ingo's favourite trick is tracing all the IO's & CCWS & spooling them into the reader of another | 1819 | Ingo's favourite trick is tracing all the IO's & CCWS & spooling them into the reader of another |
1821 | VM guest so he can ftp the logfile back to his own machine.I'll do a small bit of this & give you | 1820 | VM guest so he can ftp the logfile back to his own machine.I'll do a small bit of this & give you |
1822 | a look at the output. | 1821 | a look at the output. |
1823 | 1822 | ||
1824 | 1) Spool stdout to VM reader | 1823 | 1) Spool stdout to VM reader |
1825 | SP PRT TO (another vm guest ) or * for the local vm guest | 1824 | SP PRT TO (another vm guest ) or * for the local vm guest |
1826 | 2) Fill the reader with the trace | 1825 | 2) Fill the reader with the trace |
1827 | TR IO 7c08-7c09 INST INT CCW PRT RUN | 1826 | TR IO 7c08-7c09 INST INT CCW PRT RUN |
1828 | 3) Start up linux | 1827 | 3) Start up linux |
1829 | i 00c | 1828 | i 00c |
1830 | 4) Finish the trace | 1829 | 4) Finish the trace |
1831 | TR END | 1830 | TR END |
1832 | 5) close the reader | 1831 | 5) close the reader |
1833 | C PRT | 1832 | C PRT |
1834 | 6) list reader contents | 1833 | 6) list reader contents |
1835 | RDRLIST | 1834 | RDRLIST |
1836 | 7) copy it to linux4's minidisk | 1835 | 7) copy it to linux4's minidisk |
1837 | RECEIVE / LOG TXT A1 ( replace | 1836 | RECEIVE / LOG TXT A1 ( replace |
1838 | 8) | 1837 | 8) |
1839 | filel & press F11 to look at it | 1838 | filel & press F11 to look at it |
1840 | You should see something like: | 1839 | You should see something like: |
1841 | 1840 | ||
1842 | 00020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08 | 1841 | 00020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08 |
1843 | CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80 | 1842 | CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80 |
1844 | CCW 000FFDF0 E4200100 00487FE8 0000 E4240100 ........ | 1843 | CCW 000FFDF0 E4200100 00487FE8 0000 E4240100 ........ |
1845 | IDAL 43D8AFE8 | 1844 | IDAL 43D8AFE8 |
1846 | IDAL 0FB76000 | 1845 | IDAL 0FB76000 |
1847 | 00020B0A' I/O DEV 7C08 -> 000197BC' SCH 0000 PARM 00E2C9C4 | 1846 | 00020B0A' I/O DEV 7C08 -> 000197BC' SCH 0000 PARM 00E2C9C4 |
1848 | 00021628' TSCH B2354000 >> 00488164 CC 0 SCH 0000 DEV 7C08 | 1847 | 00021628' TSCH B2354000 >> 00488164 CC 0 SCH 0000 DEV 7C08 |
1849 | CCWA 000FFDF8 DEV STS 0C SCH STS 00 CNT 00EC | 1848 | CCWA 000FFDF8 DEV STS 0C SCH STS 00 CNT 00EC |
1850 | KEY 0 FPI C0 CC 0 CTLS 4007 | 1849 | KEY 0 FPI C0 CC 0 CTLS 4007 |
1851 | 00022238' STSCH B2344000 >> 00488108 CC 0 SCH 0000 DEV 7C08 | 1850 | 00022238' STSCH B2344000 >> 00488108 CC 0 SCH 0000 DEV 7C08 |
1852 | 1851 | ||
1853 | If you don't like messing up your readed ( because you possibly booted from it ) | 1852 | If you don't like messing up your readed ( because you possibly booted from it ) |
1854 | you can alternatively spool it to another readers guest. | 1853 | you can alternatively spool it to another readers guest. |
1855 | 1854 | ||
1856 | 1855 | ||
1857 | Other common VM device related commands | 1856 | Other common VM device related commands |
1858 | --------------------------------------------- | 1857 | --------------------------------------------- |
1859 | These commands are listed only because they have | 1858 | These commands are listed only because they have |
1860 | been of use to me in the past & may be of use to | 1859 | been of use to me in the past & may be of use to |
1861 | you too. For more complete info on each of the commands | 1860 | you too. For more complete info on each of the commands |
1862 | use type HELP <command> from CMS. | 1861 | use type HELP <command> from CMS. |
1863 | detaching devices | 1862 | detaching devices |
1864 | DET <devno range> | 1863 | DET <devno range> |
1865 | ATT <devno range> <guest> | 1864 | ATT <devno range> <guest> |
1866 | attach a device to guest * for your own guest | 1865 | attach a device to guest * for your own guest |
1867 | READY <devno> cause VM to issue a fake interrupt. | 1866 | READY <devno> cause VM to issue a fake interrupt. |
1868 | 1867 | ||
1869 | The VARY command is normally only available to VM administrators. | 1868 | The VARY command is normally only available to VM administrators. |
1870 | VARY ON PATH <path> TO <devno range> | 1869 | VARY ON PATH <path> TO <devno range> |
1871 | VARY OFF PATH <PATH> FROM <devno range> | 1870 | VARY OFF PATH <PATH> FROM <devno range> |
1872 | This is used to switch on or off channel paths to devices. | 1871 | This is used to switch on or off channel paths to devices. |
1873 | 1872 | ||
1874 | Q CHPID <channel path ID> | 1873 | Q CHPID <channel path ID> |
1875 | This displays state of devices using this channel path | 1874 | This displays state of devices using this channel path |
1876 | D SCHIB <subchannel> | 1875 | D SCHIB <subchannel> |
1877 | This displays the subchannel information SCHIB block for the device. | 1876 | This displays the subchannel information SCHIB block for the device. |
1878 | this I believe is also only available to administrators. | 1877 | this I believe is also only available to administrators. |
1879 | DEFINE CTC <devno> | 1878 | DEFINE CTC <devno> |
1880 | defines a virtual CTC channel to channel connection | 1879 | defines a virtual CTC channel to channel connection |
1881 | 2 need to be defined on each guest for the CTC driver to use. | 1880 | 2 need to be defined on each guest for the CTC driver to use. |
1882 | COUPLE devno userid remote devno | 1881 | COUPLE devno userid remote devno |
1883 | Joins a local virtual device to a remote virtual device | 1882 | Joins a local virtual device to a remote virtual device |
1884 | ( commonly used for the CTC driver ). | 1883 | ( commonly used for the CTC driver ). |
1885 | 1884 | ||
1886 | Building a VM ramdisk under CMS which linux can use | 1885 | Building a VM ramdisk under CMS which linux can use |
1887 | def vfb-<blocksize> <subchannel> <number blocks> | 1886 | def vfb-<blocksize> <subchannel> <number blocks> |
1888 | blocksize is commonly 4096 for linux. | 1887 | blocksize is commonly 4096 for linux. |
1889 | Formatting it | 1888 | Formatting it |
1890 | format <subchannel> <driver letter e.g. x> (blksize <blocksize> | 1889 | format <subchannel> <driver letter e.g. x> (blksize <blocksize> |
1891 | 1890 | ||
1892 | Sharing a disk between multiple guests | 1891 | Sharing a disk between multiple guests |
1893 | LINK userid devno1 devno2 mode password | 1892 | LINK userid devno1 devno2 mode password |
1894 | 1893 | ||
1895 | 1894 | ||
1896 | 1895 | ||
1897 | GDB on S390 | 1896 | GDB on S390 |
1898 | =========== | 1897 | =========== |
1899 | N.B. if compiling for debugging gdb works better without optimisation | 1898 | N.B. if compiling for debugging gdb works better without optimisation |
1900 | ( see Compiling programs for debugging ) | 1899 | ( see Compiling programs for debugging ) |
1901 | 1900 | ||
1902 | invocation | 1901 | invocation |
1903 | ---------- | 1902 | ---------- |
1904 | gdb <victim program> <optional corefile> | 1903 | gdb <victim program> <optional corefile> |
1905 | 1904 | ||
1906 | Online help | 1905 | Online help |
1907 | ----------- | 1906 | ----------- |
1908 | help: gives help on commands | 1907 | help: gives help on commands |
1909 | e.g. | 1908 | e.g. |
1910 | help | 1909 | help |
1911 | help display | 1910 | help display |
1912 | Note gdb's online help is very good use it. | 1911 | Note gdb's online help is very good use it. |
1913 | 1912 | ||
1914 | 1913 | ||
1915 | Assembly | 1914 | Assembly |
1916 | -------- | 1915 | -------- |
1917 | info registers: displays registers other than floating point. | 1916 | info registers: displays registers other than floating point. |
1918 | info all-registers: displays floating points as well. | 1917 | info all-registers: displays floating points as well. |
1919 | disassemble: disassembles | 1918 | disassemble: disassembles |
1920 | e.g. | 1919 | e.g. |
1921 | disassemble without parameters will disassemble the current function | 1920 | disassemble without parameters will disassemble the current function |
1922 | disassemble $pc $pc+10 | 1921 | disassemble $pc $pc+10 |
1923 | 1922 | ||
1924 | Viewing & modifying variables | 1923 | Viewing & modifying variables |
1925 | ----------------------------- | 1924 | ----------------------------- |
1926 | print or p: displays variable or register | 1925 | print or p: displays variable or register |
1927 | e.g. p/x $sp will display the stack pointer | 1926 | e.g. p/x $sp will display the stack pointer |
1928 | 1927 | ||
1929 | display: prints variable or register each time program stops | 1928 | display: prints variable or register each time program stops |
1930 | e.g. | 1929 | e.g. |
1931 | display/x $pc will display the program counter | 1930 | display/x $pc will display the program counter |
1932 | display argc | 1931 | display argc |
1933 | 1932 | ||
1934 | undisplay : undo's display's | 1933 | undisplay : undo's display's |
1935 | 1934 | ||
1936 | info breakpoints: shows all current breakpoints | 1935 | info breakpoints: shows all current breakpoints |
1937 | 1936 | ||
1938 | info stack: shows stack back trace ( if this doesn't work too well, I'll show you the | 1937 | info stack: shows stack back trace ( if this doesn't work too well, I'll show you the |
1939 | stacktrace by hand below ). | 1938 | stacktrace by hand below ). |
1940 | 1939 | ||
1941 | info locals: displays local variables. | 1940 | info locals: displays local variables. |
1942 | 1941 | ||
1943 | info args: display current procedure arguments. | 1942 | info args: display current procedure arguments. |
1944 | 1943 | ||
1945 | set args: will set argc & argv each time the victim program is invoked. | 1944 | set args: will set argc & argv each time the victim program is invoked. |
1946 | 1945 | ||
1947 | set <variable>=value | 1946 | set <variable>=value |
1948 | set argc=100 | 1947 | set argc=100 |
1949 | set $pc=0 | 1948 | set $pc=0 |
1950 | 1949 | ||
1951 | 1950 | ||
1952 | 1951 | ||
1953 | Modifying execution | 1952 | Modifying execution |
1954 | ------------------- | 1953 | ------------------- |
1955 | step: steps n lines of sourcecode | 1954 | step: steps n lines of sourcecode |
1956 | step steps 1 line. | 1955 | step steps 1 line. |
1957 | step 100 steps 100 lines of code. | 1956 | step 100 steps 100 lines of code. |
1958 | 1957 | ||
1959 | next: like step except this will not step into subroutines | 1958 | next: like step except this will not step into subroutines |
1960 | 1959 | ||
1961 | stepi: steps a single machine code instruction. | 1960 | stepi: steps a single machine code instruction. |
1962 | e.g. stepi 100 | 1961 | e.g. stepi 100 |
1963 | 1962 | ||
1964 | nexti: steps a single machine code instruction but will not step into subroutines. | 1963 | nexti: steps a single machine code instruction but will not step into subroutines. |
1965 | 1964 | ||
1966 | finish: will run until exit of the current routine | 1965 | finish: will run until exit of the current routine |
1967 | 1966 | ||
1968 | run: (re)starts a program | 1967 | run: (re)starts a program |
1969 | 1968 | ||
1970 | cont: continues a program | 1969 | cont: continues a program |
1971 | 1970 | ||
1972 | quit: exits gdb. | 1971 | quit: exits gdb. |
1973 | 1972 | ||
1974 | 1973 | ||
1975 | breakpoints | 1974 | breakpoints |
1976 | ------------ | 1975 | ------------ |
1977 | 1976 | ||
1978 | break | 1977 | break |
1979 | sets a breakpoint | 1978 | sets a breakpoint |
1980 | e.g. | 1979 | e.g. |
1981 | 1980 | ||
1982 | break main | 1981 | break main |
1983 | 1982 | ||
1984 | break *$pc | 1983 | break *$pc |
1985 | 1984 | ||
1986 | break *0x400618 | 1985 | break *0x400618 |
1987 | 1986 | ||
1988 | heres a really useful one for large programs | 1987 | heres a really useful one for large programs |
1989 | rbr | 1988 | rbr |
1990 | Set a breakpoint for all functions matching REGEXP | 1989 | Set a breakpoint for all functions matching REGEXP |
1991 | e.g. | 1990 | e.g. |
1992 | rbr 390 | 1991 | rbr 390 |
1993 | will set a breakpoint with all functions with 390 in their name. | 1992 | will set a breakpoint with all functions with 390 in their name. |
1994 | 1993 | ||
1995 | info breakpoints | 1994 | info breakpoints |
1996 | lists all breakpoints | 1995 | lists all breakpoints |
1997 | 1996 | ||
1998 | delete: delete breakpoint by number or delete them all | 1997 | delete: delete breakpoint by number or delete them all |
1999 | e.g. | 1998 | e.g. |
2000 | delete 1 will delete the first breakpoint | 1999 | delete 1 will delete the first breakpoint |
2001 | delete will delete them all | 2000 | delete will delete them all |
2002 | 2001 | ||
2003 | watch: This will set a watchpoint ( usually hardware assisted ), | 2002 | watch: This will set a watchpoint ( usually hardware assisted ), |
2004 | This will watch a variable till it changes | 2003 | This will watch a variable till it changes |
2005 | e.g. | 2004 | e.g. |
2006 | watch cnt, will watch the variable cnt till it changes. | 2005 | watch cnt, will watch the variable cnt till it changes. |
2007 | As an aside unfortunately gdb's, architecture independent watchpoint code | 2006 | As an aside unfortunately gdb's, architecture independent watchpoint code |
2008 | is inconsistent & not very good, watchpoints usually work but not always. | 2007 | is inconsistent & not very good, watchpoints usually work but not always. |
2009 | 2008 | ||
2010 | info watchpoints: Display currently active watchpoints | 2009 | info watchpoints: Display currently active watchpoints |
2011 | 2010 | ||
2012 | condition: ( another useful one ) | 2011 | condition: ( another useful one ) |
2013 | Specify breakpoint number N to break only if COND is true. | 2012 | Specify breakpoint number N to break only if COND is true. |
2014 | Usage is `condition N COND', where N is an integer and COND is an | 2013 | Usage is `condition N COND', where N is an integer and COND is an |
2015 | expression to be evaluated whenever breakpoint N is reached. | 2014 | expression to be evaluated whenever breakpoint N is reached. |
2016 | 2015 | ||
2017 | 2016 | ||
2018 | 2017 | ||
2019 | User defined functions/macros | 2018 | User defined functions/macros |
2020 | ----------------------------- | 2019 | ----------------------------- |
2021 | define: ( Note this is very very useful,simple & powerful ) | 2020 | define: ( Note this is very very useful,simple & powerful ) |
2022 | usage define <name> <list of commands> end | 2021 | usage define <name> <list of commands> end |
2023 | 2022 | ||
2024 | examples which you should consider putting into .gdbinit in your home directory | 2023 | examples which you should consider putting into .gdbinit in your home directory |
2025 | define d | 2024 | define d |
2026 | stepi | 2025 | stepi |
2027 | disassemble $pc $pc+10 | 2026 | disassemble $pc $pc+10 |
2028 | end | 2027 | end |
2029 | 2028 | ||
2030 | define e | 2029 | define e |
2031 | nexti | 2030 | nexti |
2032 | disassemble $pc $pc+10 | 2031 | disassemble $pc $pc+10 |
2033 | end | 2032 | end |
2034 | 2033 | ||
2035 | 2034 | ||
2036 | Other hard to classify stuff | 2035 | Other hard to classify stuff |
2037 | ---------------------------- | 2036 | ---------------------------- |
2038 | signal n: | 2037 | signal n: |
2039 | sends the victim program a signal. | 2038 | sends the victim program a signal. |
2040 | e.g. signal 3 will send a SIGQUIT. | 2039 | e.g. signal 3 will send a SIGQUIT. |
2041 | 2040 | ||
2042 | info signals: | 2041 | info signals: |
2043 | what gdb does when the victim receives certain signals. | 2042 | what gdb does when the victim receives certain signals. |
2044 | 2043 | ||
2045 | list: | 2044 | list: |
2046 | e.g. | 2045 | e.g. |
2047 | list lists current function source | 2046 | list lists current function source |
2048 | list 1,10 list first 10 lines of current file. | 2047 | list 1,10 list first 10 lines of current file. |
2049 | list test.c:1,10 | 2048 | list test.c:1,10 |
2050 | 2049 | ||
2051 | 2050 | ||
2052 | directory: | 2051 | directory: |
2053 | Adds directories to be searched for source if gdb cannot find the source. | 2052 | Adds directories to be searched for source if gdb cannot find the source. |
2054 | (note it is a bit sensititive about slashes) | 2053 | (note it is a bit sensititive about slashes) |
2055 | e.g. To add the root of the filesystem to the searchpath do | 2054 | e.g. To add the root of the filesystem to the searchpath do |
2056 | directory // | 2055 | directory // |
2057 | 2056 | ||
2058 | 2057 | ||
2059 | call <function> | 2058 | call <function> |
2060 | This calls a function in the victim program, this is pretty powerful | 2059 | This calls a function in the victim program, this is pretty powerful |
2061 | e.g. | 2060 | e.g. |
2062 | (gdb) call printf("hello world") | 2061 | (gdb) call printf("hello world") |
2063 | outputs: | 2062 | outputs: |
2064 | $1 = 11 | 2063 | $1 = 11 |
2065 | 2064 | ||
2066 | You might now be thinking that the line above didn't work, something extra had to be done. | 2065 | You might now be thinking that the line above didn't work, something extra had to be done. |
2067 | (gdb) call fflush(stdout) | 2066 | (gdb) call fflush(stdout) |
2068 | hello world$2 = 0 | 2067 | hello world$2 = 0 |
2069 | As an aside the debugger also calls malloc & free under the hood | 2068 | As an aside the debugger also calls malloc & free under the hood |
2070 | to make space for the "hello world" string. | 2069 | to make space for the "hello world" string. |
2071 | 2070 | ||
2072 | 2071 | ||
2073 | 2072 | ||
2074 | hints | 2073 | hints |
2075 | ----- | 2074 | ----- |
2076 | 1) command completion works just like bash | 2075 | 1) command completion works just like bash |
2077 | ( if you are a bad typist like me this really helps ) | 2076 | ( if you are a bad typist like me this really helps ) |
2078 | e.g. hit br <TAB> & cursor up & down :-). | 2077 | e.g. hit br <TAB> & cursor up & down :-). |
2079 | 2078 | ||
2080 | 2) if you have a debugging problem that takes a few steps to recreate | 2079 | 2) if you have a debugging problem that takes a few steps to recreate |
2081 | put the steps into a file called .gdbinit in your current working directory | 2080 | put the steps into a file called .gdbinit in your current working directory |
2082 | if you have defined a few extra useful user defined commands put these in | 2081 | if you have defined a few extra useful user defined commands put these in |
2083 | your home directory & they will be read each time gdb is launched. | 2082 | your home directory & they will be read each time gdb is launched. |
2084 | 2083 | ||
2085 | A typical .gdbinit file might be. | 2084 | A typical .gdbinit file might be. |
2086 | break main | 2085 | break main |
2087 | run | 2086 | run |
2088 | break runtime_exception | 2087 | break runtime_exception |
2089 | cont | 2088 | cont |
2090 | 2089 | ||
2091 | 2090 | ||
2092 | stack chaining in gdb by hand | 2091 | stack chaining in gdb by hand |
2093 | ----------------------------- | 2092 | ----------------------------- |
2094 | This is done using a the same trick described for VM | 2093 | This is done using a the same trick described for VM |
2095 | p/x (*($sp+56))&0x7fffffff get the first backchain. | 2094 | p/x (*($sp+56))&0x7fffffff get the first backchain. |
2096 | 2095 | ||
2097 | For z/Architecture | 2096 | For z/Architecture |
2098 | Replace 56 with 112 & ignore the &0x7fffffff | 2097 | Replace 56 with 112 & ignore the &0x7fffffff |
2099 | in the macros below & do nasty casts to longs like the following | 2098 | in the macros below & do nasty casts to longs like the following |
2100 | as gdb unfortunately deals with printed arguments as ints which | 2099 | as gdb unfortunately deals with printed arguments as ints which |
2101 | messes up everything. | 2100 | messes up everything. |
2102 | i.e. here is a 3rd backchain dereference | 2101 | i.e. here is a 3rd backchain dereference |
2103 | p/x *(long *)(***(long ***)$sp+112) | 2102 | p/x *(long *)(***(long ***)$sp+112) |
2104 | 2103 | ||
2105 | 2104 | ||
2106 | this outputs | 2105 | this outputs |
2107 | $5 = 0x528f18 | 2106 | $5 = 0x528f18 |
2108 | on my machine. | 2107 | on my machine. |
2109 | Now you can use | 2108 | Now you can use |
2110 | info symbol (*($sp+56))&0x7fffffff | 2109 | info symbol (*($sp+56))&0x7fffffff |
2111 | you might see something like. | 2110 | you might see something like. |
2112 | rl_getc + 36 in section .text telling you what is located at address 0x528f18 | 2111 | rl_getc + 36 in section .text telling you what is located at address 0x528f18 |
2113 | Now do. | 2112 | Now do. |
2114 | p/x (*(*$sp+56))&0x7fffffff | 2113 | p/x (*(*$sp+56))&0x7fffffff |
2115 | This outputs | 2114 | This outputs |
2116 | $6 = 0x528ed0 | 2115 | $6 = 0x528ed0 |
2117 | Now do. | 2116 | Now do. |
2118 | info symbol (*(*$sp+56))&0x7fffffff | 2117 | info symbol (*(*$sp+56))&0x7fffffff |
2119 | rl_read_key + 180 in section .text | 2118 | rl_read_key + 180 in section .text |
2120 | now do | 2119 | now do |
2121 | p/x (*(**$sp+56))&0x7fffffff | 2120 | p/x (*(**$sp+56))&0x7fffffff |
2122 | & so on. | 2121 | & so on. |
2123 | 2122 | ||
2124 | Disassembling instructions without debug info | 2123 | Disassembling instructions without debug info |
2125 | --------------------------------------------- | 2124 | --------------------------------------------- |
2126 | gdb typically complains if there is a lack of debugging | 2125 | gdb typically complains if there is a lack of debugging |
2127 | symbols in the disassemble command with | 2126 | symbols in the disassemble command with |
2128 | "No function contains specified address." To get around | 2127 | "No function contains specified address." To get around |
2129 | this do | 2128 | this do |
2130 | x/<number lines to disassemble>xi <address> | 2129 | x/<number lines to disassemble>xi <address> |
2131 | e.g. | 2130 | e.g. |
2132 | x/20xi 0x400730 | 2131 | x/20xi 0x400730 |
2133 | 2132 | ||
2134 | 2133 | ||
2135 | 2134 | ||
2136 | Note: Remember gdb has history just like bash you don't need to retype the | 2135 | Note: Remember gdb has history just like bash you don't need to retype the |
2137 | whole line just use the up & down arrows. | 2136 | whole line just use the up & down arrows. |
2138 | 2137 | ||
2139 | 2138 | ||
2140 | 2139 | ||
2141 | For more info | 2140 | For more info |
2142 | ------------- | 2141 | ------------- |
2143 | From your linuxbox do | 2142 | From your linuxbox do |
2144 | man gdb or info gdb. | 2143 | man gdb or info gdb. |
2145 | 2144 | ||
2146 | core dumps | 2145 | core dumps |
2147 | ---------- | 2146 | ---------- |
2148 | What a core dump ?, | 2147 | What a core dump ?, |
2149 | A core dump is a file generated by the kernel ( if allowed ) which contains the registers, | 2148 | A core dump is a file generated by the kernel ( if allowed ) which contains the registers, |
2150 | & all active pages of the program which has crashed. | 2149 | & all active pages of the program which has crashed. |
2151 | From this file gdb will allow you to look at the registers & stack trace & memory of the | 2150 | From this file gdb will allow you to look at the registers & stack trace & memory of the |
2152 | program as if it just crashed on your system, it is usually called core & created in the | 2151 | program as if it just crashed on your system, it is usually called core & created in the |
2153 | current working directory. | 2152 | current working directory. |
2154 | This is very useful in that a customer can mail a core dump to a technical support department | 2153 | This is very useful in that a customer can mail a core dump to a technical support department |
2155 | & the technical support department can reconstruct what happened. | 2154 | & the technical support department can reconstruct what happened. |
2156 | Provided the have an identical copy of this program with debugging symbols compiled in & | 2155 | Provided the have an identical copy of this program with debugging symbols compiled in & |
2157 | the source base of this build is available. | 2156 | the source base of this build is available. |
2158 | In short it is far more useful than something like a crash log could ever hope to be. | 2157 | In short it is far more useful than something like a crash log could ever hope to be. |
2159 | 2158 | ||
2160 | In theory all that is missing to restart a core dumped program is a kernel patch which | 2159 | In theory all that is missing to restart a core dumped program is a kernel patch which |
2161 | will do the following. | 2160 | will do the following. |
2162 | 1) Make a new kernel task structure | 2161 | 1) Make a new kernel task structure |
2163 | 2) Reload all the dumped pages back into the kernel's memory management structures. | 2162 | 2) Reload all the dumped pages back into the kernel's memory management structures. |
2164 | 3) Do the required clock fixups | 2163 | 3) Do the required clock fixups |
2165 | 4) Get all files & network connections for the process back into an identical state ( really difficult ). | 2164 | 4) Get all files & network connections for the process back into an identical state ( really difficult ). |
2166 | 5) A few more difficult things I haven't thought of. | 2165 | 5) A few more difficult things I haven't thought of. |
2167 | 2166 | ||
2168 | 2167 | ||
2169 | 2168 | ||
2170 | Why have I never seen one ?. | 2169 | Why have I never seen one ?. |
2171 | Probably because you haven't used the command | 2170 | Probably because you haven't used the command |
2172 | ulimit -c unlimited in bash | 2171 | ulimit -c unlimited in bash |
2173 | to allow core dumps, now do | 2172 | to allow core dumps, now do |
2174 | ulimit -a | 2173 | ulimit -a |
2175 | to verify that the limit was accepted. | 2174 | to verify that the limit was accepted. |
2176 | 2175 | ||
2177 | A sample core dump | 2176 | A sample core dump |
2178 | To create this I'm going to do | 2177 | To create this I'm going to do |
2179 | ulimit -c unlimited | 2178 | ulimit -c unlimited |
2180 | gdb | 2179 | gdb |
2181 | to launch gdb (my victim app. ) now be bad & do the following from another | 2180 | to launch gdb (my victim app. ) now be bad & do the following from another |
2182 | telnet/xterm session to the same machine | 2181 | telnet/xterm session to the same machine |
2183 | ps -aux | grep gdb | 2182 | ps -aux | grep gdb |
2184 | kill -SIGSEGV <gdb's pid> | 2183 | kill -SIGSEGV <gdb's pid> |
2185 | or alternatively use killall -SIGSEGV gdb if you have the killall command. | 2184 | or alternatively use killall -SIGSEGV gdb if you have the killall command. |
2186 | Now look at the core dump. | 2185 | Now look at the core dump. |
2187 | ./gdb ./gdb core | 2186 | ./gdb core |
2188 | Displays the following | 2187 | Displays the following |
2189 | GNU gdb 4.18 | 2188 | GNU gdb 4.18 |
2190 | Copyright 1998 Free Software Foundation, Inc. | 2189 | Copyright 1998 Free Software Foundation, Inc. |
2191 | GDB is free software, covered by the GNU General Public License, and you are | 2190 | GDB is free software, covered by the GNU General Public License, and you are |
2192 | welcome to change it and/or distribute copies of it under certain conditions. | 2191 | welcome to change it and/or distribute copies of it under certain conditions. |
2193 | Type "show copying" to see the conditions. | 2192 | Type "show copying" to see the conditions. |
2194 | There is absolutely no warranty for GDB. Type "show warranty" for details. | 2193 | There is absolutely no warranty for GDB. Type "show warranty" for details. |
2195 | This GDB was configured as "s390-ibm-linux"... | 2194 | This GDB was configured as "s390-ibm-linux"... |
2196 | Core was generated by `./gdb'. | 2195 | Core was generated by `./gdb'. |
2197 | Program terminated with signal 11, Segmentation fault. | 2196 | Program terminated with signal 11, Segmentation fault. |
2198 | Reading symbols from /usr/lib/libncurses.so.4...done. | 2197 | Reading symbols from /usr/lib/libncurses.so.4...done. |
2199 | Reading symbols from /lib/libm.so.6...done. | 2198 | Reading symbols from /lib/libm.so.6...done. |
2200 | Reading symbols from /lib/libc.so.6...done. | 2199 | Reading symbols from /lib/libc.so.6...done. |
2201 | Reading symbols from /lib/ld-linux.so.2...done. | 2200 | Reading symbols from /lib/ld-linux.so.2...done. |
2202 | #0 0x40126d1a in read () from /lib/libc.so.6 | 2201 | #0 0x40126d1a in read () from /lib/libc.so.6 |
2203 | Setting up the environment for debugging gdb. | 2202 | Setting up the environment for debugging gdb. |
2204 | Breakpoint 1 at 0x4dc6f8: file utils.c, line 471. | 2203 | Breakpoint 1 at 0x4dc6f8: file utils.c, line 471. |
2205 | Breakpoint 2 at 0x4d87a4: file top.c, line 2609. | 2204 | Breakpoint 2 at 0x4d87a4: file top.c, line 2609. |
2206 | (top-gdb) info stack | 2205 | (top-gdb) info stack |
2207 | #0 0x40126d1a in read () from /lib/libc.so.6 | 2206 | #0 0x40126d1a in read () from /lib/libc.so.6 |
2208 | #1 0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402 | 2207 | #1 0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402 |
2209 | #2 0x528ed0 in rl_read_key () at input.c:381 | 2208 | #2 0x528ed0 in rl_read_key () at input.c:381 |
2210 | #3 0x5167e6 in readline_internal_char () at readline.c:454 | 2209 | #3 0x5167e6 in readline_internal_char () at readline.c:454 |
2211 | #4 0x5168ee in readline_internal_charloop () at readline.c:507 | 2210 | #4 0x5168ee in readline_internal_charloop () at readline.c:507 |
2212 | #5 0x51692c in readline_internal () at readline.c:521 | 2211 | #5 0x51692c in readline_internal () at readline.c:521 |
2213 | #6 0x5164fe in readline (prompt=0x7ffff810 "\177ยรฟยรธx\177ยรฟยรทยร\177ยรฟยรธxยร") | 2212 | #6 0x5164fe in readline (prompt=0x7ffff810 "\177ยรฟยรธx\177ยรฟยรทยร\177ยรฟยรธxยร") |
2214 | at readline.c:349 | 2213 | at readline.c:349 |
2215 | #7 0x4d7a8a in command_line_input (prrompt=0x564420 "(gdb) ", repeat=1, | 2214 | #7 0x4d7a8a in command_line_input (prrompt=0x564420 "(gdb) ", repeat=1, |
2216 | annotation_suffix=0x4d6b44 "prompt") at top.c:2091 | 2215 | annotation_suffix=0x4d6b44 "prompt") at top.c:2091 |
2217 | #8 0x4d6cf0 in command_loop () at top.c:1345 | 2216 | #8 0x4d6cf0 in command_loop () at top.c:1345 |
2218 | #9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635 | 2217 | #9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635 |
2219 | 2218 | ||
2220 | 2219 | ||
2221 | LDD | 2220 | LDD |
2222 | === | 2221 | === |
2223 | This is a program which lists the shared libraries which a library needs, | 2222 | This is a program which lists the shared libraries which a library needs, |
2224 | Note you also get the relocations of the shared library text segments which | 2223 | Note you also get the relocations of the shared library text segments which |
2225 | help when using objdump --source. | 2224 | help when using objdump --source. |
2226 | e.g. | 2225 | e.g. |
2227 | ldd ./gdb | 2226 | ldd ./gdb |
2228 | outputs | 2227 | outputs |
2229 | libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000) | 2228 | libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000) |
2230 | libm.so.6 => /lib/libm.so.6 (0x4005e000) | 2229 | libm.so.6 => /lib/libm.so.6 (0x4005e000) |
2231 | libc.so.6 => /lib/libc.so.6 (0x40084000) | 2230 | libc.so.6 => /lib/libc.so.6 (0x40084000) |
2232 | /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) | 2231 | /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) |
2233 | 2232 | ||
2234 | 2233 | ||
2235 | Debugging shared libraries | 2234 | Debugging shared libraries |
2236 | ========================== | 2235 | ========================== |
2237 | Most programs use shared libraries, however it can be very painful | 2236 | Most programs use shared libraries, however it can be very painful |
2238 | when you single step instruction into a function like printf for the | 2237 | when you single step instruction into a function like printf for the |
2239 | first time & you end up in functions like _dl_runtime_resolve this is | 2238 | first time & you end up in functions like _dl_runtime_resolve this is |
2240 | the ld.so doing lazy binding, lazy binding is a concept in ELF where | 2239 | the ld.so doing lazy binding, lazy binding is a concept in ELF where |
2241 | shared library functions are not loaded into memory unless they are | 2240 | shared library functions are not loaded into memory unless they are |
2242 | actually used, great for saving memory but a pain to debug. | 2241 | actually used, great for saving memory but a pain to debug. |
2243 | To get around this either relink the program -static or exit gdb type | 2242 | To get around this either relink the program -static or exit gdb type |
2244 | export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing | 2243 | export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing |
2245 | the program in question. | 2244 | the program in question. |
2246 | 2245 | ||
2247 | 2246 | ||
2248 | 2247 | ||
2249 | Debugging modules | 2248 | Debugging modules |
2250 | ================= | 2249 | ================= |
2251 | As modules are dynamically loaded into the kernel their address can be | 2250 | As modules are dynamically loaded into the kernel their address can be |
2252 | anywhere to get around this use the -m option with insmod to emit a load | 2251 | anywhere to get around this use the -m option with insmod to emit a load |
2253 | map which can be piped into a file if required. | 2252 | map which can be piped into a file if required. |
2254 | 2253 | ||
2255 | The proc file system | 2254 | The proc file system |
2256 | ==================== | 2255 | ==================== |
2257 | What is it ?. | 2256 | What is it ?. |
2258 | It is a filesystem created by the kernel with files which are created on demand | 2257 | It is a filesystem created by the kernel with files which are created on demand |
2259 | by the kernel if read, or can be used to modify kernel parameters, | 2258 | by the kernel if read, or can be used to modify kernel parameters, |
2260 | it is a powerful concept. | 2259 | it is a powerful concept. |
2261 | 2260 | ||
2262 | e.g. | 2261 | e.g. |
2263 | 2262 | ||
2264 | cat /proc/sys/net/ipv4/ip_forward | 2263 | cat /proc/sys/net/ipv4/ip_forward |
2265 | On my machine outputs | 2264 | On my machine outputs |
2266 | 0 | 2265 | 0 |
2267 | telling me ip_forwarding is not on to switch it on I can do | 2266 | telling me ip_forwarding is not on to switch it on I can do |
2268 | echo 1 > /proc/sys/net/ipv4/ip_forward | 2267 | echo 1 > /proc/sys/net/ipv4/ip_forward |
2269 | cat it again | 2268 | cat it again |
2270 | cat /proc/sys/net/ipv4/ip_forward | 2269 | cat /proc/sys/net/ipv4/ip_forward |
2271 | On my machine now outputs | 2270 | On my machine now outputs |
2272 | 1 | 2271 | 1 |
2273 | IP forwarding is on. | 2272 | IP forwarding is on. |
2274 | There is a lot of useful info in here best found by going in & having a look around, | 2273 | There is a lot of useful info in here best found by going in & having a look around, |
2275 | so I'll take you through some entries I consider important. | 2274 | so I'll take you through some entries I consider important. |
2276 | 2275 | ||
2277 | All the processes running on the machine have there own entry defined by | 2276 | All the processes running on the machine have there own entry defined by |
2278 | /proc/<pid> | 2277 | /proc/<pid> |
2279 | So lets have a look at the init process | 2278 | So lets have a look at the init process |
2280 | cd /proc/1 | 2279 | cd /proc/1 |
2281 | 2280 | ||
2282 | cat cmdline | 2281 | cat cmdline |
2283 | emits | 2282 | emits |
2284 | init [2] | 2283 | init [2] |
2285 | 2284 | ||
2286 | cd /proc/1/fd | 2285 | cd /proc/1/fd |
2287 | This contains numerical entries of all the open files, | 2286 | This contains numerical entries of all the open files, |
2288 | some of these you can cat e.g. stdout (2) | 2287 | some of these you can cat e.g. stdout (2) |
2289 | 2288 | ||
2290 | cat /proc/29/maps | 2289 | cat /proc/29/maps |
2291 | on my machine emits | 2290 | on my machine emits |
2292 | 2291 | ||
2293 | 00400000-00478000 r-xp 00000000 5f:00 4103 /bin/bash | 2292 | 00400000-00478000 r-xp 00000000 5f:00 4103 /bin/bash |
2294 | 00478000-0047e000 rw-p 00077000 5f:00 4103 /bin/bash | 2293 | 00478000-0047e000 rw-p 00077000 5f:00 4103 /bin/bash |
2295 | 0047e000-00492000 rwxp 00000000 00:00 0 | 2294 | 0047e000-00492000 rwxp 00000000 00:00 0 |
2296 | 40000000-40015000 r-xp 00000000 5f:00 14382 /lib/ld-2.1.2.so | 2295 | 40000000-40015000 r-xp 00000000 5f:00 14382 /lib/ld-2.1.2.so |
2297 | 40015000-40016000 rw-p 00014000 5f:00 14382 /lib/ld-2.1.2.so | 2296 | 40015000-40016000 rw-p 00014000 5f:00 14382 /lib/ld-2.1.2.so |
2298 | 40016000-40017000 rwxp 00000000 00:00 0 | 2297 | 40016000-40017000 rwxp 00000000 00:00 0 |
2299 | 40017000-40018000 rw-p 00000000 00:00 0 | 2298 | 40017000-40018000 rw-p 00000000 00:00 0 |
2300 | 40018000-4001b000 r-xp 00000000 5f:00 14435 /lib/libtermcap.so.2.0.8 | 2299 | 40018000-4001b000 r-xp 00000000 5f:00 14435 /lib/libtermcap.so.2.0.8 |
2301 | 4001b000-4001c000 rw-p 00002000 5f:00 14435 /lib/libtermcap.so.2.0.8 | 2300 | 4001b000-4001c000 rw-p 00002000 5f:00 14435 /lib/libtermcap.so.2.0.8 |
2302 | 4001c000-4010d000 r-xp 00000000 5f:00 14387 /lib/libc-2.1.2.so | 2301 | 4001c000-4010d000 r-xp 00000000 5f:00 14387 /lib/libc-2.1.2.so |
2303 | 4010d000-40111000 rw-p 000f0000 5f:00 14387 /lib/libc-2.1.2.so | 2302 | 4010d000-40111000 rw-p 000f0000 5f:00 14387 /lib/libc-2.1.2.so |
2304 | 40111000-40114000 rw-p 00000000 00:00 0 | 2303 | 40111000-40114000 rw-p 00000000 00:00 0 |
2305 | 40114000-4011e000 r-xp 00000000 5f:00 14408 /lib/libnss_files-2.1.2.so | 2304 | 40114000-4011e000 r-xp 00000000 5f:00 14408 /lib/libnss_files-2.1.2.so |
2306 | 4011e000-4011f000 rw-p 00009000 5f:00 14408 /lib/libnss_files-2.1.2.so | 2305 | 4011e000-4011f000 rw-p 00009000 5f:00 14408 /lib/libnss_files-2.1.2.so |
2307 | 7fffd000-80000000 rwxp ffffe000 00:00 0 | 2306 | 7fffd000-80000000 rwxp ffffe000 00:00 0 |
2308 | 2307 | ||
2309 | 2308 | ||
2310 | Showing us the shared libraries init uses where they are in memory | 2309 | Showing us the shared libraries init uses where they are in memory |
2311 | & memory access permissions for each virtual memory area. | 2310 | & memory access permissions for each virtual memory area. |
2312 | 2311 | ||
2313 | /proc/1/cwd is a softlink to the current working directory. | 2312 | /proc/1/cwd is a softlink to the current working directory. |
2314 | /proc/1/root is the root of the filesystem for this process. | 2313 | /proc/1/root is the root of the filesystem for this process. |
2315 | 2314 | ||
2316 | /proc/1/mem is the current running processes memory which you | 2315 | /proc/1/mem is the current running processes memory which you |
2317 | can read & write to like a file. | 2316 | can read & write to like a file. |
2318 | strace uses this sometimes as it is a bit faster than the | 2317 | strace uses this sometimes as it is a bit faster than the |
2319 | rather inefficient ptrace interface for peeking at DATA. | 2318 | rather inefficient ptrace interface for peeking at DATA. |
2320 | 2319 | ||
2321 | 2320 | ||
2322 | cat status | 2321 | cat status |
2323 | 2322 | ||
2324 | Name: init | 2323 | Name: init |
2325 | State: S (sleeping) | 2324 | State: S (sleeping) |
2326 | Pid: 1 | 2325 | Pid: 1 |
2327 | PPid: 0 | 2326 | PPid: 0 |
2328 | Uid: 0 0 0 0 | 2327 | Uid: 0 0 0 0 |
2329 | Gid: 0 0 0 0 | 2328 | Gid: 0 0 0 0 |
2330 | Groups: | 2329 | Groups: |
2331 | VmSize: 408 kB | 2330 | VmSize: 408 kB |
2332 | VmLck: 0 kB | 2331 | VmLck: 0 kB |
2333 | VmRSS: 208 kB | 2332 | VmRSS: 208 kB |
2334 | VmData: 24 kB | 2333 | VmData: 24 kB |
2335 | VmStk: 8 kB | 2334 | VmStk: 8 kB |
2336 | VmExe: 368 kB | 2335 | VmExe: 368 kB |
2337 | VmLib: 0 kB | 2336 | VmLib: 0 kB |
2338 | SigPnd: 0000000000000000 | 2337 | SigPnd: 0000000000000000 |
2339 | SigBlk: 0000000000000000 | 2338 | SigBlk: 0000000000000000 |
2340 | SigIgn: 7fffffffd7f0d8fc | 2339 | SigIgn: 7fffffffd7f0d8fc |
2341 | SigCgt: 00000000280b2603 | 2340 | SigCgt: 00000000280b2603 |
2342 | CapInh: 00000000fffffeff | 2341 | CapInh: 00000000fffffeff |
2343 | CapPrm: 00000000ffffffff | 2342 | CapPrm: 00000000ffffffff |
2344 | CapEff: 00000000fffffeff | 2343 | CapEff: 00000000fffffeff |
2345 | 2344 | ||
2346 | User PSW: 070de000 80414146 | 2345 | User PSW: 070de000 80414146 |
2347 | task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68 | 2346 | task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68 |
2348 | User GPRS: | 2347 | User GPRS: |
2349 | 00000400 00000000 0000000b 7ffffa90 | 2348 | 00000400 00000000 0000000b 7ffffa90 |
2350 | 00000000 00000000 00000000 0045d9f4 | 2349 | 00000000 00000000 00000000 0045d9f4 |
2351 | 0045cafc 7ffffa90 7fffff18 0045cb08 | 2350 | 0045cafc 7ffffa90 7fffff18 0045cb08 |
2352 | 00010400 804039e8 80403af8 7ffff8b0 | 2351 | 00010400 804039e8 80403af8 7ffff8b0 |
2353 | User ACRS: | 2352 | User ACRS: |
2354 | 00000000 00000000 00000000 00000000 | 2353 | 00000000 00000000 00000000 00000000 |
2355 | 00000001 00000000 00000000 00000000 | 2354 | 00000001 00000000 00000000 00000000 |
2356 | 00000000 00000000 00000000 00000000 | 2355 | 00000000 00000000 00000000 00000000 |
2357 | 00000000 00000000 00000000 00000000 | 2356 | 00000000 00000000 00000000 00000000 |
2358 | Kernel BackChain CallChain BackChain CallChain | 2357 | Kernel BackChain CallChain BackChain CallChain |
2359 | 004b7ca8 8002bd0c 004b7d18 8002b92c | 2358 | 004b7ca8 8002bd0c 004b7d18 8002b92c |
2360 | 004b7db8 8005cd50 004b7e38 8005d12a | 2359 | 004b7db8 8005cd50 004b7e38 8005d12a |
2361 | 004b7f08 80019114 | 2360 | 004b7f08 80019114 |
2362 | Showing among other things memory usage & status of some signals & | 2361 | Showing among other things memory usage & status of some signals & |
2363 | the processes'es registers from the kernel task_structure | 2362 | the processes'es registers from the kernel task_structure |
2364 | as well as a backchain which may be useful if a process crashes | 2363 | as well as a backchain which may be useful if a process crashes |
2365 | in the kernel for some unknown reason. | 2364 | in the kernel for some unknown reason. |
2366 | 2365 | ||
2367 | Some driver debugging techniques | 2366 | Some driver debugging techniques |
2368 | ================================ | 2367 | ================================ |
2369 | debug feature | 2368 | debug feature |
2370 | ------------- | 2369 | ------------- |
2371 | Some of our drivers now support a "debug feature" in | 2370 | Some of our drivers now support a "debug feature" in |
2372 | /proc/s390dbf see s390dbf.txt in the linux/Documentation directory | 2371 | /proc/s390dbf see s390dbf.txt in the linux/Documentation directory |
2373 | for more info. | 2372 | for more info. |
2374 | e.g. | 2373 | e.g. |
2375 | to switch on the lcs "debug feature" | 2374 | to switch on the lcs "debug feature" |
2376 | echo 5 > /proc/s390dbf/lcs/level | 2375 | echo 5 > /proc/s390dbf/lcs/level |
2377 | & then after the error occurred. | 2376 | & then after the error occurred. |
2378 | cat /proc/s390dbf/lcs/sprintf >/logfile | 2377 | cat /proc/s390dbf/lcs/sprintf >/logfile |
2379 | the logfile now contains some information which may help | 2378 | the logfile now contains some information which may help |
2380 | tech support resolve a problem in the field. | 2379 | tech support resolve a problem in the field. |
2381 | 2380 | ||
2382 | 2381 | ||
2383 | 2382 | ||
2384 | high level debugging network drivers | 2383 | high level debugging network drivers |
2385 | ------------------------------------ | 2384 | ------------------------------------ |
2386 | ifconfig is a quite useful command | 2385 | ifconfig is a quite useful command |
2387 | it gives the current state of network drivers. | 2386 | it gives the current state of network drivers. |
2388 | 2387 | ||
2389 | If you suspect your network device driver is dead | 2388 | If you suspect your network device driver is dead |
2390 | one way to check is type | 2389 | one way to check is type |
2391 | ifconfig <network device> | 2390 | ifconfig <network device> |
2392 | e.g. tr0 | 2391 | e.g. tr0 |
2393 | You should see something like | 2392 | You should see something like |
2394 | tr0 Link encap:16/4 Mbps Token Ring (New) HWaddr 00:04:AC:20:8E:48 | 2393 | tr0 Link encap:16/4 Mbps Token Ring (New) HWaddr 00:04:AC:20:8E:48 |
2395 | inet addr:9.164.185.132 Bcast:9.164.191.255 Mask:255.255.224.0 | 2394 | inet addr:9.164.185.132 Bcast:9.164.191.255 Mask:255.255.224.0 |
2396 | UP BROADCAST RUNNING MULTICAST MTU:2000 Metric:1 | 2395 | UP BROADCAST RUNNING MULTICAST MTU:2000 Metric:1 |
2397 | RX packets:246134 errors:0 dropped:0 overruns:0 frame:0 | 2396 | RX packets:246134 errors:0 dropped:0 overruns:0 frame:0 |
2398 | TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 | 2397 | TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 |
2399 | collisions:0 txqueuelen:100 | 2398 | collisions:0 txqueuelen:100 |
2400 | 2399 | ||
2401 | if the device doesn't say up | 2400 | if the device doesn't say up |
2402 | try | 2401 | try |
2403 | /etc/rc.d/init.d/network start | 2402 | /etc/rc.d/init.d/network start |
2404 | ( this starts the network stack & hopefully calls ifconfig tr0 up ). | 2403 | ( this starts the network stack & hopefully calls ifconfig tr0 up ). |
2405 | ifconfig looks at the output of /proc/net/dev & presents it in a more presentable form | 2404 | ifconfig looks at the output of /proc/net/dev & presents it in a more presentable form |
2406 | Now ping the device from a machine in the same subnet. | 2405 | Now ping the device from a machine in the same subnet. |
2407 | if the RX packets count & TX packets counts don't increment you probably | 2406 | if the RX packets count & TX packets counts don't increment you probably |
2408 | have problems. | 2407 | have problems. |
2409 | next | 2408 | next |
2410 | cat /proc/net/arp | 2409 | cat /proc/net/arp |
2411 | Do you see any hardware addresses in the cache if not you may have problems. | 2410 | Do you see any hardware addresses in the cache if not you may have problems. |
2412 | Next try | 2411 | Next try |
2413 | ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of | 2412 | ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of |
2414 | ifconfig. Do you see any replies from machines other than the local machine | 2413 | ifconfig. Do you see any replies from machines other than the local machine |
2415 | if not you may have problems. also if the TX packets count in ifconfig | 2414 | if not you may have problems. also if the TX packets count in ifconfig |
2416 | hasn't incremented either you have serious problems in your driver | 2415 | hasn't incremented either you have serious problems in your driver |
2417 | (e.g. the txbusy field of the network device being stuck on ) | 2416 | (e.g. the txbusy field of the network device being stuck on ) |
2418 | or you may have multiple network devices connected. | 2417 | or you may have multiple network devices connected. |
2419 | 2418 | ||
2420 | 2419 | ||
2421 | chandev | 2420 | chandev |
2422 | ------- | 2421 | ------- |
2423 | There is a new device layer for channel devices, some | 2422 | There is a new device layer for channel devices, some |
2424 | drivers e.g. lcs are registered with this layer. | 2423 | drivers e.g. lcs are registered with this layer. |
2425 | If the device uses the channel device layer you'll be | 2424 | If the device uses the channel device layer you'll be |
2426 | able to find what interrupts it uses & the current state | 2425 | able to find what interrupts it uses & the current state |
2427 | of the device. | 2426 | of the device. |
2428 | See the manpage chandev.8 &type cat /proc/chandev for more info. | 2427 | See the manpage chandev.8 &type cat /proc/chandev for more info. |
2429 | 2428 | ||
2430 | 2429 | ||
2431 | 2430 | ||
2432 | Starting points for debugging scripting languages etc. | 2431 | Starting points for debugging scripting languages etc. |
2433 | ====================================================== | 2432 | ====================================================== |
2434 | 2433 | ||
2435 | bash/sh | 2434 | bash/sh |
2436 | 2435 | ||
2437 | bash -x <scriptname> | 2436 | bash -x <scriptname> |
2438 | e.g. bash -x /usr/bin/bashbug | 2437 | e.g. bash -x /usr/bin/bashbug |
2439 | displays the following lines as it executes them. | 2438 | displays the following lines as it executes them. |
2440 | + MACHINE=i586 | 2439 | + MACHINE=i586 |
2441 | + OS=linux-gnu | 2440 | + OS=linux-gnu |
2442 | + CC=gcc | 2441 | + CC=gcc |
2443 | + CFLAGS= -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./lib -O2 -pipe | 2442 | + CFLAGS= -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./lib -O2 -pipe |
2444 | + RELEASE=2.01 | 2443 | + RELEASE=2.01 |
2445 | + PATCHLEVEL=1 | 2444 | + PATCHLEVEL=1 |
2446 | + RELSTATUS=release | 2445 | + RELSTATUS=release |
2447 | + MACHTYPE=i586-pc-linux-gnu | 2446 | + MACHTYPE=i586-pc-linux-gnu |
2448 | 2447 | ||
2449 | perl -d <scriptname> runs the perlscript in a fully interactive debugger | 2448 | perl -d <scriptname> runs the perlscript in a fully interactive debugger |
2450 | <like gdb>. | 2449 | <like gdb>. |
2451 | Type 'h' in the debugger for help. | 2450 | Type 'h' in the debugger for help. |
2452 | 2451 | ||
2453 | for debugging java type | 2452 | for debugging java type |
2454 | jdb <filename> another fully interactive gdb style debugger. | 2453 | jdb <filename> another fully interactive gdb style debugger. |
2455 | & type ? in the debugger for help. | 2454 | & type ? in the debugger for help. |
2456 | 2455 | ||
2457 | 2456 | ||
2458 | 2457 | ||
2459 | Dumptool & Lcrash ( lkcd ) | 2458 | Dumptool & Lcrash ( lkcd ) |
2460 | ========================== | 2459 | ========================== |
2461 | Michael Holzheu & others here at IBM have a fairly mature port of | 2460 | Michael Holzheu & others here at IBM have a fairly mature port of |
2462 | SGI's lcrash tool which allows one to look at kernel structures in a | 2461 | SGI's lcrash tool which allows one to look at kernel structures in a |
2463 | running kernel. | 2462 | running kernel. |
2464 | 2463 | ||
2465 | It also complements a tool called dumptool which dumps all the kernel's | 2464 | It also complements a tool called dumptool which dumps all the kernel's |
2466 | memory pages & registers to either a tape or a disk. | 2465 | memory pages & registers to either a tape or a disk. |
2467 | This can be used by tech support or an ambitious end user do | 2466 | This can be used by tech support or an ambitious end user do |
2468 | post mortem debugging of a machine like gdb core dumps. | 2467 | post mortem debugging of a machine like gdb core dumps. |
2469 | 2468 | ||
2470 | Going into how to use this tool in detail will be explained | 2469 | Going into how to use this tool in detail will be explained |
2471 | in other documentation supplied by IBM with the patches & the | 2470 | in other documentation supplied by IBM with the patches & the |
2472 | lcrash homepage http://oss.sgi.com/projects/lkcd/ & the lcrash manpage. | 2471 | lcrash homepage http://oss.sgi.com/projects/lkcd/ & the lcrash manpage. |
2473 | 2472 | ||
2474 | How they work | 2473 | How they work |
2475 | ------------- | 2474 | ------------- |
2476 | Lcrash is a perfectly normal program,however, it requires 2 | 2475 | Lcrash is a perfectly normal program,however, it requires 2 |
2477 | additional files, Kerntypes which is built using a patch to the | 2476 | additional files, Kerntypes which is built using a patch to the |
2478 | linux kernel sources in the linux root directory & the System.map. | 2477 | linux kernel sources in the linux root directory & the System.map. |
2479 | 2478 | ||
2480 | Kerntypes is an an objectfile whose sole purpose in life | 2479 | Kerntypes is an objectfile whose sole purpose in life |
2481 | is to provide stabs debug info to lcrash, to do this | 2480 | is to provide stabs debug info to lcrash, to do this |
2482 | Kerntypes is built from kerntypes.c which just includes the most commonly | 2481 | Kerntypes is built from kerntypes.c which just includes the most commonly |
2483 | referenced header files used when debugging, lcrash can then read the | 2482 | referenced header files used when debugging, lcrash can then read the |
2484 | .stabs section of this file. | 2483 | .stabs section of this file. |
2485 | 2484 | ||
2486 | Debugging a live system it uses /dev/mem | 2485 | Debugging a live system it uses /dev/mem |
2487 | alternatively for post mortem debugging it uses the data | 2486 | alternatively for post mortem debugging it uses the data |
2488 | collected by dumptool. | 2487 | collected by dumptool. |
2489 | 2488 | ||
2490 | 2489 | ||
2491 | 2490 | ||
2492 | SysRq | 2491 | SysRq |
2493 | ===== | 2492 | ===== |
2494 | This is now supported by linux for s/390 & z/Architecture. | 2493 | This is now supported by linux for s/390 & z/Architecture. |
2495 | To enable it do compile the kernel with | 2494 | To enable it do compile the kernel with |
2496 | Kernel Hacking -> Magic SysRq Key Enabled | 2495 | Kernel Hacking -> Magic SysRq Key Enabled |
2497 | echo "1" > /proc/sys/kernel/sysrq | 2496 | echo "1" > /proc/sys/kernel/sysrq |
2498 | also type | 2497 | also type |
2499 | echo "8" >/proc/sys/kernel/printk | 2498 | echo "8" >/proc/sys/kernel/printk |
2500 | To make printk output go to console. | 2499 | To make printk output go to console. |
2501 | On 390 all commands are prefixed with | 2500 | On 390 all commands are prefixed with |
2502 | ^- | 2501 | ^- |
2503 | e.g. | 2502 | e.g. |
2504 | ^-t will show tasks. | 2503 | ^-t will show tasks. |
2505 | ^-? or some unknown command will display help. | 2504 | ^-? or some unknown command will display help. |
2506 | The sysrq key reading is very picky ( I have to type the keys in an | 2505 | The sysrq key reading is very picky ( I have to type the keys in an |
2507 | xterm session & paste them into the x3270 console ) | 2506 | xterm session & paste them into the x3270 console ) |
2508 | & it may be wise to predefine the keys as described in the VM hints above | 2507 | & it may be wise to predefine the keys as described in the VM hints above |
2509 | 2508 | ||
2510 | This is particularly useful for syncing disks unmounting & rebooting | 2509 | This is particularly useful for syncing disks unmounting & rebooting |
2511 | if the machine gets partially hung. | 2510 | if the machine gets partially hung. |
2512 | 2511 | ||
2513 | Read Documentation/sysrq.txt for more info | 2512 | Read Documentation/sysrq.txt for more info |
2514 | 2513 | ||
2515 | References: | 2514 | References: |
2516 | =========== | 2515 | =========== |
2517 | Enterprise Systems Architecture Reference Summary | 2516 | Enterprise Systems Architecture Reference Summary |
2518 | Enterprise Systems Architecture Principles of Operation | 2517 | Enterprise Systems Architecture Principles of Operation |
2519 | Hartmut Penners s390 stack frame sheet. | 2518 | Hartmut Penners s390 stack frame sheet. |
2520 | IBM Mainframe Channel Attachment a technology brief from a CISCO webpage | 2519 | IBM Mainframe Channel Attachment a technology brief from a CISCO webpage |
2521 | Various bits of man & info pages of Linux. | 2520 | Various bits of man & info pages of Linux. |
2522 | Linux & GDB source. | 2521 | Linux & GDB source. |
2523 | Various info & man pages. | 2522 | Various info & man pages. |
2524 | CMS Help on tracing commands. | 2523 | CMS Help on tracing commands. |
2525 | Linux for s/390 Elf Application Binary Interface | 2524 | Linux for s/390 Elf Application Binary Interface |
2526 | Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended ) | 2525 | Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended ) |
2527 | z/Architecture Principles of Operation SA22-7832-00 | 2526 | z/Architecture Principles of Operation SA22-7832-00 |
2528 | Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the | 2527 | Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the |
2529 | Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05 | 2528 | Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05 |
2530 | 2529 | ||
2531 | Special Thanks | 2530 | Special Thanks |
2532 | ============== | 2531 | ============== |
2533 | Special thanks to Neale Ferguson who maintains a much | 2532 | Special thanks to Neale Ferguson who maintains a much |
2534 | prettier HTML version of this page at | 2533 | prettier HTML version of this page at |
2535 | http://penguinvm.princeton.edu/notes.html#Debug390 | 2534 | http://penguinvm.princeton.edu/notes.html#Debug390 |
2536 | Bob Grainger Stefan Bader & others for reporting bugs | 2535 | Bob Grainger Stefan Bader & others for reporting bugs |
2537 | 2536 |
Documentation/s390/s390dbf.txt
1 | S390 Debug Feature | 1 | S390 Debug Feature |
2 | ================== | 2 | ================== |
3 | 3 | ||
4 | files: arch/s390/kernel/debug.c | 4 | files: arch/s390/kernel/debug.c |
5 | include/asm-s390/debug.h | 5 | include/asm-s390/debug.h |
6 | 6 | ||
7 | Description: | 7 | Description: |
8 | ------------ | 8 | ------------ |
9 | The goal of this feature is to provide a kernel debug logging API | 9 | The goal of this feature is to provide a kernel debug logging API |
10 | where log records can be stored efficiently in memory, where each component | 10 | where log records can be stored efficiently in memory, where each component |
11 | (e.g. device drivers) can have one separate debug log. | 11 | (e.g. device drivers) can have one separate debug log. |
12 | One purpose of this is to inspect the debug logs after a production system crash | 12 | One purpose of this is to inspect the debug logs after a production system crash |
13 | in order to analyze the reason for the crash. | 13 | in order to analyze the reason for the crash. |
14 | If the system still runs but only a subcomponent which uses dbf fails, | 14 | If the system still runs but only a subcomponent which uses dbf fails, |
15 | it is possible to look at the debug logs on a live system via the Linux | 15 | it is possible to look at the debug logs on a live system via the Linux |
16 | debugfs filesystem. | 16 | debugfs filesystem. |
17 | The debug feature may also very useful for kernel and driver development. | 17 | The debug feature may also very useful for kernel and driver development. |
18 | 18 | ||
19 | Design: | 19 | Design: |
20 | ------- | 20 | ------- |
21 | Kernel components (e.g. device drivers) can register themselves at the debug | 21 | Kernel components (e.g. device drivers) can register themselves at the debug |
22 | feature with the function call debug_register(). This function initializes a | 22 | feature with the function call debug_register(). This function initializes a |
23 | debug log for the caller. For each debug log exists a number of debug areas | 23 | debug log for the caller. For each debug log exists a number of debug areas |
24 | where exactly one is active at one time. Each debug area consists of contiguous | 24 | where exactly one is active at one time. Each debug area consists of contiguous |
25 | pages in memory. In the debug areas there are stored debug entries (log records) | 25 | pages in memory. In the debug areas there are stored debug entries (log records) |
26 | which are written by event- and exception-calls. | 26 | which are written by event- and exception-calls. |
27 | 27 | ||
28 | An event-call writes the specified debug entry to the active debug | 28 | An event-call writes the specified debug entry to the active debug |
29 | area and updates the log pointer for the active area. If the end | 29 | area and updates the log pointer for the active area. If the end |
30 | of the active debug area is reached, a wrap around is done (ring buffer) | 30 | of the active debug area is reached, a wrap around is done (ring buffer) |
31 | and the next debug entry will be written at the beginning of the active | 31 | and the next debug entry will be written at the beginning of the active |
32 | debug area. | 32 | debug area. |
33 | 33 | ||
34 | An exception-call writes the specified debug entry to the log and | 34 | An exception-call writes the specified debug entry to the log and |
35 | switches to the next debug area. This is done in order to be sure | 35 | switches to the next debug area. This is done in order to be sure |
36 | that the records which describe the origin of the exception are not | 36 | that the records which describe the origin of the exception are not |
37 | overwritten when a wrap around for the current area occurs. | 37 | overwritten when a wrap around for the current area occurs. |
38 | 38 | ||
39 | The debug areas itselve are also ordered in form of a ring buffer. | 39 | The debug areas itselve are also ordered in form of a ring buffer. |
40 | When an exception is thrown in the last debug area, the following debug | 40 | When an exception is thrown in the last debug area, the following debug |
41 | entries are then written again in the very first area. | 41 | entries are then written again in the very first area. |
42 | 42 | ||
43 | There are three versions for the event- and exception-calls: One for | 43 | There are three versions for the event- and exception-calls: One for |
44 | logging raw data, one for text and one for numbers. | 44 | logging raw data, one for text and one for numbers. |
45 | 45 | ||
46 | Each debug entry contains the following data: | 46 | Each debug entry contains the following data: |
47 | 47 | ||
48 | - Timestamp | 48 | - Timestamp |
49 | - Cpu-Number of calling task | 49 | - Cpu-Number of calling task |
50 | - Level of debug entry (0...6) | 50 | - Level of debug entry (0...6) |
51 | - Return Address to caller | 51 | - Return Address to caller |
52 | - Flag, if entry is an exception or not | 52 | - Flag, if entry is an exception or not |
53 | 53 | ||
54 | The debug logs can be inspected in a live system through entries in | 54 | The debug logs can be inspected in a live system through entries in |
55 | the debugfs-filesystem. Under the toplevel directory "s390dbf" there is | 55 | the debugfs-filesystem. Under the toplevel directory "s390dbf" there is |
56 | a directory for each registered component, which is named like the | 56 | a directory for each registered component, which is named like the |
57 | corresponding component. The debugfs normally should be mounted to | 57 | corresponding component. The debugfs normally should be mounted to |
58 | /sys/kernel/debug therefore the debug feature can be accessed unter | 58 | /sys/kernel/debug therefore the debug feature can be accessed unter |
59 | /sys/kernel/debug/s390dbf. | 59 | /sys/kernel/debug/s390dbf. |
60 | 60 | ||
61 | The content of the directories are files which represent different views | 61 | The content of the directories are files which represent different views |
62 | to the debug log. Each component can decide which views should be | 62 | to the debug log. Each component can decide which views should be |
63 | used through registering them with the function debug_register_view(). | 63 | used through registering them with the function debug_register_view(). |
64 | Predefined views for hex/ascii, sprintf and raw binary data are provided. | 64 | Predefined views for hex/ascii, sprintf and raw binary data are provided. |
65 | It is also possible to define other views. The content of | 65 | It is also possible to define other views. The content of |
66 | a view can be inspected simply by reading the corresponding debugfs file. | 66 | a view can be inspected simply by reading the corresponding debugfs file. |
67 | 67 | ||
68 | All debug logs have an an actual debug level (range from 0 to 6). | 68 | All debug logs have an actual debug level (range from 0 to 6). |
69 | The default level is 3. Event and Exception functions have a 'level' | 69 | The default level is 3. Event and Exception functions have a 'level' |
70 | parameter. Only debug entries with a level that is lower or equal | 70 | parameter. Only debug entries with a level that is lower or equal |
71 | than the actual level are written to the log. This means, when | 71 | than the actual level are written to the log. This means, when |
72 | writing events, high priority log entries should have a low level | 72 | writing events, high priority log entries should have a low level |
73 | value whereas low priority entries should have a high one. | 73 | value whereas low priority entries should have a high one. |
74 | The actual debug level can be changed with the help of the debugfs-filesystem | 74 | The actual debug level can be changed with the help of the debugfs-filesystem |
75 | through writing a number string "x" to the 'level' debugfs file which is | 75 | through writing a number string "x" to the 'level' debugfs file which is |
76 | provided for every debug log. Debugging can be switched off completely | 76 | provided for every debug log. Debugging can be switched off completely |
77 | by using "-" on the 'level' debugfs file. | 77 | by using "-" on the 'level' debugfs file. |
78 | 78 | ||
79 | Example: | 79 | Example: |
80 | 80 | ||
81 | > echo "-" > /sys/kernel/debug/s390dbf/dasd/level | 81 | > echo "-" > /sys/kernel/debug/s390dbf/dasd/level |
82 | 82 | ||
83 | It is also possible to deactivate the debug feature globally for every | 83 | It is also possible to deactivate the debug feature globally for every |
84 | debug log. You can change the behavior using 2 sysctl parameters in | 84 | debug log. You can change the behavior using 2 sysctl parameters in |
85 | /proc/sys/s390dbf: | 85 | /proc/sys/s390dbf: |
86 | There are currently 2 possible triggers, which stop the debug feature | 86 | There are currently 2 possible triggers, which stop the debug feature |
87 | globally. The first possibility is to use the "debug_active" sysctl. If | 87 | globally. The first possibility is to use the "debug_active" sysctl. If |
88 | set to 1 the debug feature is running. If "debug_active" is set to 0 the | 88 | set to 1 the debug feature is running. If "debug_active" is set to 0 the |
89 | debug feature is turned off. | 89 | debug feature is turned off. |
90 | The second trigger which stops the debug feature is an kernel oops. | 90 | The second trigger which stops the debug feature is an kernel oops. |
91 | That prevents the debug feature from overwriting debug information that | 91 | That prevents the debug feature from overwriting debug information that |
92 | happened before the oops. After an oops you can reactivate the debug feature | 92 | happened before the oops. After an oops you can reactivate the debug feature |
93 | by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not | 93 | by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not |
94 | suggested to use an oopsed kernel in an production environment. | 94 | suggested to use an oopsed kernel in an production environment. |
95 | If you want to disallow the deactivation of the debug feature, you can use | 95 | If you want to disallow the deactivation of the debug feature, you can use |
96 | the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug | 96 | the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug |
97 | feature cannot be stopped. If the debug feature is already stopped, it | 97 | feature cannot be stopped. If the debug feature is already stopped, it |
98 | will stay deactivated. | 98 | will stay deactivated. |
99 | 99 | ||
100 | Kernel Interfaces: | 100 | Kernel Interfaces: |
101 | ------------------ | 101 | ------------------ |
102 | 102 | ||
103 | ---------------------------------------------------------------------------- | 103 | ---------------------------------------------------------------------------- |
104 | debug_info_t *debug_register(char *name, int pages, int nr_areas, | 104 | debug_info_t *debug_register(char *name, int pages, int nr_areas, |
105 | int buf_size); | 105 | int buf_size); |
106 | 106 | ||
107 | Parameter: name: Name of debug log (e.g. used for debugfs entry) | 107 | Parameter: name: Name of debug log (e.g. used for debugfs entry) |
108 | pages: number of pages, which will be allocated per area | 108 | pages: number of pages, which will be allocated per area |
109 | nr_areas: number of debug areas | 109 | nr_areas: number of debug areas |
110 | buf_size: size of data area in each debug entry | 110 | buf_size: size of data area in each debug entry |
111 | 111 | ||
112 | Return Value: Handle for generated debug area | 112 | Return Value: Handle for generated debug area |
113 | NULL if register failed | 113 | NULL if register failed |
114 | 114 | ||
115 | Description: Allocates memory for a debug log | 115 | Description: Allocates memory for a debug log |
116 | Must not be called within an interrupt handler | 116 | Must not be called within an interrupt handler |
117 | 117 | ||
118 | --------------------------------------------------------------------------- | 118 | --------------------------------------------------------------------------- |
119 | void debug_unregister (debug_info_t * id); | 119 | void debug_unregister (debug_info_t * id); |
120 | 120 | ||
121 | Parameter: id: handle for debug log | 121 | Parameter: id: handle for debug log |
122 | 122 | ||
123 | Return Value: none | 123 | Return Value: none |
124 | 124 | ||
125 | Description: frees memory for a debug log | 125 | Description: frees memory for a debug log |
126 | Must not be called within an interrupt handler | 126 | Must not be called within an interrupt handler |
127 | 127 | ||
128 | --------------------------------------------------------------------------- | 128 | --------------------------------------------------------------------------- |
129 | void debug_set_level (debug_info_t * id, int new_level); | 129 | void debug_set_level (debug_info_t * id, int new_level); |
130 | 130 | ||
131 | Parameter: id: handle for debug log | 131 | Parameter: id: handle for debug log |
132 | new_level: new debug level | 132 | new_level: new debug level |
133 | 133 | ||
134 | Return Value: none | 134 | Return Value: none |
135 | 135 | ||
136 | Description: Sets new actual debug level if new_level is valid. | 136 | Description: Sets new actual debug level if new_level is valid. |
137 | 137 | ||
138 | --------------------------------------------------------------------------- | 138 | --------------------------------------------------------------------------- |
139 | void debug_stop_all(void); | 139 | void debug_stop_all(void); |
140 | 140 | ||
141 | Parameter: none | 141 | Parameter: none |
142 | 142 | ||
143 | Return Value: none | 143 | Return Value: none |
144 | 144 | ||
145 | Description: stops the debug feature if stopping is allowed. Currently | 145 | Description: stops the debug feature if stopping is allowed. Currently |
146 | used in case of a kernel oops. | 146 | used in case of a kernel oops. |
147 | 147 | ||
148 | --------------------------------------------------------------------------- | 148 | --------------------------------------------------------------------------- |
149 | debug_entry_t* debug_event (debug_info_t* id, int level, void* data, | 149 | debug_entry_t* debug_event (debug_info_t* id, int level, void* data, |
150 | int length); | 150 | int length); |
151 | 151 | ||
152 | Parameter: id: handle for debug log | 152 | Parameter: id: handle for debug log |
153 | level: debug level | 153 | level: debug level |
154 | data: pointer to data for debug entry | 154 | data: pointer to data for debug entry |
155 | length: length of data in bytes | 155 | length: length of data in bytes |
156 | 156 | ||
157 | Return Value: Address of written debug entry | 157 | Return Value: Address of written debug entry |
158 | 158 | ||
159 | Description: writes debug entry to active debug area (if level <= actual | 159 | Description: writes debug entry to active debug area (if level <= actual |
160 | debug level) | 160 | debug level) |
161 | 161 | ||
162 | --------------------------------------------------------------------------- | 162 | --------------------------------------------------------------------------- |
163 | debug_entry_t* debug_int_event (debug_info_t * id, int level, | 163 | debug_entry_t* debug_int_event (debug_info_t * id, int level, |
164 | unsigned int data); | 164 | unsigned int data); |
165 | debug_entry_t* debug_long_event(debug_info_t * id, int level, | 165 | debug_entry_t* debug_long_event(debug_info_t * id, int level, |
166 | unsigned long data); | 166 | unsigned long data); |
167 | 167 | ||
168 | Parameter: id: handle for debug log | 168 | Parameter: id: handle for debug log |
169 | level: debug level | 169 | level: debug level |
170 | data: integer value for debug entry | 170 | data: integer value for debug entry |
171 | 171 | ||
172 | Return Value: Address of written debug entry | 172 | Return Value: Address of written debug entry |
173 | 173 | ||
174 | Description: writes debug entry to active debug area (if level <= actual | 174 | Description: writes debug entry to active debug area (if level <= actual |
175 | debug level) | 175 | debug level) |
176 | 176 | ||
177 | --------------------------------------------------------------------------- | 177 | --------------------------------------------------------------------------- |
178 | debug_entry_t* debug_text_event (debug_info_t * id, int level, | 178 | debug_entry_t* debug_text_event (debug_info_t * id, int level, |
179 | const char* data); | 179 | const char* data); |
180 | 180 | ||
181 | Parameter: id: handle for debug log | 181 | Parameter: id: handle for debug log |
182 | level: debug level | 182 | level: debug level |
183 | data: string for debug entry | 183 | data: string for debug entry |
184 | 184 | ||
185 | Return Value: Address of written debug entry | 185 | Return Value: Address of written debug entry |
186 | 186 | ||
187 | Description: writes debug entry in ascii format to active debug area | 187 | Description: writes debug entry in ascii format to active debug area |
188 | (if level <= actual debug level) | 188 | (if level <= actual debug level) |
189 | 189 | ||
190 | --------------------------------------------------------------------------- | 190 | --------------------------------------------------------------------------- |
191 | debug_entry_t* debug_sprintf_event (debug_info_t * id, int level, | 191 | debug_entry_t* debug_sprintf_event (debug_info_t * id, int level, |
192 | char* string,...); | 192 | char* string,...); |
193 | 193 | ||
194 | Parameter: id: handle for debug log | 194 | Parameter: id: handle for debug log |
195 | level: debug level | 195 | level: debug level |
196 | string: format string for debug entry | 196 | string: format string for debug entry |
197 | ...: varargs used as in sprintf() | 197 | ...: varargs used as in sprintf() |
198 | 198 | ||
199 | Return Value: Address of written debug entry | 199 | Return Value: Address of written debug entry |
200 | 200 | ||
201 | Description: writes debug entry with format string and varargs (longs) to | 201 | Description: writes debug entry with format string and varargs (longs) to |
202 | active debug area (if level $<=$ actual debug level). | 202 | active debug area (if level $<=$ actual debug level). |
203 | floats and long long datatypes cannot be used as varargs. | 203 | floats and long long datatypes cannot be used as varargs. |
204 | 204 | ||
205 | --------------------------------------------------------------------------- | 205 | --------------------------------------------------------------------------- |
206 | 206 | ||
207 | debug_entry_t* debug_exception (debug_info_t* id, int level, void* data, | 207 | debug_entry_t* debug_exception (debug_info_t* id, int level, void* data, |
208 | int length); | 208 | int length); |
209 | 209 | ||
210 | Parameter: id: handle for debug log | 210 | Parameter: id: handle for debug log |
211 | level: debug level | 211 | level: debug level |
212 | data: pointer to data for debug entry | 212 | data: pointer to data for debug entry |
213 | length: length of data in bytes | 213 | length: length of data in bytes |
214 | 214 | ||
215 | Return Value: Address of written debug entry | 215 | Return Value: Address of written debug entry |
216 | 216 | ||
217 | Description: writes debug entry to active debug area (if level <= actual | 217 | Description: writes debug entry to active debug area (if level <= actual |
218 | debug level) and switches to next debug area | 218 | debug level) and switches to next debug area |
219 | 219 | ||
220 | --------------------------------------------------------------------------- | 220 | --------------------------------------------------------------------------- |
221 | debug_entry_t* debug_int_exception (debug_info_t * id, int level, | 221 | debug_entry_t* debug_int_exception (debug_info_t * id, int level, |
222 | unsigned int data); | 222 | unsigned int data); |
223 | debug_entry_t* debug_long_exception(debug_info_t * id, int level, | 223 | debug_entry_t* debug_long_exception(debug_info_t * id, int level, |
224 | unsigned long data); | 224 | unsigned long data); |
225 | 225 | ||
226 | Parameter: id: handle for debug log | 226 | Parameter: id: handle for debug log |
227 | level: debug level | 227 | level: debug level |
228 | data: integer value for debug entry | 228 | data: integer value for debug entry |
229 | 229 | ||
230 | Return Value: Address of written debug entry | 230 | Return Value: Address of written debug entry |
231 | 231 | ||
232 | Description: writes debug entry to active debug area (if level <= actual | 232 | Description: writes debug entry to active debug area (if level <= actual |
233 | debug level) and switches to next debug area | 233 | debug level) and switches to next debug area |
234 | 234 | ||
235 | --------------------------------------------------------------------------- | 235 | --------------------------------------------------------------------------- |
236 | debug_entry_t* debug_text_exception (debug_info_t * id, int level, | 236 | debug_entry_t* debug_text_exception (debug_info_t * id, int level, |
237 | const char* data); | 237 | const char* data); |
238 | 238 | ||
239 | Parameter: id: handle for debug log | 239 | Parameter: id: handle for debug log |
240 | level: debug level | 240 | level: debug level |
241 | data: string for debug entry | 241 | data: string for debug entry |
242 | 242 | ||
243 | Return Value: Address of written debug entry | 243 | Return Value: Address of written debug entry |
244 | 244 | ||
245 | Description: writes debug entry in ascii format to active debug area | 245 | Description: writes debug entry in ascii format to active debug area |
246 | (if level <= actual debug level) and switches to next debug | 246 | (if level <= actual debug level) and switches to next debug |
247 | area | 247 | area |
248 | 248 | ||
249 | --------------------------------------------------------------------------- | 249 | --------------------------------------------------------------------------- |
250 | debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level, | 250 | debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level, |
251 | char* string,...); | 251 | char* string,...); |
252 | 252 | ||
253 | Parameter: id: handle for debug log | 253 | Parameter: id: handle for debug log |
254 | level: debug level | 254 | level: debug level |
255 | string: format string for debug entry | 255 | string: format string for debug entry |
256 | ...: varargs used as in sprintf() | 256 | ...: varargs used as in sprintf() |
257 | 257 | ||
258 | Return Value: Address of written debug entry | 258 | Return Value: Address of written debug entry |
259 | 259 | ||
260 | Description: writes debug entry with format string and varargs (longs) to | 260 | Description: writes debug entry with format string and varargs (longs) to |
261 | active debug area (if level $<=$ actual debug level) and | 261 | active debug area (if level $<=$ actual debug level) and |
262 | switches to next debug area. | 262 | switches to next debug area. |
263 | floats and long long datatypes cannot be used as varargs. | 263 | floats and long long datatypes cannot be used as varargs. |
264 | 264 | ||
265 | --------------------------------------------------------------------------- | 265 | --------------------------------------------------------------------------- |
266 | 266 | ||
267 | int debug_register_view (debug_info_t * id, struct debug_view *view); | 267 | int debug_register_view (debug_info_t * id, struct debug_view *view); |
268 | 268 | ||
269 | Parameter: id: handle for debug log | 269 | Parameter: id: handle for debug log |
270 | view: pointer to debug view struct | 270 | view: pointer to debug view struct |
271 | 271 | ||
272 | Return Value: 0 : ok | 272 | Return Value: 0 : ok |
273 | < 0: Error | 273 | < 0: Error |
274 | 274 | ||
275 | Description: registers new debug view and creates debugfs dir entry | 275 | Description: registers new debug view and creates debugfs dir entry |
276 | 276 | ||
277 | --------------------------------------------------------------------------- | 277 | --------------------------------------------------------------------------- |
278 | int debug_unregister_view (debug_info_t * id, struct debug_view *view); | 278 | int debug_unregister_view (debug_info_t * id, struct debug_view *view); |
279 | 279 | ||
280 | Parameter: id: handle for debug log | 280 | Parameter: id: handle for debug log |
281 | view: pointer to debug view struct | 281 | view: pointer to debug view struct |
282 | 282 | ||
283 | Return Value: 0 : ok | 283 | Return Value: 0 : ok |
284 | < 0: Error | 284 | < 0: Error |
285 | 285 | ||
286 | Description: unregisters debug view and removes debugfs dir entry | 286 | Description: unregisters debug view and removes debugfs dir entry |
287 | 287 | ||
288 | 288 | ||
289 | 289 | ||
290 | Predefined views: | 290 | Predefined views: |
291 | ----------------- | 291 | ----------------- |
292 | 292 | ||
293 | extern struct debug_view debug_hex_ascii_view; | 293 | extern struct debug_view debug_hex_ascii_view; |
294 | extern struct debug_view debug_raw_view; | 294 | extern struct debug_view debug_raw_view; |
295 | extern struct debug_view debug_sprintf_view; | 295 | extern struct debug_view debug_sprintf_view; |
296 | 296 | ||
297 | Examples | 297 | Examples |
298 | -------- | 298 | -------- |
299 | 299 | ||
300 | /* | 300 | /* |
301 | * hex_ascii- + raw-view Example | 301 | * hex_ascii- + raw-view Example |
302 | */ | 302 | */ |
303 | 303 | ||
304 | #include <linux/init.h> | 304 | #include <linux/init.h> |
305 | #include <asm/debug.h> | 305 | #include <asm/debug.h> |
306 | 306 | ||
307 | static debug_info_t* debug_info; | 307 | static debug_info_t* debug_info; |
308 | 308 | ||
309 | static int init(void) | 309 | static int init(void) |
310 | { | 310 | { |
311 | /* register 4 debug areas with one page each and 4 byte data field */ | 311 | /* register 4 debug areas with one page each and 4 byte data field */ |
312 | 312 | ||
313 | debug_info = debug_register ("test", 1, 4, 4 ); | 313 | debug_info = debug_register ("test", 1, 4, 4 ); |
314 | debug_register_view(debug_info,&debug_hex_ascii_view); | 314 | debug_register_view(debug_info,&debug_hex_ascii_view); |
315 | debug_register_view(debug_info,&debug_raw_view); | 315 | debug_register_view(debug_info,&debug_raw_view); |
316 | 316 | ||
317 | debug_text_event(debug_info, 4 , "one "); | 317 | debug_text_event(debug_info, 4 , "one "); |
318 | debug_int_exception(debug_info, 4, 4711); | 318 | debug_int_exception(debug_info, 4, 4711); |
319 | debug_event(debug_info, 3, &debug_info, 4); | 319 | debug_event(debug_info, 3, &debug_info, 4); |
320 | 320 | ||
321 | return 0; | 321 | return 0; |
322 | } | 322 | } |
323 | 323 | ||
324 | static void cleanup(void) | 324 | static void cleanup(void) |
325 | { | 325 | { |
326 | debug_unregister (debug_info); | 326 | debug_unregister (debug_info); |
327 | } | 327 | } |
328 | 328 | ||
329 | module_init(init); | 329 | module_init(init); |
330 | module_exit(cleanup); | 330 | module_exit(cleanup); |
331 | 331 | ||
332 | --------------------------------------------------------------------------- | 332 | --------------------------------------------------------------------------- |
333 | 333 | ||
334 | /* | 334 | /* |
335 | * sprintf-view Example | 335 | * sprintf-view Example |
336 | */ | 336 | */ |
337 | 337 | ||
338 | #include <linux/init.h> | 338 | #include <linux/init.h> |
339 | #include <asm/debug.h> | 339 | #include <asm/debug.h> |
340 | 340 | ||
341 | static debug_info_t* debug_info; | 341 | static debug_info_t* debug_info; |
342 | 342 | ||
343 | static int init(void) | 343 | static int init(void) |
344 | { | 344 | { |
345 | /* register 4 debug areas with one page each and data field for */ | 345 | /* register 4 debug areas with one page each and data field for */ |
346 | /* format string pointer + 2 varargs (= 3 * sizeof(long)) */ | 346 | /* format string pointer + 2 varargs (= 3 * sizeof(long)) */ |
347 | 347 | ||
348 | debug_info = debug_register ("test", 1, 4, sizeof(long) * 3); | 348 | debug_info = debug_register ("test", 1, 4, sizeof(long) * 3); |
349 | debug_register_view(debug_info,&debug_sprintf_view); | 349 | debug_register_view(debug_info,&debug_sprintf_view); |
350 | 350 | ||
351 | debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__); | 351 | debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__); |
352 | debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info); | 352 | debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info); |
353 | 353 | ||
354 | return 0; | 354 | return 0; |
355 | } | 355 | } |
356 | 356 | ||
357 | static void cleanup(void) | 357 | static void cleanup(void) |
358 | { | 358 | { |
359 | debug_unregister (debug_info); | 359 | debug_unregister (debug_info); |
360 | } | 360 | } |
361 | 361 | ||
362 | module_init(init); | 362 | module_init(init); |
363 | module_exit(cleanup); | 363 | module_exit(cleanup); |
364 | 364 | ||
365 | 365 | ||
366 | 366 | ||
367 | Debugfs Interface | 367 | Debugfs Interface |
368 | ---------------- | 368 | ---------------- |
369 | Views to the debug logs can be investigated through reading the corresponding | 369 | Views to the debug logs can be investigated through reading the corresponding |
370 | debugfs-files: | 370 | debugfs-files: |
371 | 371 | ||
372 | Example: | 372 | Example: |
373 | 373 | ||
374 | > ls /sys/kernel/debug/s390dbf/dasd | 374 | > ls /sys/kernel/debug/s390dbf/dasd |
375 | flush hex_ascii level pages raw | 375 | flush hex_ascii level pages raw |
376 | > cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort +1 | 376 | > cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort +1 |
377 | 00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | .... | 377 | 00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | .... |
378 | 00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE | 378 | 00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE |
379 | 00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | .... | 379 | 00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | .... |
380 | 00 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP | 380 | 00 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP |
381 | 01 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD | 381 | 01 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD |
382 | 01 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | .... | 382 | 01 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | .... |
383 | 01 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ... | 383 | 01 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ... |
384 | 01 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | .... | 384 | 01 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | .... |
385 | 01 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE | 385 | 01 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE |
386 | 01 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | .... | 386 | 01 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | .... |
387 | 387 | ||
388 | See section about predefined views for explanation of the above output! | 388 | See section about predefined views for explanation of the above output! |
389 | 389 | ||
390 | Changing the debug level | 390 | Changing the debug level |
391 | ------------------------ | 391 | ------------------------ |
392 | 392 | ||
393 | Example: | 393 | Example: |
394 | 394 | ||
395 | 395 | ||
396 | > cat /sys/kernel/debug/s390dbf/dasd/level | 396 | > cat /sys/kernel/debug/s390dbf/dasd/level |
397 | 3 | 397 | 3 |
398 | > echo "5" > /sys/kernel/debug/s390dbf/dasd/level | 398 | > echo "5" > /sys/kernel/debug/s390dbf/dasd/level |
399 | > cat /sys/kernel/debug/s390dbf/dasd/level | 399 | > cat /sys/kernel/debug/s390dbf/dasd/level |
400 | 5 | 400 | 5 |
401 | 401 | ||
402 | Flushing debug areas | 402 | Flushing debug areas |
403 | -------------------- | 403 | -------------------- |
404 | Debug areas can be flushed with piping the number of the desired | 404 | Debug areas can be flushed with piping the number of the desired |
405 | area (0...n) to the debugfs file "flush". When using "-" all debug areas | 405 | area (0...n) to the debugfs file "flush". When using "-" all debug areas |
406 | are flushed. | 406 | are flushed. |
407 | 407 | ||
408 | Examples: | 408 | Examples: |
409 | 409 | ||
410 | 1. Flush debug area 0: | 410 | 1. Flush debug area 0: |
411 | > echo "0" > /sys/kernel/debug/s390dbf/dasd/flush | 411 | > echo "0" > /sys/kernel/debug/s390dbf/dasd/flush |
412 | 412 | ||
413 | 2. Flush all debug areas: | 413 | 2. Flush all debug areas: |
414 | > echo "-" > /sys/kernel/debug/s390dbf/dasd/flush | 414 | > echo "-" > /sys/kernel/debug/s390dbf/dasd/flush |
415 | 415 | ||
416 | Changing the size of debug areas | 416 | Changing the size of debug areas |
417 | ------------------------------------ | 417 | ------------------------------------ |
418 | It is possible the change the size of debug areas through piping | 418 | It is possible the change the size of debug areas through piping |
419 | the number of pages to the debugfs file "pages". The resize request will | 419 | the number of pages to the debugfs file "pages". The resize request will |
420 | also flush the debug areas. | 420 | also flush the debug areas. |
421 | 421 | ||
422 | Example: | 422 | Example: |
423 | 423 | ||
424 | Define 4 pages for the debug areas of debug feature "dasd": | 424 | Define 4 pages for the debug areas of debug feature "dasd": |
425 | > echo "4" > /sys/kernel/debug/s390dbf/dasd/pages | 425 | > echo "4" > /sys/kernel/debug/s390dbf/dasd/pages |
426 | 426 | ||
427 | Stooping the debug feature | 427 | Stooping the debug feature |
428 | -------------------------- | 428 | -------------------------- |
429 | Example: | 429 | Example: |
430 | 430 | ||
431 | 1. Check if stopping is allowed | 431 | 1. Check if stopping is allowed |
432 | > cat /proc/sys/s390dbf/debug_stoppable | 432 | > cat /proc/sys/s390dbf/debug_stoppable |
433 | 2. Stop debug feature | 433 | 2. Stop debug feature |
434 | > echo 0 > /proc/sys/s390dbf/debug_active | 434 | > echo 0 > /proc/sys/s390dbf/debug_active |
435 | 435 | ||
436 | lcrash Interface | 436 | lcrash Interface |
437 | ---------------- | 437 | ---------------- |
438 | It is planned that the dump analysis tool lcrash gets an additional command | 438 | It is planned that the dump analysis tool lcrash gets an additional command |
439 | 's390dbf' to display all the debug logs. With this tool it will be possible | 439 | 's390dbf' to display all the debug logs. With this tool it will be possible |
440 | to investigate the debug logs on a live system and with a memory dump after | 440 | to investigate the debug logs on a live system and with a memory dump after |
441 | a system crash. | 441 | a system crash. |
442 | 442 | ||
443 | Investigating raw memory | 443 | Investigating raw memory |
444 | ------------------------ | 444 | ------------------------ |
445 | One last possibility to investigate the debug logs at a live | 445 | One last possibility to investigate the debug logs at a live |
446 | system and after a system crash is to look at the raw memory | 446 | system and after a system crash is to look at the raw memory |
447 | under VM or at the Service Element. | 447 | under VM or at the Service Element. |
448 | It is possible to find the anker of the debug-logs through | 448 | It is possible to find the anker of the debug-logs through |
449 | the 'debug_area_first' symbol in the System map. Then one has | 449 | the 'debug_area_first' symbol in the System map. Then one has |
450 | to follow the correct pointers of the data-structures defined | 450 | to follow the correct pointers of the data-structures defined |
451 | in debug.h and find the debug-areas in memory. | 451 | in debug.h and find the debug-areas in memory. |
452 | Normally modules which use the debug feature will also have | 452 | Normally modules which use the debug feature will also have |
453 | a global variable with the pointer to the debug-logs. Following | 453 | a global variable with the pointer to the debug-logs. Following |
454 | this pointer it will also be possible to find the debug logs in | 454 | this pointer it will also be possible to find the debug logs in |
455 | memory. | 455 | memory. |
456 | 456 | ||
457 | For this method it is recommended to use '16 * x + 4' byte (x = 0..n) | 457 | For this method it is recommended to use '16 * x + 4' byte (x = 0..n) |
458 | for the length of the data field in debug_register() in | 458 | for the length of the data field in debug_register() in |
459 | order to see the debug entries well formatted. | 459 | order to see the debug entries well formatted. |
460 | 460 | ||
461 | 461 | ||
462 | Predefined Views | 462 | Predefined Views |
463 | ---------------- | 463 | ---------------- |
464 | 464 | ||
465 | There are three predefined views: hex_ascii, raw and sprintf. | 465 | There are three predefined views: hex_ascii, raw and sprintf. |
466 | The hex_ascii view shows the data field in hex and ascii representation | 466 | The hex_ascii view shows the data field in hex and ascii representation |
467 | (e.g. '45 43 4b 44 | ECKD'). | 467 | (e.g. '45 43 4b 44 | ECKD'). |
468 | The raw view returns a bytestream as the debug areas are stored in memory. | 468 | The raw view returns a bytestream as the debug areas are stored in memory. |
469 | 469 | ||
470 | The sprintf view formats the debug entries in the same way as the sprintf | 470 | The sprintf view formats the debug entries in the same way as the sprintf |
471 | function would do. The sprintf event/exception functions write to the | 471 | function would do. The sprintf event/exception functions write to the |
472 | debug entry a pointer to the format string (size = sizeof(long)) | 472 | debug entry a pointer to the format string (size = sizeof(long)) |
473 | and for each vararg a long value. So e.g. for a debug entry with a format | 473 | and for each vararg a long value. So e.g. for a debug entry with a format |
474 | string plus two varargs one would need to allocate a (3 * sizeof(long)) | 474 | string plus two varargs one would need to allocate a (3 * sizeof(long)) |
475 | byte data area in the debug_register() function. | 475 | byte data area in the debug_register() function. |
476 | 476 | ||
477 | 477 | ||
478 | NOTE: If using the sprintf view do NOT use other event/exception functions | 478 | NOTE: If using the sprintf view do NOT use other event/exception functions |
479 | than the sprintf-event and -exception functions. | 479 | than the sprintf-event and -exception functions. |
480 | 480 | ||
481 | The format of the hex_ascii and sprintf view is as follows: | 481 | The format of the hex_ascii and sprintf view is as follows: |
482 | - Number of area | 482 | - Number of area |
483 | - Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated | 483 | - Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated |
484 | Universal Time (UTC), January 1, 1970) | 484 | Universal Time (UTC), January 1, 1970) |
485 | - level of debug entry | 485 | - level of debug entry |
486 | - Exception flag (* = Exception) | 486 | - Exception flag (* = Exception) |
487 | - Cpu-Number of calling task | 487 | - Cpu-Number of calling task |
488 | - Return Address to caller | 488 | - Return Address to caller |
489 | - data field | 489 | - data field |
490 | 490 | ||
491 | The format of the raw view is: | 491 | The format of the raw view is: |
492 | - Header as described in debug.h | 492 | - Header as described in debug.h |
493 | - datafield | 493 | - datafield |
494 | 494 | ||
495 | A typical line of the hex_ascii view will look like the following (first line | 495 | A typical line of the hex_ascii view will look like the following (first line |
496 | is only for explanation and will not be displayed when 'cating' the view): | 496 | is only for explanation and will not be displayed when 'cating' the view): |
497 | 497 | ||
498 | area time level exception cpu caller data (hex + ascii) | 498 | area time level exception cpu caller data (hex + ascii) |
499 | -------------------------------------------------------------------------- | 499 | -------------------------------------------------------------------------- |
500 | 00 00964419409:440690 1 - 00 88023fe | 500 | 00 00964419409:440690 1 - 00 88023fe |
501 | 501 | ||
502 | 502 | ||
503 | Defining views | 503 | Defining views |
504 | -------------- | 504 | -------------- |
505 | 505 | ||
506 | Views are specified with the 'debug_view' structure. There are defined | 506 | Views are specified with the 'debug_view' structure. There are defined |
507 | callback functions which are used for reading and writing the debugfs files: | 507 | callback functions which are used for reading and writing the debugfs files: |
508 | 508 | ||
509 | struct debug_view { | 509 | struct debug_view { |
510 | char name[DEBUG_MAX_PROCF_LEN]; | 510 | char name[DEBUG_MAX_PROCF_LEN]; |
511 | debug_prolog_proc_t* prolog_proc; | 511 | debug_prolog_proc_t* prolog_proc; |
512 | debug_header_proc_t* header_proc; | 512 | debug_header_proc_t* header_proc; |
513 | debug_format_proc_t* format_proc; | 513 | debug_format_proc_t* format_proc; |
514 | debug_input_proc_t* input_proc; | 514 | debug_input_proc_t* input_proc; |
515 | void* private_data; | 515 | void* private_data; |
516 | }; | 516 | }; |
517 | 517 | ||
518 | where | 518 | where |
519 | 519 | ||
520 | typedef int (debug_header_proc_t) (debug_info_t* id, | 520 | typedef int (debug_header_proc_t) (debug_info_t* id, |
521 | struct debug_view* view, | 521 | struct debug_view* view, |
522 | int area, | 522 | int area, |
523 | debug_entry_t* entry, | 523 | debug_entry_t* entry, |
524 | char* out_buf); | 524 | char* out_buf); |
525 | 525 | ||
526 | typedef int (debug_format_proc_t) (debug_info_t* id, | 526 | typedef int (debug_format_proc_t) (debug_info_t* id, |
527 | struct debug_view* view, char* out_buf, | 527 | struct debug_view* view, char* out_buf, |
528 | const char* in_buf); | 528 | const char* in_buf); |
529 | typedef int (debug_prolog_proc_t) (debug_info_t* id, | 529 | typedef int (debug_prolog_proc_t) (debug_info_t* id, |
530 | struct debug_view* view, | 530 | struct debug_view* view, |
531 | char* out_buf); | 531 | char* out_buf); |
532 | typedef int (debug_input_proc_t) (debug_info_t* id, | 532 | typedef int (debug_input_proc_t) (debug_info_t* id, |
533 | struct debug_view* view, | 533 | struct debug_view* view, |
534 | struct file* file, const char* user_buf, | 534 | struct file* file, const char* user_buf, |
535 | size_t in_buf_size, loff_t* offset); | 535 | size_t in_buf_size, loff_t* offset); |
536 | 536 | ||
537 | 537 | ||
538 | The "private_data" member can be used as pointer to view specific data. | 538 | The "private_data" member can be used as pointer to view specific data. |
539 | It is not used by the debug feature itself. | 539 | It is not used by the debug feature itself. |
540 | 540 | ||
541 | The output when reading a debugfs file is structured like this: | 541 | The output when reading a debugfs file is structured like this: |
542 | 542 | ||
543 | "prolog_proc output" | 543 | "prolog_proc output" |
544 | 544 | ||
545 | "header_proc output 1" "format_proc output 1" | 545 | "header_proc output 1" "format_proc output 1" |
546 | "header_proc output 2" "format_proc output 2" | 546 | "header_proc output 2" "format_proc output 2" |
547 | "header_proc output 3" "format_proc output 3" | 547 | "header_proc output 3" "format_proc output 3" |
548 | ... | 548 | ... |
549 | 549 | ||
550 | When a view is read from the debugfs, the Debug Feature calls the | 550 | When a view is read from the debugfs, the Debug Feature calls the |
551 | 'prolog_proc' once for writing the prolog. | 551 | 'prolog_proc' once for writing the prolog. |
552 | Then 'header_proc' and 'format_proc' are called for each | 552 | Then 'header_proc' and 'format_proc' are called for each |
553 | existing debug entry. | 553 | existing debug entry. |
554 | 554 | ||
555 | The input_proc can be used to implement functionality when it is written to | 555 | The input_proc can be used to implement functionality when it is written to |
556 | the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level). | 556 | the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level). |
557 | 557 | ||
558 | For header_proc there can be used the default function | 558 | For header_proc there can be used the default function |
559 | debug_dflt_header_fn() which is defined in in debug.h. | 559 | debug_dflt_header_fn() which is defined in debug.h. |
560 | and which produces the same header output as the predefined views. | 560 | and which produces the same header output as the predefined views. |
561 | E.g: | 561 | E.g: |
562 | 00 00964419409:440761 2 - 00 88023ec | 562 | 00 00964419409:440761 2 - 00 88023ec |
563 | 563 | ||
564 | In order to see how to use the callback functions check the implementation | 564 | In order to see how to use the callback functions check the implementation |
565 | of the default views! | 565 | of the default views! |
566 | 566 | ||
567 | Example | 567 | Example |
568 | 568 | ||
569 | #include <asm/debug.h> | 569 | #include <asm/debug.h> |
570 | 570 | ||
571 | #define UNKNOWNSTR "data: %08x" | 571 | #define UNKNOWNSTR "data: %08x" |
572 | 572 | ||
573 | const char* messages[] = | 573 | const char* messages[] = |
574 | {"This error...........\n", | 574 | {"This error...........\n", |
575 | "That error...........\n", | 575 | "That error...........\n", |
576 | "Problem..............\n", | 576 | "Problem..............\n", |
577 | "Something went wrong.\n", | 577 | "Something went wrong.\n", |
578 | "Everything ok........\n", | 578 | "Everything ok........\n", |
579 | NULL | 579 | NULL |
580 | }; | 580 | }; |
581 | 581 | ||
582 | static int debug_test_format_fn( | 582 | static int debug_test_format_fn( |
583 | debug_info_t * id, struct debug_view *view, | 583 | debug_info_t * id, struct debug_view *view, |
584 | char *out_buf, const char *in_buf | 584 | char *out_buf, const char *in_buf |
585 | ) | 585 | ) |
586 | { | 586 | { |
587 | int i, rc = 0; | 587 | int i, rc = 0; |
588 | 588 | ||
589 | if(id->buf_size >= 4) { | 589 | if(id->buf_size >= 4) { |
590 | int msg_nr = *((int*)in_buf); | 590 | int msg_nr = *((int*)in_buf); |
591 | if(msg_nr < sizeof(messages)/sizeof(char*) - 1) | 591 | if(msg_nr < sizeof(messages)/sizeof(char*) - 1) |
592 | rc += sprintf(out_buf, "%s", messages[msg_nr]); | 592 | rc += sprintf(out_buf, "%s", messages[msg_nr]); |
593 | else | 593 | else |
594 | rc += sprintf(out_buf, UNKNOWNSTR, msg_nr); | 594 | rc += sprintf(out_buf, UNKNOWNSTR, msg_nr); |
595 | } | 595 | } |
596 | out: | 596 | out: |
597 | return rc; | 597 | return rc; |
598 | } | 598 | } |
599 | 599 | ||
600 | struct debug_view debug_test_view = { | 600 | struct debug_view debug_test_view = { |
601 | "myview", /* name of view */ | 601 | "myview", /* name of view */ |
602 | NULL, /* no prolog */ | 602 | NULL, /* no prolog */ |
603 | &debug_dflt_header_fn, /* default header for each entry */ | 603 | &debug_dflt_header_fn, /* default header for each entry */ |
604 | &debug_test_format_fn, /* our own format function */ | 604 | &debug_test_format_fn, /* our own format function */ |
605 | NULL, /* no input function */ | 605 | NULL, /* no input function */ |
606 | NULL /* no private data */ | 606 | NULL /* no private data */ |
607 | }; | 607 | }; |
608 | 608 | ||
609 | ===== | 609 | ===== |
610 | test: | 610 | test: |
611 | ===== | 611 | ===== |
612 | debug_info_t *debug_info; | 612 | debug_info_t *debug_info; |
613 | ... | 613 | ... |
614 | debug_info = debug_register ("test", 0, 4, 4 )); | 614 | debug_info = debug_register ("test", 0, 4, 4 )); |
615 | debug_register_view(debug_info, &debug_test_view); | 615 | debug_register_view(debug_info, &debug_test_view); |
616 | for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i); | 616 | for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i); |
617 | 617 | ||
618 | > cat /sys/kernel/debug/s390dbf/test/myview | 618 | > cat /sys/kernel/debug/s390dbf/test/myview |
619 | 00 00964419734:611402 1 - 00 88042ca This error........... | 619 | 00 00964419734:611402 1 - 00 88042ca This error........... |
620 | 00 00964419734:611405 1 - 00 88042ca That error........... | 620 | 00 00964419734:611405 1 - 00 88042ca That error........... |
621 | 00 00964419734:611408 1 - 00 88042ca Problem.............. | 621 | 00 00964419734:611408 1 - 00 88042ca Problem.............. |
622 | 00 00964419734:611411 1 - 00 88042ca Something went wrong. | 622 | 00 00964419734:611411 1 - 00 88042ca Something went wrong. |
623 | 00 00964419734:611414 1 - 00 88042ca Everything ok........ | 623 | 00 00964419734:611414 1 - 00 88042ca Everything ok........ |
624 | 00 00964419734:611417 1 - 00 88042ca data: 00000005 | 624 | 00 00964419734:611417 1 - 00 88042ca data: 00000005 |
625 | 00 00964419734:611419 1 - 00 88042ca data: 00000006 | 625 | 00 00964419734:611419 1 - 00 88042ca data: 00000006 |
626 | 00 00964419734:611422 1 - 00 88042ca data: 00000007 | 626 | 00 00964419734:611422 1 - 00 88042ca data: 00000007 |
627 | 00 00964419734:611425 1 - 00 88042ca data: 00000008 | 627 | 00 00964419734:611425 1 - 00 88042ca data: 00000008 |
628 | 00 00964419734:611428 1 - 00 88042ca data: 00000009 | 628 | 00 00964419734:611428 1 - 00 88042ca data: 00000009 |
629 | 629 |
Documentation/scsi/ChangeLog.1992-1997
1 | Sat Jan 18 15:51:45 1997 Richard Henderson <rth@tamu.edu> | 1 | Sat Jan 18 15:51:45 1997 Richard Henderson <rth@tamu.edu> |
2 | 2 | ||
3 | * Don't play with usage_count directly, instead hand around | 3 | * Don't play with usage_count directly, instead hand around |
4 | the module header and use the module macros. | 4 | the module header and use the module macros. |
5 | 5 | ||
6 | Fri May 17 00:00:00 1996 Leonard N. Zubkoff <lnz@dandelion.com> | 6 | Fri May 17 00:00:00 1996 Leonard N. Zubkoff <lnz@dandelion.com> |
7 | 7 | ||
8 | * BusLogic Driver Version 2.0.3 Released. | 8 | * BusLogic Driver Version 2.0.3 Released. |
9 | 9 | ||
10 | Tue Apr 16 21:00:00 1996 Leonard N. Zubkoff <lnz@dandelion.com> | 10 | Tue Apr 16 21:00:00 1996 Leonard N. Zubkoff <lnz@dandelion.com> |
11 | 11 | ||
12 | * BusLogic Driver Version 1.3.2 Released. | 12 | * BusLogic Driver Version 1.3.2 Released. |
13 | 13 | ||
14 | Sun Dec 31 23:26:00 1995 Leonard N. Zubkoff <lnz@dandelion.com> | 14 | Sun Dec 31 23:26:00 1995 Leonard N. Zubkoff <lnz@dandelion.com> |
15 | 15 | ||
16 | * BusLogic Driver Version 1.3.1 Released. | 16 | * BusLogic Driver Version 1.3.1 Released. |
17 | 17 | ||
18 | Fri Nov 10 15:29:49 1995 Leonard N. Zubkoff <lnz@dandelion.com> | 18 | Fri Nov 10 15:29:49 1995 Leonard N. Zubkoff <lnz@dandelion.com> |
19 | 19 | ||
20 | * Released new BusLogic driver. | 20 | * Released new BusLogic driver. |
21 | 21 | ||
22 | Wed Aug 9 22:37:04 1995 Andries Brouwer <aeb@cwi.nl> | 22 | Wed Aug 9 22:37:04 1995 Andries Brouwer <aeb@cwi.nl> |
23 | 23 | ||
24 | As a preparation for new device code, separated the various | 24 | As a preparation for new device code, separated the various |
25 | functions the request->dev field had into the device proper, | 25 | functions the request->dev field had into the device proper, |
26 | request->rq_dev and a status field request->rq_status. | 26 | request->rq_dev and a status field request->rq_status. |
27 | 27 | ||
28 | The 2nd argument of bios_param is now a kdev_t. | 28 | The 2nd argument of bios_param is now a kdev_t. |
29 | 29 | ||
30 | Wed Jul 19 10:43:15 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 30 | Wed Jul 19 10:43:15 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
31 | 31 | ||
32 | * scsi.c (scsi_proc_info): /proc/scsi/scsi now also lists all | 32 | * scsi.c (scsi_proc_info): /proc/scsi/scsi now also lists all |
33 | attached devices. | 33 | attached devices. |
34 | 34 | ||
35 | * scsi_proc.c (proc_print_scsidevice): Added. Used by scsi.c and | 35 | * scsi_proc.c (proc_print_scsidevice): Added. Used by scsi.c and |
36 | eata_dma_proc.c to produce some device info for /proc/scsi. | 36 | eata_dma_proc.c to produce some device info for /proc/scsi. |
37 | 37 | ||
38 | * eata_dma.c (eata_queue)(eata_int_handler)(eata_scsi_done): | 38 | * eata_dma.c (eata_queue)(eata_int_handler)(eata_scsi_done): |
39 | Changed handling of internal SCSI commands send to the HBA. | 39 | Changed handling of internal SCSI commands send to the HBA. |
40 | 40 | ||
41 | 41 | ||
42 | Wed Jul 19 10:09:17 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 42 | Wed Jul 19 10:09:17 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
43 | 43 | ||
44 | * Linux 1.3.11 released. | 44 | * Linux 1.3.11 released. |
45 | 45 | ||
46 | * eata_dma.c (eata_queue)(eata_int_handler): Added code to do | 46 | * eata_dma.c (eata_queue)(eata_int_handler): Added code to do |
47 | command latency measurements if requested by root through | 47 | command latency measurements if requested by root through |
48 | /proc/scsi interface. | 48 | /proc/scsi interface. |
49 | Throughout Use HZ constant for time references. | 49 | Throughout Use HZ constant for time references. |
50 | 50 | ||
51 | * eata_pio.c: Use HZ constant for time references. | 51 | * eata_pio.c: Use HZ constant for time references. |
52 | 52 | ||
53 | * aic7xxx.c, aic7xxx.h, aic7xxx_asm.c: Changed copyright from BSD | 53 | * aic7xxx.c, aic7xxx.h, aic7xxx_asm.c: Changed copyright from BSD |
54 | to GNU style. | 54 | to GNU style. |
55 | 55 | ||
56 | * scsi.h: Added READ_12 command opcode constant | 56 | * scsi.h: Added READ_12 command opcode constant |
57 | 57 | ||
58 | Wed Jul 19 09:25:30 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 58 | Wed Jul 19 09:25:30 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
59 | 59 | ||
60 | * Linux 1.3.10 released. | 60 | * Linux 1.3.10 released. |
61 | 61 | ||
62 | * scsi_proc.c (dispatch_scsi_info): Removed unused variable. | 62 | * scsi_proc.c (dispatch_scsi_info): Removed unused variable. |
63 | 63 | ||
64 | Wed Jul 19 09:25:30 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 64 | Wed Jul 19 09:25:30 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
65 | 65 | ||
66 | * Linux 1.3.9 released. | 66 | * Linux 1.3.9 released. |
67 | 67 | ||
68 | * scsi.c Blacklist concept expanded to 'support' more device | 68 | * scsi.c Blacklist concept expanded to 'support' more device |
69 | deficiencies. blacklist[] renamed to device_list[] | 69 | deficiencies. blacklist[] renamed to device_list[] |
70 | (scan_scsis): Code cleanup. | 70 | (scan_scsis): Code cleanup. |
71 | 71 | ||
72 | * scsi_debug.c (scsi_debug_proc_info): Added support to control | 72 | * scsi_debug.c (scsi_debug_proc_info): Added support to control |
73 | device lockup simulation via /proc/scsi interface. | 73 | device lockup simulation via /proc/scsi interface. |
74 | 74 | ||
75 | 75 | ||
76 | Wed Jul 19 09:22:34 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 76 | Wed Jul 19 09:22:34 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
77 | 77 | ||
78 | * Linux 1.3.7 released. | 78 | * Linux 1.3.7 released. |
79 | 79 | ||
80 | * scsi_proc.c: Fixed a number of bugs in directory handling | 80 | * scsi_proc.c: Fixed a number of bugs in directory handling |
81 | 81 | ||
82 | Wed Jul 19 09:18:28 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 82 | Wed Jul 19 09:18:28 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
83 | 83 | ||
84 | * Linux 1.3.5 released. | 84 | * Linux 1.3.5 released. |
85 | 85 | ||
86 | * Native wide, multichannel and /proc/scsi support now in official | 86 | * Native wide, multichannel and /proc/scsi support now in official |
87 | kernel distribution. | 87 | kernel distribution. |
88 | 88 | ||
89 | * scsi.c/h, hosts.c/h et al reindented to increase readability | 89 | * scsi.c/h, hosts.c/h et al reindented to increase readability |
90 | (especially on 80 column wide terminals). | 90 | (especially on 80 column wide terminals). |
91 | 91 | ||
92 | * scsi.c, scsi_proc.c, ../../fs/proc/inode.c: Added | 92 | * scsi.c, scsi_proc.c, ../../fs/proc/inode.c: Added |
93 | /proc/scsi/scsi which allows root to scan for hotplugged devices. | 93 | /proc/scsi/scsi which allows root to scan for hotplugged devices. |
94 | 94 | ||
95 | * scsi.c (scsi_proc_info): Added, to support /proc/scsi/scsi. | 95 | * scsi.c (scsi_proc_info): Added, to support /proc/scsi/scsi. |
96 | (scan_scsis): Added some 'spaghetti' code to allow scanning for | 96 | (scan_scsis): Added some 'spaghetti' code to allow scanning for |
97 | single devices. | 97 | single devices. |
98 | 98 | ||
99 | 99 | ||
100 | Thu Jun 20 15:20:27 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 100 | Thu Jun 20 15:20:27 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
101 | 101 | ||
102 | * proc.c: Renamed to scsi_proc.c | 102 | * proc.c: Renamed to scsi_proc.c |
103 | 103 | ||
104 | Mon Jun 12 20:32:45 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 104 | Mon Jun 12 20:32:45 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
105 | 105 | ||
106 | * Linux 1.3.0 released. | 106 | * Linux 1.3.0 released. |
107 | 107 | ||
108 | Mon May 15 19:33:14 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 108 | Mon May 15 19:33:14 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
109 | 109 | ||
110 | * scsi.c: Added native multichannel and wide scsi support. | 110 | * scsi.c: Added native multichannel and wide scsi support. |
111 | 111 | ||
112 | * proc.c (dispatch_scsi_info) (build_proc_dir_hba_entries): | 112 | * proc.c (dispatch_scsi_info) (build_proc_dir_hba_entries): |
113 | Updated /proc/scsi interface. | 113 | Updated /proc/scsi interface. |
114 | 114 | ||
115 | Thu May 4 17:58:48 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> | 115 | Thu May 4 17:58:48 1995 Michael Neuffer <neuffer@goofy.zdv.uni-mainz.de> |
116 | 116 | ||
117 | * sd.c (requeue_sd_request): Zero out the scatterlist only if | 117 | * sd.c (requeue_sd_request): Zero out the scatterlist only if |
118 | scsi_malloc returned memory for it. | 118 | scsi_malloc returned memory for it. |
119 | 119 | ||
120 | * eata_dma.c (register_HBA) (eata_queue): Add support for | 120 | * eata_dma.c (register_HBA) (eata_queue): Add support for |
121 | large scatter/gather tables and set use_clustering accordingly | 121 | large scatter/gather tables and set use_clustering accordingly |
122 | 122 | ||
123 | * hosts.c: Make use_clustering changeable in the Scsi_Host structure. | 123 | * hosts.c: Make use_clustering changeable in the Scsi_Host structure. |
124 | 124 | ||
125 | Wed Apr 12 15:25:52 1995 Eric Youngdale (eric@andante) | 125 | Wed Apr 12 15:25:52 1995 Eric Youngdale (eric@andante) |
126 | 126 | ||
127 | * Linux 1.2.5 released. | 127 | * Linux 1.2.5 released. |
128 | 128 | ||
129 | * buslogic.c: Update to version 1.15 (From Leonard N. Zubkoff). | 129 | * buslogic.c: Update to version 1.15 (From Leonard N. Zubkoff). |
130 | Fixed interrupt routine to avoid races when handling multiple | 130 | Fixed interrupt routine to avoid races when handling multiple |
131 | complete commands per interrupt. Seems to come up with faster | 131 | complete commands per interrupt. Seems to come up with faster |
132 | cards. | 132 | cards. |
133 | 133 | ||
134 | * eata_dma.c: Update to 2.3.5r. Modularize. Improved error handling | 134 | * eata_dma.c: Update to 2.3.5r. Modularize. Improved error handling |
135 | throughout and fixed bug interrupt routine which resulted in shifted | 135 | throughout and fixed bug interrupt routine which resulted in shifted |
136 | status bytes. Added blink LED state checks for ISA and EISA HBAs. | 136 | status bytes. Added blink LED state checks for ISA and EISA HBAs. |
137 | Memory management bug seems to have disappeared ==> increasing | 137 | Memory management bug seems to have disappeared ==> increasing |
138 | C_P_L_CURRENT_MAX to 16 for now. Decreasing C_P_L_DIV to 3 for | 138 | C_P_L_CURRENT_MAX to 16 for now. Decreasing C_P_L_DIV to 3 for |
139 | performance reasons. | 139 | performance reasons. |
140 | 140 | ||
141 | * scsi.c: If we get a FMK, EOM, or ILI when attempting to scan | 141 | * scsi.c: If we get a FMK, EOM, or ILI when attempting to scan |
142 | the bus, assume that it was just noise on the bus, and ignore | 142 | the bus, assume that it was just noise on the bus, and ignore |
143 | the device. | 143 | the device. |
144 | 144 | ||
145 | * scsi.h: Update and add a bunch of missing commands which we | 145 | * scsi.h: Update and add a bunch of missing commands which we |
146 | were never using. | 146 | were never using. |
147 | 147 | ||
148 | * sd.c: Use restore_flags in do_sd_request - this may result in | 148 | * sd.c: Use restore_flags in do_sd_request - this may result in |
149 | latency conditions, but it gets rid of races and crashes. | 149 | latency conditions, but it gets rid of races and crashes. |
150 | Do not save flags again when searching for a second command to | 150 | Do not save flags again when searching for a second command to |
151 | queue. | 151 | queue. |
152 | 152 | ||
153 | * st.c: Use bytes, not STP->buffer->buffer_size when reading | 153 | * st.c: Use bytes, not STP->buffer->buffer_size when reading |
154 | from tape. | 154 | from tape. |
155 | 155 | ||
156 | 156 | ||
157 | Tue Apr 4 09:42:08 1995 Eric Youngdale (eric@andante) | 157 | Tue Apr 4 09:42:08 1995 Eric Youngdale (eric@andante) |
158 | 158 | ||
159 | * Linux 1.2.4 released. | 159 | * Linux 1.2.4 released. |
160 | 160 | ||
161 | * st.c: Fix typo - restoring wrong flags. | 161 | * st.c: Fix typo - restoring wrong flags. |
162 | 162 | ||
163 | Wed Mar 29 06:55:12 1995 Eric Youngdale (eric@andante) | 163 | Wed Mar 29 06:55:12 1995 Eric Youngdale (eric@andante) |
164 | 164 | ||
165 | * Linux 1.2.3 released. | 165 | * Linux 1.2.3 released. |
166 | 166 | ||
167 | * st.c: Perform some waiting operations with interrupts off. | 167 | * st.c: Perform some waiting operations with interrupts off. |
168 | Is this correct??? | 168 | Is this correct??? |
169 | 169 | ||
170 | Wed Mar 22 10:34:26 1995 Eric Youngdale (eric@andante) | 170 | Wed Mar 22 10:34:26 1995 Eric Youngdale (eric@andante) |
171 | 171 | ||
172 | * Linux 1.2.2 released. | 172 | * Linux 1.2.2 released. |
173 | 173 | ||
174 | * aha152x.c: Modularize. Add support for PCMCIA. | 174 | * aha152x.c: Modularize. Add support for PCMCIA. |
175 | 175 | ||
176 | * eata.c: Update to version 2.0. Fixed bug preventing media | 176 | * eata.c: Update to version 2.0. Fixed bug preventing media |
177 | detection. If scsi_register_host returns NULL, fail gracefully. | 177 | detection. If scsi_register_host returns NULL, fail gracefully. |
178 | 178 | ||
179 | * scsi.c: Detect as NEC (for photo-cd purposes) for the 84 | 179 | * scsi.c: Detect as NEC (for photo-cd purposes) for the 84 |
180 | and 25 models as "NEC_OLDCDR". | 180 | and 25 models as "NEC_OLDCDR". |
181 | 181 | ||
182 | * scsi.h: Add define for NEC_OLDCDR | 182 | * scsi.h: Add define for NEC_OLDCDR |
183 | 183 | ||
184 | * sr.c: Add handling for NEC_OLDCDR. Treat as unknown. | 184 | * sr.c: Add handling for NEC_OLDCDR. Treat as unknown. |
185 | 185 | ||
186 | * u14-34f.c: Update to version 2.0. Fixed same bug as in | 186 | * u14-34f.c: Update to version 2.0. Fixed same bug as in |
187 | eata.c. | 187 | eata.c. |
188 | 188 | ||
189 | 189 | ||
190 | Mon Mar 6 11:11:20 1995 Eric Youngdale (eric@andante) | 190 | Mon Mar 6 11:11:20 1995 Eric Youngdale (eric@andante) |
191 | 191 | ||
192 | * Linux 1.2.0 released. Yeah!!! | 192 | * Linux 1.2.0 released. Yeah!!! |
193 | 193 | ||
194 | * Minor spelling/punctuation changes throughout. Nothing | 194 | * Minor spelling/punctuation changes throughout. Nothing |
195 | substantive. | 195 | substantive. |
196 | 196 | ||
197 | Mon Feb 20 21:33:03 1995 Eric Youngdale (eric@andante) | 197 | Mon Feb 20 21:33:03 1995 Eric Youngdale (eric@andante) |
198 | 198 | ||
199 | * Linux 1.1.95 released. | 199 | * Linux 1.1.95 released. |
200 | 200 | ||
201 | * qlogic.c: Update to version 0.41. | 201 | * qlogic.c: Update to version 0.41. |
202 | 202 | ||
203 | * seagate.c: Change some message to be more descriptive about what | 203 | * seagate.c: Change some message to be more descriptive about what |
204 | we detected. | 204 | we detected. |
205 | 205 | ||
206 | * sr.c: spelling/whitespace changes. | 206 | * sr.c: spelling/whitespace changes. |
207 | 207 | ||
208 | Mon Feb 20 21:33:03 1995 Eric Youngdale (eric@andante) | 208 | Mon Feb 20 21:33:03 1995 Eric Youngdale (eric@andante) |
209 | 209 | ||
210 | * Linux 1.1.94 released. | 210 | * Linux 1.1.94 released. |
211 | 211 | ||
212 | Mon Feb 20 08:57:17 1995 Eric Youngdale (eric@andante) | 212 | Mon Feb 20 08:57:17 1995 Eric Youngdale (eric@andante) |
213 | 213 | ||
214 | * Linux 1.1.93 released. | 214 | * Linux 1.1.93 released. |
215 | 215 | ||
216 | * hosts.h: Change io_port to long int from short. | 216 | * hosts.h: Change io_port to long int from short. |
217 | 217 | ||
218 | * 53c7,8xx.c: crash on AEN fixed, SCSI reset is no longer a NOP, | 218 | * 53c7,8xx.c: crash on AEN fixed, SCSI reset is no longer a NOP, |
219 | NULL pointer panic on odd UDCs fixed, two bugs in diagnostic output | 219 | NULL pointer panic on odd UDCs fixed, two bugs in diagnostic output |
220 | fixed, should initialize correctly if left running, now loadable, | 220 | fixed, should initialize correctly if left running, now loadable, |
221 | new memory allocation, extraneous diagnostic output suppressed, | 221 | new memory allocation, extraneous diagnostic output suppressed, |
222 | splx() replaced with save/restore flags. [ Drew ] | 222 | splx() replaced with save/restore flags. [ Drew ] |
223 | 223 | ||
224 | * hosts.c, hosts.h, scsi_ioctl.c, sd.c, sd_ioctl.c, sg.c, sr.c, | 224 | * hosts.c, hosts.h, scsi_ioctl.c, sd.c, sd_ioctl.c, sg.c, sr.c, |
225 | sr_ioctl.c: Add special junk at end that Emacs will use for | 225 | sr_ioctl.c: Add special junk at end that Emacs will use for |
226 | formatting the file. | 226 | formatting the file. |
227 | 227 | ||
228 | * qlogic.c: Update to v0.40a. Improve parity handling. | 228 | * qlogic.c: Update to v0.40a. Improve parity handling. |
229 | 229 | ||
230 | * scsi.c: Add Hitachi DK312C to blacklist. Change "};" to "}" in | 230 | * scsi.c: Add Hitachi DK312C to blacklist. Change "};" to "}" in |
231 | many places. Use scsi_init_malloc to get command block - may | 231 | many places. Use scsi_init_malloc to get command block - may |
232 | need this to be dma compatible for some host adapters. | 232 | need this to be dma compatible for some host adapters. |
233 | Restore interrupts after unregistering a host. | 233 | Restore interrupts after unregistering a host. |
234 | 234 | ||
235 | * sd.c: Use sti instead of restore flags - causes latency problems. | 235 | * sd.c: Use sti instead of restore flags - causes latency problems. |
236 | 236 | ||
237 | * seagate.c: Use controller_type to determine string used when | 237 | * seagate.c: Use controller_type to determine string used when |
238 | registering irq. | 238 | registering irq. |
239 | 239 | ||
240 | * sr.c: More photo-cd hacks to make sure we get the xa stuff right. | 240 | * sr.c: More photo-cd hacks to make sure we get the xa stuff right. |
241 | * sr.h, sr.c: Change is_xa to xa_flags field. | 241 | * sr.h, sr.c: Change is_xa to xa_flags field. |
242 | 242 | ||
243 | * st.c: Disable retries for write operations. | 243 | * st.c: Disable retries for write operations. |
244 | 244 | ||
245 | Wed Feb 15 10:52:56 1995 Eric Youngdale (eric@andante) | 245 | Wed Feb 15 10:52:56 1995 Eric Youngdale (eric@andante) |
246 | 246 | ||
247 | * Linux 1.1.92 released. | 247 | * Linux 1.1.92 released. |
248 | 248 | ||
249 | * eata.c: Update to 1.17. | 249 | * eata.c: Update to 1.17. |
250 | 250 | ||
251 | * eata_dma.c: Update to 2.31a. Add more support for /proc/scsi. | 251 | * eata_dma.c: Update to 2.31a. Add more support for /proc/scsi. |
252 | Continuing modularization. Less crashes because of the bug in the | 252 | Continuing modularization. Less crashes because of the bug in the |
253 | memory management ==> increase C_P_L_CURRENT_MAX to 10 | 253 | memory management ==> increase C_P_L_CURRENT_MAX to 10 |
254 | and decrease C_P_L_DIV to 4. | 254 | and decrease C_P_L_DIV to 4. |
255 | 255 | ||
256 | * hosts.c: If we remove last host registered, reuse host number. | 256 | * hosts.c: If we remove last host registered, reuse host number. |
257 | When freeing memory from host being deregistered, free extra_bytes | 257 | When freeing memory from host being deregistered, free extra_bytes |
258 | too. | 258 | too. |
259 | 259 | ||
260 | * scsi.c (scan_scsis): memset(SDpnt, 0) and set SCmd.device to SDpnt. | 260 | * scsi.c (scan_scsis): memset(SDpnt, 0) and set SCmd.device to SDpnt. |
261 | Change memory allocation to work around bugs in __get_dma_pages. | 261 | Change memory allocation to work around bugs in __get_dma_pages. |
262 | Do not free host if usage count is not zero (for modules). | 262 | Do not free host if usage count is not zero (for modules). |
263 | 263 | ||
264 | * sr_ioctl.c: Increase IOCTL_TIMEOUT to 3000. | 264 | * sr_ioctl.c: Increase IOCTL_TIMEOUT to 3000. |
265 | 265 | ||
266 | * st.c: Allow for ST_EXTRA_DEVS in st data structures. | 266 | * st.c: Allow for ST_EXTRA_DEVS in st data structures. |
267 | 267 | ||
268 | * u14-34f.c: Update to 1.17. | 268 | * u14-34f.c: Update to 1.17. |
269 | 269 | ||
270 | Thu Feb 9 10:11:16 1995 Eric Youngdale (eric@andante) | 270 | Thu Feb 9 10:11:16 1995 Eric Youngdale (eric@andante) |
271 | 271 | ||
272 | * Linux 1.1.91 released. | 272 | * Linux 1.1.91 released. |
273 | 273 | ||
274 | * eata.c: Update to 1.16. Use wish_block instead of host->block. | 274 | * eata.c: Update to 1.16. Use wish_block instead of host->block. |
275 | 275 | ||
276 | * hosts.c: Initialize wish_block to 0. | 276 | * hosts.c: Initialize wish_block to 0. |
277 | 277 | ||
278 | * hosts.h: Add wish_block. | 278 | * hosts.h: Add wish_block. |
279 | 279 | ||
280 | * scsi.c: Use wish_block as indicator that the host should be added | 280 | * scsi.c: Use wish_block as indicator that the host should be added |
281 | to block list. | 281 | to block list. |
282 | 282 | ||
283 | * sg.c: Add SG_EXTRA_DEVS to number of slots. | 283 | * sg.c: Add SG_EXTRA_DEVS to number of slots. |
284 | 284 | ||
285 | * u14-34f.c: Use wish_block. | 285 | * u14-34f.c: Use wish_block. |
286 | 286 | ||
287 | Tue Feb 7 11:46:04 1995 Eric Youngdale (eric@andante) | 287 | Tue Feb 7 11:46:04 1995 Eric Youngdale (eric@andante) |
288 | 288 | ||
289 | * Linux 1.1.90 released. | 289 | * Linux 1.1.90 released. |
290 | 290 | ||
291 | * eata.c: Change naming from eata_* to eata2x_*. Now at vers 1.15. | 291 | * eata.c: Change naming from eata_* to eata2x_*. Now at vers 1.15. |
292 | Update interrupt handler to take pt_regs as arg. Allow blocking | 292 | Update interrupt handler to take pt_regs as arg. Allow blocking |
293 | even if loaded as module. Initialize target_time_out array. | 293 | even if loaded as module. Initialize target_time_out array. |
294 | Do not put sti(); in timing loop. | 294 | Do not put sti(); in timing loop. |
295 | 295 | ||
296 | * hosts.c: Do not reuse host numbers. | 296 | * hosts.c: Do not reuse host numbers. |
297 | Use scsi_make_blocked_list to generate blocking list. | 297 | Use scsi_make_blocked_list to generate blocking list. |
298 | 298 | ||
299 | * script_asm.pl: Beats me. Don't know perl. Something to do with | 299 | * script_asm.pl: Beats me. Don't know perl. Something to do with |
300 | phase index. | 300 | phase index. |
301 | 301 | ||
302 | * scsi.c (scsi_make_blocked_list): New function - code copied from | 302 | * scsi.c (scsi_make_blocked_list): New function - code copied from |
303 | hosts.c. | 303 | hosts.c. |
304 | 304 | ||
305 | * scsi.c: Update code to disable photo CD for Toshiba cdroms. | 305 | * scsi.c: Update code to disable photo CD for Toshiba cdroms. |
306 | Use just manufacturer name, not model number. | 306 | Use just manufacturer name, not model number. |
307 | 307 | ||
308 | * sr.c: Fix setting density for Toshiba drives. | 308 | * sr.c: Fix setting density for Toshiba drives. |
309 | 309 | ||
310 | * u14-34f.c: Clear target_time_out array during reset. | 310 | * u14-34f.c: Clear target_time_out array during reset. |
311 | 311 | ||
312 | Wed Feb 1 09:20:45 1995 Eric Youngdale (eric@andante) | 312 | Wed Feb 1 09:20:45 1995 Eric Youngdale (eric@andante) |
313 | 313 | ||
314 | * Linux 1.1.89 released. | 314 | * Linux 1.1.89 released. |
315 | 315 | ||
316 | * Makefile, u14-34f.c: Modularize. | 316 | * Makefile, u14-34f.c: Modularize. |
317 | 317 | ||
318 | * Makefile, eata.c: Modularize. Now version 1.14 | 318 | * Makefile, eata.c: Modularize. Now version 1.14 |
319 | 319 | ||
320 | * NCR5380.c: Update interrupt handler with new arglist. Minor | 320 | * NCR5380.c: Update interrupt handler with new arglist. Minor |
321 | cleanups. | 321 | cleanups. |
322 | 322 | ||
323 | * eata_dma.c: Begin to modularize. Add hooks for /proc/scsi. | 323 | * eata_dma.c: Begin to modularize. Add hooks for /proc/scsi. |
324 | New version 2.3.0a. Add code in interrupt handler to allow | 324 | New version 2.3.0a. Add code in interrupt handler to allow |
325 | certain CDROM drivers to be detected which return a | 325 | certain CDROM drivers to be detected which return a |
326 | CHECK_CONDITION during SCSI bus scan. Add opcode check to get | 326 | CHECK_CONDITION during SCSI bus scan. Add opcode check to get |
327 | all DATA IN and DATA OUT phases right. Utilize HBA_interpret flag. | 327 | all DATA IN and DATA OUT phases right. Utilize HBA_interpret flag. |
328 | Improvements in HBA identification. Various other minor stuff. | 328 | Improvements in HBA identification. Various other minor stuff. |
329 | 329 | ||
330 | * hosts.c: Initialize ->dma_channel and ->io_port when registering | 330 | * hosts.c: Initialize ->dma_channel and ->io_port when registering |
331 | a new host. | 331 | a new host. |
332 | 332 | ||
333 | * qlogic.c: Modularize and add PCMCIA support. | 333 | * qlogic.c: Modularize and add PCMCIA support. |
334 | 334 | ||
335 | * scsi.c: Add Hitachi to blacklist. | 335 | * scsi.c: Add Hitachi to blacklist. |
336 | 336 | ||
337 | * scsi.c: Change default to no lun scan (too many problem devices). | 337 | * scsi.c: Change default to no lun scan (too many problem devices). |
338 | 338 | ||
339 | * scsi.h: Define QUEUE_FULL condition. | 339 | * scsi.h: Define QUEUE_FULL condition. |
340 | 340 | ||
341 | * sd.c: Do not check for non-existent partition until after | 341 | * sd.c: Do not check for non-existent partition until after |
342 | new media check. | 342 | new media check. |
343 | 343 | ||
344 | * sg.c: Undo previous change which was wrong. | 344 | * sg.c: Undo previous change which was wrong. |
345 | 345 | ||
346 | * sr_ioctl.c: Increase IOCTL_TIMEOUT to 2000. | 346 | * sr_ioctl.c: Increase IOCTL_TIMEOUT to 2000. |
347 | 347 | ||
348 | * st.c: Patches from Kai - improve filemark handling. | 348 | * st.c: Patches from Kai - improve filemark handling. |
349 | 349 | ||
350 | Tue Jan 31 17:32:12 1995 Eric Youngdale (eric@andante) | 350 | Tue Jan 31 17:32:12 1995 Eric Youngdale (eric@andante) |
351 | 351 | ||
352 | * Linux 1.1.88 released. | 352 | * Linux 1.1.88 released. |
353 | 353 | ||
354 | * Throughout - spelling/grammar fixups. | 354 | * Throughout - spelling/grammar fixups. |
355 | 355 | ||
356 | * scsi.c: Make sure that all buffers are 16 byte aligned - some | 356 | * scsi.c: Make sure that all buffers are 16 byte aligned - some |
357 | drivers (buslogic) need this. | 357 | drivers (buslogic) need this. |
358 | 358 | ||
359 | * scsi.c (scan_scsis): Remove message printed. | 359 | * scsi.c (scan_scsis): Remove message printed. |
360 | 360 | ||
361 | * scsi.c (scsi_init): Move message here. | 361 | * scsi.c (scsi_init): Move message here. |
362 | 362 | ||
363 | Mon Jan 30 06:40:25 1995 Eric Youngdale (eric@andante) | 363 | Mon Jan 30 06:40:25 1995 Eric Youngdale (eric@andante) |
364 | 364 | ||
365 | * Linux 1.1.87 released. | 365 | * Linux 1.1.87 released. |
366 | 366 | ||
367 | * sr.c: Photo-cd related changes. (Gerd Knorr??). | 367 | * sr.c: Photo-cd related changes. (Gerd Knorr??). |
368 | 368 | ||
369 | * st.c: Changes from Kai related to EOM detection. | 369 | * st.c: Changes from Kai related to EOM detection. |
370 | 370 | ||
371 | Mon Jan 23 23:53:10 1995 Eric Youngdale (eric@andante) | 371 | Mon Jan 23 23:53:10 1995 Eric Youngdale (eric@andante) |
372 | 372 | ||
373 | * Linux 1.1.86 released. | 373 | * Linux 1.1.86 released. |
374 | 374 | ||
375 | * 53c7,8xx.h: Change SG size to 127. | 375 | * 53c7,8xx.h: Change SG size to 127. |
376 | 376 | ||
377 | * eata_dma: Update to version 2.10i. Remove bug in the registration | 377 | * eata_dma: Update to version 2.10i. Remove bug in the registration |
378 | of multiple HBAs and channels. Minor other improvements and stylistic | 378 | of multiple HBAs and channels. Minor other improvements and stylistic |
379 | changes. | 379 | changes. |
380 | 380 | ||
381 | * scsi.c: Test for Toshiba XM-3401TA and exclude from detection | 381 | * scsi.c: Test for Toshiba XM-3401TA and exclude from detection |
382 | as toshiba drive - photo cd does not work with this drive. | 382 | as toshiba drive - photo cd does not work with this drive. |
383 | 383 | ||
384 | * sr.c: Update photocd code. | 384 | * sr.c: Update photocd code. |
385 | 385 | ||
386 | Mon Jan 23 23:53:10 1995 Eric Youngdale (eric@andante) | 386 | Mon Jan 23 23:53:10 1995 Eric Youngdale (eric@andante) |
387 | 387 | ||
388 | * Linux 1.1.85 released. | 388 | * Linux 1.1.85 released. |
389 | 389 | ||
390 | * st.c, st_ioctl.c, sg.c, sd_ioctl.c, scsi_ioctl.c, hosts.c: | 390 | * st.c, st_ioctl.c, sg.c, sd_ioctl.c, scsi_ioctl.c, hosts.c: |
391 | include linux/mm.h | 391 | include linux/mm.h |
392 | 392 | ||
393 | * qlogic.c, buslogic.c, aha1542.c: Include linux/module.h. | 393 | * qlogic.c, buslogic.c, aha1542.c: Include linux/module.h. |
394 | 394 | ||
395 | Sun Jan 22 22:08:46 1995 Eric Youngdale (eric@andante) | 395 | Sun Jan 22 22:08:46 1995 Eric Youngdale (eric@andante) |
396 | 396 | ||
397 | * Linux 1.1.84 released. | 397 | * Linux 1.1.84 released. |
398 | 398 | ||
399 | * Makefile: Support for loadable QLOGIC boards. | 399 | * Makefile: Support for loadable QLOGIC boards. |
400 | 400 | ||
401 | * aha152x.c: Update to version 1.8 from Juergen. | 401 | * aha152x.c: Update to version 1.8 from Juergen. |
402 | 402 | ||
403 | * eata_dma.c: Update from Michael Neuffer. | 403 | * eata_dma.c: Update from Michael Neuffer. |
404 | Remove hard limit of 2 commands per lun and make it better | 404 | Remove hard limit of 2 commands per lun and make it better |
405 | configurable. Improvements in HBA identification. | 405 | configurable. Improvements in HBA identification. |
406 | 406 | ||
407 | * in2000.c: Fix biosparam to support large disks. | 407 | * in2000.c: Fix biosparam to support large disks. |
408 | 408 | ||
409 | * qlogic.c: Minor changes (change sti -> restore_flags). | 409 | * qlogic.c: Minor changes (change sti -> restore_flags). |
410 | 410 | ||
411 | Wed Jan 18 23:33:09 1995 Eric Youngdale (eric@andante) | 411 | Wed Jan 18 23:33:09 1995 Eric Youngdale (eric@andante) |
412 | 412 | ||
413 | * Linux 1.1.83 released. | 413 | * Linux 1.1.83 released. |
414 | 414 | ||
415 | * aha1542.c(aha1542_intr_handle): Use arguments handed down to find | 415 | * aha1542.c(aha1542_intr_handle): Use arguments handed down to find |
416 | which irq. | 416 | which irq. |
417 | 417 | ||
418 | * buslogic.c: Likewise. | 418 | * buslogic.c: Likewise. |
419 | 419 | ||
420 | * eata_dma.c: Use min of 2 cmd_per_lun for OCS_enabled boards. | 420 | * eata_dma.c: Use min of 2 cmd_per_lun for OCS_enabled boards. |
421 | 421 | ||
422 | * scsi.c: Make RECOVERED_ERROR a SUGGEST_IS_OK. | 422 | * scsi.c: Make RECOVERED_ERROR a SUGGEST_IS_OK. |
423 | 423 | ||
424 | * sd.c: Fail if we are opening a non-existent partition. | 424 | * sd.c: Fail if we are opening a non-existent partition. |
425 | 425 | ||
426 | * sr.c: Bump SR_TIMEOUT to 15000. | 426 | * sr.c: Bump SR_TIMEOUT to 15000. |
427 | Do not probe for media size at boot time(hard on changers). | 427 | Do not probe for media size at boot time(hard on changers). |
428 | Flag device as needing sector size instead. | 428 | Flag device as needing sector size instead. |
429 | 429 | ||
430 | * sr_ioctl.c: Remove CDROMMULTISESSION_SYS ioctl. | 430 | * sr_ioctl.c: Remove CDROMMULTISESSION_SYS ioctl. |
431 | 431 | ||
432 | * ultrastor.c: Fix bug in call to ultrastor_interrupt (wrong #args). | 432 | * ultrastor.c: Fix bug in call to ultrastor_interrupt (wrong #args). |
433 | 433 | ||
434 | Mon Jan 16 07:18:23 1995 Eric Youngdale (eric@andante) | 434 | Mon Jan 16 07:18:23 1995 Eric Youngdale (eric@andante) |
435 | 435 | ||
436 | * Linux 1.1.82 released. | 436 | * Linux 1.1.82 released. |
437 | 437 | ||
438 | Throughout. | 438 | Throughout. |
439 | - Change all interrupt handlers to accept new calling convention. | 439 | - Change all interrupt handlers to accept new calling convention. |
440 | In particular, we now receive the irq number as one of the arguments. | 440 | In particular, we now receive the irq number as one of the arguments. |
441 | 441 | ||
442 | * More minor spelling corrections in some of the new files. | 442 | * More minor spelling corrections in some of the new files. |
443 | 443 | ||
444 | * aha1542.c, buslogic.c: Clean up interrupt handler a little now | 444 | * aha1542.c, buslogic.c: Clean up interrupt handler a little now |
445 | that we receive the irq as an arg. | 445 | that we receive the irq as an arg. |
446 | 446 | ||
447 | * aha274x.c: s/snarf_region/request_region/ | 447 | * aha274x.c: s/snarf_region/request_region/ |
448 | 448 | ||
449 | * eata.c: Update to version 1.12. Fix some comments and display a | 449 | * eata.c: Update to version 1.12. Fix some comments and display a |
450 | message if we cannot reserve the port addresses. | 450 | message if we cannot reserve the port addresses. |
451 | 451 | ||
452 | * u14-34f.c: Update to version 1.13. Fix some comments and display a | 452 | * u14-34f.c: Update to version 1.13. Fix some comments and display a |
453 | message if we cannot reserve the port addresses. | 453 | message if we cannot reserve the port addresses. |
454 | 454 | ||
455 | * eata_dma.c: Define get_board_data function (send INQUIRY command). | 455 | * eata_dma.c: Define get_board_data function (send INQUIRY command). |
456 | Use to improve detection of variants of different DPT boards. Change | 456 | Use to improve detection of variants of different DPT boards. Change |
457 | version subnumber to "0g". | 457 | version subnumber to "0g". |
458 | 458 | ||
459 | * fdomain.c: Update to version 5.26. Improve detection of some boards | 459 | * fdomain.c: Update to version 5.26. Improve detection of some boards |
460 | repackaged by IBM. | 460 | repackaged by IBM. |
461 | 461 | ||
462 | * scsi.c (scsi_register_host): Change "name" to const char *. | 462 | * scsi.c (scsi_register_host): Change "name" to const char *. |
463 | 463 | ||
464 | * sr.c: Fix problem in set mode command for Toshiba drives. | 464 | * sr.c: Fix problem in set mode command for Toshiba drives. |
465 | 465 | ||
466 | * sr.c: Fix typo from patch 81. | 466 | * sr.c: Fix typo from patch 81. |
467 | 467 | ||
468 | Fri Jan 13 12:54:46 1995 Eric Youngdale (eric@andante) | 468 | Fri Jan 13 12:54:46 1995 Eric Youngdale (eric@andante) |
469 | 469 | ||
470 | * Linux 1.1.81 released. Codefreeze for 1.2 release announced. | 470 | * Linux 1.1.81 released. Codefreeze for 1.2 release announced. |
471 | 471 | ||
472 | Big changes here. | 472 | Big changes here. |
473 | 473 | ||
474 | * eata_dma.*: New files from Michael Neuffer. | 474 | * eata_dma.*: New files from Michael Neuffer. |
475 | (neuffer@goofy.zdv.uni-mainz.de). Should support | 475 | (neuffer@goofy.zdv.uni-mainz.de). Should support |
476 | all eata/dpt cards. | 476 | all eata/dpt cards. |
477 | 477 | ||
478 | * hosts.c, Makefile: Add eata_dma. | 478 | * hosts.c, Makefile: Add eata_dma. |
479 | 479 | ||
480 | * README.st: Document MTEOM. | 480 | * README.st: Document MTEOM. |
481 | 481 | ||
482 | Patches from me (ERY) to finish support for low-level loadable scsi. | 482 | Patches from me (ERY) to finish support for low-level loadable scsi. |
483 | It now works, and is actually useful. | 483 | It now works, and is actually useful. |
484 | 484 | ||
485 | * Throughout - add new argument to scsi_init_malloc that takes an | 485 | * Throughout - add new argument to scsi_init_malloc that takes an |
486 | additional parameter. This is used as a priority to kmalloc, | 486 | additional parameter. This is used as a priority to kmalloc, |
487 | and you can specify the GFP_DMA flag if you need DMA-able memory. | 487 | and you can specify the GFP_DMA flag if you need DMA-able memory. |
488 | 488 | ||
489 | * Makefile: For source files that are loadable, always add name | 489 | * Makefile: For source files that are loadable, always add name |
490 | to SCSI_SRCS. Fill in modules: target. | 490 | to SCSI_SRCS. Fill in modules: target. |
491 | 491 | ||
492 | * hosts.c: Change next_host to next_scsi_host, and make global. | 492 | * hosts.c: Change next_host to next_scsi_host, and make global. |
493 | Print hosts after we have identified all of them. Use info() | 493 | Print hosts after we have identified all of them. Use info() |
494 | function if present, otherwise use name field. | 494 | function if present, otherwise use name field. |
495 | 495 | ||
496 | * hosts.h: Change attach function to return int, not void. | 496 | * hosts.h: Change attach function to return int, not void. |
497 | Define number of device slots to allow for loadable devices. | 497 | Define number of device slots to allow for loadable devices. |
498 | Define tags to tell scsi module code what type of module we | 498 | Define tags to tell scsi module code what type of module we |
499 | are loading. | 499 | are loading. |
500 | 500 | ||
501 | * scsi.c: Fix scan_scsis so that it can be run by a user process. | 501 | * scsi.c: Fix scan_scsis so that it can be run by a user process. |
502 | Do not use waiting loops - use up and down mechanism as long | 502 | Do not use waiting loops - use up and down mechanism as long |
503 | as current != task[0]. | 503 | as current != task[0]. |
504 | 504 | ||
505 | * scsi.c(scan_scsis): Do not use stack variables for I/O - this | 505 | * scsi.c(scan_scsis): Do not use stack variables for I/O - this |
506 | could be > 16Mb if we are loading a module at runtime (i.e. use | 506 | could be > 16Mb if we are loading a module at runtime (i.e. use |
507 | scsi_init_malloc to get some memory we know will be safe). | 507 | scsi_init_malloc to get some memory we know will be safe). |
508 | 508 | ||
509 | * scsi.c: Change dma freelist to be a set of pages. This allows | 509 | * scsi.c: Change dma freelist to be a set of pages. This allows |
510 | us to dynamically adjust the size of the list by adding more pages | 510 | us to dynamically adjust the size of the list by adding more pages |
511 | to the pagelist. Fix scsi_malloc and scsi_free accordingly. | 511 | to the pagelist. Fix scsi_malloc and scsi_free accordingly. |
512 | 512 | ||
513 | * scsi_module.c: Fix include. | 513 | * scsi_module.c: Fix include. |
514 | 514 | ||
515 | * sd.c: Declare detach function. Increment/decrement module usage | 515 | * sd.c: Declare detach function. Increment/decrement module usage |
516 | count as required. Fix init functions to allow loaded devices. | 516 | count as required. Fix init functions to allow loaded devices. |
517 | Revalidate all new disks so we get the partition tables. Define | 517 | Revalidate all new disks so we get the partition tables. Define |
518 | detach function. | 518 | detach function. |
519 | 519 | ||
520 | * sr.c: Likewise. | 520 | * sr.c: Likewise. |
521 | 521 | ||
522 | * sg.c: Declare detach function. Allow attachment of devices on | 522 | * sg.c: Declare detach function. Allow attachment of devices on |
523 | loaded drivers. | 523 | loaded drivers. |
524 | 524 | ||
525 | * st.c: Declare detach function. Increment/decrement module usage | 525 | * st.c: Declare detach function. Increment/decrement module usage |
526 | count as required. | 526 | count as required. |
527 | 527 | ||
528 | Tue Jan 10 10:09:58 1995 Eric Youngdale (eric@andante) | 528 | Tue Jan 10 10:09:58 1995 Eric Youngdale (eric@andante) |
529 | 529 | ||
530 | * Linux 1.1.79 released. | 530 | * Linux 1.1.79 released. |
531 | 531 | ||
532 | Patch from some undetermined individual who needs to get a life :-). | 532 | Patch from some undetermined individual who needs to get a life :-). |
533 | 533 | ||
534 | * sr.c: Attacked by spelling bee... | 534 | * sr.c: Attacked by spelling bee... |
535 | 535 | ||
536 | Patches from Gerd Knorr: | 536 | Patches from Gerd Knorr: |
537 | 537 | ||
538 | * sr.c: make printk messages for photoCD a little more informative. | 538 | * sr.c: make printk messages for photoCD a little more informative. |
539 | 539 | ||
540 | * sr_ioctl.c: Fix CDROMMULTISESSION_SYS ioctl. | 540 | * sr_ioctl.c: Fix CDROMMULTISESSION_SYS ioctl. |
541 | 541 | ||
542 | Mon Jan 9 10:01:37 1995 Eric Youngdale (eric@andante) | 542 | Mon Jan 9 10:01:37 1995 Eric Youngdale (eric@andante) |
543 | 543 | ||
544 | * Linux 1.1.78 released. | 544 | * Linux 1.1.78 released. |
545 | 545 | ||
546 | * Makefile: Add empty modules: target. | 546 | * Makefile: Add empty modules: target. |
547 | 547 | ||
548 | * Wheee. Now change register_iomem to request_region. | 548 | * Wheee. Now change register_iomem to request_region. |
549 | 549 | ||
550 | * in2000.c: Bugfix - apparently this is the fix that we have | 550 | * in2000.c: Bugfix - apparently this is the fix that we have |
551 | all been waiting for. It fixes a problem whereby the driver | 551 | all been waiting for. It fixes a problem whereby the driver |
552 | is not stable under heavy load. Race condition and all that. | 552 | is not stable under heavy load. Race condition and all that. |
553 | Patch from Peter Lu. | 553 | Patch from Peter Lu. |
554 | 554 | ||
555 | Wed Jan 4 21:17:40 1995 Eric Youngdale (eric@andante) | 555 | Wed Jan 4 21:17:40 1995 Eric Youngdale (eric@andante) |
556 | 556 | ||
557 | * Linux 1.1.77 released. | 557 | * Linux 1.1.77 released. |
558 | 558 | ||
559 | * 53c7,8xx.c: Fix from Linus - emulate splx. | 559 | * 53c7,8xx.c: Fix from Linus - emulate splx. |
560 | 560 | ||
561 | Throughout: | 561 | Throughout: |
562 | 562 | ||
563 | Change "snarf_region" with "register_iomem". | 563 | Change "snarf_region" with "register_iomem". |
564 | 564 | ||
565 | * scsi_module.c: New file. Contains support for low-level loadable | 565 | * scsi_module.c: New file. Contains support for low-level loadable |
566 | scsi drivers. [ERY]. | 566 | scsi drivers. [ERY]. |
567 | 567 | ||
568 | * sd.c: More s/int/long/ changes. | 568 | * sd.c: More s/int/long/ changes. |
569 | 569 | ||
570 | * seagate.c: Explicitly include linux/config.h | 570 | * seagate.c: Explicitly include linux/config.h |
571 | 571 | ||
572 | * sg.c: Increment/decrement module usage count on open/close. | 572 | * sg.c: Increment/decrement module usage count on open/close. |
573 | 573 | ||
574 | * sg.c: Be a bit more careful about the user not supplying enough | 574 | * sg.c: Be a bit more careful about the user not supplying enough |
575 | information for a valid command. Pass correct size down to | 575 | information for a valid command. Pass correct size down to |
576 | scsi_do_cmd. | 576 | scsi_do_cmd. |
577 | 577 | ||
578 | * sr.c: More changes for Photo-CD. This apparently breaks NEC drives. | 578 | * sr.c: More changes for Photo-CD. This apparently breaks NEC drives. |
579 | 579 | ||
580 | * sr_ioctl.c: Support CDROMMULTISESSION ioctl. | 580 | * sr_ioctl.c: Support CDROMMULTISESSION ioctl. |
581 | 581 | ||
582 | 582 | ||
583 | Sun Jan 1 19:55:21 1995 Eric Youngdale (eric@andante) | 583 | Sun Jan 1 19:55:21 1995 Eric Youngdale (eric@andante) |
584 | 584 | ||
585 | * Linux 1.1.76 released. | 585 | * Linux 1.1.76 released. |
586 | 586 | ||
587 | * constants.c: Add type cast in switch statement. | 587 | * constants.c: Add type cast in switch statement. |
588 | 588 | ||
589 | * scsi.c (scsi_free): Change datatype of "offset" to long. | 589 | * scsi.c (scsi_free): Change datatype of "offset" to long. |
590 | (scsi_malloc): Change a few more variables to long. Who | 590 | (scsi_malloc): Change a few more variables to long. Who |
591 | did this and why was it important? 64 bit machines? | 591 | did this and why was it important? 64 bit machines? |
592 | 592 | ||
593 | 593 | ||
594 | Lots of changes to use save_state/restore_state instead of cli/sti. | 594 | Lots of changes to use save_state/restore_state instead of cli/sti. |
595 | Files changed include: | 595 | Files changed include: |
596 | 596 | ||
597 | * aha1542.c: | 597 | * aha1542.c: |
598 | * aha1740.c: | 598 | * aha1740.c: |
599 | * buslogic.c: | 599 | * buslogic.c: |
600 | * in2000.c: | 600 | * in2000.c: |
601 | * scsi.c: | 601 | * scsi.c: |
602 | * scsi_debug.c: | 602 | * scsi_debug.c: |
603 | * sd.c: | 603 | * sd.c: |
604 | * sr.c: | 604 | * sr.c: |
605 | * st.c: | 605 | * st.c: |
606 | 606 | ||
607 | Wed Dec 28 16:38:29 1994 Eric Youngdale (eric@andante) | 607 | Wed Dec 28 16:38:29 1994 Eric Youngdale (eric@andante) |
608 | 608 | ||
609 | * Linux 1.1.75 released. | 609 | * Linux 1.1.75 released. |
610 | 610 | ||
611 | * buslogic.c: Spelling fix. | 611 | * buslogic.c: Spelling fix. |
612 | 612 | ||
613 | * scsi.c: Add HP C1790A and C2500A scanjet to blacklist. | 613 | * scsi.c: Add HP C1790A and C2500A scanjet to blacklist. |
614 | 614 | ||
615 | * scsi.c: Spelling fixup. | 615 | * scsi.c: Spelling fixup. |
616 | 616 | ||
617 | * sd.c: Add support for sd_hardsizes (hard sector sizes). | 617 | * sd.c: Add support for sd_hardsizes (hard sector sizes). |
618 | 618 | ||
619 | * ultrastor.c: Use save_flags/restore_flags instead of cli/sti. | 619 | * ultrastor.c: Use save_flags/restore_flags instead of cli/sti. |
620 | 620 | ||
621 | Fri Dec 23 13:36:25 1994 Eric Youngdale (eric@andante) | 621 | Fri Dec 23 13:36:25 1994 Eric Youngdale (eric@andante) |
622 | 622 | ||
623 | * Linux 1.1.74 released. | 623 | * Linux 1.1.74 released. |
624 | 624 | ||
625 | * README.st: Update from Kai Makisara. | 625 | * README.st: Update from Kai Makisara. |
626 | 626 | ||
627 | * eata.c: New version from Dario - version 1.11. | 627 | * eata.c: New version from Dario - version 1.11. |
628 | use scsicam bios_param routine. Add support for 2011 | 628 | use scsicam bios_param routine. Add support for 2011 |
629 | and 2021 boards. | 629 | and 2021 boards. |
630 | 630 | ||
631 | * hosts.c: Add support for blocking. Linked list automatically | 631 | * hosts.c: Add support for blocking. Linked list automatically |
632 | generated when shpnt->block is set. | 632 | generated when shpnt->block is set. |
633 | 633 | ||
634 | * scsi.c: Add sankyo & HP scanjet to blacklist. Add support for | 634 | * scsi.c: Add sankyo & HP scanjet to blacklist. Add support for |
635 | kicking things loose when we deadlock. | 635 | kicking things loose when we deadlock. |
636 | 636 | ||
637 | * scsi.c: Recognize scanners and processors in scan_scsis. | 637 | * scsi.c: Recognize scanners and processors in scan_scsis. |
638 | 638 | ||
639 | * scsi_ioctl.h: Increase timeout to 9 seconds. | 639 | * scsi_ioctl.h: Increase timeout to 9 seconds. |
640 | 640 | ||
641 | * st.c: New version from Kai - add better support for backspace. | 641 | * st.c: New version from Kai - add better support for backspace. |
642 | 642 | ||
643 | * u14-34f.c: New version from Dario. Supports blocking. | 643 | * u14-34f.c: New version from Dario. Supports blocking. |
644 | 644 | ||
645 | Wed Dec 14 14:46:30 1994 Eric Youngdale (eric@andante) | 645 | Wed Dec 14 14:46:30 1994 Eric Youngdale (eric@andante) |
646 | 646 | ||
647 | * Linux 1.1.73 released. | 647 | * Linux 1.1.73 released. |
648 | 648 | ||
649 | * buslogic.c: Update from Dave Gentzel. Version 1.14. | 649 | * buslogic.c: Update from Dave Gentzel. Version 1.14. |
650 | Add module related stuff. More fault tolerant if out of | 650 | Add module related stuff. More fault tolerant if out of |
651 | DMA memory. | 651 | DMA memory. |
652 | 652 | ||
653 | * fdomain.c: New version from Rik Faith - version 5.22. Add support | 653 | * fdomain.c: New version from Rik Faith - version 5.22. Add support |
654 | for ISA-200S SCSI adapter. | 654 | for ISA-200S SCSI adapter. |
655 | 655 | ||
656 | * hosts.c: Spelling. | 656 | * hosts.c: Spelling. |
657 | 657 | ||
658 | * qlogic.c: Update to version 0.38a. Add more support for PCMCIA. | 658 | * qlogic.c: Update to version 0.38a. Add more support for PCMCIA. |
659 | 659 | ||
660 | * scsi.c: Mask device type with 0x1f during scan_scsis. | 660 | * scsi.c: Mask device type with 0x1f during scan_scsis. |
661 | Add support for deadlocking, err, make that getting out of | 661 | Add support for deadlocking, err, make that getting out of |
662 | deadlock situations that are created when we allow the user | 662 | deadlock situations that are created when we allow the user |
663 | to limit requests to one host adapter at a time. | 663 | to limit requests to one host adapter at a time. |
664 | 664 | ||
665 | * scsi.c: Bugfix - pass pid, not SCpnt as second arg to | 665 | * scsi.c: Bugfix - pass pid, not SCpnt as second arg to |
666 | scsi_times_out. | 666 | scsi_times_out. |
667 | 667 | ||
668 | * scsi.c: Restore interrupt state to previous value instead of using | 668 | * scsi.c: Restore interrupt state to previous value instead of using |
669 | cli/sti pairs. | 669 | cli/sti pairs. |
670 | 670 | ||
671 | * scsi.c: Add a bunch of module stuff (all commented out for now). | 671 | * scsi.c: Add a bunch of module stuff (all commented out for now). |
672 | 672 | ||
673 | * scsi.c: Clean up scsi_dump_status. | 673 | * scsi.c: Clean up scsi_dump_status. |
674 | 674 | ||
675 | Tue Dec 6 12:34:20 1994 Eric Youngdale (eric@andante) | 675 | Tue Dec 6 12:34:20 1994 Eric Youngdale (eric@andante) |
676 | 676 | ||
677 | * Linux 1.1.72 released. | 677 | * Linux 1.1.72 released. |
678 | 678 | ||
679 | * sg.c: Bugfix - always use sg_free, since we might have big buff. | 679 | * sg.c: Bugfix - always use sg_free, since we might have big buff. |
680 | 680 | ||
681 | Fri Dec 2 11:24:53 1994 Eric Youngdale (eric@andante) | 681 | Fri Dec 2 11:24:53 1994 Eric Youngdale (eric@andante) |
682 | 682 | ||
683 | * Linux 1.1.71 released. | 683 | * Linux 1.1.71 released. |
684 | 684 | ||
685 | * sg.c: Clear buff field when not in use. Only call scsi_free if | 685 | * sg.c: Clear buff field when not in use. Only call scsi_free if |
686 | non-null. | 686 | non-null. |
687 | 687 | ||
688 | * scsi.h: Call wake_up(&wait_for_request) when done with a | 688 | * scsi.h: Call wake_up(&wait_for_request) when done with a |
689 | command. | 689 | command. |
690 | 690 | ||
691 | * scsi.c (scsi_times_out): Pass pid down so that we can protect | 691 | * scsi.c (scsi_times_out): Pass pid down so that we can protect |
692 | against race conditions. | 692 | against race conditions. |
693 | 693 | ||
694 | * scsi.c (scsi_abort): Zero timeout field if we get the | 694 | * scsi.c (scsi_abort): Zero timeout field if we get the |
695 | NOT_RUNNING message back from low-level driver. | 695 | NOT_RUNNING message back from low-level driver. |
696 | 696 | ||
697 | 697 | ||
698 | * scsi.c (scsi_done): Restore cmd_len, use_sg here. | 698 | * scsi.c (scsi_done): Restore cmd_len, use_sg here. |
699 | 699 | ||
700 | * scsi.c (request_sense): Not here. | 700 | * scsi.c (request_sense): Not here. |
701 | 701 | ||
702 | * hosts.h: Add new forbidden_addr, forbidden_size fields. Who | 702 | * hosts.h: Add new forbidden_addr, forbidden_size fields. Who |
703 | added these and why???? | 703 | added these and why???? |
704 | 704 | ||
705 | * hosts.c (scsi_mem_init): Mark pages as reserved if they fall in | 705 | * hosts.c (scsi_mem_init): Mark pages as reserved if they fall in |
706 | the forbidden regions. I am not sure - I think this is so that | 706 | the forbidden regions. I am not sure - I think this is so that |
707 | we can deal with boards that do incomplete decoding of their | 707 | we can deal with boards that do incomplete decoding of their |
708 | address lines for the bios chips, but I am not entirely sure. | 708 | address lines for the bios chips, but I am not entirely sure. |
709 | 709 | ||
710 | * buslogic.c: Set forbidden_addr stuff if using a buggy board. | 710 | * buslogic.c: Set forbidden_addr stuff if using a buggy board. |
711 | 711 | ||
712 | * aha1740.c: Test for NULL pointer in SCtmp. This should not | 712 | * aha1740.c: Test for NULL pointer in SCtmp. This should not |
713 | occur, but a nice message is better than a kernel segfault. | 713 | occur, but a nice message is better than a kernel segfault. |
714 | 714 | ||
715 | * 53c7,8xx.c: Add new PCI chip ID for 815. | 715 | * 53c7,8xx.c: Add new PCI chip ID for 815. |
716 | 716 | ||
717 | Fri Dec 2 11:24:53 1994 Eric Youngdale (eric@andante) | 717 | Fri Dec 2 11:24:53 1994 Eric Youngdale (eric@andante) |
718 | 718 | ||
719 | * Linux 1.1.70 released. | 719 | * Linux 1.1.70 released. |
720 | 720 | ||
721 | * ChangeLog, st.c: Spelling. | 721 | * ChangeLog, st.c: Spelling. |
722 | 722 | ||
723 | Tue Nov 29 18:48:42 1994 Eric Youngdale (eric@andante) | 723 | Tue Nov 29 18:48:42 1994 Eric Youngdale (eric@andante) |
724 | 724 | ||
725 | * Linux 1.1.69 released. | 725 | * Linux 1.1.69 released. |
726 | 726 | ||
727 | * u14-34f.h: Non-functional change. [Dario]. | 727 | * u14-34f.h: Non-functional change. [Dario]. |
728 | 728 | ||
729 | * u14-34f.c: Use block field in Scsi_Host to prevent commands from | 729 | * u14-34f.c: Use block field in Scsi_Host to prevent commands from |
730 | being queued to more than one host at the same time (used when | 730 | being queued to more than one host at the same time (used when |
731 | motherboard does not deal with multiple bus-masters very well). | 731 | motherboard does not deal with multiple bus-masters very well). |
732 | Only when SINGLE_HOST_OPERATIONS is defined. | 732 | Only when SINGLE_HOST_OPERATIONS is defined. |
733 | Use new cmd_per_lun field. [Dario] | 733 | Use new cmd_per_lun field. [Dario] |
734 | 734 | ||
735 | * eata.c: Likewise. | 735 | * eata.c: Likewise. |
736 | 736 | ||
737 | * st.c: More changes from Kai. Add ready flag to indicate drive | 737 | * st.c: More changes from Kai. Add ready flag to indicate drive |
738 | status. | 738 | status. |
739 | 739 | ||
740 | * README.st: Document this. | 740 | * README.st: Document this. |
741 | 741 | ||
742 | * sr.c: Bugfix (do not subtract CD_BLOCK_OFFSET) for photo-cd | 742 | * sr.c: Bugfix (do not subtract CD_BLOCK_OFFSET) for photo-cd |
743 | code. | 743 | code. |
744 | 744 | ||
745 | * sg.c: Bugfix - fix problem where opcode is not correctly set up. | 745 | * sg.c: Bugfix - fix problem where opcode is not correctly set up. |
746 | 746 | ||
747 | * seagate.[c,h]: Use #defines to set driver name. | 747 | * seagate.[c,h]: Use #defines to set driver name. |
748 | 748 | ||
749 | * scsi_ioctl.c: Zero buffer before executing command. | 749 | * scsi_ioctl.c: Zero buffer before executing command. |
750 | 750 | ||
751 | * scsi.c: Use new cmd_per_lun field in Scsi_Hosts as appropriate. | 751 | * scsi.c: Use new cmd_per_lun field in Scsi_Hosts as appropriate. |
752 | Add Sony CDU55S to blacklist. | 752 | Add Sony CDU55S to blacklist. |
753 | 753 | ||
754 | * hosts.h: Add new cmd_per_lun field to Scsi_Hosts. | 754 | * hosts.h: Add new cmd_per_lun field to Scsi_Hosts. |
755 | 755 | ||
756 | * hosts.c: Initialize cmd_per_lun in Scsi_Hosts from template. | 756 | * hosts.c: Initialize cmd_per_lun in Scsi_Hosts from template. |
757 | 757 | ||
758 | * buslogic.c: Use cmd_per_lun field - initialize to different | 758 | * buslogic.c: Use cmd_per_lun field - initialize to different |
759 | values depending upon bus type (i.e. use 1 if ISA, so we do not | 759 | values depending upon bus type (i.e. use 1 if ISA, so we do not |
760 | hog memory). Use other patches which got lost from 1.1.68. | 760 | hog memory). Use other patches which got lost from 1.1.68. |
761 | 761 | ||
762 | * aha1542.c: Spelling. | 762 | * aha1542.c: Spelling. |
763 | 763 | ||
764 | Tue Nov 29 15:43:50 1994 Eric Youngdale (eric@andante.aib.com) | 764 | Tue Nov 29 15:43:50 1994 Eric Youngdale (eric@andante.aib.com) |
765 | 765 | ||
766 | * Linux 1.1.68 released. | 766 | * Linux 1.1.68 released. |
767 | 767 | ||
768 | Add support for 12 byte vendor specific commands in scsi-generics, | 768 | Add support for 12 byte vendor specific commands in scsi-generics, |
769 | more (i.e. the last mandatory) low-level changes to support | 769 | more (i.e. the last mandatory) low-level changes to support |
770 | loadable modules, plus a few other changes people have requested | 770 | loadable modules, plus a few other changes people have requested |
771 | lately. Changes by me (ERY) unless otherwise noted. Spelling | 771 | lately. Changes by me (ERY) unless otherwise noted. Spelling |
772 | changes appear from some unknown corner of the universe. | 772 | changes appear from some unknown corner of the universe. |
773 | 773 | ||
774 | * Throughout: Change COMMAND_SIZE() to use SCpnt->cmd_len. | 774 | * Throughout: Change COMMAND_SIZE() to use SCpnt->cmd_len. |
775 | 775 | ||
776 | * Throughout: Change info() low level function to take a Scsi_Host | 776 | * Throughout: Change info() low level function to take a Scsi_Host |
777 | pointer. This way the info function can return specific | 777 | pointer. This way the info function can return specific |
778 | information about the host in question, if desired. | 778 | information about the host in question, if desired. |
779 | 779 | ||
780 | * All low-level drivers: Add NULL in initializer for the | 780 | * All low-level drivers: Add NULL in initializer for the |
781 | usage_count field added to Scsi_Host_Template. | 781 | usage_count field added to Scsi_Host_Template. |
782 | 782 | ||
783 | * aha152x.[c,h]: Remove redundant info() function. | 783 | * aha152x.[c,h]: Remove redundant info() function. |
784 | 784 | ||
785 | * aha1542.[c,h]: Likewise. | 785 | * aha1542.[c,h]: Likewise. |
786 | 786 | ||
787 | * aha1740.[c,h]: Likewise. | 787 | * aha1740.[c,h]: Likewise. |
788 | 788 | ||
789 | * aha274x.[c,h]: Likewise. | 789 | * aha274x.[c,h]: Likewise. |
790 | 790 | ||
791 | * eata.[c,h]: Likewise. | 791 | * eata.[c,h]: Likewise. |
792 | 792 | ||
793 | * pas16.[c,h]: Likewise. | 793 | * pas16.[c,h]: Likewise. |
794 | 794 | ||
795 | * scsi_debug.[c,h]: Likewise. | 795 | * scsi_debug.[c,h]: Likewise. |
796 | 796 | ||
797 | * t128.[c,h]: Likewise. | 797 | * t128.[c,h]: Likewise. |
798 | 798 | ||
799 | * u14-34f.[c,h]: Likewise. | 799 | * u14-34f.[c,h]: Likewise. |
800 | 800 | ||
801 | * ultrastor.[c,h]: Likewise. | 801 | * ultrastor.[c,h]: Likewise. |
802 | 802 | ||
803 | * wd7000.[c,h]: Likewise. | 803 | * wd7000.[c,h]: Likewise. |
804 | 804 | ||
805 | * aha1542.c: Add support for command line options with lilo to set | 805 | * aha1542.c: Add support for command line options with lilo to set |
806 | DMA parameters, I/O port. From Matt Aarnio. | 806 | DMA parameters, I/O port. From Matt Aarnio. |
807 | 807 | ||
808 | * buslogic.[c,h]: New version (1.13) from Dave Gentzel. | 808 | * buslogic.[c,h]: New version (1.13) from Dave Gentzel. |
809 | 809 | ||
810 | * hosts.h: Add new field to Scsi_Hosts "block" to allow blocking | 810 | * hosts.h: Add new field to Scsi_Hosts "block" to allow blocking |
811 | all I/O to certain other cards. Helps prevent problems with some | 811 | all I/O to certain other cards. Helps prevent problems with some |
812 | ISA motherboards. | 812 | ISA motherboards. |
813 | 813 | ||
814 | * hosts.h: Add usage_count to Scsi_Host_Template. | 814 | * hosts.h: Add usage_count to Scsi_Host_Template. |
815 | 815 | ||
816 | * hosts.h: Add n_io_port to Scsi_Host (used when releasing module). | 816 | * hosts.h: Add n_io_port to Scsi_Host (used when releasing module). |
817 | 817 | ||
818 | * hosts.c: Initialize block field. | 818 | * hosts.c: Initialize block field. |
819 | 819 | ||
820 | * in2000.c: Remove "static" declarations from exported functions. | 820 | * in2000.c: Remove "static" declarations from exported functions. |
821 | 821 | ||
822 | * in2000.h: Likewise. | 822 | * in2000.h: Likewise. |
823 | 823 | ||
824 | * scsi.c: Correctly set cmd_len field as required. Save and | 824 | * scsi.c: Correctly set cmd_len field as required. Save and |
825 | change setting when doing a request_sense, restore when done. | 825 | change setting when doing a request_sense, restore when done. |
826 | Move abort timeout message. Fix panic in request_queueable to | 826 | Move abort timeout message. Fix panic in request_queueable to |
827 | print correct function name. | 827 | print correct function name. |
828 | 828 | ||
829 | * scsi.c: When incrementing usage count, walk block linked list | 829 | * scsi.c: When incrementing usage count, walk block linked list |
830 | for host, and or in SCSI_HOST_BLOCK bit. When decrementing usage | 830 | for host, and or in SCSI_HOST_BLOCK bit. When decrementing usage |
831 | count to 0, clear this bit to allow usage to continue, wake up | 831 | count to 0, clear this bit to allow usage to continue, wake up |
832 | processes waiting. | 832 | processes waiting. |
833 | 833 | ||
834 | 834 | ||
835 | * scsi_ioctl.c: If we have an info() function, call it, otherwise | 835 | * scsi_ioctl.c: If we have an info() function, call it, otherwise |
836 | if we have a "name" field, use it, else do nothing. | 836 | if we have a "name" field, use it, else do nothing. |
837 | 837 | ||
838 | * sd.c, sr.c: Clear cmd_len field prior to each command we | 838 | * sd.c, sr.c: Clear cmd_len field prior to each command we |
839 | generate. | 839 | generate. |
840 | 840 | ||
841 | * sd.h: Add "has_part_table" bit to rscsi_disks. | 841 | * sd.h: Add "has_part_table" bit to rscsi_disks. |
842 | 842 | ||
843 | * sg.[c,h]: Add support for vendor specific 12 byte commands (i.e. | 843 | * sg.[c,h]: Add support for vendor specific 12 byte commands (i.e. |
844 | override command length in COMMAND_SIZE). | 844 | override command length in COMMAND_SIZE). |
845 | 845 | ||
846 | * sr.c: Bugfix from Gerd in photocd code. | 846 | * sr.c: Bugfix from Gerd in photocd code. |
847 | 847 | ||
848 | * sr.c: Bugfix in get_sectorsize - always use scsi_malloc buffer - | 848 | * sr.c: Bugfix in get_sectorsize - always use scsi_malloc buffer - |
849 | we cannot guarantee that the stack is < 16Mb. | 849 | we cannot guarantee that the stack is < 16Mb. |
850 | 850 | ||
851 | Tue Nov 22 15:40:46 1994 Eric Youngdale (eric@andante.aib.com) | 851 | Tue Nov 22 15:40:46 1994 Eric Youngdale (eric@andante.aib.com) |
852 | 852 | ||
853 | * Linux 1.1.67 released. | 853 | * Linux 1.1.67 released. |
854 | 854 | ||
855 | * sr.c: Change spelling of manufactor to manufacturer. | 855 | * sr.c: Change spelling of manufactor to manufacturer. |
856 | 856 | ||
857 | * scsi.h: Likewise. | 857 | * scsi.h: Likewise. |
858 | 858 | ||
859 | * scsi.c: Likewise. | 859 | * scsi.c: Likewise. |
860 | 860 | ||
861 | * qlogic.c: Spelling corrections. | 861 | * qlogic.c: Spelling corrections. |
862 | 862 | ||
863 | * in2000.h: Spelling corrections. | 863 | * in2000.h: Spelling corrections. |
864 | 864 | ||
865 | * in2000.c: Update from Bill Earnest, change from | 865 | * in2000.c: Update from Bill Earnest, change from |
866 | jshiffle@netcom.com. Support new bios versions. | 866 | jshiffle@netcom.com. Support new bios versions. |
867 | 867 | ||
868 | * README.qlogic: Spelling correction. | 868 | * README.qlogic: Spelling correction. |
869 | 869 | ||
870 | Tue Nov 22 15:40:46 1994 Eric Youngdale (eric@andante.aib.com) | 870 | Tue Nov 22 15:40:46 1994 Eric Youngdale (eric@andante.aib.com) |
871 | 871 | ||
872 | * Linux 1.1.66 released. | 872 | * Linux 1.1.66 released. |
873 | 873 | ||
874 | * u14-34f.c: Spelling corrections. | 874 | * u14-34f.c: Spelling corrections. |
875 | 875 | ||
876 | * sr.[h,c]: Add support for multi-session CDs from Gerd Knorr. | 876 | * sr.[h,c]: Add support for multi-session CDs from Gerd Knorr. |
877 | 877 | ||
878 | * scsi.h: Add manufactor field for keeping track of device | 878 | * scsi.h: Add manufactor field for keeping track of device |
879 | manufacturer. | 879 | manufacturer. |
880 | 880 | ||
881 | * scsi.c: More spelling corrections. | 881 | * scsi.c: More spelling corrections. |
882 | 882 | ||
883 | * qlogic.h, qlogic.c, README.qlogic: New driver from Tom Zerucha. | 883 | * qlogic.h, qlogic.c, README.qlogic: New driver from Tom Zerucha. |
884 | 884 | ||
885 | * in2000.c, in2000.h: New driver from Brad McLean/Bill Earnest. | 885 | * in2000.c, in2000.h: New driver from Brad McLean/Bill Earnest. |
886 | 886 | ||
887 | * fdomain.c: Spelling correction. | 887 | * fdomain.c: Spelling correction. |
888 | 888 | ||
889 | * eata.c: Spelling correction. | 889 | * eata.c: Spelling correction. |
890 | 890 | ||
891 | Fri Nov 18 15:22:44 1994 Eric Youngdale (eric@andante.aib.com) | 891 | Fri Nov 18 15:22:44 1994 Eric Youngdale (eric@andante.aib.com) |
892 | 892 | ||
893 | * Linux 1.1.65 released. | 893 | * Linux 1.1.65 released. |
894 | 894 | ||
895 | * eata.h: Update version string to 1.08.00. | 895 | * eata.h: Update version string to 1.08.00. |
896 | 896 | ||
897 | * eata.c: Set sg_tablesize correctly for DPT PM2012 boards. | 897 | * eata.c: Set sg_tablesize correctly for DPT PM2012 boards. |
898 | 898 | ||
899 | * aha274x.seq: Spell checking. | 899 | * aha274x.seq: Spell checking. |
900 | 900 | ||
901 | * README.st: Likewise. | 901 | * README.st: Likewise. |
902 | 902 | ||
903 | * README.aha274x: Likewise. | 903 | * README.aha274x: Likewise. |
904 | 904 | ||
905 | * ChangeLog: Likewise. | 905 | * ChangeLog: Likewise. |
906 | 906 | ||
907 | Tue Nov 15 15:35:08 1994 Eric Youngdale (eric@andante.aib.com) | 907 | Tue Nov 15 15:35:08 1994 Eric Youngdale (eric@andante.aib.com) |
908 | 908 | ||
909 | * Linux 1.1.64 released. | 909 | * Linux 1.1.64 released. |
910 | 910 | ||
911 | * u14-34f.h: Update version number to 1.10.01. | 911 | * u14-34f.h: Update version number to 1.10.01. |
912 | 912 | ||
913 | * u14-34f.c: Use Scsi_Host can_queue variable instead of one from template. | 913 | * u14-34f.c: Use Scsi_Host can_queue variable instead of one from template. |
914 | 914 | ||
915 | * eata.[c,h]: New driver for DPT boards from Dario Ballabio. | 915 | * eata.[c,h]: New driver for DPT boards from Dario Ballabio. |
916 | 916 | ||
917 | * buslogic.c: Use can_queue field. | 917 | * buslogic.c: Use can_queue field. |
918 | 918 | ||
919 | Wed Nov 30 12:09:09 1994 Eric Youngdale (eric@andante.aib.com) | 919 | Wed Nov 30 12:09:09 1994 Eric Youngdale (eric@andante.aib.com) |
920 | 920 | ||
921 | * Linux 1.1.63 released. | 921 | * Linux 1.1.63 released. |
922 | 922 | ||
923 | * sd.c: Give I/O error if we attempt 512 byte I/O to a disk with | 923 | * sd.c: Give I/O error if we attempt 512 byte I/O to a disk with |
924 | 1024 byte sectors. | 924 | 1024 byte sectors. |
925 | 925 | ||
926 | * scsicam.c: Make sure we do read from whole disk (mask off | 926 | * scsicam.c: Make sure we do read from whole disk (mask off |
927 | partition). | 927 | partition). |
928 | 928 | ||
929 | * scsi.c: Use can_queue in Scsi_Host structure. | 929 | * scsi.c: Use can_queue in Scsi_Host structure. |
930 | Fix panic message about invalid host. | 930 | Fix panic message about invalid host. |
931 | 931 | ||
932 | * hosts.c: Initialize can_queue from template. | 932 | * hosts.c: Initialize can_queue from template. |
933 | 933 | ||
934 | * hosts.h: Add can_queue to Scsi_Host structure. | 934 | * hosts.h: Add can_queue to Scsi_Host structure. |
935 | 935 | ||
936 | * aha1740.c: Print out warning about NULL ecbptr. | 936 | * aha1740.c: Print out warning about NULL ecbptr. |
937 | 937 | ||
938 | Fri Nov 4 12:40:30 1994 Eric Youngdale (eric@andante.aib.com) | 938 | Fri Nov 4 12:40:30 1994 Eric Youngdale (eric@andante.aib.com) |
939 | 939 | ||
940 | * Linux 1.1.62 released. | 940 | * Linux 1.1.62 released. |
941 | 941 | ||
942 | * fdomain.c: Update to version 5.20. (From Rik Faith). Support | 942 | * fdomain.c: Update to version 5.20. (From Rik Faith). Support |
943 | BIOS version 3.5. | 943 | BIOS version 3.5. |
944 | 944 | ||
945 | * st.h: Add ST_EOD symbol. | 945 | * st.h: Add ST_EOD symbol. |
946 | 946 | ||
947 | * st.c: Patches from Kai Makisara - support additional densities, | 947 | * st.c: Patches from Kai Makisara - support additional densities, |
948 | add support for MTFSS, MTBSS, MTWSM commands. | 948 | add support for MTFSS, MTBSS, MTWSM commands. |
949 | 949 | ||
950 | * README.st: Update to document new commands. | 950 | * README.st: Update to document new commands. |
951 | 951 | ||
952 | * scsi.c: Add Mediavision CDR-H93MV to blacklist. | 952 | * scsi.c: Add Mediavision CDR-H93MV to blacklist. |
953 | 953 | ||
954 | Sat Oct 29 20:57:36 1994 Eric Youngdale (eric@andante.aib.com) | 954 | Sat Oct 29 20:57:36 1994 Eric Youngdale (eric@andante.aib.com) |
955 | 955 | ||
956 | * Linux 1.1.60 released. | 956 | * Linux 1.1.60 released. |
957 | 957 | ||
958 | * u14-34f.[c,h]: New driver from Dario Ballabio. | 958 | * u14-34f.[c,h]: New driver from Dario Ballabio. |
959 | 959 | ||
960 | * aic7770.c, aha274x_seq.h, aha274x.seq, aha274x.h, aha274x.c, | 960 | * aic7770.c, aha274x_seq.h, aha274x.seq, aha274x.h, aha274x.c, |
961 | README.aha274x: New files, new driver from John Aycock. | 961 | README.aha274x: New files, new driver from John Aycock. |
962 | 962 | ||
963 | 963 | ||
964 | Tue Oct 11 08:47:39 1994 Eric Youngdale (eric@andante) | 964 | Tue Oct 11 08:47:39 1994 Eric Youngdale (eric@andante) |
965 | 965 | ||
966 | * Linux 1.1.54 released. | 966 | * Linux 1.1.54 released. |
967 | 967 | ||
968 | * Add third PCI chip id. [Drew] | 968 | * Add third PCI chip id. [Drew] |
969 | 969 | ||
970 | * buslogic.c: Set BUSLOGIC_CMDLUN back to 1 [Eric]. | 970 | * buslogic.c: Set BUSLOGIC_CMDLUN back to 1 [Eric]. |
971 | 971 | ||
972 | * ultrastor.c: Fix asm directives for new GCC. | 972 | * ultrastor.c: Fix asm directives for new GCC. |
973 | 973 | ||
974 | * sr.c, sd.c: Use new end_scsi_request function. | 974 | * sr.c, sd.c: Use new end_scsi_request function. |
975 | 975 | ||
976 | * scsi.h(end_scsi_request): Return pointer to block if still | 976 | * scsi.h(end_scsi_request): Return pointer to block if still |
977 | active, else return NULL if inactive. Fixes race condition. | 977 | active, else return NULL if inactive. Fixes race condition. |
978 | 978 | ||
979 | Sun Oct 9 20:23:14 1994 Eric Youngdale (eric@andante) | 979 | Sun Oct 9 20:23:14 1994 Eric Youngdale (eric@andante) |
980 | 980 | ||
981 | * Linux 1.1.53 released. | 981 | * Linux 1.1.53 released. |
982 | 982 | ||
983 | * scsi.c: Do not allocate dma bounce buffers if we have exactly | 983 | * scsi.c: Do not allocate dma bounce buffers if we have exactly |
984 | 16Mb. | 984 | 16Mb. |
985 | 985 | ||
986 | Fri Sep 9 05:35:30 1994 Eric Youngdale (eric@andante) | 986 | Fri Sep 9 05:35:30 1994 Eric Youngdale (eric@andante) |
987 | 987 | ||
988 | * Linux 1.1.51 released. | 988 | * Linux 1.1.51 released. |
989 | 989 | ||
990 | * aha152x.c: Add support for disabling the parity check. Update | 990 | * aha152x.c: Add support for disabling the parity check. Update |
991 | to version 1.4. [Juergen]. | 991 | to version 1.4. [Juergen]. |
992 | 992 | ||
993 | * seagate.c: Tweak debugging message. | 993 | * seagate.c: Tweak debugging message. |
994 | 994 | ||
995 | Wed Aug 31 10:15:55 1994 Eric Youngdale (eric@andante) | 995 | Wed Aug 31 10:15:55 1994 Eric Youngdale (eric@andante) |
996 | 996 | ||
997 | * Linux 1.1.50 released. | 997 | * Linux 1.1.50 released. |
998 | 998 | ||
999 | * aha152x.c: Add eb800 for Vtech Platinum SMP boards. [Juergen]. | 999 | * aha152x.c: Add eb800 for Vtech Platinum SMP boards. [Juergen]. |
1000 | 1000 | ||
1001 | * scsi.c: Add Quantum PD1225S to blacklist. | 1001 | * scsi.c: Add Quantum PD1225S to blacklist. |
1002 | 1002 | ||
1003 | Fri Aug 26 09:38:45 1994 Eric Youngdale (eric@andante) | 1003 | Fri Aug 26 09:38:45 1994 Eric Youngdale (eric@andante) |
1004 | 1004 | ||
1005 | * Linux 1.1.49 released. | 1005 | * Linux 1.1.49 released. |
1006 | 1006 | ||
1007 | * sd.c: Fix bug when we were deleting the wrong entry if we | 1007 | * sd.c: Fix bug when we were deleting the wrong entry if we |
1008 | get an unsupported sector size device. | 1008 | get an unsupported sector size device. |
1009 | 1009 | ||
1010 | * sr.c: Another spelling patch. | 1010 | * sr.c: Another spelling patch. |
1011 | 1011 | ||
1012 | Thu Aug 25 09:15:27 1994 Eric Youngdale (eric@andante) | 1012 | Thu Aug 25 09:15:27 1994 Eric Youngdale (eric@andante) |
1013 | 1013 | ||
1014 | * Linux 1.1.48 released. | 1014 | * Linux 1.1.48 released. |
1015 | 1015 | ||
1016 | * Throughout: Use new semantics for request_dma, as appropriate. | 1016 | * Throughout: Use new semantics for request_dma, as appropriate. |
1017 | 1017 | ||
1018 | * sr.c: Print correct device number. | 1018 | * sr.c: Print correct device number. |
1019 | 1019 | ||
1020 | Sun Aug 21 17:49:23 1994 Eric Youngdale (eric@andante) | 1020 | Sun Aug 21 17:49:23 1994 Eric Youngdale (eric@andante) |
1021 | 1021 | ||
1022 | * Linux 1.1.47 released. | 1022 | * Linux 1.1.47 released. |
1023 | 1023 | ||
1024 | * NCR5380.c: Add support for LIMIT_TRANSFERSIZE. | 1024 | * NCR5380.c: Add support for LIMIT_TRANSFERSIZE. |
1025 | 1025 | ||
1026 | * constants.h: Add prototype for print_Scsi_Cmnd. | 1026 | * constants.h: Add prototype for print_Scsi_Cmnd. |
1027 | 1027 | ||
1028 | * pas16.c: Some more minor tweaks. Test for Mediavision board. | 1028 | * pas16.c: Some more minor tweaks. Test for Mediavision board. |
1029 | Allow for disks > 1Gb. [Drew??] | 1029 | Allow for disks > 1Gb. [Drew??] |
1030 | 1030 | ||
1031 | * sr.c: Set SCpnt->transfersize. | 1031 | * sr.c: Set SCpnt->transfersize. |
1032 | 1032 | ||
1033 | Tue Aug 16 17:29:35 1994 Eric Youngdale (eric@andante) | 1033 | Tue Aug 16 17:29:35 1994 Eric Youngdale (eric@andante) |
1034 | 1034 | ||
1035 | * Linux 1.1.46 released. | 1035 | * Linux 1.1.46 released. |
1036 | 1036 | ||
1037 | * Throughout: More spelling fixups. | 1037 | * Throughout: More spelling fixups. |
1038 | 1038 | ||
1039 | * buslogic.c: Add a few more fixups from Dave. Disk translation | 1039 | * buslogic.c: Add a few more fixups from Dave. Disk translation |
1040 | mainly. | 1040 | mainly. |
1041 | 1041 | ||
1042 | * pas16.c: Add a few patches (Drew?). | 1042 | * pas16.c: Add a few patches (Drew?). |
1043 | 1043 | ||
1044 | 1044 | ||
1045 | Thu Aug 11 20:45:15 1994 Eric Youngdale (eric@andante) | 1045 | Thu Aug 11 20:45:15 1994 Eric Youngdale (eric@andante) |
1046 | 1046 | ||
1047 | * Linux 1.1.44 released. | 1047 | * Linux 1.1.44 released. |
1048 | 1048 | ||
1049 | * hosts.c: Add type casts for scsi_init_malloc. | 1049 | * hosts.c: Add type casts for scsi_init_malloc. |
1050 | 1050 | ||
1051 | * scsicam.c: Add type cast. | 1051 | * scsicam.c: Add type cast. |
1052 | 1052 | ||
1053 | Wed Aug 10 19:23:01 1994 Eric Youngdale (eric@andante) | 1053 | Wed Aug 10 19:23:01 1994 Eric Youngdale (eric@andante) |
1054 | 1054 | ||
1055 | * Linux 1.1.43 released. | 1055 | * Linux 1.1.43 released. |
1056 | 1056 | ||
1057 | * Throughout: Spelling cleanups. [??] | 1057 | * Throughout: Spelling cleanups. [??] |
1058 | 1058 | ||
1059 | * aha152x.c, NCR53*.c, fdomain.c, g_NCR5380.c, pas16.c, seagate.c, | 1059 | * aha152x.c, NCR53*.c, fdomain.c, g_NCR5380.c, pas16.c, seagate.c, |
1060 | t128.c: Use request_irq, not irqaction. [??] | 1060 | t128.c: Use request_irq, not irqaction. [??] |
1061 | 1061 | ||
1062 | * aha1542.c: Move test for shost before we start to use shost. | 1062 | * aha1542.c: Move test for shost before we start to use shost. |
1063 | 1063 | ||
1064 | * aha1542.c, aha1740.c, ultrastor.c, wd7000.c: Use new | 1064 | * aha1542.c, aha1740.c, ultrastor.c, wd7000.c: Use new |
1065 | calling sequence for request_irq. | 1065 | calling sequence for request_irq. |
1066 | 1066 | ||
1067 | * buslogic.c: Update from Dave Gentzel. | 1067 | * buslogic.c: Update from Dave Gentzel. |
1068 | 1068 | ||
1069 | Tue Aug 9 09:32:59 1994 Eric Youngdale (eric@andante) | 1069 | Tue Aug 9 09:32:59 1994 Eric Youngdale (eric@andante) |
1070 | 1070 | ||
1071 | * Linux 1.1.42 released. | 1071 | * Linux 1.1.42 released. |
1072 | 1072 | ||
1073 | * NCR5380.c: Change NCR5380_print_status to static. | 1073 | * NCR5380.c: Change NCR5380_print_status to static. |
1074 | 1074 | ||
1075 | * seagate.c: A few more bugfixes. Only Drew knows what they are | 1075 | * seagate.c: A few more bugfixes. Only Drew knows what they are |
1076 | for. | 1076 | for. |
1077 | 1077 | ||
1078 | * ultrastor.c: Tweak some __asm__ directives so that it works | 1078 | * ultrastor.c: Tweak some __asm__ directives so that it works |
1079 | with newer compilers. [??] | 1079 | with newer compilers. [??] |
1080 | 1080 | ||
1081 | Sat Aug 6 21:29:36 1994 Eric Youngdale (eric@andante) | 1081 | Sat Aug 6 21:29:36 1994 Eric Youngdale (eric@andante) |
1082 | 1082 | ||
1083 | * Linux 1.1.40 released. | 1083 | * Linux 1.1.40 released. |
1084 | 1084 | ||
1085 | * NCR5380.c: Return SCSI_RESET_WAKEUP from reset function. | 1085 | * NCR5380.c: Return SCSI_RESET_WAKEUP from reset function. |
1086 | 1086 | ||
1087 | * aha1542.c: Reset mailbox status after a bus device reset. | 1087 | * aha1542.c: Reset mailbox status after a bus device reset. |
1088 | 1088 | ||
1089 | * constants.c: Fix typo (;;). | 1089 | * constants.c: Fix typo (;;). |
1090 | 1090 | ||
1091 | * g_NCR5380.c: | 1091 | * g_NCR5380.c: |
1092 | * pas16.c: Correct usage of NCR5380_init. | 1092 | * pas16.c: Correct usage of NCR5380_init. |
1093 | 1093 | ||
1094 | * scsi.c: Remove redundant (and unused variables). | 1094 | * scsi.c: Remove redundant (and unused variables). |
1095 | 1095 | ||
1096 | * sd.c: Use memset to clear all of rscsi_disks before we use it. | 1096 | * sd.c: Use memset to clear all of rscsi_disks before we use it. |
1097 | 1097 | ||
1098 | * sg.c: Ditto, except for scsi_generics. | 1098 | * sg.c: Ditto, except for scsi_generics. |
1099 | 1099 | ||
1100 | * sr.c: Ditto, except for scsi_CDs. | 1100 | * sr.c: Ditto, except for scsi_CDs. |
1101 | 1101 | ||
1102 | * st.c: Initialize STp->device. | 1102 | * st.c: Initialize STp->device. |
1103 | 1103 | ||
1104 | * seagate.c: Fix bug. [Drew] | 1104 | * seagate.c: Fix bug. [Drew] |
1105 | 1105 | ||
1106 | Thu Aug 4 08:47:27 1994 Eric Youngdale (eric@andante) | 1106 | Thu Aug 4 08:47:27 1994 Eric Youngdale (eric@andante) |
1107 | 1107 | ||
1108 | * Linux 1.1.39 released. | 1108 | * Linux 1.1.39 released. |
1109 | 1109 | ||
1110 | * Makefile: Fix typo in NCR53C7xx. | 1110 | * Makefile: Fix typo in NCR53C7xx. |
1111 | 1111 | ||
1112 | * st.c: Print correct number for device. | 1112 | * st.c: Print correct number for device. |
1113 | 1113 | ||
1114 | Tue Aug 2 11:29:14 1994 Eric Youngdale (eric@esp22) | 1114 | Tue Aug 2 11:29:14 1994 Eric Youngdale (eric@esp22) |
1115 | 1115 | ||
1116 | * Linux 1.1.38 released. | 1116 | * Linux 1.1.38 released. |
1117 | 1117 | ||
1118 | Lots of changes in 1.1.38. All from Drew unless otherwise noted. | 1118 | Lots of changes in 1.1.38. All from Drew unless otherwise noted. |
1119 | 1119 | ||
1120 | * 53c7,8xx.c: New file from Drew. PCI driver. | 1120 | * 53c7,8xx.c: New file from Drew. PCI driver. |
1121 | 1121 | ||
1122 | * 53c7,8xx.h: Likewise. | 1122 | * 53c7,8xx.h: Likewise. |
1123 | 1123 | ||
1124 | * 53c7,8xx.scr: Likewise. | 1124 | * 53c7,8xx.scr: Likewise. |
1125 | 1125 | ||
1126 | * 53c8xx_d.h, 53c8xx_u.h, script_asm.pl: Likewise. | 1126 | * 53c8xx_d.h, 53c8xx_u.h, script_asm.pl: Likewise. |
1127 | 1127 | ||
1128 | * scsicam.c: New file from Drew. Read block 0 on the disk and | 1128 | * scsicam.c: New file from Drew. Read block 0 on the disk and |
1129 | read the partition table. Attempt to deduce the geometry from | 1129 | read the partition table. Attempt to deduce the geometry from |
1130 | the partition table if possible. Only used by 53c[7,8]xx right | 1130 | the partition table if possible. Only used by 53c[7,8]xx right |
1131 | now, but could be used by any device for which we have no way | 1131 | now, but could be used by any device for which we have no way |
1132 | of identifying the geometry. | 1132 | of identifying the geometry. |
1133 | 1133 | ||
1134 | * sd.c: Use device letters instead of sd%d in a lot of messages. | 1134 | * sd.c: Use device letters instead of sd%d in a lot of messages. |
1135 | 1135 | ||
1136 | * seagate.c: Fix bug that resulted in lockups with some devices. | 1136 | * seagate.c: Fix bug that resulted in lockups with some devices. |
1137 | 1137 | ||
1138 | * sr.c (sr_open): Return -EROFS, not -EACCES if we attempt to open | 1138 | * sr.c (sr_open): Return -EROFS, not -EACCES if we attempt to open |
1139 | device for write. | 1139 | device for write. |
1140 | 1140 | ||
1141 | * hosts.c, Makefile: Update for new driver. | 1141 | * hosts.c, Makefile: Update for new driver. |
1142 | 1142 | ||
1143 | * NCR5380.c, NCR5380.h, g_NCR5380.h: Update from Drew to support | 1143 | * NCR5380.c, NCR5380.h, g_NCR5380.h: Update from Drew to support |
1144 | 53C400 chip. | 1144 | 53C400 chip. |
1145 | 1145 | ||
1146 | * constants.c: Define CONST_CMND and CONST_MSG. Other minor | 1146 | * constants.c: Define CONST_CMND and CONST_MSG. Other minor |
1147 | cleanups along the way. Improve handling of CONST_MSG. | 1147 | cleanups along the way. Improve handling of CONST_MSG. |
1148 | 1148 | ||
1149 | * fdomain.c, fdomain.h: New version from Rik Faith. Update to | 1149 | * fdomain.c, fdomain.h: New version from Rik Faith. Update to |
1150 | 5.18. Should now support TMC-3260 PCI card with 18C30 chip. | 1150 | 5.18. Should now support TMC-3260 PCI card with 18C30 chip. |
1151 | 1151 | ||
1152 | * pas16.c: Update with new irq initialization. | 1152 | * pas16.c: Update with new irq initialization. |
1153 | 1153 | ||
1154 | * t128.c: Update with minor cleanups. | 1154 | * t128.c: Update with minor cleanups. |
1155 | 1155 | ||
1156 | * scsi.c (scsi_pid): New variable - gives each command a unique | 1156 | * scsi.c (scsi_pid): New variable - gives each command a unique |
1157 | id. Add Quantum LPS5235S to blacklist. Change in_scan to | 1157 | id. Add Quantum LPS5235S to blacklist. Change in_scan to |
1158 | in_scan_scsis and make global. | 1158 | in_scan_scsis and make global. |
1159 | 1159 | ||
1160 | * scsi.h: Add some defines for extended message handling, | 1160 | * scsi.h: Add some defines for extended message handling, |
1161 | INITIATE/RELEASE_RECOVERY. Add a few new fields to support sync | 1161 | INITIATE/RELEASE_RECOVERY. Add a few new fields to support sync |
1162 | transfers. | 1162 | transfers. |
1163 | 1163 | ||
1164 | * scsi_ioctl.h: Add ioctl to request synchronous transfers. | 1164 | * scsi_ioctl.h: Add ioctl to request synchronous transfers. |
1165 | 1165 | ||
1166 | 1166 | ||
1167 | Tue Jul 26 21:36:58 1994 Eric Youngdale (eric@esp22) | 1167 | Tue Jul 26 21:36:58 1994 Eric Youngdale (eric@esp22) |
1168 | 1168 | ||
1169 | * Linux 1.1.37 released. | 1169 | * Linux 1.1.37 released. |
1170 | 1170 | ||
1171 | * aha1542.c: Always call aha1542_mbenable, use new udelay | 1171 | * aha1542.c: Always call aha1542_mbenable, use new udelay |
1172 | mechanism so we do not wait a long time if the board does not | 1172 | mechanism so we do not wait a long time if the board does not |
1173 | implement this command. | 1173 | implement this command. |
1174 | 1174 | ||
1175 | * g_NCR5380.c: Remove #include <linux/config.h> and #if | 1175 | * g_NCR5380.c: Remove #include <linux/config.h> and #if |
1176 | defined(CONFIG_SCSI_*). | 1176 | defined(CONFIG_SCSI_*). |
1177 | 1177 | ||
1178 | * seagate.c: Likewise. | 1178 | * seagate.c: Likewise. |
1179 | 1179 | ||
1180 | Next round of changes to support loadable modules. Getting closer | 1180 | Next round of changes to support loadable modules. Getting closer |
1181 | now, still not possible to do anything remotely usable. | 1181 | now, still not possible to do anything remotely usable. |
1182 | 1182 | ||
1183 | hosts.c: Create a linked list of detected high level devices. | 1183 | hosts.c: Create a linked list of detected high level devices. |
1184 | (scsi_register_device): New function to insert into this list. | 1184 | (scsi_register_device): New function to insert into this list. |
1185 | (scsi_init): Call scsi_register_device for each of the known high | 1185 | (scsi_init): Call scsi_register_device for each of the known high |
1186 | level drivers. | 1186 | level drivers. |
1187 | 1187 | ||
1188 | hosts.h: Add prototype for linked list header. Add structure | 1188 | hosts.h: Add prototype for linked list header. Add structure |
1189 | definition for device template structure which defines the linked | 1189 | definition for device template structure which defines the linked |
1190 | list. | 1190 | list. |
1191 | 1191 | ||
1192 | scsi.c: (scan_scsis): Use linked list instead of knowledge about | 1192 | scsi.c: (scan_scsis): Use linked list instead of knowledge about |
1193 | existing high level device drivers. | 1193 | existing high level device drivers. |
1194 | (scsi_dev_init): Use init functions for drivers on linked list | 1194 | (scsi_dev_init): Use init functions for drivers on linked list |
1195 | instead of explicit list to initialize and attach devices to high | 1195 | instead of explicit list to initialize and attach devices to high |
1196 | level drivers. | 1196 | level drivers. |
1197 | 1197 | ||
1198 | scsi.h: Add new field "attached" to scsi_device - count of number | 1198 | scsi.h: Add new field "attached" to scsi_device - count of number |
1199 | of high level devices attached. | 1199 | of high level devices attached. |
1200 | 1200 | ||
1201 | sd.c, sr.c, sg.c, st.c: Adjust init/attach functions to use new | 1201 | sd.c, sr.c, sg.c, st.c: Adjust init/attach functions to use new |
1202 | scheme. | 1202 | scheme. |
1203 | 1203 | ||
1204 | Sat Jul 23 13:03:17 1994 Eric Youngdale (eric@esp22) | 1204 | Sat Jul 23 13:03:17 1994 Eric Youngdale (eric@esp22) |
1205 | 1205 | ||
1206 | * Linux 1.1.35 released. | 1206 | * Linux 1.1.35 released. |
1207 | 1207 | ||
1208 | * ultrastor.c: Change constraint on asm() operand so that it works | 1208 | * ultrastor.c: Change constraint on asm() operand so that it works |
1209 | with gcc 2.6.0. | 1209 | with gcc 2.6.0. |
1210 | 1210 | ||
1211 | Thu Jul 21 10:37:39 1994 Eric Youngdale (eric@esp22) | 1211 | Thu Jul 21 10:37:39 1994 Eric Youngdale (eric@esp22) |
1212 | 1212 | ||
1213 | * Linux 1.1.33 released. | 1213 | * Linux 1.1.33 released. |
1214 | 1214 | ||
1215 | * sr.c(sr_open): Do not allow opens with write access. | 1215 | * sr.c(sr_open): Do not allow opens with write access. |
1216 | 1216 | ||
1217 | Mon Jul 18 09:51:22 1994 1994 Eric Youngdale (eric@esp22) | 1217 | Mon Jul 18 09:51:22 1994 Eric Youngdale (eric@esp22) |
1218 | 1218 | ||
1219 | * Linux 1.1.31 released. | 1219 | * Linux 1.1.31 released. |
1220 | 1220 | ||
1221 | * sd.c: Increase SD_TIMEOUT from 300 to 600. | 1221 | * sd.c: Increase SD_TIMEOUT from 300 to 600. |
1222 | 1222 | ||
1223 | * sr.c: Remove stray task_struct* variable that was no longer | 1223 | * sr.c: Remove stray task_struct* variable that was no longer |
1224 | used. | 1224 | used. |
1225 | 1225 | ||
1226 | * sr_ioctl.c: Fix typo in up() call. | 1226 | * sr_ioctl.c: Fix typo in up() call. |
1227 | 1227 | ||
1228 | Sun Jul 17 16:25:29 1994 Eric Youngdale (eric@esp22) | 1228 | Sun Jul 17 16:25:29 1994 Eric Youngdale (eric@esp22) |
1229 | 1229 | ||
1230 | * Linux 1.1.30 released. | 1230 | * Linux 1.1.30 released. |
1231 | 1231 | ||
1232 | * scsi.c (scan_scsis): Fix detection of some Toshiba CDROM drives | 1232 | * scsi.c (scan_scsis): Fix detection of some Toshiba CDROM drives |
1233 | that report themselves as disk drives. | 1233 | that report themselves as disk drives. |
1234 | 1234 | ||
1235 | * (Throughout): Use request.sem instead of request.waiting. | 1235 | * (Throughout): Use request.sem instead of request.waiting. |
1236 | Should fix swap problem with fdomain. | 1236 | Should fix swap problem with fdomain. |
1237 | 1237 | ||
1238 | Thu Jul 14 10:51:42 1994 Eric Youngdale (eric@esp22) | 1238 | Thu Jul 14 10:51:42 1994 Eric Youngdale (eric@esp22) |
1239 | 1239 | ||
1240 | * Linux 1.1.29 released. | 1240 | * Linux 1.1.29 released. |
1241 | 1241 | ||
1242 | * scsi.c (scan_scsis): Add new devices to end of linked list, not | 1242 | * scsi.c (scan_scsis): Add new devices to end of linked list, not |
1243 | to the beginning. | 1243 | to the beginning. |
1244 | 1244 | ||
1245 | * scsi.h (SCSI_SLEEP): Remove brain dead hack to try to save | 1245 | * scsi.h (SCSI_SLEEP): Remove brain dead hack to try to save |
1246 | the task state before sleeping. | 1246 | the task state before sleeping. |
1247 | 1247 | ||
1248 | Sat Jul 9 15:01:03 1994 Eric Youngdale (eric@esp22) | 1248 | Sat Jul 9 15:01:03 1994 Eric Youngdale (eric@esp22) |
1249 | 1249 | ||
1250 | More changes to eventually support loadable modules. Mainly | 1250 | More changes to eventually support loadable modules. Mainly |
1251 | we want to use linked lists instead of arrays because it is easier | 1251 | we want to use linked lists instead of arrays because it is easier |
1252 | to dynamically add and remove things this way. | 1252 | to dynamically add and remove things this way. |
1253 | 1253 | ||
1254 | Quite a bit more work is needed before loadable modules are | 1254 | Quite a bit more work is needed before loadable modules are |
1255 | possible (and usable) with scsi, but this is most of the grunge | 1255 | possible (and usable) with scsi, but this is most of the grunge |
1256 | work. | 1256 | work. |
1257 | 1257 | ||
1258 | * Linux 1.1.28 released. | 1258 | * Linux 1.1.28 released. |
1259 | 1259 | ||
1260 | * scsi.c, scsi.h (allocate_device, request_queueable): Change | 1260 | * scsi.c, scsi.h (allocate_device, request_queueable): Change |
1261 | argument from index into scsi_devices to a pointer to the | 1261 | argument from index into scsi_devices to a pointer to the |
1262 | Scsi_Device struct. | 1262 | Scsi_Device struct. |
1263 | 1263 | ||
1264 | * Throughout: Change all calls to allocate_device, | 1264 | * Throughout: Change all calls to allocate_device, |
1265 | request_queueable to use new calling sequence. | 1265 | request_queueable to use new calling sequence. |
1266 | 1266 | ||
1267 | * Throughout: Use SCpnt->device instead of | 1267 | * Throughout: Use SCpnt->device instead of |
1268 | scsi_devices[SCpnt->index]. Ugh - the pointer was there all along | 1268 | scsi_devices[SCpnt->index]. Ugh - the pointer was there all along |
1269 | - much cleaner this way. | 1269 | - much cleaner this way. |
1270 | 1270 | ||
1271 | * scsi.c (scsi_init_malloc, scsi_free_malloc): New functions - | 1271 | * scsi.c (scsi_init_malloc, scsi_free_malloc): New functions - |
1272 | allow us to pretend that we have a working malloc when we | 1272 | allow us to pretend that we have a working malloc when we |
1273 | initialize. Use this instead of passing memory_start, memory_end | 1273 | initialize. Use this instead of passing memory_start, memory_end |
1274 | around all over the place. | 1274 | around all over the place. |
1275 | 1275 | ||
1276 | * scsi.h, st.c, sr.c, sd.c, sg.c: Change *_init1 functions to use | 1276 | * scsi.h, st.c, sr.c, sd.c, sg.c: Change *_init1 functions to use |
1277 | scsi_init_malloc, remove all arguments, no return value. | 1277 | scsi_init_malloc, remove all arguments, no return value. |
1278 | 1278 | ||
1279 | * scsi.h: Remove index field from Scsi_Device and Scsi_Cmnd | 1279 | * scsi.h: Remove index field from Scsi_Device and Scsi_Cmnd |
1280 | structs. | 1280 | structs. |
1281 | 1281 | ||
1282 | * scsi.c (scsi_dev_init): Set up for scsi_init_malloc. | 1282 | * scsi.c (scsi_dev_init): Set up for scsi_init_malloc. |
1283 | (scan_scsis): Get SDpnt from scsi_init_malloc, and refresh | 1283 | (scan_scsis): Get SDpnt from scsi_init_malloc, and refresh |
1284 | when we discover a device. Free pointer before returning. | 1284 | when we discover a device. Free pointer before returning. |
1285 | Change scsi_devices into a linked list. | 1285 | Change scsi_devices into a linked list. |
1286 | 1286 | ||
1287 | * scsi.c (scan_scsis): Change to only scan one host. | 1287 | * scsi.c (scan_scsis): Change to only scan one host. |
1288 | (scsi_dev_init): Loop over all detected hosts, and scan them. | 1288 | (scsi_dev_init): Loop over all detected hosts, and scan them. |
1289 | 1289 | ||
1290 | * hosts.c (scsi_init_free): Change so that number of extra bytes | 1290 | * hosts.c (scsi_init_free): Change so that number of extra bytes |
1291 | is stored in struct, and we do not have to pass it each time. | 1291 | is stored in struct, and we do not have to pass it each time. |
1292 | 1292 | ||
1293 | * hosts.h: Change Scsi_Host_Template struct to include "next" and | 1293 | * hosts.h: Change Scsi_Host_Template struct to include "next" and |
1294 | "release" functions. Initialize to NULL in all low level | 1294 | "release" functions. Initialize to NULL in all low level |
1295 | adapters. | 1295 | adapters. |
1296 | 1296 | ||
1297 | * hosts.c: Rename scsi_hosts to builtin_scsi_hosts, create linked | 1297 | * hosts.c: Rename scsi_hosts to builtin_scsi_hosts, create linked |
1298 | list scsi_hosts, linked together with the new "next" field. | 1298 | list scsi_hosts, linked together with the new "next" field. |
1299 | 1299 | ||
1300 | Wed Jul 6 05:45:02 1994 Eric Youngdale (eric@esp22) | 1300 | Wed Jul 6 05:45:02 1994 Eric Youngdale (eric@esp22) |
1301 | 1301 | ||
1302 | * Linux 1.1.25 released. | 1302 | * Linux 1.1.25 released. |
1303 | 1303 | ||
1304 | * aha152x.c: Changes from Juergen - cleanups and updates. | 1304 | * aha152x.c: Changes from Juergen - cleanups and updates. |
1305 | 1305 | ||
1306 | * sd.c, sr.c: Use new check_media_change and revalidate | 1306 | * sd.c, sr.c: Use new check_media_change and revalidate |
1307 | file_operations fields. | 1307 | file_operations fields. |
1308 | 1308 | ||
1309 | * st.c, st.h: Add changes from Kai Makisara, dated Jun 22. | 1309 | * st.c, st.h: Add changes from Kai Makisara, dated Jun 22. |
1310 | 1310 | ||
1311 | * hosts.h: Change SG_ALL back to 0xff. Apparently soft error | 1311 | * hosts.h: Change SG_ALL back to 0xff. Apparently soft error |
1312 | in /dev/brain resulted in having this bumped up. | 1312 | in /dev/brain resulted in having this bumped up. |
1313 | Change first parameter in bios_param function to be Disk * instead | 1313 | Change first parameter in bios_param function to be Disk * instead |
1314 | of index into rscsi_disks. | 1314 | of index into rscsi_disks. |
1315 | 1315 | ||
1316 | * sd_ioctl.c: Pass pointer to rscsi_disks element instead of index | 1316 | * sd_ioctl.c: Pass pointer to rscsi_disks element instead of index |
1317 | to array. | 1317 | to array. |
1318 | 1318 | ||
1319 | * sd.h: Add struct name "scsi_disk" to typedef for Scsi_Disk. | 1319 | * sd.h: Add struct name "scsi_disk" to typedef for Scsi_Disk. |
1320 | 1320 | ||
1321 | * scsi.c: Remove redundant Maxtor XT8760S from blacklist. | 1321 | * scsi.c: Remove redundant Maxtor XT8760S from blacklist. |
1322 | In scsi_reset, add printk when DEBUG defined. | 1322 | In scsi_reset, add printk when DEBUG defined. |
1323 | 1323 | ||
1324 | * All low level drivers: Modify definitions of bios_param in | 1324 | * All low level drivers: Modify definitions of bios_param in |
1325 | appropriate way. | 1325 | appropriate way. |
1326 | 1326 | ||
1327 | Thu Jun 16 10:31:59 1994 Eric Youngdale (eric@esp22) | 1327 | Thu Jun 16 10:31:59 1994 Eric Youngdale (eric@esp22) |
1328 | 1328 | ||
1329 | * Linux 1.1.20 released. | 1329 | * Linux 1.1.20 released. |
1330 | 1330 | ||
1331 | * scsi_ioctl.c: Only pass down the actual number of characters | 1331 | * scsi_ioctl.c: Only pass down the actual number of characters |
1332 | required to scsi_do_cmd, not the one rounded up to a even number | 1332 | required to scsi_do_cmd, not the one rounded up to a even number |
1333 | of sectors. | 1333 | of sectors. |
1334 | 1334 | ||
1335 | * ultrastor.c: Changes from Caleb Epstein for 24f cards. Support | 1335 | * ultrastor.c: Changes from Caleb Epstein for 24f cards. Support |
1336 | larger SG lists. | 1336 | larger SG lists. |
1337 | 1337 | ||
1338 | * ultrastor.c: Changes from me - use scsi_register to register | 1338 | * ultrastor.c: Changes from me - use scsi_register to register |
1339 | host. Add some consistency checking, | 1339 | host. Add some consistency checking, |
1340 | 1340 | ||
1341 | Wed Jun 1 21:12:13 1994 Eric Youngdale (eric@esp22) | 1341 | Wed Jun 1 21:12:13 1994 Eric Youngdale (eric@esp22) |
1342 | 1342 | ||
1343 | * Linux 1.1.19 released. | 1343 | * Linux 1.1.19 released. |
1344 | 1344 | ||
1345 | * scsi.h: Add new return code for reset() function: | 1345 | * scsi.h: Add new return code for reset() function: |
1346 | SCSI_RESET_PUNT. | 1346 | SCSI_RESET_PUNT. |
1347 | 1347 | ||
1348 | * scsi.c: Make SCSI_RESET_PUNT the same as SCSI_RESET_WAKEUP for | 1348 | * scsi.c: Make SCSI_RESET_PUNT the same as SCSI_RESET_WAKEUP for |
1349 | now. | 1349 | now. |
1350 | 1350 | ||
1351 | * aha1542.c: If the command responsible for the reset is not | 1351 | * aha1542.c: If the command responsible for the reset is not |
1352 | pending, return SCSI_RESET_PUNT. | 1352 | pending, return SCSI_RESET_PUNT. |
1353 | 1353 | ||
1354 | * aha1740.c, buslogic.c, wd7000.c, ultrastor.c: Return | 1354 | * aha1740.c, buslogic.c, wd7000.c, ultrastor.c: Return |
1355 | SCSI_RESET_PUNT instead of SCSI_RESET_SNOOZE. | 1355 | SCSI_RESET_PUNT instead of SCSI_RESET_SNOOZE. |
1356 | 1356 | ||
1357 | Tue May 31 19:36:01 1994 Eric Youngdale (eric@esp22) | 1357 | Tue May 31 19:36:01 1994 Eric Youngdale (eric@esp22) |
1358 | 1358 | ||
1359 | * buslogic.c: Do not print out message about "must be Adaptec" | 1359 | * buslogic.c: Do not print out message about "must be Adaptec" |
1360 | if we have detected a buslogic card. Print out a warning message | 1360 | if we have detected a buslogic card. Print out a warning message |
1361 | if we are configuring for >16Mb, since the 445S at board level | 1361 | if we are configuring for >16Mb, since the 445S at board level |
1362 | D or earlier does not work right. The "D" level board can be made | 1362 | D or earlier does not work right. The "D" level board can be made |
1363 | to work by flipping an undocumented switch, but this is too subtle. | 1363 | to work by flipping an undocumented switch, but this is too subtle. |
1364 | 1364 | ||
1365 | Changes based upon patches in Yggdrasil distribution. | 1365 | Changes based upon patches in Yggdrasil distribution. |
1366 | 1366 | ||
1367 | * sg.c, sg.h: Return sense data to user. | 1367 | * sg.c, sg.h: Return sense data to user. |
1368 | 1368 | ||
1369 | * aha1542.c, aha1740.c, buslogic.c: Do not panic if | 1369 | * aha1542.c, aha1740.c, buslogic.c: Do not panic if |
1370 | sense buffer is wrong size. | 1370 | sense buffer is wrong size. |
1371 | 1371 | ||
1372 | * hosts.c: Test for ultrastor card before any of the others. | 1372 | * hosts.c: Test for ultrastor card before any of the others. |
1373 | 1373 | ||
1374 | * scsi.c: Allow boot-time option for max_scsi_luns=? so that | 1374 | * scsi.c: Allow boot-time option for max_scsi_luns=? so that |
1375 | buggy firmware has an easy work-around. | 1375 | buggy firmware has an easy work-around. |
1376 | 1376 | ||
1377 | Sun May 15 20:24:34 1994 Eric Youngdale (eric@esp22) | 1377 | Sun May 15 20:24:34 1994 Eric Youngdale (eric@esp22) |
1378 | 1378 | ||
1379 | * Linux 1.1.15 released. | 1379 | * Linux 1.1.15 released. |
1380 | 1380 | ||
1381 | Post-codefreeze thaw... | 1381 | Post-codefreeze thaw... |
1382 | 1382 | ||
1383 | * buslogic.[c,h]: New driver from David Gentzel. | 1383 | * buslogic.[c,h]: New driver from David Gentzel. |
1384 | 1384 | ||
1385 | * hosts.h: Add use_clustering field to explicitly say whether | 1385 | * hosts.h: Add use_clustering field to explicitly say whether |
1386 | clustering should be used for devices attached to this host | 1386 | clustering should be used for devices attached to this host |
1387 | adapter. The buslogic board apparently supports large SG lists, | 1387 | adapter. The buslogic board apparently supports large SG lists, |
1388 | but it is apparently faster if sd.c condenses this into a smaller | 1388 | but it is apparently faster if sd.c condenses this into a smaller |
1389 | list. | 1389 | list. |
1390 | 1390 | ||
1391 | * sd.c: Use this field instead of heuristic. | 1391 | * sd.c: Use this field instead of heuristic. |
1392 | 1392 | ||
1393 | * All host adapter include files: Add appropriate initializer for | 1393 | * All host adapter include files: Add appropriate initializer for |
1394 | use_clustering field. | 1394 | use_clustering field. |
1395 | 1395 | ||
1396 | * scsi.h: Add #defines for return codes for the abort and reset | 1396 | * scsi.h: Add #defines for return codes for the abort and reset |
1397 | functions. There are now a specific set of return codes to fully | 1397 | functions. There are now a specific set of return codes to fully |
1398 | specify all of the possible things that the low-level adapter | 1398 | specify all of the possible things that the low-level adapter |
1399 | could do. | 1399 | could do. |
1400 | 1400 | ||
1401 | * scsi.c: Act based upon return codes from abort/reset functions. | 1401 | * scsi.c: Act based upon return codes from abort/reset functions. |
1402 | 1402 | ||
1403 | * All host adapter abort/reset functions: Return new return code. | 1403 | * All host adapter abort/reset functions: Return new return code. |
1404 | 1404 | ||
1405 | * Add code in scsi.c to help debug timeouts. Use #define | 1405 | * Add code in scsi.c to help debug timeouts. Use #define |
1406 | DEBUG_TIMEOUT to enable this. | 1406 | DEBUG_TIMEOUT to enable this. |
1407 | 1407 | ||
1408 | * scsi.c: If the host->irq field is set, use | 1408 | * scsi.c: If the host->irq field is set, use |
1409 | disable_irq/enable_irq before calling queuecommand if we | 1409 | disable_irq/enable_irq before calling queuecommand if we |
1410 | are not already in an interrupt. Reduce races, and we | 1410 | are not already in an interrupt. Reduce races, and we |
1411 | can be sloppier about cli/sti in the interrupt routines now | 1411 | can be sloppier about cli/sti in the interrupt routines now |
1412 | (reduce interrupt latency). | 1412 | (reduce interrupt latency). |
1413 | 1413 | ||
1414 | * constants.c: Fix some things to eliminate warnings. Add some | 1414 | * constants.c: Fix some things to eliminate warnings. Add some |
1415 | sense descriptions that were omitted before. | 1415 | sense descriptions that were omitted before. |
1416 | 1416 | ||
1417 | * aha1542.c: Watch for SCRD from host adapter - if we see it, set | 1417 | * aha1542.c: Watch for SCRD from host adapter - if we see it, set |
1418 | a flag. Currently we only print out the number of pending | 1418 | a flag. Currently we only print out the number of pending |
1419 | commands that might need to be restarted. | 1419 | commands that might need to be restarted. |
1420 | 1420 | ||
1421 | * aha1542.c (aha1542_abort): Look for lost interrupts, OGMB still | 1421 | * aha1542.c (aha1542_abort): Look for lost interrupts, OGMB still |
1422 | full, and attempt to recover. Otherwise give up. | 1422 | full, and attempt to recover. Otherwise give up. |
1423 | 1423 | ||
1424 | * aha1542.c (aha1542_reset): Try BUS DEVICE RESET, and then pass | 1424 | * aha1542.c (aha1542_reset): Try BUS DEVICE RESET, and then pass |
1425 | DID_RESET back up to the upper level code for all commands running | 1425 | DID_RESET back up to the upper level code for all commands running |
1426 | on this target (even on different LUNs). | 1426 | on this target (even on different LUNs). |
1427 | 1427 | ||
1428 | Sat May 7 14:54:01 1994 | 1428 | Sat May 7 14:54:01 1994 |
1429 | 1429 | ||
1430 | * Linux 1.1.12 released. | 1430 | * Linux 1.1.12 released. |
1431 | 1431 | ||
1432 | * st.c, st.h: New version from Kai. Supports boot time | 1432 | * st.c, st.h: New version from Kai. Supports boot time |
1433 | specification of number of buffers. | 1433 | specification of number of buffers. |
1434 | 1434 | ||
1435 | * wd7000.[c,h]: Updated driver from John Boyd. Now supports | 1435 | * wd7000.[c,h]: Updated driver from John Boyd. Now supports |
1436 | more than one wd7000 board in machine at one time, among other things. | 1436 | more than one wd7000 board in machine at one time, among other things. |
1437 | 1437 | ||
1438 | Wed Apr 20 22:20:35 1994 | 1438 | Wed Apr 20 22:20:35 1994 |
1439 | 1439 | ||
1440 | * Linux 1.1.8 released. | 1440 | * Linux 1.1.8 released. |
1441 | 1441 | ||
1442 | * sd.c: Add a few type casts where scsi_malloc is called. | 1442 | * sd.c: Add a few type casts where scsi_malloc is called. |
1443 | 1443 | ||
1444 | Wed Apr 13 12:53:29 1994 | 1444 | Wed Apr 13 12:53:29 1994 |
1445 | 1445 | ||
1446 | * Linux 1.1.4 released. | 1446 | * Linux 1.1.4 released. |
1447 | 1447 | ||
1448 | * scsi.c: Clean up a few printks (use %p to print pointers). | 1448 | * scsi.c: Clean up a few printks (use %p to print pointers). |
1449 | 1449 | ||
1450 | Wed Apr 13 11:33:02 1994 | 1450 | Wed Apr 13 11:33:02 1994 |
1451 | 1451 | ||
1452 | * Linux 1.1.3 released. | 1452 | * Linux 1.1.3 released. |
1453 | 1453 | ||
1454 | * fdomain.c: Update to version 5.16 (Handle different FIFO sizes | 1454 | * fdomain.c: Update to version 5.16 (Handle different FIFO sizes |
1455 | better). | 1455 | better). |
1456 | 1456 | ||
1457 | Fri Apr 8 08:57:19 1994 | 1457 | Fri Apr 8 08:57:19 1994 |
1458 | 1458 | ||
1459 | * Linux 1.1.2 released. | 1459 | * Linux 1.1.2 released. |
1460 | 1460 | ||
1461 | * Throughout: SCSI portion of cluster diffs added. | 1461 | * Throughout: SCSI portion of cluster diffs added. |
1462 | 1462 | ||
1463 | Tue Apr 5 07:41:50 1994 | 1463 | Tue Apr 5 07:41:50 1994 |
1464 | 1464 | ||
1465 | * Linux 1.1 development tree initiated. | 1465 | * Linux 1.1 development tree initiated. |
1466 | 1466 | ||
1467 | * The linux 1.0 development tree is now effectively frozen except | 1467 | * The linux 1.0 development tree is now effectively frozen except |
1468 | for obvious bugfixes. | 1468 | for obvious bugfixes. |
1469 | 1469 | ||
1470 | ****************************************************************** | 1470 | ****************************************************************** |
1471 | ****************************************************************** | 1471 | ****************************************************************** |
1472 | ****************************************************************** | 1472 | ****************************************************************** |
1473 | ****************************************************************** | 1473 | ****************************************************************** |
1474 | 1474 | ||
1475 | Sun Apr 17 00:17:39 1994 | 1475 | Sun Apr 17 00:17:39 1994 |
1476 | 1476 | ||
1477 | * Linux 1.0, patchlevel 9 released. | 1477 | * Linux 1.0, patchlevel 9 released. |
1478 | 1478 | ||
1479 | * fdomain.c: Update to version 5.16 (Handle different FIFO sizes | 1479 | * fdomain.c: Update to version 5.16 (Handle different FIFO sizes |
1480 | better). | 1480 | better). |
1481 | 1481 | ||
1482 | Thu Apr 7 08:36:20 1994 | 1482 | Thu Apr 7 08:36:20 1994 |
1483 | 1483 | ||
1484 | * Linux 1.0, patchlevel8 released. | 1484 | * Linux 1.0, patchlevel8 released. |
1485 | 1485 | ||
1486 | * fdomain.c: Update to version 5.15 from 5.9. Handles 3.4 bios. | 1486 | * fdomain.c: Update to version 5.15 from 5.9. Handles 3.4 bios. |
1487 | 1487 | ||
1488 | Sun Apr 3 14:43:03 1994 | 1488 | Sun Apr 3 14:43:03 1994 |
1489 | 1489 | ||
1490 | * Linux 1.0, patchlevel6 released. | 1490 | * Linux 1.0, patchlevel6 released. |
1491 | 1491 | ||
1492 | * wd7000.c: Make stab at fixing race condition. | 1492 | * wd7000.c: Make stab at fixing race condition. |
1493 | 1493 | ||
1494 | Sat Mar 26 14:14:50 1994 | 1494 | Sat Mar 26 14:14:50 1994 |
1495 | 1495 | ||
1496 | * Linux 1.0, patchlevel5 released. | 1496 | * Linux 1.0, patchlevel5 released. |
1497 | 1497 | ||
1498 | * aha152x.c, Makefile: Fix a few bugs (too much data message). | 1498 | * aha152x.c, Makefile: Fix a few bugs (too much data message). |
1499 | Add a few more bios signatures. (Patches from Juergen). | 1499 | Add a few more bios signatures. (Patches from Juergen). |
1500 | 1500 | ||
1501 | * aha1542.c: Fix race condition in aha1542_out. | 1501 | * aha1542.c: Fix race condition in aha1542_out. |
1502 | 1502 | ||
1503 | Mon Mar 21 16:36:20 1994 | 1503 | Mon Mar 21 16:36:20 1994 |
1504 | 1504 | ||
1505 | * Linux 1.0, patchlevel3 released. | 1505 | * Linux 1.0, patchlevel3 released. |
1506 | 1506 | ||
1507 | * sd.c, st.c, sr.c, sg.c: Return -ENXIO, not -ENODEV if we attempt | 1507 | * sd.c, st.c, sr.c, sg.c: Return -ENXIO, not -ENODEV if we attempt |
1508 | to open a non-existent device. | 1508 | to open a non-existent device. |
1509 | 1509 | ||
1510 | * scsi.c: Add Chinon cdrom to blacklist. | 1510 | * scsi.c: Add Chinon cdrom to blacklist. |
1511 | 1511 | ||
1512 | * sr_ioctl.c: Check return status of verify_area. | 1512 | * sr_ioctl.c: Check return status of verify_area. |
1513 | 1513 | ||
1514 | Sat Mar 6 16:06:19 1994 | 1514 | Sat Mar 6 16:06:19 1994 |
1515 | 1515 | ||
1516 | * Linux 1.0 released (technically a pre-release). | 1516 | * Linux 1.0 released (technically a pre-release). |
1517 | 1517 | ||
1518 | * scsi.c: Add IMS CDD521, Maxtor XT-8760S to blacklist. | 1518 | * scsi.c: Add IMS CDD521, Maxtor XT-8760S to blacklist. |
1519 | 1519 | ||
1520 | Tue Feb 15 10:58:20 1994 | 1520 | Tue Feb 15 10:58:20 1994 |
1521 | 1521 | ||
1522 | * pl15e released. | 1522 | * pl15e released. |
1523 | 1523 | ||
1524 | * aha1542.c: For 1542C, allow dynamic device scan with >1Gb turned | 1524 | * aha1542.c: For 1542C, allow dynamic device scan with >1Gb turned |
1525 | off. | 1525 | off. |
1526 | 1526 | ||
1527 | * constants.c: Fix typo in definition of CONSTANTS. | 1527 | * constants.c: Fix typo in definition of CONSTANTS. |
1528 | 1528 | ||
1529 | * pl15d released. | 1529 | * pl15d released. |
1530 | 1530 | ||
1531 | Fri Feb 11 10:10:16 1994 | 1531 | Fri Feb 11 10:10:16 1994 |
1532 | 1532 | ||
1533 | * pl15c released. | 1533 | * pl15c released. |
1534 | 1534 | ||
1535 | * scsi.c: Add Maxtor XT-3280 and Rodime RO3000S to blacklist. | 1535 | * scsi.c: Add Maxtor XT-3280 and Rodime RO3000S to blacklist. |
1536 | 1536 | ||
1537 | * scsi.c: Allow tagged queueing for scsi 3 devices as well. | 1537 | * scsi.c: Allow tagged queueing for scsi 3 devices as well. |
1538 | Some really old devices report a version number of 0. Disallow | 1538 | Some really old devices report a version number of 0. Disallow |
1539 | LUN != 0 for these. | 1539 | LUN != 0 for these. |
1540 | 1540 | ||
1541 | Thu Feb 10 09:48:57 1994 | 1541 | Thu Feb 10 09:48:57 1994 |
1542 | 1542 | ||
1543 | * pl15b released. | 1543 | * pl15b released. |
1544 | 1544 | ||
1545 | Sun Feb 6 12:19:46 1994 | 1545 | Sun Feb 6 12:19:46 1994 |
1546 | 1546 | ||
1547 | * pl15a released. | 1547 | * pl15a released. |
1548 | 1548 | ||
1549 | Fri Feb 4 09:02:17 1994 | 1549 | Fri Feb 4 09:02:17 1994 |
1550 | 1550 | ||
1551 | * scsi.c: Add Teac cdrom to blacklist. | 1551 | * scsi.c: Add Teac cdrom to blacklist. |
1552 | 1552 | ||
1553 | Thu Feb 3 14:16:43 1994 | 1553 | Thu Feb 3 14:16:43 1994 |
1554 | 1554 | ||
1555 | * pl15 released. | 1555 | * pl15 released. |
1556 | 1556 | ||
1557 | Tue Feb 1 15:47:43 1994 | 1557 | Tue Feb 1 15:47:43 1994 |
1558 | 1558 | ||
1559 | * pl14w released. | 1559 | * pl14w released. |
1560 | 1560 | ||
1561 | * wd7000.c (wd_bases): Fix typo in last change. | 1561 | * wd7000.c (wd_bases): Fix typo in last change. |
1562 | 1562 | ||
1563 | Mon Jan 24 17:37:23 1994 | 1563 | Mon Jan 24 17:37:23 1994 |
1564 | 1564 | ||
1565 | * pl14u released. | 1565 | * pl14u released. |
1566 | 1566 | ||
1567 | * aha1542.c: Support 1542CF/extended bios. Different from 1542C | 1567 | * aha1542.c: Support 1542CF/extended bios. Different from 1542C |
1568 | 1568 | ||
1569 | * wd7000.c: Allow bios at 0xd8000 as well. | 1569 | * wd7000.c: Allow bios at 0xd8000 as well. |
1570 | 1570 | ||
1571 | * ultrastor.c: Do not truncate cylinders to 1024. | 1571 | * ultrastor.c: Do not truncate cylinders to 1024. |
1572 | 1572 | ||
1573 | * fdomain.c: Update to version 5.9 (add new bios signature). | 1573 | * fdomain.c: Update to version 5.9 (add new bios signature). |
1574 | 1574 | ||
1575 | * NCR5380.c: Update from Drew - should work a lot better now. | 1575 | * NCR5380.c: Update from Drew - should work a lot better now. |
1576 | 1576 | ||
1577 | Sat Jan 8 15:13:10 1994 | 1577 | Sat Jan 8 15:13:10 1994 |
1578 | 1578 | ||
1579 | * pl14o released. | 1579 | * pl14o released. |
1580 | 1580 | ||
1581 | * sr_ioctl.c: Zero reserved field before trying to set audio volume. | 1581 | * sr_ioctl.c: Zero reserved field before trying to set audio volume. |
1582 | 1582 | ||
1583 | Wed Jan 5 13:21:10 1994 | 1583 | Wed Jan 5 13:21:10 1994 |
1584 | 1584 | ||
1585 | * pl14m released. | 1585 | * pl14m released. |
1586 | 1586 | ||
1587 | * fdomain.c: Update to version 5.8. No functional difference??? | 1587 | * fdomain.c: Update to version 5.8. No functional difference??? |
1588 | 1588 | ||
1589 | Tue Jan 4 14:26:13 1994 | 1589 | Tue Jan 4 14:26:13 1994 |
1590 | 1590 | ||
1591 | * pl14l released. | 1591 | * pl14l released. |
1592 | 1592 | ||
1593 | * ultrastor.c: Remove outl, inl functions (now provided elsewhere). | 1593 | * ultrastor.c: Remove outl, inl functions (now provided elsewhere). |
1594 | 1594 | ||
1595 | Mon Jan 3 12:27:25 1994 | 1595 | Mon Jan 3 12:27:25 1994 |
1596 | 1596 | ||
1597 | * pl14k released. | 1597 | * pl14k released. |
1598 | 1598 | ||
1599 | * aha152x.c: Remove insw and outsw functions. | 1599 | * aha152x.c: Remove insw and outsw functions. |
1600 | 1600 | ||
1601 | * fdomain.c: Ditto. | 1601 | * fdomain.c: Ditto. |
1602 | 1602 | ||
1603 | Wed Dec 29 09:47:20 1993 | 1603 | Wed Dec 29 09:47:20 1993 |
1604 | 1604 | ||
1605 | * pl14i released. | 1605 | * pl14i released. |
1606 | 1606 | ||
1607 | * scsi.c: Support RECOVERED_ERROR for tape drives. | 1607 | * scsi.c: Support RECOVERED_ERROR for tape drives. |
1608 | 1608 | ||
1609 | * st.c: Update of tape driver from Kai. | 1609 | * st.c: Update of tape driver from Kai. |
1610 | 1610 | ||
1611 | Tue Dec 21 09:18:30 1993 | 1611 | Tue Dec 21 09:18:30 1993 |
1612 | 1612 | ||
1613 | * pl14g released. | 1613 | * pl14g released. |
1614 | 1614 | ||
1615 | * aha1542.[c,h]: Support extended BIOS stuff. | 1615 | * aha1542.[c,h]: Support extended BIOS stuff. |
1616 | 1616 | ||
1617 | * scsi.c: Clean up messages about disks, so they are displayed as | 1617 | * scsi.c: Clean up messages about disks, so they are displayed as |
1618 | sda, sdb, etc instead of sd0, sd1, etc. | 1618 | sda, sdb, etc instead of sd0, sd1, etc. |
1619 | 1619 | ||
1620 | * sr.c: Force reread of capacity if disk was changed. | 1620 | * sr.c: Force reread of capacity if disk was changed. |
1621 | Clear buffer before asking for capacity/sectorsize (some drives | 1621 | Clear buffer before asking for capacity/sectorsize (some drives |
1622 | do not report this properly). Set needs_sector_size flag if | 1622 | do not report this properly). Set needs_sector_size flag if |
1623 | drive did not return sensible sector size. | 1623 | drive did not return sensible sector size. |
1624 | 1624 | ||
1625 | Mon Dec 13 12:13:47 1993 | 1625 | Mon Dec 13 12:13:47 1993 |
1626 | 1626 | ||
1627 | * aha152x.c: Update to version .101 from Juergen. | 1627 | * aha152x.c: Update to version .101 from Juergen. |
1628 | 1628 | ||
1629 | Mon Nov 29 03:03:00 1993 | 1629 | Mon Nov 29 03:03:00 1993 |
1630 | 1630 | ||
1631 | * linux 0.99.14 released. | 1631 | * linux 0.99.14 released. |
1632 | 1632 | ||
1633 | * All scsi stuff moved from kernel/blk_drv/scsi to drivers/scsi. | 1633 | * All scsi stuff moved from kernel/blk_drv/scsi to drivers/scsi. |
1634 | 1634 | ||
1635 | * Throughout: Grammatical corrections to various comments. | 1635 | * Throughout: Grammatical corrections to various comments. |
1636 | 1636 | ||
1637 | * Makefile: fix so that we do not need to compile things we are | 1637 | * Makefile: fix so that we do not need to compile things we are |
1638 | not going to use. | 1638 | not going to use. |
1639 | 1639 | ||
1640 | * NCR5380.c, NCR5380.h, g_NCR5380.c, g_NCR5380.h, pas16.c, | 1640 | * NCR5380.c, NCR5380.h, g_NCR5380.c, g_NCR5380.h, pas16.c, |
1641 | pas16.h, t128.c, t128.h: New files from Drew. | 1641 | pas16.h, t128.c, t128.h: New files from Drew. |
1642 | 1642 | ||
1643 | * aha152x.c, aha152x.h: New files from Juergen Fischer. | 1643 | * aha152x.c, aha152x.h: New files from Juergen Fischer. |
1644 | 1644 | ||
1645 | * aha1542.c: Support for more than one 1542 in the machine | 1645 | * aha1542.c: Support for more than one 1542 in the machine |
1646 | at the same time. Make functions static that do not need | 1646 | at the same time. Make functions static that do not need |
1647 | visibility. | 1647 | visibility. |
1648 | 1648 | ||
1649 | * aha1740.c: Set NEEDS_JUMPSTART flag in reset function, so we | 1649 | * aha1740.c: Set NEEDS_JUMPSTART flag in reset function, so we |
1650 | know to restart the command. Change prototype of aha1740_reset | 1650 | know to restart the command. Change prototype of aha1740_reset |
1651 | to take a command pointer. | 1651 | to take a command pointer. |
1652 | 1652 | ||
1653 | * constants.c: Clean up a few things. | 1653 | * constants.c: Clean up a few things. |
1654 | 1654 | ||
1655 | * fdomain.c: Update to version 5.6. Move snarf_region. Allow | 1655 | * fdomain.c: Update to version 5.6. Move snarf_region. Allow |
1656 | board to be set at different SCSI ids. Remove support for | 1656 | board to be set at different SCSI ids. Remove support for |
1657 | reselection (did not work well). Set JUMPSTART flag in reset | 1657 | reselection (did not work well). Set JUMPSTART flag in reset |
1658 | code. | 1658 | code. |
1659 | 1659 | ||
1660 | * hosts.c: Support new low-level adapters. Allow for more than | 1660 | * hosts.c: Support new low-level adapters. Allow for more than |
1661 | one adapter of a given type. | 1661 | one adapter of a given type. |
1662 | 1662 | ||
1663 | * hosts.h: Allow for more than one adapter of a given type. | 1663 | * hosts.h: Allow for more than one adapter of a given type. |
1664 | 1664 | ||
1665 | * scsi.c: Add scsi_device_types array, if NEEDS_JUMPSTART is set | 1665 | * scsi.c: Add scsi_device_types array, if NEEDS_JUMPSTART is set |
1666 | after a low-level reset, start the command again. Sort blacklist, | 1666 | after a low-level reset, start the command again. Sort blacklist, |
1667 | and add Maxtor MXT-1240S, XT-4170S, NEC CDROM 84, Seagate ST157N. | 1667 | and add Maxtor MXT-1240S, XT-4170S, NEC CDROM 84, Seagate ST157N. |
1668 | 1668 | ||
1669 | * scsi.h: Add constants for tagged queueing. | 1669 | * scsi.h: Add constants for tagged queueing. |
1670 | 1670 | ||
1671 | * Throughout: Use constants from major.h instead of hardcoded | 1671 | * Throughout: Use constants from major.h instead of hardcoded |
1672 | numbers for major numbers. | 1672 | numbers for major numbers. |
1673 | 1673 | ||
1674 | * scsi_ioctl.c: Fix bug in buffer length in ioctl_command. Use | 1674 | * scsi_ioctl.c: Fix bug in buffer length in ioctl_command. Use |
1675 | verify_area in GET_IDLUN ioctl. Add new ioctls for | 1675 | verify_area in GET_IDLUN ioctl. Add new ioctls for |
1676 | TAGGED_QUEUE_ENABLE, DISABLE. Only allow IOCTL_SEND_COMMAND by | 1676 | TAGGED_QUEUE_ENABLE, DISABLE. Only allow IOCTL_SEND_COMMAND by |
1677 | superuser. | 1677 | superuser. |
1678 | 1678 | ||
1679 | * sd.c: Only pay attention to UNIT_ATTENTION for removable disks. | 1679 | * sd.c: Only pay attention to UNIT_ATTENTION for removable disks. |
1680 | Fix bug where sometimes portions of blocks would get lost | 1680 | Fix bug where sometimes portions of blocks would get lost |
1681 | resulting in processes hanging. Add messages when we spin up a | 1681 | resulting in processes hanging. Add messages when we spin up a |
1682 | disk, and fix a bug in the timing. Increase read-ahead for disks | 1682 | disk, and fix a bug in the timing. Increase read-ahead for disks |
1683 | that are on a scatter-gather capable host adapter. | 1683 | that are on a scatter-gather capable host adapter. |
1684 | 1684 | ||
1685 | * seagate.c: Fix so that some parameters can be set from the lilo | 1685 | * seagate.c: Fix so that some parameters can be set from the lilo |
1686 | prompt. Supply jumpstart flag if we are resetting and need the | 1686 | prompt. Supply jumpstart flag if we are resetting and need the |
1687 | command restarted. Fix so that we return 1 if we detect a card | 1687 | command restarted. Fix so that we return 1 if we detect a card |
1688 | so that multiple card detection works correctly. Add yet another | 1688 | so that multiple card detection works correctly. Add yet another |
1689 | signature for FD cards (950). Add another signature for ST0x. | 1689 | signature for FD cards (950). Add another signature for ST0x. |
1690 | 1690 | ||
1691 | * sg.c, sg.h: New files from Lawrence Foard for generic scsi | 1691 | * sg.c, sg.h: New files from Lawrence Foard for generic scsi |
1692 | access. | 1692 | access. |
1693 | 1693 | ||
1694 | * sr.c: Add type casts for (void*) so that we can do pointer | 1694 | * sr.c: Add type casts for (void*) so that we can do pointer |
1695 | arithmetic. Works with GCC without this, but it is not strictly | 1695 | arithmetic. Works with GCC without this, but it is not strictly |
1696 | correct. Same bugfix as was in sd.c. Increase read-ahead a la | 1696 | correct. Same bugfix as was in sd.c. Increase read-ahead a la |
1697 | disk driver. | 1697 | disk driver. |
1698 | 1698 | ||
1699 | * sr_ioctl.c: Use scsi_malloc buffer instead of buffer from stack | 1699 | * sr_ioctl.c: Use scsi_malloc buffer instead of buffer from stack |
1700 | since we cannot guarantee that the stack is < 16Mb. | 1700 | since we cannot guarantee that the stack is < 16Mb. |
1701 | 1701 | ||
1702 | ultrastor.c: Update to support 24f properly (JFC's driver). | 1702 | ultrastor.c: Update to support 24f properly (JFC's driver). |
1703 | 1703 | ||
1704 | wd7000.c: Supply jumpstart flag for reset. Do not round up | 1704 | wd7000.c: Supply jumpstart flag for reset. Do not round up |
1705 | number of cylinders in biosparam function. | 1705 | number of cylinders in biosparam function. |
1706 | 1706 | ||
1707 | Sat Sep 4 20:49:56 1993 | 1707 | Sat Sep 4 20:49:56 1993 |
1708 | 1708 | ||
1709 | * 0.99pl13 released. | 1709 | * 0.99pl13 released. |
1710 | 1710 | ||
1711 | * Throughout: Use check_region/snarf_region for all low-level | 1711 | * Throughout: Use check_region/snarf_region for all low-level |
1712 | drivers. | 1712 | drivers. |
1713 | 1713 | ||
1714 | * aha1542.c: Do hard reset instead of soft (some ethercard probes | 1714 | * aha1542.c: Do hard reset instead of soft (some ethercard probes |
1715 | screw us up). | 1715 | screw us up). |
1716 | 1716 | ||
1717 | * scsi.c: Add new flag ASKED_FOR_SENSE so that we can tell if we are | 1717 | * scsi.c: Add new flag ASKED_FOR_SENSE so that we can tell if we are |
1718 | in a loop whereby the device returns null sense data. | 1718 | in a loop whereby the device returns null sense data. |
1719 | 1719 | ||
1720 | * sd.c: Add code to spin up a drive if it is not already spinning. | 1720 | * sd.c: Add code to spin up a drive if it is not already spinning. |
1721 | Do this one at a time to make it easier on power supplies. | 1721 | Do this one at a time to make it easier on power supplies. |
1722 | 1722 | ||
1723 | * sd_ioctl.c: Use sync_dev instead of fsync_dev in BLKFLSBUF ioctl. | 1723 | * sd_ioctl.c: Use sync_dev instead of fsync_dev in BLKFLSBUF ioctl. |
1724 | 1724 | ||
1725 | * seagate.c: Switch around DATA/CONTROL lines. | 1725 | * seagate.c: Switch around DATA/CONTROL lines. |
1726 | 1726 | ||
1727 | * st.c: Change sense to unsigned. | 1727 | * st.c: Change sense to unsigned. |
1728 | 1728 | ||
1729 | Thu Aug 5 11:59:18 1993 | 1729 | Thu Aug 5 11:59:18 1993 |
1730 | 1730 | ||
1731 | * 0.99pl12 released. | 1731 | * 0.99pl12 released. |
1732 | 1732 | ||
1733 | * constants.c, constants.h: New files with ascii descriptions of | 1733 | * constants.c, constants.h: New files with ascii descriptions of |
1734 | various conditions. | 1734 | various conditions. |
1735 | 1735 | ||
1736 | * Makefile: Do not try to count the number of low-level drivers, | 1736 | * Makefile: Do not try to count the number of low-level drivers, |
1737 | just generate the list of .o files. | 1737 | just generate the list of .o files. |
1738 | 1738 | ||
1739 | * aha1542.c: Replace 16 with sizeof(SCpnt->sense_buffer). Add tests | 1739 | * aha1542.c: Replace 16 with sizeof(SCpnt->sense_buffer). Add tests |
1740 | for addresses > 16Mb, panic if we find one. | 1740 | for addresses > 16Mb, panic if we find one. |
1741 | 1741 | ||
1742 | * aha1740.c: Ditto with sizeof(). | 1742 | * aha1740.c: Ditto with sizeof(). |
1743 | 1743 | ||
1744 | * fdomain.c: Update to version 3.18. Add new signature, register IRQ | 1744 | * fdomain.c: Update to version 3.18. Add new signature, register IRQ |
1745 | with irqaction. Use ID 7 for new board. Be more intelligent about | 1745 | with irqaction. Use ID 7 for new board. Be more intelligent about |
1746 | obtaining the h/s/c numbers for biosparam. | 1746 | obtaining the h/s/c numbers for biosparam. |
1747 | 1747 | ||
1748 | * hosts.c: Do not depend upon Makefile generated count of the number | 1748 | * hosts.c: Do not depend upon Makefile generated count of the number |
1749 | of low-level host adapters. | 1749 | of low-level host adapters. |
1750 | 1750 | ||
1751 | * scsi.c: Use array for scsi_command_size instead of a function. Add | 1751 | * scsi.c: Use array for scsi_command_size instead of a function. Add |
1752 | Texel cdrom and Maxtor XT-4380S to blacklist. Allow compile time | 1752 | Texel cdrom and Maxtor XT-4380S to blacklist. Allow compile time |
1753 | option for no-multi lun scan. Add semaphore for possible problems | 1753 | option for no-multi lun scan. Add semaphore for possible problems |
1754 | with handshaking, assume device is faulty until we know it not to be | 1754 | with handshaking, assume device is faulty until we know it not to be |
1755 | the case. Add DEBUG_INIT symbol to dump info as we scan for devices. | 1755 | the case. Add DEBUG_INIT symbol to dump info as we scan for devices. |
1756 | Zero sense buffer so we can tell if we need to request it. When | 1756 | Zero sense buffer so we can tell if we need to request it. When |
1757 | examining sense information, request sense if buffer is all zero. | 1757 | examining sense information, request sense if buffer is all zero. |
1758 | If RESET, request sense information to see what to do next. | 1758 | If RESET, request sense information to see what to do next. |
1759 | 1759 | ||
1760 | * scsi_debug.c: Change some constants to use symbols like INT_MAX. | 1760 | * scsi_debug.c: Change some constants to use symbols like INT_MAX. |
1761 | 1761 | ||
1762 | * scsi_ioctl.c (kernel_scsi_ioctl): New function -for making ioctl | 1762 | * scsi_ioctl.c (kernel_scsi_ioctl): New function -for making ioctl |
1763 | calls from kernel space. | 1763 | calls from kernel space. |
1764 | 1764 | ||
1765 | * sd.c: Increase timeout to 300. Use functions in constants.h to | 1765 | * sd.c: Increase timeout to 300. Use functions in constants.h to |
1766 | display info. Use scsi_malloc buffer for READ_CAPACITY, since | 1766 | display info. Use scsi_malloc buffer for READ_CAPACITY, since |
1767 | we cannot guarantee that a stack based buffer is < 16Mb. | 1767 | we cannot guarantee that a stack based buffer is < 16Mb. |
1768 | 1768 | ||
1769 | * sd_ioctl.c: Add BLKFLSBUF ioctl. | 1769 | * sd_ioctl.c: Add BLKFLSBUF ioctl. |
1770 | 1770 | ||
1771 | * seagate.c: Add new compile time options for ARBITRATE, | 1771 | * seagate.c: Add new compile time options for ARBITRATE, |
1772 | SLOW_HANDSHAKE, and SLOW_RATE. Update assembly loops for transferring | 1772 | SLOW_HANDSHAKE, and SLOW_RATE. Update assembly loops for transferring |
1773 | data. Use kernel_scsi_ioctl to request mode page with geometry. | 1773 | data. Use kernel_scsi_ioctl to request mode page with geometry. |
1774 | 1774 | ||
1775 | * sr.c: Use functions in constants.c to display messages. | 1775 | * sr.c: Use functions in constants.c to display messages. |
1776 | 1776 | ||
1777 | * st.c: Support for variable block size. | 1777 | * st.c: Support for variable block size. |
1778 | 1778 | ||
1779 | * ultrastor.c: Do not use cache for tape drives. Set | 1779 | * ultrastor.c: Do not use cache for tape drives. Set |
1780 | unchecked_isa_dma flag, even though this may not be needed (gets set | 1780 | unchecked_isa_dma flag, even though this may not be needed (gets set |
1781 | later). | 1781 | later). |
1782 | 1782 | ||
1783 | Sat Jul 17 18:32:44 1993 | 1783 | Sat Jul 17 18:32:44 1993 |
1784 | 1784 | ||
1785 | * 0.99pl11 released. C++ compilable. | 1785 | * 0.99pl11 released. C++ compilable. |
1786 | 1786 | ||
1787 | * Throughout: Add type casts all over the place, and use "ip" instead | 1787 | * Throughout: Add type casts all over the place, and use "ip" instead |
1788 | of "info" in the various biosparam functions. | 1788 | of "info" in the various biosparam functions. |
1789 | 1789 | ||
1790 | * Makefile: Compile seagate.c with C++ compiler. | 1790 | * Makefile: Compile seagate.c with C++ compiler. |
1791 | 1791 | ||
1792 | * aha1542.c: Always set ccb pointer as this gets trashed somehow on | 1792 | * aha1542.c: Always set ccb pointer as this gets trashed somehow on |
1793 | some systems. Add a few type casts. Update biosparam function a little. | 1793 | some systems. Add a few type casts. Update biosparam function a little. |
1794 | 1794 | ||
1795 | * aha1740.c: Add a few type casts. | 1795 | * aha1740.c: Add a few type casts. |
1796 | 1796 | ||
1797 | * fdomain.c: Update to version 3.17 from 3.6. Now works with | 1797 | * fdomain.c: Update to version 3.17 from 3.6. Now works with |
1798 | TMC-18C50. | 1798 | TMC-18C50. |
1799 | 1799 | ||
1800 | * scsi.c: Minor changes here and there with datatypes. Save use_sg | 1800 | * scsi.c: Minor changes here and there with datatypes. Save use_sg |
1801 | when requesting sense information so that this can properly be | 1801 | when requesting sense information so that this can properly be |
1802 | restored if we retry the command. Set aside dma buffers assuming each | 1802 | restored if we retry the command. Set aside dma buffers assuming each |
1803 | block is 1 page, not 1Kb minix block. | 1803 | block is 1 page, not 1Kb minix block. |
1804 | 1804 | ||
1805 | * scsi_ioctl.c: Add a few type casts. Other minor changes. | 1805 | * scsi_ioctl.c: Add a few type casts. Other minor changes. |
1806 | 1806 | ||
1807 | * sd.c: Correctly free all scsi_malloc'd memory if we run out of | 1807 | * sd.c: Correctly free all scsi_malloc'd memory if we run out of |
1808 | dma_pool. Store blocksize information for each partition. | 1808 | dma_pool. Store blocksize information for each partition. |
1809 | 1809 | ||
1810 | * seagate.c: Minor cleanups here and there. | 1810 | * seagate.c: Minor cleanups here and there. |
1811 | 1811 | ||
1812 | * sr.c: Set up blocksize array for all discs. Fix bug in freeing | 1812 | * sr.c: Set up blocksize array for all discs. Fix bug in freeing |
1813 | buffers if we run out of dma pool. | 1813 | buffers if we run out of dma pool. |
1814 | 1814 | ||
1815 | Thu Jun 2 17:58:11 1993 | 1815 | Thu Jun 2 17:58:11 1993 |
1816 | 1816 | ||
1817 | * 0.99pl10 released. | 1817 | * 0.99pl10 released. |
1818 | 1818 | ||
1819 | * aha1542.c: Support for BT 445S (VL-bus board with no dma channel). | 1819 | * aha1542.c: Support for BT 445S (VL-bus board with no dma channel). |
1820 | 1820 | ||
1821 | * fdomain.c: Upgrade to version 3.6. Preliminary support for TNC-18C50. | 1821 | * fdomain.c: Upgrade to version 3.6. Preliminary support for TNC-18C50. |
1822 | 1822 | ||
1823 | * scsi.c: First attempt to fix problem with old_use_sg. Change | 1823 | * scsi.c: First attempt to fix problem with old_use_sg. Change |
1824 | NOT_READY to a SUGGEST_ABORT. Fix timeout race where time might | 1824 | NOT_READY to a SUGGEST_ABORT. Fix timeout race where time might |
1825 | get decremented past zero. | 1825 | get decremented past zero. |
1826 | 1826 | ||
1827 | * sd.c: Add block_fsync function to dispatch table. | 1827 | * sd.c: Add block_fsync function to dispatch table. |
1828 | 1828 | ||
1829 | * sr.c: Increase timeout to 500 from 250. Add entry for sync in | 1829 | * sr.c: Increase timeout to 500 from 250. Add entry for sync in |
1830 | dispatch table (supply NULL). If we do not have a sectorsize, | 1830 | dispatch table (supply NULL). If we do not have a sectorsize, |
1831 | try to get it in the sd_open function. Add new function just to | 1831 | try to get it in the sd_open function. Add new function just to |
1832 | obtain sectorsize. | 1832 | obtain sectorsize. |
1833 | 1833 | ||
1834 | * sr.h: Add needs_sector_size semaphore. | 1834 | * sr.h: Add needs_sector_size semaphore. |
1835 | 1835 | ||
1836 | * st.c: Add NULL for fsync in dispatch table. | 1836 | * st.c: Add NULL for fsync in dispatch table. |
1837 | 1837 | ||
1838 | * wd7000.c: Allow another condition for power on that are normal | 1838 | * wd7000.c: Allow another condition for power on that are normal |
1839 | and do not require a panic. | 1839 | and do not require a panic. |
1840 | 1840 | ||
1841 | Thu Apr 22 23:10:11 1993 | 1841 | Thu Apr 22 23:10:11 1993 |
1842 | 1842 | ||
1843 | * 0.99pl9 released. | 1843 | * 0.99pl9 released. |
1844 | 1844 | ||
1845 | * aha1542.c: Use (void) instead of () in setup_mailboxes. | 1845 | * aha1542.c: Use (void) instead of () in setup_mailboxes. |
1846 | 1846 | ||
1847 | * scsi.c: Initialize transfersize and underflow fields in SCmd to 0. | 1847 | * scsi.c: Initialize transfersize and underflow fields in SCmd to 0. |
1848 | Do not panic for unsupported message bytes. | 1848 | Do not panic for unsupported message bytes. |
1849 | 1849 | ||
1850 | * scsi.h: Allocate 12 bytes instead of 10 for commands. Add | 1850 | * scsi.h: Allocate 12 bytes instead of 10 for commands. Add |
1851 | transfersize and underflow fields. | 1851 | transfersize and underflow fields. |
1852 | 1852 | ||
1853 | * scsi_ioctl.c: Further bugfix to ioctl_probe. | 1853 | * scsi_ioctl.c: Further bugfix to ioctl_probe. |
1854 | 1854 | ||
1855 | * sd.c: Use long instead of int for last parameter in sd_ioctl. | 1855 | * sd.c: Use long instead of int for last parameter in sd_ioctl. |
1856 | Initialize transfersize and underflow fields. | 1856 | Initialize transfersize and underflow fields. |
1857 | 1857 | ||
1858 | * sd_ioctl.c: Ditto for sd_ioctl(,,,,); | 1858 | * sd_ioctl.c: Ditto for sd_ioctl(,,,,); |
1859 | 1859 | ||
1860 | * seagate.c: New version from Drew. Includes new signatures for FD | 1860 | * seagate.c: New version from Drew. Includes new signatures for FD |
1861 | cards. Support for 0ws jumper. Correctly initialize | 1861 | cards. Support for 0ws jumper. Correctly initialize |
1862 | scsi_hosts[hostnum].this_id. Improved handing of | 1862 | scsi_hosts[hostnum].this_id. Improved handing of |
1863 | disconnect/reconnect, and support command linking. Use | 1863 | disconnect/reconnect, and support command linking. Use |
1864 | transfersize and underflow fields. Support scatter-gather. | 1864 | transfersize and underflow fields. Support scatter-gather. |
1865 | 1865 | ||
1866 | * sr.c, sr_ioctl.c: Use long instead of int for last parameter in sr_ioctl. | 1866 | * sr.c, sr_ioctl.c: Use long instead of int for last parameter in sr_ioctl. |
1867 | Use buffer and buflength in do_ioctl. Patches from Chris Newbold for | 1867 | Use buffer and buflength in do_ioctl. Patches from Chris Newbold for |
1868 | scsi-2 audio commands. | 1868 | scsi-2 audio commands. |
1869 | 1869 | ||
1870 | * ultrastor.c: Comment out in_byte (compiler warning). | 1870 | * ultrastor.c: Comment out in_byte (compiler warning). |
1871 | 1871 | ||
1872 | * wd7000.c: Change () to (void) in wd7000_enable_dma. | 1872 | * wd7000.c: Change () to (void) in wd7000_enable_dma. |
1873 | 1873 | ||
1874 | Wed Mar 31 16:36:25 1993 | 1874 | Wed Mar 31 16:36:25 1993 |
1875 | 1875 | ||
1876 | * 0.99pl8 released. | 1876 | * 0.99pl8 released. |
1877 | 1877 | ||
1878 | * aha1542.c: Handle mailboxes better for 1542C. | 1878 | * aha1542.c: Handle mailboxes better for 1542C. |
1879 | Do not truncate number of cylinders at 1024 for biosparam call. | 1879 | Do not truncate number of cylinders at 1024 for biosparam call. |
1880 | 1880 | ||
1881 | * aha1740.c: Fix a few minor bugs for multiple devices. | 1881 | * aha1740.c: Fix a few minor bugs for multiple devices. |
1882 | Same as above for biosparam. | 1882 | Same as above for biosparam. |
1883 | 1883 | ||
1884 | * scsi.c: Add lockable semaphore for removable devices that can have | 1884 | * scsi.c: Add lockable semaphore for removable devices that can have |
1885 | media removal prevented. Add another signature for flopticals. | 1885 | media removal prevented. Add another signature for flopticals. |
1886 | (allocate_device): Fix race condition. Allow more space in dma pool | 1886 | (allocate_device): Fix race condition. Allow more space in dma pool |
1887 | for blocksizes of up to 4Kb. | 1887 | for blocksizes of up to 4Kb. |
1888 | 1888 | ||
1889 | * scsi.h: Define COMMAND_SIZE. Define a SCSI specific version of | 1889 | * scsi.h: Define COMMAND_SIZE. Define a SCSI specific version of |
1890 | INIT_REQUEST that can run with interrupts off. | 1890 | INIT_REQUEST that can run with interrupts off. |
1891 | 1891 | ||
1892 | * scsi_ioctl.c: Make ioctl_probe function more idiot-proof. If | 1892 | * scsi_ioctl.c: Make ioctl_probe function more idiot-proof. If |
1893 | a removable device says ILLEGAL REQUEST to a door-locking command, | 1893 | a removable device says ILLEGAL REQUEST to a door-locking command, |
1894 | clear lockable flag. Add SCSI_IOCTL_GET_IDLUN ioctl. Do not attempt | 1894 | clear lockable flag. Add SCSI_IOCTL_GET_IDLUN ioctl. Do not attempt |
1895 | to lock door for devices that do not have lockable semaphore set. | 1895 | to lock door for devices that do not have lockable semaphore set. |
1896 | 1896 | ||
1897 | * sd.c: Fix race condition for multiple disks. Use INIT_SCSI_REQUEST | 1897 | * sd.c: Fix race condition for multiple disks. Use INIT_SCSI_REQUEST |
1898 | instead of INIT_REQUEST. Allow sector sizes of 1024 and 256. For | 1898 | instead of INIT_REQUEST. Allow sector sizes of 1024 and 256. For |
1899 | removable disks that are not ready, mark them as having a media change | 1899 | removable disks that are not ready, mark them as having a media change |
1900 | (some drives do not report this later). | 1900 | (some drives do not report this later). |
1901 | 1901 | ||
1902 | * seagate.c: Use volatile keyword for memory-mapped register pointers. | 1902 | * seagate.c: Use volatile keyword for memory-mapped register pointers. |
1903 | 1903 | ||
1904 | * sr.c: Fix race condition, a la sd.c. Increase the number of retries | 1904 | * sr.c: Fix race condition, a la sd.c. Increase the number of retries |
1905 | to 1. Use INIT_SCSI_REQUEST. Allow 512 byte sector sizes. Do a | 1905 | to 1. Use INIT_SCSI_REQUEST. Allow 512 byte sector sizes. Do a |
1906 | read_capacity when we init the device so we know the size and | 1906 | read_capacity when we init the device so we know the size and |
1907 | sectorsize. | 1907 | sectorsize. |
1908 | 1908 | ||
1909 | * st.c: If ioctl not found in st.c, try scsi_ioctl for others. | 1909 | * st.c: If ioctl not found in st.c, try scsi_ioctl for others. |
1910 | 1910 | ||
1911 | * ultrastor.c: Do not truncate number of cylinders at 1024 for | 1911 | * ultrastor.c: Do not truncate number of cylinders at 1024 for |
1912 | biosparam call. | 1912 | biosparam call. |
1913 | 1913 | ||
1914 | * wd7000.c: Ditto. | 1914 | * wd7000.c: Ditto. |
1915 | Throughout: Use COMMAND_SIZE macro to determine length of scsi | 1915 | Throughout: Use COMMAND_SIZE macro to determine length of scsi |
1916 | command. | 1916 | command. |
1917 | 1917 | ||
1918 | 1918 | ||
1919 | 1919 | ||
1920 | Sat Mar 13 17:31:29 1993 | 1920 | Sat Mar 13 17:31:29 1993 |
1921 | 1921 | ||
1922 | * 0.99pl7 released. | 1922 | * 0.99pl7 released. |
1923 | 1923 | ||
1924 | Throughout: Improve punctuation in some messages, and use new | 1924 | Throughout: Improve punctuation in some messages, and use new |
1925 | verify_area syntax. | 1925 | verify_area syntax. |
1926 | 1926 | ||
1927 | * aha1542.c: Handle unexpected interrupts better. | 1927 | * aha1542.c: Handle unexpected interrupts better. |
1928 | 1928 | ||
1929 | * scsi.c: Ditto. Handle reset conditions a bit better, asking for | 1929 | * scsi.c: Ditto. Handle reset conditions a bit better, asking for |
1930 | sense information and retrying if required. | 1930 | sense information and retrying if required. |
1931 | 1931 | ||
1932 | * scsi_ioctl.c: Allow for 12 byte scsi commands. | 1932 | * scsi_ioctl.c: Allow for 12 byte scsi commands. |
1933 | 1933 | ||
1934 | * ultrastor.c: Update to use scatter-gather. | 1934 | * ultrastor.c: Update to use scatter-gather. |
1935 | 1935 | ||
1936 | Sat Feb 20 17:57:15 1993 | 1936 | Sat Feb 20 17:57:15 1993 |
1937 | 1937 | ||
1938 | * 0.99pl6 released. | 1938 | * 0.99pl6 released. |
1939 | 1939 | ||
1940 | * fdomain.c: Update to version 3.5. Handle spurious interrupts | 1940 | * fdomain.c: Update to version 3.5. Handle spurious interrupts |
1941 | better. | 1941 | better. |
1942 | 1942 | ||
1943 | * sd.c: Use register_blkdev function. | 1943 | * sd.c: Use register_blkdev function. |
1944 | 1944 | ||
1945 | * sr.c: Ditto. | 1945 | * sr.c: Ditto. |
1946 | 1946 | ||
1947 | * st.c: Use register_chrdev function. | 1947 | * st.c: Use register_chrdev function. |
1948 | 1948 | ||
1949 | * wd7000.c: Undo previous change. | 1949 | * wd7000.c: Undo previous change. |
1950 | 1950 | ||
1951 | Sat Feb 6 11:20:43 1993 | 1951 | Sat Feb 6 11:20:43 1993 |
1952 | 1952 | ||
1953 | * 0.99pl5 released. | 1953 | * 0.99pl5 released. |
1954 | 1954 | ||
1955 | * scsi.c: Fix bug in testing for UNIT_ATTENTION. | 1955 | * scsi.c: Fix bug in testing for UNIT_ATTENTION. |
1956 | 1956 | ||
1957 | * wd7000.c: Check at more addresses for bios. Fix bug in biosparam | 1957 | * wd7000.c: Check at more addresses for bios. Fix bug in biosparam |
1958 | (heads & sectors turned around). | 1958 | (heads & sectors turned around). |
1959 | 1959 | ||
1960 | Wed Jan 20 18:13:59 1993 | 1960 | Wed Jan 20 18:13:59 1993 |
1961 | 1961 | ||
1962 | * 0.99pl4 released. | 1962 | * 0.99pl4 released. |
1963 | 1963 | ||
1964 | * scsi.c: Ignore leading spaces when looking for blacklisted devices. | 1964 | * scsi.c: Ignore leading spaces when looking for blacklisted devices. |
1965 | 1965 | ||
1966 | * seagate.c: Add a few new signatures for FD cards. Another patch | 1966 | * seagate.c: Add a few new signatures for FD cards. Another patch |
1967 | with SCint to fix race condition. Use recursion_depth to keep track | 1967 | with SCint to fix race condition. Use recursion_depth to keep track |
1968 | of how many times we have been recursively called, and do not start | 1968 | of how many times we have been recursively called, and do not start |
1969 | another command unless we are on the outer level. Fixes bug | 1969 | another command unless we are on the outer level. Fixes bug |
1970 | with Syquest cartridge drives (used to crash kernel), because | 1970 | with Syquest cartridge drives (used to crash kernel), because |
1971 | they do not disconnect with large data transfers. | 1971 | they do not disconnect with large data transfers. |
1972 | 1972 | ||
1973 | Tue Jan 12 14:33:36 1993 | 1973 | Tue Jan 12 14:33:36 1993 |
1974 | 1974 | ||
1975 | * 0.99pl3 released. | 1975 | * 0.99pl3 released. |
1976 | 1976 | ||
1977 | * fdomain.c: Update to version 3.3 (a few new signatures). | 1977 | * fdomain.c: Update to version 3.3 (a few new signatures). |
1978 | 1978 | ||
1979 | * scsi.c: Add CDU-541, Denon DRD-25X to blacklist. | 1979 | * scsi.c: Add CDU-541, Denon DRD-25X to blacklist. |
1980 | (allocate_request, request_queueable): Init request.waiting to NULL if | 1980 | (allocate_request, request_queueable): Init request.waiting to NULL if |
1981 | non-buffer type of request. | 1981 | non-buffer type of request. |
1982 | 1982 | ||
1983 | * seagate.c: Allow controller to be overridden with CONTROLLER symbol. | 1983 | * seagate.c: Allow controller to be overridden with CONTROLLER symbol. |
1984 | Set SCint=NULL when we are done, to remove race condition. | 1984 | Set SCint=NULL when we are done, to remove race condition. |
1985 | 1985 | ||
1986 | * st.c: Changes from Kai. | 1986 | * st.c: Changes from Kai. |
1987 | 1987 | ||
1988 | Wed Dec 30 20:03:47 1992 | 1988 | Wed Dec 30 20:03:47 1992 |
1989 | 1989 | ||
1990 | * 0.99pl2 released. | 1990 | * 0.99pl2 released. |
1991 | 1991 | ||
1992 | * scsi.c: Blacklist back in. Remove Newbury drive as other bugfix | 1992 | * scsi.c: Blacklist back in. Remove Newbury drive as other bugfix |
1993 | eliminates need for it here. | 1993 | eliminates need for it here. |
1994 | 1994 | ||
1995 | * sd.c: Return ENODEV instead of EACCES if no such device available. | 1995 | * sd.c: Return ENODEV instead of EACCES if no such device available. |
1996 | (sd_init) Init blkdev_fops earlier so that sd_open is available sooner. | 1996 | (sd_init) Init blkdev_fops earlier so that sd_open is available sooner. |
1997 | 1997 | ||
1998 | * sr.c: Same as above for sd.c. | 1998 | * sr.c: Same as above for sd.c. |
1999 | 1999 | ||
2000 | * st.c: Return ENODEV instead of ENXIO if no device. Init chrdev_fops | 2000 | * st.c: Return ENODEV instead of ENXIO if no device. Init chrdev_fops |
2001 | sooner, so that it is always there even if no tapes. | 2001 | sooner, so that it is always there even if no tapes. |
2002 | 2002 | ||
2003 | * seagate.c (controller_type): New variable to keep track of ST0x or | 2003 | * seagate.c (controller_type): New variable to keep track of ST0x or |
2004 | FD. Modify signatures list to indicate controller type, and init | 2004 | FD. Modify signatures list to indicate controller type, and init |
2005 | controller_type once we find a match. | 2005 | controller_type once we find a match. |
2006 | 2006 | ||
2007 | * wd7000.c (wd7000_set_sync): Remove redundant function. | 2007 | * wd7000.c (wd7000_set_sync): Remove redundant function. |
2008 | 2008 | ||
2009 | Sun Dec 20 16:26:24 1992 | 2009 | Sun Dec 20 16:26:24 1992 |
2010 | 2010 | ||
2011 | * 0.99pl1 released. | 2011 | * 0.99pl1 released. |
2012 | 2012 | ||
2013 | * scsi_ioctl.c: Bugfix - check dev->index, not dev->id against | 2013 | * scsi_ioctl.c: Bugfix - check dev->index, not dev->id against |
2014 | NR_SCSI_DEVICES. | 2014 | NR_SCSI_DEVICES. |
2015 | 2015 | ||
2016 | * sr_ioctl.c: Verify that device exists before allowing an ioctl. | 2016 | * sr_ioctl.c: Verify that device exists before allowing an ioctl. |
2017 | 2017 | ||
2018 | * st.c: Patches from Kai - change timeout values, improve end of tape | 2018 | * st.c: Patches from Kai - change timeout values, improve end of tape |
2019 | handling. | 2019 | handling. |
2020 | 2020 | ||
2021 | Sun Dec 13 18:15:23 1992 | 2021 | Sun Dec 13 18:15:23 1992 |
2022 | 2022 | ||
2023 | * 0.99 kernel released. Baseline for this ChangeLog. | 2023 | * 0.99 kernel released. Baseline for this ChangeLog. |
2024 | 2024 |
Documentation/scsi/st.txt
1 | This file contains brief information about the SCSI tape driver. | 1 | This file contains brief information about the SCSI tape driver. |
2 | The driver is currently maintained by Kai Mรคkisara (email | 2 | The driver is currently maintained by Kai Mรคkisara (email |
3 | Kai.Makisara@kolumbus.fi) | 3 | Kai.Makisara@kolumbus.fi) |
4 | 4 | ||
5 | Last modified: Mon Mar 7 21:14:44 2005 by kai.makisara | 5 | Last modified: Mon Mar 7 21:14:44 2005 by kai.makisara |
6 | 6 | ||
7 | 7 | ||
8 | BASICS | 8 | BASICS |
9 | 9 | ||
10 | The driver is generic, i.e., it does not contain any code tailored | 10 | The driver is generic, i.e., it does not contain any code tailored |
11 | to any specific tape drive. The tape parameters can be specified with | 11 | to any specific tape drive. The tape parameters can be specified with |
12 | one of the following three methods: | 12 | one of the following three methods: |
13 | 13 | ||
14 | 1. Each user can specify the tape parameters he/she wants to use | 14 | 1. Each user can specify the tape parameters he/she wants to use |
15 | directly with ioctls. This is administratively a very simple and | 15 | directly with ioctls. This is administratively a very simple and |
16 | flexible method and applicable to single-user workstations. However, | 16 | flexible method and applicable to single-user workstations. However, |
17 | in a multiuser environment the next user finds the tape parameters in | 17 | in a multiuser environment the next user finds the tape parameters in |
18 | state the previous user left them. | 18 | state the previous user left them. |
19 | 19 | ||
20 | 2. The system manager (root) can define default values for some tape | 20 | 2. The system manager (root) can define default values for some tape |
21 | parameters, like block size and density using the MTSETDRVBUFFER ioctl. | 21 | parameters, like block size and density using the MTSETDRVBUFFER ioctl. |
22 | These parameters can be programmed to come into effect either when a | 22 | These parameters can be programmed to come into effect either when a |
23 | new tape is loaded into the drive or if writing begins at the | 23 | new tape is loaded into the drive or if writing begins at the |
24 | beginning of the tape. The second method is applicable if the tape | 24 | beginning of the tape. The second method is applicable if the tape |
25 | drive performs auto-detection of the tape format well (like some | 25 | drive performs auto-detection of the tape format well (like some |
26 | QIC-drives). The result is that any tape can be read, writing can be | 26 | QIC-drives). The result is that any tape can be read, writing can be |
27 | continued using existing format, and the default format is used if | 27 | continued using existing format, and the default format is used if |
28 | the tape is rewritten from the beginning (or a new tape is written | 28 | the tape is rewritten from the beginning (or a new tape is written |
29 | for the first time). The first method is applicable if the drive | 29 | for the first time). The first method is applicable if the drive |
30 | does not perform auto-detection well enough and there is a single | 30 | does not perform auto-detection well enough and there is a single |
31 | "sensible" mode for the device. An example is a DAT drive that is | 31 | "sensible" mode for the device. An example is a DAT drive that is |
32 | used only in variable block mode (I don't know if this is sensible | 32 | used only in variable block mode (I don't know if this is sensible |
33 | or not :-). | 33 | or not :-). |
34 | 34 | ||
35 | The user can override the parameters defined by the system | 35 | The user can override the parameters defined by the system |
36 | manager. The changes persist until the defaults again come into | 36 | manager. The changes persist until the defaults again come into |
37 | effect. | 37 | effect. |
38 | 38 | ||
39 | 3. By default, up to four modes can be defined and selected using the minor | 39 | 3. By default, up to four modes can be defined and selected using the minor |
40 | number (bits 5 and 6). The number of modes can be changed by changing | 40 | number (bits 5 and 6). The number of modes can be changed by changing |
41 | ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed | 41 | ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed |
42 | above. Additional modes are dormant until they are defined by the | 42 | above. Additional modes are dormant until they are defined by the |
43 | system manager (root). When specification of a new mode is started, | 43 | system manager (root). When specification of a new mode is started, |
44 | the configuration of mode 0 is used to provide a starting point for | 44 | the configuration of mode 0 is used to provide a starting point for |
45 | definition of the new mode. | 45 | definition of the new mode. |
46 | 46 | ||
47 | Using the modes allows the system manager to give the users choices | 47 | Using the modes allows the system manager to give the users choices |
48 | over some of the buffering parameters not directly accessible to the | 48 | over some of the buffering parameters not directly accessible to the |
49 | users (buffered and asynchronous writes). The modes also allow choices | 49 | users (buffered and asynchronous writes). The modes also allow choices |
50 | between formats in multi-tape operations (the explicitly overridden | 50 | between formats in multi-tape operations (the explicitly overridden |
51 | parameters are reset when a new tape is loaded). | 51 | parameters are reset when a new tape is loaded). |
52 | 52 | ||
53 | If more than one mode is used, all modes should contain definitions | 53 | If more than one mode is used, all modes should contain definitions |
54 | for the same set of parameters. | 54 | for the same set of parameters. |
55 | 55 | ||
56 | Many Unices contain internal tables that associate different modes to | 56 | Many Unices contain internal tables that associate different modes to |
57 | supported devices. The Linux SCSI tape driver does not contain such | 57 | supported devices. The Linux SCSI tape driver does not contain such |
58 | tables (and will not do that in future). Instead of that, a utility | 58 | tables (and will not do that in future). Instead of that, a utility |
59 | program can be made that fetches the inquiry data sent by the device, | 59 | program can be made that fetches the inquiry data sent by the device, |
60 | scans its database, and sets up the modes using the ioctls. Another | 60 | scans its database, and sets up the modes using the ioctls. Another |
61 | alternative is to make a small script that uses mt to set the defaults | 61 | alternative is to make a small script that uses mt to set the defaults |
62 | tailored to the system. | 62 | tailored to the system. |
63 | 63 | ||
64 | The driver supports fixed and variable block size (within buffer | 64 | The driver supports fixed and variable block size (within buffer |
65 | limits). Both the auto-rewind (minor equals device number) and | 65 | limits). Both the auto-rewind (minor equals device number) and |
66 | non-rewind devices (minor is 128 + device number) are implemented. | 66 | non-rewind devices (minor is 128 + device number) are implemented. |
67 | 67 | ||
68 | In variable block mode, the byte count in write() determines the size | 68 | In variable block mode, the byte count in write() determines the size |
69 | of the physical block on tape. When reading, the drive reads the next | 69 | of the physical block on tape. When reading, the drive reads the next |
70 | tape block and returns to the user the data if the read() byte count | 70 | tape block and returns to the user the data if the read() byte count |
71 | is at least the block size. Otherwise, error ENOMEM is returned. | 71 | is at least the block size. Otherwise, error ENOMEM is returned. |
72 | 72 | ||
73 | In fixed block mode, the data transfer between the drive and the | 73 | In fixed block mode, the data transfer between the drive and the |
74 | driver is in multiples of the block size. The write() byte count must | 74 | driver is in multiples of the block size. The write() byte count must |
75 | be a multiple of the block size. This is not required when reading but | 75 | be a multiple of the block size. This is not required when reading but |
76 | may be advisable for portability. | 76 | may be advisable for portability. |
77 | 77 | ||
78 | Support is provided for changing the tape partition and partitioning | 78 | Support is provided for changing the tape partition and partitioning |
79 | of the tape with one or two partitions. By default support for | 79 | of the tape with one or two partitions. By default support for |
80 | partitioned tape is disabled for each driver and it can be enabled | 80 | partitioned tape is disabled for each driver and it can be enabled |
81 | with the ioctl MTSETDRVBUFFER. | 81 | with the ioctl MTSETDRVBUFFER. |
82 | 82 | ||
83 | By default the driver writes one filemark when the device is closed after | 83 | By default the driver writes one filemark when the device is closed after |
84 | writing and the last operation has been a write. Two filemarks can be | 84 | writing and the last operation has been a write. Two filemarks can be |
85 | optionally written. In both cases end of data is signified by | 85 | optionally written. In both cases end of data is signified by |
86 | returning zero bytes for two consecutive reads. | 86 | returning zero bytes for two consecutive reads. |
87 | 87 | ||
88 | If rewind, offline, bsf, or seek is done and previous tape operation was | 88 | If rewind, offline, bsf, or seek is done and previous tape operation was |
89 | write, a filemark is written before moving tape. | 89 | write, a filemark is written before moving tape. |
90 | 90 | ||
91 | The compile options are defined in the file linux/drivers/scsi/st_options.h. | 91 | The compile options are defined in the file linux/drivers/scsi/st_options.h. |
92 | 92 | ||
93 | 4. If the open option O_NONBLOCK is used, open succeeds even if the | 93 | 4. If the open option O_NONBLOCK is used, open succeeds even if the |
94 | drive is not ready. If O_NONBLOCK is not used, the driver waits for | 94 | drive is not ready. If O_NONBLOCK is not used, the driver waits for |
95 | the drive to become ready. If this does not happen in ST_BLOCK_SECONDS | 95 | the drive to become ready. If this does not happen in ST_BLOCK_SECONDS |
96 | seconds, open fails with the errno value EIO. With O_NONBLOCK the | 96 | seconds, open fails with the errno value EIO. With O_NONBLOCK the |
97 | device can be opened for writing even if there is a write protected | 97 | device can be opened for writing even if there is a write protected |
98 | tape in the drive (commands trying to write something return error if | 98 | tape in the drive (commands trying to write something return error if |
99 | attempted). | 99 | attempted). |
100 | 100 | ||
101 | 101 | ||
102 | MINOR NUMBERS | 102 | MINOR NUMBERS |
103 | 103 | ||
104 | The tape driver currently supports 128 drives by default. This number | 104 | The tape driver currently supports 128 drives by default. This number |
105 | can be increased by editing st.h and recompiling the driver if | 105 | can be increased by editing st.h and recompiling the driver if |
106 | necessary. The upper limit is 2^17 drives if 4 modes for each drive | 106 | necessary. The upper limit is 2^17 drives if 4 modes for each drive |
107 | are used. | 107 | are used. |
108 | 108 | ||
109 | The minor numbers consist of the following bit fields: | 109 | The minor numbers consist of the following bit fields: |
110 | 110 | ||
111 | dev_upper non-rew mode dev-lower | 111 | dev_upper non-rew mode dev-lower |
112 | 20 - 8 7 6 5 4 0 | 112 | 20 - 8 7 6 5 4 0 |
113 | The non-rewind bit is always bit 7 (the uppermost bit in the lowermost | 113 | The non-rewind bit is always bit 7 (the uppermost bit in the lowermost |
114 | byte). The bits defining the mode are below the non-rewind bit. The | 114 | byte). The bits defining the mode are below the non-rewind bit. The |
115 | remaining bits define the tape device number. This numbering is | 115 | remaining bits define the tape device number. This numbering is |
116 | backward compatible with the numbering used when the minor number was | 116 | backward compatible with the numbering used when the minor number was |
117 | only 8 bits wide. | 117 | only 8 bits wide. |
118 | 118 | ||
119 | 119 | ||
120 | SYSFS SUPPORT | 120 | SYSFS SUPPORT |
121 | 121 | ||
122 | The driver creates the directory /sys/class/scsi_tape and populates it with | 122 | The driver creates the directory /sys/class/scsi_tape and populates it with |
123 | directories corresponding to the existing tape devices. There are autorewind | 123 | directories corresponding to the existing tape devices. There are autorewind |
124 | and non-rewind entries for each mode. The names are stxy and nstxy, where x | 124 | and non-rewind entries for each mode. The names are stxy and nstxy, where x |
125 | is the tape number and y a character corresponding to the mode (none, l, m, | 125 | is the tape number and y a character corresponding to the mode (none, l, m, |
126 | a). For example, the directories for the first tape device are (assuming four | 126 | a). For example, the directories for the first tape device are (assuming four |
127 | modes): st0 nst0 st0l nst0l st0m nst0m st0a nst0a. | 127 | modes): st0 nst0 st0l nst0l st0m nst0m st0a nst0a. |
128 | 128 | ||
129 | Each directory contains the entries: default_blksize default_compression | 129 | Each directory contains the entries: default_blksize default_compression |
130 | default_density defined dev device driver. The file 'defined' contains 1 | 130 | default_density defined dev device driver. The file 'defined' contains 1 |
131 | if the mode is defined and zero if not defined. The files 'default_*' contain | 131 | if the mode is defined and zero if not defined. The files 'default_*' contain |
132 | the defaults set by the user. The value -1 means the default is not set. The | 132 | the defaults set by the user. The value -1 means the default is not set. The |
133 | file 'dev' contains the device numbers corresponding to this device. The links | 133 | file 'dev' contains the device numbers corresponding to this device. The links |
134 | 'device' and 'driver' point to the SCSI device and driver entries. | 134 | 'device' and 'driver' point to the SCSI device and driver entries. |
135 | 135 | ||
136 | A link named 'tape' is made from the SCSI device directory to the class | 136 | A link named 'tape' is made from the SCSI device directory to the class |
137 | directory corresponding to the mode 0 auto-rewind device (e.g., st0). | 137 | directory corresponding to the mode 0 auto-rewind device (e.g., st0). |
138 | 138 | ||
139 | 139 | ||
140 | BSD AND SYS V SEMANTICS | 140 | BSD AND SYS V SEMANTICS |
141 | 141 | ||
142 | The user can choose between these two behaviours of the tape driver by | 142 | The user can choose between these two behaviours of the tape driver by |
143 | defining the value of the symbol ST_SYSV. The semantics differ when a | 143 | defining the value of the symbol ST_SYSV. The semantics differ when a |
144 | file being read is closed. The BSD semantics leaves the tape where it | 144 | file being read is closed. The BSD semantics leaves the tape where it |
145 | currently is whereas the SYS V semantics moves the tape past the next | 145 | currently is whereas the SYS V semantics moves the tape past the next |
146 | filemark unless the filemark has just been crossed. | 146 | filemark unless the filemark has just been crossed. |
147 | 147 | ||
148 | The default is BSD semantics. | 148 | The default is BSD semantics. |
149 | 149 | ||
150 | 150 | ||
151 | BUFFERING | 151 | BUFFERING |
152 | 152 | ||
153 | The driver tries to do transfers directly to/from user space. If this | 153 | The driver tries to do transfers directly to/from user space. If this |
154 | is not possible, a driver buffer allocated at run-time is used. If | 154 | is not possible, a driver buffer allocated at run-time is used. If |
155 | direct i/o is not possible for the whole transfer, the driver buffer | 155 | direct i/o is not possible for the whole transfer, the driver buffer |
156 | is used (i.e., bounce buffers for individual pages are not | 156 | is used (i.e., bounce buffers for individual pages are not |
157 | used). Direct i/o can be impossible because of several reasons, e.g.: | 157 | used). Direct i/o can be impossible because of several reasons, e.g.: |
158 | - one or more pages are at addresses not reachable by the HBA | 158 | - one or more pages are at addresses not reachable by the HBA |
159 | - the number of pages in the transfer exceeds the number of | 159 | - the number of pages in the transfer exceeds the number of |
160 | scatter/gather segments permitted by the HBA | 160 | scatter/gather segments permitted by the HBA |
161 | - one or more pages can't be locked into memory (should not happen in | 161 | - one or more pages can't be locked into memory (should not happen in |
162 | any reasonable situation) | 162 | any reasonable situation) |
163 | 163 | ||
164 | The size of the driver buffers is always at least one tape block. In fixed | 164 | The size of the driver buffers is always at least one tape block. In fixed |
165 | block mode, the minimum buffer size is defined (in 1024 byte units) by | 165 | block mode, the minimum buffer size is defined (in 1024 byte units) by |
166 | ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of | 166 | ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of |
167 | several blocks and using one SCSI read or write to transfer all of the | 167 | several blocks and using one SCSI read or write to transfer all of the |
168 | blocks. Buffering of data across write calls in fixed block mode is | 168 | blocks. Buffering of data across write calls in fixed block mode is |
169 | allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used. | 169 | allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used. |
170 | Buffer allocation uses chunks of memory having sizes 2^n * (page | 170 | Buffer allocation uses chunks of memory having sizes 2^n * (page |
171 | size). Because of this the actual buffer size may be larger than the | 171 | size). Because of this the actual buffer size may be larger than the |
172 | minimum allowable buffer size. | 172 | minimum allowable buffer size. |
173 | 173 | ||
174 | NOTE that if direct i/o is used, the small writes are not buffered. This may | 174 | NOTE that if direct i/o is used, the small writes are not buffered. This may |
175 | cause a surprise when moving from 2.4. There small writes (e.g., tar without | 175 | cause a surprise when moving from 2.4. There small writes (e.g., tar without |
176 | -b option) may have had good throughput but this is not true any more with | 176 | -b option) may have had good throughput but this is not true any more with |
177 | 2.6. Direct i/o can be turned off to solve this problem but a better solution | 177 | 2.6. Direct i/o can be turned off to solve this problem but a better solution |
178 | is to use bigger write() byte counts (e.g., tar -b 64). | 178 | is to use bigger write() byte counts (e.g., tar -b 64). |
179 | 179 | ||
180 | Asynchronous writing. Writing the buffer contents to the tape is | 180 | Asynchronous writing. Writing the buffer contents to the tape is |
181 | started and the write call returns immediately. The status is checked | 181 | started and the write call returns immediately. The status is checked |
182 | at the next tape operation. Asynchronous writes are not done with | 182 | at the next tape operation. Asynchronous writes are not done with |
183 | direct i/o and not in fixed block mode. | 183 | direct i/o and not in fixed block mode. |
184 | 184 | ||
185 | Buffered writes and asynchronous writes may in some rare cases cause | 185 | Buffered writes and asynchronous writes may in some rare cases cause |
186 | problems in multivolume operations if there is not enough space on the | 186 | problems in multivolume operations if there is not enough space on the |
187 | tape after the early-warning mark to flush the driver buffer. | 187 | tape after the early-warning mark to flush the driver buffer. |
188 | 188 | ||
189 | Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is | 189 | Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is |
190 | attempted even if the user does not want to get all of the data at | 190 | attempted even if the user does not want to get all of the data at |
191 | this read command. Should be disabled for those drives that don't like | 191 | this read command. Should be disabled for those drives that don't like |
192 | a filemark to truncate a read request or that don't like backspacing. | 192 | a filemark to truncate a read request or that don't like backspacing. |
193 | 193 | ||
194 | Scatter/gather buffers (buffers that consist of chunks non-contiguous | 194 | Scatter/gather buffers (buffers that consist of chunks non-contiguous |
195 | in the physical memory) are used if contiguous buffers can't be | 195 | in the physical memory) are used if contiguous buffers can't be |
196 | allocated. To support all SCSI adapters (including those not | 196 | allocated. To support all SCSI adapters (including those not |
197 | supporting scatter/gather), buffer allocation is using the following | 197 | supporting scatter/gather), buffer allocation is using the following |
198 | three kinds of chunks: | 198 | three kinds of chunks: |
199 | 1. The initial segment that is used for all SCSI adapters including | 199 | 1. The initial segment that is used for all SCSI adapters including |
200 | those not supporting scatter/gather. The size of this buffer will be | 200 | those not supporting scatter/gather. The size of this buffer will be |
201 | (PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of | 201 | (PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of |
202 | this size (and it is not larger than the buffer size specified by | 202 | this size (and it is not larger than the buffer size specified by |
203 | ST_BUFFER_BLOCKS). If this size is not available, the driver halves | 203 | ST_BUFFER_BLOCKS). If this size is not available, the driver halves |
204 | the size and tries again until the size of one page. The default | 204 | the size and tries again until the size of one page. The default |
205 | settings in st_options.h make the driver to try to allocate all of the | 205 | settings in st_options.h make the driver to try to allocate all of the |
206 | buffer as one chunk. | 206 | buffer as one chunk. |
207 | 2. The scatter/gather segments to fill the specified buffer size are | 207 | 2. The scatter/gather segments to fill the specified buffer size are |
208 | allocated so that as many segments as possible are used but the number | 208 | allocated so that as many segments as possible are used but the number |
209 | of segments does not exceed ST_FIRST_SG. | 209 | of segments does not exceed ST_FIRST_SG. |
210 | 3. The remaining segments between ST_MAX_SG (or the module parameter | 210 | 3. The remaining segments between ST_MAX_SG (or the module parameter |
211 | max_sg_segs) and the number of segments used in phases 1 and 2 | 211 | max_sg_segs) and the number of segments used in phases 1 and 2 |
212 | are used to extend the buffer at run-time if this is necessary. The | 212 | are used to extend the buffer at run-time if this is necessary. The |
213 | number of scatter/gather segments allowed for the SCSI adapter is not | 213 | number of scatter/gather segments allowed for the SCSI adapter is not |
214 | exceeded if it is smaller than the maximum number of scatter/gather | 214 | exceeded if it is smaller than the maximum number of scatter/gather |
215 | segments specified. If the maximum number allowed for the SCSI adapter | 215 | segments specified. If the maximum number allowed for the SCSI adapter |
216 | is smaller than the number of segments used in phases 1 and 2, | 216 | is smaller than the number of segments used in phases 1 and 2, |
217 | extending the buffer will always fail. | 217 | extending the buffer will always fail. |
218 | 218 | ||
219 | 219 | ||
220 | EOM BEHAVIOUR WHEN WRITING | 220 | EOM BEHAVIOUR WHEN WRITING |
221 | 221 | ||
222 | When the end of medium early warning is encountered, the current write | 222 | When the end of medium early warning is encountered, the current write |
223 | is finished and the number of bytes is returned. The next write | 223 | is finished and the number of bytes is returned. The next write |
224 | returns -1 and errno is set to ENOSPC. To enable writing a trailer, | 224 | returns -1 and errno is set to ENOSPC. To enable writing a trailer, |
225 | the next write is allowed to proceed and, if successful, the number of | 225 | the next write is allowed to proceed and, if successful, the number of |
226 | bytes is returned. After this, -1 and the number of bytes are | 226 | bytes is returned. After this, -1 and the number of bytes are |
227 | alternately returned until the physical end of medium (or some other | 227 | alternately returned until the physical end of medium (or some other |
228 | error) is encountered. | 228 | error) is encountered. |
229 | 229 | ||
230 | 230 | ||
231 | MODULE PARAMETERS | 231 | MODULE PARAMETERS |
232 | 232 | ||
233 | The buffer size, write threshold, and the maximum number of allocated buffers | 233 | The buffer size, write threshold, and the maximum number of allocated buffers |
234 | are configurable when the driver is loaded as a module. The keywords are: | 234 | are configurable when the driver is loaded as a module. The keywords are: |
235 | 235 | ||
236 | buffer_kbs=xxx the buffer size for fixed block mode is set | 236 | buffer_kbs=xxx the buffer size for fixed block mode is set |
237 | to xxx kilobytes | 237 | to xxx kilobytes |
238 | write_threshold_kbs=xxx the write threshold in kilobytes set to xxx | 238 | write_threshold_kbs=xxx the write threshold in kilobytes set to xxx |
239 | max_sg_segs=xxx the maximum number of scatter/gather | 239 | max_sg_segs=xxx the maximum number of scatter/gather |
240 | segments | 240 | segments |
241 | try_direct_io=x try direct transfer between user buffer and | 241 | try_direct_io=x try direct transfer between user buffer and |
242 | tape drive if this is non-zero | 242 | tape drive if this is non-zero |
243 | 243 | ||
244 | Note that if the buffer size is changed but the write threshold is not | 244 | Note that if the buffer size is changed but the write threshold is not |
245 | set, the write threshold is set to the new buffer size - 2 kB. | 245 | set, the write threshold is set to the new buffer size - 2 kB. |
246 | 246 | ||
247 | 247 | ||
248 | BOOT TIME CONFIGURATION | 248 | BOOT TIME CONFIGURATION |
249 | 249 | ||
250 | If the driver is compiled into the kernel, the same parameters can be | 250 | If the driver is compiled into the kernel, the same parameters can be |
251 | also set using, e.g., the LILO command line. The preferred syntax is | 251 | also set using, e.g., the LILO command line. The preferred syntax is |
252 | is to use the same keyword used when loading as module but prepended | 252 | to use the same keyword used when loading as module but prepended |
253 | with 'st.'. For instance, to set the maximum number of scatter/gather | 253 | with 'st.'. For instance, to set the maximum number of scatter/gather |
254 | segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the | 254 | segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the |
255 | number of scatter/gather segments). | 255 | number of scatter/gather segments). |
256 | 256 | ||
257 | For compatibility, the old syntax from early 2.5 and 2.4 kernel | 257 | For compatibility, the old syntax from early 2.5 and 2.4 kernel |
258 | versions is supported. The same keywords can be used as when loading | 258 | versions is supported. The same keywords can be used as when loading |
259 | the driver as module. If several parameters are set, the keyword-value | 259 | the driver as module. If several parameters are set, the keyword-value |
260 | pairs are separated with a comma (no spaces allowed). A colon can be | 260 | pairs are separated with a comma (no spaces allowed). A colon can be |
261 | used instead of the equal mark. The definition is prepended by the | 261 | used instead of the equal mark. The definition is prepended by the |
262 | string st=. Here is an example: | 262 | string st=. Here is an example: |
263 | 263 | ||
264 | st=buffer_kbs:64,write_threhold_kbs:60 | 264 | st=buffer_kbs:64,write_threhold_kbs:60 |
265 | 265 | ||
266 | The following syntax used by the old kernel versions is also supported: | 266 | The following syntax used by the old kernel versions is also supported: |
267 | 267 | ||
268 | st=aa[,bb[,dd]] | 268 | st=aa[,bb[,dd]] |
269 | 269 | ||
270 | where | 270 | where |
271 | aa is the buffer size for fixed block mode in 1024 byte units | 271 | aa is the buffer size for fixed block mode in 1024 byte units |
272 | bb is the write threshold in 1024 byte units | 272 | bb is the write threshold in 1024 byte units |
273 | dd is the maximum number of scatter/gather segments | 273 | dd is the maximum number of scatter/gather segments |
274 | 274 | ||
275 | 275 | ||
276 | IOCTLS | 276 | IOCTLS |
277 | 277 | ||
278 | The tape is positioned and the drive parameters are set with ioctls | 278 | The tape is positioned and the drive parameters are set with ioctls |
279 | defined in mtio.h The tape control program 'mt' uses these ioctls. Try | 279 | defined in mtio.h The tape control program 'mt' uses these ioctls. Try |
280 | to find an mt that supports all of the Linux SCSI tape ioctls and | 280 | to find an mt that supports all of the Linux SCSI tape ioctls and |
281 | opens the device for writing if the tape contents will be modified | 281 | opens the device for writing if the tape contents will be modified |
282 | (look for a package mt-st* from the Linux ftp sites; the GNU mt does | 282 | (look for a package mt-st* from the Linux ftp sites; the GNU mt does |
283 | not open for writing for, e.g., erase). | 283 | not open for writing for, e.g., erase). |
284 | 284 | ||
285 | The supported ioctls are: | 285 | The supported ioctls are: |
286 | 286 | ||
287 | The following use the structure mtop: | 287 | The following use the structure mtop: |
288 | 288 | ||
289 | MTFSF Space forward over count filemarks. Tape positioned after filemark. | 289 | MTFSF Space forward over count filemarks. Tape positioned after filemark. |
290 | MTFSFM As above but tape positioned before filemark. | 290 | MTFSFM As above but tape positioned before filemark. |
291 | MTBSF Space backward over count filemarks. Tape positioned before | 291 | MTBSF Space backward over count filemarks. Tape positioned before |
292 | filemark. | 292 | filemark. |
293 | MTBSFM As above but ape positioned after filemark. | 293 | MTBSFM As above but ape positioned after filemark. |
294 | MTFSR Space forward over count records. | 294 | MTFSR Space forward over count records. |
295 | MTBSR Space backward over count records. | 295 | MTBSR Space backward over count records. |
296 | MTFSS Space forward over count setmarks. | 296 | MTFSS Space forward over count setmarks. |
297 | MTBSS Space backward over count setmarks. | 297 | MTBSS Space backward over count setmarks. |
298 | MTWEOF Write count filemarks. | 298 | MTWEOF Write count filemarks. |
299 | MTWSM Write count setmarks. | 299 | MTWSM Write count setmarks. |
300 | MTREW Rewind tape. | 300 | MTREW Rewind tape. |
301 | MTOFFL Set device off line (often rewind plus eject). | 301 | MTOFFL Set device off line (often rewind plus eject). |
302 | MTNOP Do nothing except flush the buffers. | 302 | MTNOP Do nothing except flush the buffers. |
303 | MTRETEN Re-tension tape. | 303 | MTRETEN Re-tension tape. |
304 | MTEOM Space to end of recorded data. | 304 | MTEOM Space to end of recorded data. |
305 | MTERASE Erase tape. If the argument is zero, the short erase command | 305 | MTERASE Erase tape. If the argument is zero, the short erase command |
306 | is used. The long erase command is used with all other values | 306 | is used. The long erase command is used with all other values |
307 | of the argument. | 307 | of the argument. |
308 | MTSEEK Seek to tape block count. Uses Tandberg-compatible seek (QFA) | 308 | MTSEEK Seek to tape block count. Uses Tandberg-compatible seek (QFA) |
309 | for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and | 309 | for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and |
310 | block numbers in the status are not valid after a seek. | 310 | block numbers in the status are not valid after a seek. |
311 | MTSETBLK Set the drive block size. Setting to zero sets the drive into | 311 | MTSETBLK Set the drive block size. Setting to zero sets the drive into |
312 | variable block mode (if applicable). | 312 | variable block mode (if applicable). |
313 | MTSETDENSITY Sets the drive density code to arg. See drive | 313 | MTSETDENSITY Sets the drive density code to arg. See drive |
314 | documentation for available codes. | 314 | documentation for available codes. |
315 | MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door. | 315 | MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door. |
316 | MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the | 316 | MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the |
317 | command argument x is between MT_ST_HPLOADER_OFFSET + 1 and | 317 | command argument x is between MT_ST_HPLOADER_OFFSET + 1 and |
318 | MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the | 318 | MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the |
319 | drive with the command and it selects the tape slot to use of | 319 | drive with the command and it selects the tape slot to use of |
320 | HP C1553A changer. | 320 | HP C1553A changer. |
321 | MTCOMPRESSION Sets compressing or uncompressing drive mode using the | 321 | MTCOMPRESSION Sets compressing or uncompressing drive mode using the |
322 | SCSI mode page 15. Note that some drives other methods for | 322 | SCSI mode page 15. Note that some drives other methods for |
323 | control of compression. Some drives (like the Exabytes) use | 323 | control of compression. Some drives (like the Exabytes) use |
324 | density codes for compression control. Some drives use another | 324 | density codes for compression control. Some drives use another |
325 | mode page but this page has not been implemented in the | 325 | mode page but this page has not been implemented in the |
326 | driver. Some drives without compression capability will accept | 326 | driver. Some drives without compression capability will accept |
327 | any compression mode without error. | 327 | any compression mode without error. |
328 | MTSETPART Moves the tape to the partition given by the argument at the | 328 | MTSETPART Moves the tape to the partition given by the argument at the |
329 | next tape operation. The block at which the tape is positioned | 329 | next tape operation. The block at which the tape is positioned |
330 | is the block where the tape was previously positioned in the | 330 | is the block where the tape was previously positioned in the |
331 | new active partition unless the next tape operation is | 331 | new active partition unless the next tape operation is |
332 | MTSEEK. In this case the tape is moved directly to the block | 332 | MTSEEK. In this case the tape is moved directly to the block |
333 | specified by MTSEEK. MTSETPART is inactive unless | 333 | specified by MTSEEK. MTSETPART is inactive unless |
334 | MT_ST_CAN_PARTITIONS set. | 334 | MT_ST_CAN_PARTITIONS set. |
335 | MTMKPART Formats the tape with one partition (argument zero) or two | 335 | MTMKPART Formats the tape with one partition (argument zero) or two |
336 | partitions (the argument gives in megabytes the size of | 336 | partitions (the argument gives in megabytes the size of |
337 | partition 1 that is physically the first partition of the | 337 | partition 1 that is physically the first partition of the |
338 | tape). The drive has to support partitions with size specified | 338 | tape). The drive has to support partitions with size specified |
339 | by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set. | 339 | by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set. |
340 | MTSETDRVBUFFER | 340 | MTSETDRVBUFFER |
341 | Is used for several purposes. The command is obtained from count | 341 | Is used for several purposes. The command is obtained from count |
342 | with mask MT_SET_OPTIONS, the low order bits are used as argument. | 342 | with mask MT_SET_OPTIONS, the low order bits are used as argument. |
343 | This command is only allowed for the superuser (root). The | 343 | This command is only allowed for the superuser (root). The |
344 | subcommands are: | 344 | subcommands are: |
345 | 0 | 345 | 0 |
346 | The drive buffer option is set to the argument. Zero means | 346 | The drive buffer option is set to the argument. Zero means |
347 | no buffering. | 347 | no buffering. |
348 | MT_ST_BOOLEANS | 348 | MT_ST_BOOLEANS |
349 | Sets the buffering options. The bits are the new states | 349 | Sets the buffering options. The bits are the new states |
350 | (enabled/disabled) the following options (in the | 350 | (enabled/disabled) the following options (in the |
351 | parenthesis is specified whether the option is global or | 351 | parenthesis is specified whether the option is global or |
352 | can be specified differently for each mode): | 352 | can be specified differently for each mode): |
353 | MT_ST_BUFFER_WRITES write buffering (mode) | 353 | MT_ST_BUFFER_WRITES write buffering (mode) |
354 | MT_ST_ASYNC_WRITES asynchronous writes (mode) | 354 | MT_ST_ASYNC_WRITES asynchronous writes (mode) |
355 | MT_ST_READ_AHEAD read ahead (mode) | 355 | MT_ST_READ_AHEAD read ahead (mode) |
356 | MT_ST_TWO_FM writing of two filemarks (global) | 356 | MT_ST_TWO_FM writing of two filemarks (global) |
357 | MT_ST_FAST_EOM using the SCSI spacing to EOD (global) | 357 | MT_ST_FAST_EOM using the SCSI spacing to EOD (global) |
358 | MT_ST_AUTO_LOCK automatic locking of the drive door (global) | 358 | MT_ST_AUTO_LOCK automatic locking of the drive door (global) |
359 | MT_ST_DEF_WRITES the defaults are meant only for writes (mode) | 359 | MT_ST_DEF_WRITES the defaults are meant only for writes (mode) |
360 | MT_ST_CAN_BSR backspacing over more than one records can | 360 | MT_ST_CAN_BSR backspacing over more than one records can |
361 | be used for repositioning the tape (global) | 361 | be used for repositioning the tape (global) |
362 | MT_ST_NO_BLKLIMS the driver does not ask the block limits | 362 | MT_ST_NO_BLKLIMS the driver does not ask the block limits |
363 | from the drive (block size can be changed only to | 363 | from the drive (block size can be changed only to |
364 | variable) (global) | 364 | variable) (global) |
365 | MT_ST_CAN_PARTITIONS enables support for partitioned | 365 | MT_ST_CAN_PARTITIONS enables support for partitioned |
366 | tapes (global) | 366 | tapes (global) |
367 | MT_ST_SCSI2LOGICAL the logical block number is used in | 367 | MT_ST_SCSI2LOGICAL the logical block number is used in |
368 | the MTSEEK and MTIOCPOS for SCSI-2 drives instead of | 368 | the MTSEEK and MTIOCPOS for SCSI-2 drives instead of |
369 | the device dependent address. It is recommended to set | 369 | the device dependent address. It is recommended to set |
370 | this flag unless there are tapes using the device | 370 | this flag unless there are tapes using the device |
371 | dependent (from the old times) (global) | 371 | dependent (from the old times) (global) |
372 | MT_ST_SYSV sets the SYSV semantics (mode) | 372 | MT_ST_SYSV sets the SYSV semantics (mode) |
373 | MT_ST_NOWAIT enables immediate mode (i.e., don't wait for | 373 | MT_ST_NOWAIT enables immediate mode (i.e., don't wait for |
374 | the command to finish) for some commands (e.g., rewind) | 374 | the command to finish) for some commands (e.g., rewind) |
375 | MT_ST_DEBUGGING debugging (global; debugging must be | 375 | MT_ST_DEBUGGING debugging (global; debugging must be |
376 | compiled into the driver) | 376 | compiled into the driver) |
377 | MT_ST_SETBOOLEANS | 377 | MT_ST_SETBOOLEANS |
378 | MT_ST_CLEARBOOLEANS | 378 | MT_ST_CLEARBOOLEANS |
379 | Sets or clears the option bits. | 379 | Sets or clears the option bits. |
380 | MT_ST_WRITE_THRESHOLD | 380 | MT_ST_WRITE_THRESHOLD |
381 | Sets the write threshold for this device to kilobytes | 381 | Sets the write threshold for this device to kilobytes |
382 | specified by the lowest bits. | 382 | specified by the lowest bits. |
383 | MT_ST_DEF_BLKSIZE | 383 | MT_ST_DEF_BLKSIZE |
384 | Defines the default block size set automatically. Value | 384 | Defines the default block size set automatically. Value |
385 | 0xffffff means that the default is not used any more. | 385 | 0xffffff means that the default is not used any more. |
386 | MT_ST_DEF_DENSITY | 386 | MT_ST_DEF_DENSITY |
387 | MT_ST_DEF_DRVBUFFER | 387 | MT_ST_DEF_DRVBUFFER |
388 | Used to set or clear the density (8 bits), and drive buffer | 388 | Used to set or clear the density (8 bits), and drive buffer |
389 | state (3 bits). If the value is MT_ST_CLEAR_DEFAULT | 389 | state (3 bits). If the value is MT_ST_CLEAR_DEFAULT |
390 | (0xfffff) the default will not be used any more. Otherwise | 390 | (0xfffff) the default will not be used any more. Otherwise |
391 | the lowermost bits of the value contain the new value of | 391 | the lowermost bits of the value contain the new value of |
392 | the parameter. | 392 | the parameter. |
393 | MT_ST_DEF_COMPRESSION | 393 | MT_ST_DEF_COMPRESSION |
394 | The compression default will not be used if the value of | 394 | The compression default will not be used if the value of |
395 | the lowermost byte is 0xff. Otherwise the lowermost bit | 395 | the lowermost byte is 0xff. Otherwise the lowermost bit |
396 | contains the new default. If the bits 8-15 are set to a | 396 | contains the new default. If the bits 8-15 are set to a |
397 | non-zero number, and this number is not 0xff, the number is | 397 | non-zero number, and this number is not 0xff, the number is |
398 | used as the compression algorithm. The value | 398 | used as the compression algorithm. The value |
399 | MT_ST_CLEAR_DEFAULT can be used to clear the compression | 399 | MT_ST_CLEAR_DEFAULT can be used to clear the compression |
400 | default. | 400 | default. |
401 | MT_ST_SET_TIMEOUT | 401 | MT_ST_SET_TIMEOUT |
402 | Set the normal timeout in seconds for this device. The | 402 | Set the normal timeout in seconds for this device. The |
403 | default is 900 seconds (15 minutes). The timeout should be | 403 | default is 900 seconds (15 minutes). The timeout should be |
404 | long enough for the retries done by the device while | 404 | long enough for the retries done by the device while |
405 | reading/writing. | 405 | reading/writing. |
406 | MT_ST_SET_LONG_TIMEOUT | 406 | MT_ST_SET_LONG_TIMEOUT |
407 | Set the long timeout that is used for operations that are | 407 | Set the long timeout that is used for operations that are |
408 | known to take a long time. The default is 14000 seconds | 408 | known to take a long time. The default is 14000 seconds |
409 | (3.9 hours). For erase this value is further multiplied by | 409 | (3.9 hours). For erase this value is further multiplied by |
410 | eight. | 410 | eight. |
411 | MT_ST_SET_CLN | 411 | MT_ST_SET_CLN |
412 | Set the cleaning request interpretation parameters using | 412 | Set the cleaning request interpretation parameters using |
413 | the lowest 24 bits of the argument. The driver can set the | 413 | the lowest 24 bits of the argument. The driver can set the |
414 | generic status bit GMT_CLN if a cleaning request bit pattern | 414 | generic status bit GMT_CLN if a cleaning request bit pattern |
415 | is found from the extended sense data. Many drives set one or | 415 | is found from the extended sense data. Many drives set one or |
416 | more bits in the extended sense data when the drive needs | 416 | more bits in the extended sense data when the drive needs |
417 | cleaning. The bits are device-dependent. The driver is | 417 | cleaning. The bits are device-dependent. The driver is |
418 | given the number of the sense data byte (the lowest eight | 418 | given the number of the sense data byte (the lowest eight |
419 | bits of the argument; must be >= 18 (values 1 - 17 | 419 | bits of the argument; must be >= 18 (values 1 - 17 |
420 | reserved) and <= the maximum requested sense data sixe), | 420 | reserved) and <= the maximum requested sense data sixe), |
421 | a mask to select the relevant bits (the bits 9-16), and the | 421 | a mask to select the relevant bits (the bits 9-16), and the |
422 | bit pattern (bits 17-23). If the bit pattern is zero, one | 422 | bit pattern (bits 17-23). If the bit pattern is zero, one |
423 | or more bits under the mask indicate cleaning request. If | 423 | or more bits under the mask indicate cleaning request. If |
424 | the pattern is non-zero, the pattern must match the masked | 424 | the pattern is non-zero, the pattern must match the masked |
425 | sense data byte. | 425 | sense data byte. |
426 | 426 | ||
427 | (The cleaning bit is set if the additional sense code and | 427 | (The cleaning bit is set if the additional sense code and |
428 | qualifier 00h 17h are seen regardless of the setting of | 428 | qualifier 00h 17h are seen regardless of the setting of |
429 | MT_ST_SET_CLN.) | 429 | MT_ST_SET_CLN.) |
430 | 430 | ||
431 | The following ioctl uses the structure mtpos: | 431 | The following ioctl uses the structure mtpos: |
432 | MTIOCPOS Reads the current position from the drive. Uses | 432 | MTIOCPOS Reads the current position from the drive. Uses |
433 | Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2 | 433 | Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2 |
434 | command for the SCSI-2 drives. | 434 | command for the SCSI-2 drives. |
435 | 435 | ||
436 | The following ioctl uses the structure mtget to return the status: | 436 | The following ioctl uses the structure mtget to return the status: |
437 | MTIOCGET Returns some status information. | 437 | MTIOCGET Returns some status information. |
438 | The file number and block number within file are returned. The | 438 | The file number and block number within file are returned. The |
439 | block is -1 when it can't be determined (e.g., after MTBSF). | 439 | block is -1 when it can't be determined (e.g., after MTBSF). |
440 | The drive type is either MTISSCSI1 or MTISSCSI2. | 440 | The drive type is either MTISSCSI1 or MTISSCSI2. |
441 | The number of recovered errors since the previous status call | 441 | The number of recovered errors since the previous status call |
442 | is stored in the lower word of the field mt_erreg. | 442 | is stored in the lower word of the field mt_erreg. |
443 | The current block size and the density code are stored in the field | 443 | The current block size and the density code are stored in the field |
444 | mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and | 444 | mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and |
445 | MT_ST_DENSITY_SHIFT). | 445 | MT_ST_DENSITY_SHIFT). |
446 | The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN | 446 | The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN |
447 | is set if there is no tape in the drive. GMT_EOD means either | 447 | is set if there is no tape in the drive. GMT_EOD means either |
448 | end of recorded data or end of tape. GMT_EOT means end of tape. | 448 | end of recorded data or end of tape. GMT_EOT means end of tape. |
449 | 449 | ||
450 | 450 | ||
451 | MISCELLANEOUS COMPILE OPTIONS | 451 | MISCELLANEOUS COMPILE OPTIONS |
452 | 452 | ||
453 | The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL | 453 | The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL |
454 | is defined. | 454 | is defined. |
455 | 455 | ||
456 | The maximum number of tape devices is determined by the define | 456 | The maximum number of tape devices is determined by the define |
457 | ST_MAX_TAPES. If more tapes are detected at driver initialization, the | 457 | ST_MAX_TAPES. If more tapes are detected at driver initialization, the |
458 | maximum is adjusted accordingly. | 458 | maximum is adjusted accordingly. |
459 | 459 | ||
460 | Immediate return from tape positioning SCSI commands can be enabled by | 460 | Immediate return from tape positioning SCSI commands can be enabled by |
461 | defining ST_NOWAIT. If this is defined, the user should take care that | 461 | defining ST_NOWAIT. If this is defined, the user should take care that |
462 | the next tape operation is not started before the previous one has | 462 | the next tape operation is not started before the previous one has |
463 | finished. The drives and SCSI adapters should handle this condition | 463 | finished. The drives and SCSI adapters should handle this condition |
464 | gracefully, but some drive/adapter combinations are known to hang the | 464 | gracefully, but some drive/adapter combinations are known to hang the |
465 | SCSI bus in this case. | 465 | SCSI bus in this case. |
466 | 466 | ||
467 | The MTEOM command is by default implemented as spacing over 32767 | 467 | The MTEOM command is by default implemented as spacing over 32767 |
468 | filemarks. With this method the file number in the status is | 468 | filemarks. With this method the file number in the status is |
469 | correct. The user can request using direct spacing to EOD by setting | 469 | correct. The user can request using direct spacing to EOD by setting |
470 | ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file | 470 | ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file |
471 | number will be invalid. | 471 | number will be invalid. |
472 | 472 | ||
473 | When using read ahead or buffered writes the position within the file | 473 | When using read ahead or buffered writes the position within the file |
474 | may not be correct after the file is closed (correct position may | 474 | may not be correct after the file is closed (correct position may |
475 | require backspacing over more than one record). The correct position | 475 | require backspacing over more than one record). The correct position |
476 | within file can be obtained if ST_IN_FILE_POS is defined at compile | 476 | within file can be obtained if ST_IN_FILE_POS is defined at compile |
477 | time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl. | 477 | time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl. |
478 | (The driver always backs over a filemark crossed by read ahead if the | 478 | (The driver always backs over a filemark crossed by read ahead if the |
479 | user does not request data that far.) | 479 | user does not request data that far.) |
480 | 480 | ||
481 | 481 | ||
482 | DEBUGGING HINTS | 482 | DEBUGGING HINTS |
483 | 483 | ||
484 | To enable debugging messages, edit st.c and #define DEBUG 1. As seen | 484 | To enable debugging messages, edit st.c and #define DEBUG 1. As seen |
485 | above, debugging can be switched off with an ioctl if debugging is | 485 | above, debugging can be switched off with an ioctl if debugging is |
486 | compiled into the driver. The debugging output is not voluminous. | 486 | compiled into the driver. The debugging output is not voluminous. |
487 | 487 | ||
488 | If the tape seems to hang, I would be very interested to hear where | 488 | If the tape seems to hang, I would be very interested to hear where |
489 | the driver is waiting. With the command 'ps -l' you can see the state | 489 | the driver is waiting. With the command 'ps -l' you can see the state |
490 | of the process using the tape. If the state is D, the process is | 490 | of the process using the tape. If the state is D, the process is |
491 | waiting for something. The field WCHAN tells where the driver is | 491 | waiting for something. The field WCHAN tells where the driver is |
492 | waiting. If you have the current System.map in the correct place (in | 492 | waiting. If you have the current System.map in the correct place (in |
493 | /boot for the procps I use) or have updated /etc/psdatabase (for kmem | 493 | /boot for the procps I use) or have updated /etc/psdatabase (for kmem |
494 | ps), ps writes the function name in the WCHAN field. If not, you have | 494 | ps), ps writes the function name in the WCHAN field. If not, you have |
495 | to look up the function from System.map. | 495 | to look up the function from System.map. |
496 | 496 | ||
497 | Note also that the timeouts are very long compared to most other | 497 | Note also that the timeouts are very long compared to most other |
498 | drivers. This means that the Linux driver may appear hung although the | 498 | drivers. This means that the Linux driver may appear hung although the |
499 | real reason is that the tape firmware has got confused. | 499 | real reason is that the tape firmware has got confused. |
500 | 500 |
Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
1 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> | 1 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> |
2 | 2 | ||
3 | <book> | 3 | <book> |
4 | <?dbhtml filename="index.html"> | 4 | <?dbhtml filename="index.html"> |
5 | 5 | ||
6 | <!-- ****************************************************** --> | 6 | <!-- ****************************************************** --> |
7 | <!-- Header --> | 7 | <!-- Header --> |
8 | <!-- ****************************************************** --> | 8 | <!-- ****************************************************** --> |
9 | <bookinfo> | 9 | <bookinfo> |
10 | <title>Writing an ALSA Driver</title> | 10 | <title>Writing an ALSA Driver</title> |
11 | <author> | 11 | <author> |
12 | <firstname>Takashi</firstname> | 12 | <firstname>Takashi</firstname> |
13 | <surname>Iwai</surname> | 13 | <surname>Iwai</surname> |
14 | <affiliation> | 14 | <affiliation> |
15 | <address> | 15 | <address> |
16 | <email>tiwai@suse.de</email> | 16 | <email>tiwai@suse.de</email> |
17 | </address> | 17 | </address> |
18 | </affiliation> | 18 | </affiliation> |
19 | </author> | 19 | </author> |
20 | 20 | ||
21 | <date>November 17, 2005</date> | 21 | <date>November 17, 2005</date> |
22 | <edition>0.3.6</edition> | 22 | <edition>0.3.6</edition> |
23 | 23 | ||
24 | <abstract> | 24 | <abstract> |
25 | <para> | 25 | <para> |
26 | This document describes how to write an ALSA (Advanced Linux | 26 | This document describes how to write an ALSA (Advanced Linux |
27 | Sound Architecture) driver. | 27 | Sound Architecture) driver. |
28 | </para> | 28 | </para> |
29 | </abstract> | 29 | </abstract> |
30 | 30 | ||
31 | <legalnotice> | 31 | <legalnotice> |
32 | <para> | 32 | <para> |
33 | Copyright (c) 2002-2005 Takashi Iwai <email>tiwai@suse.de</email> | 33 | Copyright (c) 2002-2005 Takashi Iwai <email>tiwai@suse.de</email> |
34 | </para> | 34 | </para> |
35 | 35 | ||
36 | <para> | 36 | <para> |
37 | This document is free; you can redistribute it and/or modify it | 37 | This document is free; you can redistribute it and/or modify it |
38 | under the terms of the GNU General Public License as published by | 38 | under the terms of the GNU General Public License as published by |
39 | the Free Software Foundation; either version 2 of the License, or | 39 | the Free Software Foundation; either version 2 of the License, or |
40 | (at your option) any later version. | 40 | (at your option) any later version. |
41 | </para> | 41 | </para> |
42 | 42 | ||
43 | <para> | 43 | <para> |
44 | This document is distributed in the hope that it will be useful, | 44 | This document is distributed in the hope that it will be useful, |
45 | but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the | 45 | but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the |
46 | implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A | 46 | implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A |
47 | PARTICULAR PURPOSE</emphasis>. See the GNU General Public License | 47 | PARTICULAR PURPOSE</emphasis>. See the GNU General Public License |
48 | for more details. | 48 | for more details. |
49 | </para> | 49 | </para> |
50 | 50 | ||
51 | <para> | 51 | <para> |
52 | You should have received a copy of the GNU General Public | 52 | You should have received a copy of the GNU General Public |
53 | License along with this program; if not, write to the Free | 53 | License along with this program; if not, write to the Free |
54 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | 54 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, |
55 | MA 02111-1307 USA | 55 | MA 02111-1307 USA |
56 | </para> | 56 | </para> |
57 | </legalnotice> | 57 | </legalnotice> |
58 | 58 | ||
59 | </bookinfo> | 59 | </bookinfo> |
60 | 60 | ||
61 | <!-- ****************************************************** --> | 61 | <!-- ****************************************************** --> |
62 | <!-- Preface --> | 62 | <!-- Preface --> |
63 | <!-- ****************************************************** --> | 63 | <!-- ****************************************************** --> |
64 | <preface id="preface"> | 64 | <preface id="preface"> |
65 | <title>Preface</title> | 65 | <title>Preface</title> |
66 | <para> | 66 | <para> |
67 | This document describes how to write an | 67 | This document describes how to write an |
68 | <ulink url="http://www.alsa-project.org/"><citetitle> | 68 | <ulink url="http://www.alsa-project.org/"><citetitle> |
69 | ALSA (Advanced Linux Sound Architecture)</citetitle></ulink> | 69 | ALSA (Advanced Linux Sound Architecture)</citetitle></ulink> |
70 | driver. The document focuses mainly on the PCI soundcard. | 70 | driver. The document focuses mainly on the PCI soundcard. |
71 | In the case of other device types, the API might | 71 | In the case of other device types, the API might |
72 | be different, too. However, at least the ALSA kernel API is | 72 | be different, too. However, at least the ALSA kernel API is |
73 | consistent, and therefore it would be still a bit help for | 73 | consistent, and therefore it would be still a bit help for |
74 | writing them. | 74 | writing them. |
75 | </para> | 75 | </para> |
76 | 76 | ||
77 | <para> | 77 | <para> |
78 | The target of this document is ones who already have enough | 78 | The target of this document is ones who already have enough |
79 | skill of C language and have the basic knowledge of linux | 79 | skill of C language and have the basic knowledge of linux |
80 | kernel programming. This document doesn't explain the general | 80 | kernel programming. This document doesn't explain the general |
81 | topics of linux kernel codes and doesn't cover the detail of | 81 | topics of linux kernel codes and doesn't cover the detail of |
82 | implementation of each low-level driver. It describes only how is | 82 | implementation of each low-level driver. It describes only how is |
83 | the standard way to write a PCI sound driver on ALSA. | 83 | the standard way to write a PCI sound driver on ALSA. |
84 | </para> | 84 | </para> |
85 | 85 | ||
86 | <para> | 86 | <para> |
87 | If you are already familiar with the older ALSA ver.0.5.x, you | 87 | If you are already familiar with the older ALSA ver.0.5.x, you |
88 | can check the drivers such as <filename>es1938.c</filename> or | 88 | can check the drivers such as <filename>es1938.c</filename> or |
89 | <filename>maestro3.c</filename> which have also almost the same | 89 | <filename>maestro3.c</filename> which have also almost the same |
90 | code-base in the ALSA 0.5.x tree, so you can compare the differences. | 90 | code-base in the ALSA 0.5.x tree, so you can compare the differences. |
91 | </para> | 91 | </para> |
92 | 92 | ||
93 | <para> | 93 | <para> |
94 | This document is still a draft version. Any feedbacks and | 94 | This document is still a draft version. Any feedbacks and |
95 | corrections, please!! | 95 | corrections, please!! |
96 | </para> | 96 | </para> |
97 | </preface> | 97 | </preface> |
98 | 98 | ||
99 | 99 | ||
100 | <!-- ****************************************************** --> | 100 | <!-- ****************************************************** --> |
101 | <!-- File Tree Structure --> | 101 | <!-- File Tree Structure --> |
102 | <!-- ****************************************************** --> | 102 | <!-- ****************************************************** --> |
103 | <chapter id="file-tree"> | 103 | <chapter id="file-tree"> |
104 | <title>File Tree Structure</title> | 104 | <title>File Tree Structure</title> |
105 | 105 | ||
106 | <section id="file-tree-general"> | 106 | <section id="file-tree-general"> |
107 | <title>General</title> | 107 | <title>General</title> |
108 | <para> | 108 | <para> |
109 | The ALSA drivers are provided in the two ways. | 109 | The ALSA drivers are provided in the two ways. |
110 | </para> | 110 | </para> |
111 | 111 | ||
112 | <para> | 112 | <para> |
113 | One is the trees provided as a tarball or via cvs from the | 113 | One is the trees provided as a tarball or via cvs from the |
114 | ALSA's ftp site, and another is the 2.6 (or later) Linux kernel | 114 | ALSA's ftp site, and another is the 2.6 (or later) Linux kernel |
115 | tree. To synchronize both, the ALSA driver tree is split into | 115 | tree. To synchronize both, the ALSA driver tree is split into |
116 | two different trees: alsa-kernel and alsa-driver. The former | 116 | two different trees: alsa-kernel and alsa-driver. The former |
117 | contains purely the source codes for the Linux 2.6 (or later) | 117 | contains purely the source codes for the Linux 2.6 (or later) |
118 | tree. This tree is designed only for compilation on 2.6 or | 118 | tree. This tree is designed only for compilation on 2.6 or |
119 | later environment. The latter, alsa-driver, contains many subtle | 119 | later environment. The latter, alsa-driver, contains many subtle |
120 | files for compiling the ALSA driver on the outside of Linux | 120 | files for compiling the ALSA driver on the outside of Linux |
121 | kernel like configure script, the wrapper functions for older, | 121 | kernel like configure script, the wrapper functions for older, |
122 | 2.2 and 2.4 kernels, to adapt the latest kernel API, | 122 | 2.2 and 2.4 kernels, to adapt the latest kernel API, |
123 | and additional drivers which are still in development or in | 123 | and additional drivers which are still in development or in |
124 | tests. The drivers in alsa-driver tree will be moved to | 124 | tests. The drivers in alsa-driver tree will be moved to |
125 | alsa-kernel (eventually 2.6 kernel tree) once when they are | 125 | alsa-kernel (eventually 2.6 kernel tree) once when they are |
126 | finished and confirmed to work fine. | 126 | finished and confirmed to work fine. |
127 | </para> | 127 | </para> |
128 | 128 | ||
129 | <para> | 129 | <para> |
130 | The file tree structure of ALSA driver is depicted below. Both | 130 | The file tree structure of ALSA driver is depicted below. Both |
131 | alsa-kernel and alsa-driver have almost the same file | 131 | alsa-kernel and alsa-driver have almost the same file |
132 | structure, except for <quote>core</quote> directory. It's | 132 | structure, except for <quote>core</quote> directory. It's |
133 | named as <quote>acore</quote> in alsa-driver tree. | 133 | named as <quote>acore</quote> in alsa-driver tree. |
134 | 134 | ||
135 | <example> | 135 | <example> |
136 | <title>ALSA File Tree Structure</title> | 136 | <title>ALSA File Tree Structure</title> |
137 | <literallayout> | 137 | <literallayout> |
138 | sound | 138 | sound |
139 | /core | 139 | /core |
140 | /oss | 140 | /oss |
141 | /seq | 141 | /seq |
142 | /oss | 142 | /oss |
143 | /instr | 143 | /instr |
144 | /ioctl32 | 144 | /ioctl32 |
145 | /include | 145 | /include |
146 | /drivers | 146 | /drivers |
147 | /mpu401 | 147 | /mpu401 |
148 | /opl3 | 148 | /opl3 |
149 | /i2c | 149 | /i2c |
150 | /l3 | 150 | /l3 |
151 | /synth | 151 | /synth |
152 | /emux | 152 | /emux |
153 | /pci | 153 | /pci |
154 | /(cards) | 154 | /(cards) |
155 | /isa | 155 | /isa |
156 | /(cards) | 156 | /(cards) |
157 | /arm | 157 | /arm |
158 | /ppc | 158 | /ppc |
159 | /sparc | 159 | /sparc |
160 | /usb | 160 | /usb |
161 | /pcmcia /(cards) | 161 | /pcmcia /(cards) |
162 | /oss | 162 | /oss |
163 | </literallayout> | 163 | </literallayout> |
164 | </example> | 164 | </example> |
165 | </para> | 165 | </para> |
166 | </section> | 166 | </section> |
167 | 167 | ||
168 | <section id="file-tree-core-directory"> | 168 | <section id="file-tree-core-directory"> |
169 | <title>core directory</title> | 169 | <title>core directory</title> |
170 | <para> | 170 | <para> |
171 | This directory contains the middle layer, that is, the heart | 171 | This directory contains the middle layer, that is, the heart |
172 | of ALSA drivers. In this directory, the native ALSA modules are | 172 | of ALSA drivers. In this directory, the native ALSA modules are |
173 | stored. The sub-directories contain different modules and are | 173 | stored. The sub-directories contain different modules and are |
174 | dependent upon the kernel config. | 174 | dependent upon the kernel config. |
175 | </para> | 175 | </para> |
176 | 176 | ||
177 | <section id="file-tree-core-directory-oss"> | 177 | <section id="file-tree-core-directory-oss"> |
178 | <title>core/oss</title> | 178 | <title>core/oss</title> |
179 | 179 | ||
180 | <para> | 180 | <para> |
181 | The codes for PCM and mixer OSS emulation modules are stored | 181 | The codes for PCM and mixer OSS emulation modules are stored |
182 | in this directory. The rawmidi OSS emulation is included in | 182 | in this directory. The rawmidi OSS emulation is included in |
183 | the ALSA rawmidi code since it's quite small. The sequencer | 183 | the ALSA rawmidi code since it's quite small. The sequencer |
184 | code is stored in core/seq/oss directory (see | 184 | code is stored in core/seq/oss directory (see |
185 | <link linkend="file-tree-core-directory-seq-oss"><citetitle> | 185 | <link linkend="file-tree-core-directory-seq-oss"><citetitle> |
186 | below</citetitle></link>). | 186 | below</citetitle></link>). |
187 | </para> | 187 | </para> |
188 | </section> | 188 | </section> |
189 | 189 | ||
190 | <section id="file-tree-core-directory-ioctl32"> | 190 | <section id="file-tree-core-directory-ioctl32"> |
191 | <title>core/ioctl32</title> | 191 | <title>core/ioctl32</title> |
192 | 192 | ||
193 | <para> | 193 | <para> |
194 | This directory contains the 32bit-ioctl wrappers for 64bit | 194 | This directory contains the 32bit-ioctl wrappers for 64bit |
195 | architectures such like x86-64, ppc64 and sparc64. For 32bit | 195 | architectures such like x86-64, ppc64 and sparc64. For 32bit |
196 | and alpha architectures, these are not compiled. | 196 | and alpha architectures, these are not compiled. |
197 | </para> | 197 | </para> |
198 | </section> | 198 | </section> |
199 | 199 | ||
200 | <section id="file-tree-core-directory-seq"> | 200 | <section id="file-tree-core-directory-seq"> |
201 | <title>core/seq</title> | 201 | <title>core/seq</title> |
202 | <para> | 202 | <para> |
203 | This and its sub-directories are for the ALSA | 203 | This and its sub-directories are for the ALSA |
204 | sequencer. This directory contains the sequencer core and | 204 | sequencer. This directory contains the sequencer core and |
205 | primary sequencer modules such like snd-seq-midi, | 205 | primary sequencer modules such like snd-seq-midi, |
206 | snd-seq-virmidi, etc. They are compiled only when | 206 | snd-seq-virmidi, etc. They are compiled only when |
207 | <constant>CONFIG_SND_SEQUENCER</constant> is set in the kernel | 207 | <constant>CONFIG_SND_SEQUENCER</constant> is set in the kernel |
208 | config. | 208 | config. |
209 | </para> | 209 | </para> |
210 | </section> | 210 | </section> |
211 | 211 | ||
212 | <section id="file-tree-core-directory-seq-oss"> | 212 | <section id="file-tree-core-directory-seq-oss"> |
213 | <title>core/seq/oss</title> | 213 | <title>core/seq/oss</title> |
214 | <para> | 214 | <para> |
215 | This contains the OSS sequencer emulation codes. | 215 | This contains the OSS sequencer emulation codes. |
216 | </para> | 216 | </para> |
217 | </section> | 217 | </section> |
218 | 218 | ||
219 | <section id="file-tree-core-directory-deq-instr"> | 219 | <section id="file-tree-core-directory-deq-instr"> |
220 | <title>core/seq/instr</title> | 220 | <title>core/seq/instr</title> |
221 | <para> | 221 | <para> |
222 | This directory contains the modules for the sequencer | 222 | This directory contains the modules for the sequencer |
223 | instrument layer. | 223 | instrument layer. |
224 | </para> | 224 | </para> |
225 | </section> | 225 | </section> |
226 | </section> | 226 | </section> |
227 | 227 | ||
228 | <section id="file-tree-include-directory"> | 228 | <section id="file-tree-include-directory"> |
229 | <title>include directory</title> | 229 | <title>include directory</title> |
230 | <para> | 230 | <para> |
231 | This is the place for the public header files of ALSA drivers, | 231 | This is the place for the public header files of ALSA drivers, |
232 | which are to be exported to the user-space, or included by | 232 | which are to be exported to the user-space, or included by |
233 | several files at different directories. Basically, the private | 233 | several files at different directories. Basically, the private |
234 | header files should not be placed in this directory, but you may | 234 | header files should not be placed in this directory, but you may |
235 | still find files there, due to historical reason :) | 235 | still find files there, due to historical reason :) |
236 | </para> | 236 | </para> |
237 | </section> | 237 | </section> |
238 | 238 | ||
239 | <section id="file-tree-drivers-directory"> | 239 | <section id="file-tree-drivers-directory"> |
240 | <title>drivers directory</title> | 240 | <title>drivers directory</title> |
241 | <para> | 241 | <para> |
242 | This directory contains the codes shared among different drivers | 242 | This directory contains the codes shared among different drivers |
243 | on the different architectures. They are hence supposed not to be | 243 | on the different architectures. They are hence supposed not to be |
244 | architecture-specific. | 244 | architecture-specific. |
245 | For example, the dummy pcm driver and the serial MIDI | 245 | For example, the dummy pcm driver and the serial MIDI |
246 | driver are found in this directory. In the sub-directories, | 246 | driver are found in this directory. In the sub-directories, |
247 | there are the codes for components which are independent from | 247 | there are the codes for components which are independent from |
248 | bus and cpu architectures. | 248 | bus and cpu architectures. |
249 | </para> | 249 | </para> |
250 | 250 | ||
251 | <section id="file-tree-drivers-directory-mpu401"> | 251 | <section id="file-tree-drivers-directory-mpu401"> |
252 | <title>drivers/mpu401</title> | 252 | <title>drivers/mpu401</title> |
253 | <para> | 253 | <para> |
254 | The MPU401 and MPU401-UART modules are stored here. | 254 | The MPU401 and MPU401-UART modules are stored here. |
255 | </para> | 255 | </para> |
256 | </section> | 256 | </section> |
257 | 257 | ||
258 | <section id="file-tree-drivers-directory-opl3"> | 258 | <section id="file-tree-drivers-directory-opl3"> |
259 | <title>drivers/opl3 and opl4</title> | 259 | <title>drivers/opl3 and opl4</title> |
260 | <para> | 260 | <para> |
261 | The OPL3 and OPL4 FM-synth stuff is found here. | 261 | The OPL3 and OPL4 FM-synth stuff is found here. |
262 | </para> | 262 | </para> |
263 | </section> | 263 | </section> |
264 | </section> | 264 | </section> |
265 | 265 | ||
266 | <section id="file-tree-i2c-directory"> | 266 | <section id="file-tree-i2c-directory"> |
267 | <title>i2c directory</title> | 267 | <title>i2c directory</title> |
268 | <para> | 268 | <para> |
269 | This contains the ALSA i2c components. | 269 | This contains the ALSA i2c components. |
270 | </para> | 270 | </para> |
271 | 271 | ||
272 | <para> | 272 | <para> |
273 | Although there is a standard i2c layer on Linux, ALSA has its | 273 | Although there is a standard i2c layer on Linux, ALSA has its |
274 | own i2c codes for some cards, because the soundcard needs only a | 274 | own i2c codes for some cards, because the soundcard needs only a |
275 | simple operation and the standard i2c API is too complicated for | 275 | simple operation and the standard i2c API is too complicated for |
276 | such a purpose. | 276 | such a purpose. |
277 | </para> | 277 | </para> |
278 | 278 | ||
279 | <section id="file-tree-i2c-directory-l3"> | 279 | <section id="file-tree-i2c-directory-l3"> |
280 | <title>i2c/l3</title> | 280 | <title>i2c/l3</title> |
281 | <para> | 281 | <para> |
282 | This is a sub-directory for ARM L3 i2c. | 282 | This is a sub-directory for ARM L3 i2c. |
283 | </para> | 283 | </para> |
284 | </section> | 284 | </section> |
285 | </section> | 285 | </section> |
286 | 286 | ||
287 | <section id="file-tree-synth-directory"> | 287 | <section id="file-tree-synth-directory"> |
288 | <title>synth directory</title> | 288 | <title>synth directory</title> |
289 | <para> | 289 | <para> |
290 | This contains the synth middle-level modules. | 290 | This contains the synth middle-level modules. |
291 | </para> | 291 | </para> |
292 | 292 | ||
293 | <para> | 293 | <para> |
294 | So far, there is only Emu8000/Emu10k1 synth driver under | 294 | So far, there is only Emu8000/Emu10k1 synth driver under |
295 | synth/emux sub-directory. | 295 | synth/emux sub-directory. |
296 | </para> | 296 | </para> |
297 | </section> | 297 | </section> |
298 | 298 | ||
299 | <section id="file-tree-pci-directory"> | 299 | <section id="file-tree-pci-directory"> |
300 | <title>pci directory</title> | 300 | <title>pci directory</title> |
301 | <para> | 301 | <para> |
302 | This and its sub-directories hold the top-level card modules | 302 | This and its sub-directories hold the top-level card modules |
303 | for PCI soundcards and the codes specific to the PCI BUS. | 303 | for PCI soundcards and the codes specific to the PCI BUS. |
304 | </para> | 304 | </para> |
305 | 305 | ||
306 | <para> | 306 | <para> |
307 | The drivers compiled from a single file is stored directly on | 307 | The drivers compiled from a single file is stored directly on |
308 | pci directory, while the drivers with several source files are | 308 | pci directory, while the drivers with several source files are |
309 | stored on its own sub-directory (e.g. emu10k1, ice1712). | 309 | stored on its own sub-directory (e.g. emu10k1, ice1712). |
310 | </para> | 310 | </para> |
311 | </section> | 311 | </section> |
312 | 312 | ||
313 | <section id="file-tree-isa-directory"> | 313 | <section id="file-tree-isa-directory"> |
314 | <title>isa directory</title> | 314 | <title>isa directory</title> |
315 | <para> | 315 | <para> |
316 | This and its sub-directories hold the top-level card modules | 316 | This and its sub-directories hold the top-level card modules |
317 | for ISA soundcards. | 317 | for ISA soundcards. |
318 | </para> | 318 | </para> |
319 | </section> | 319 | </section> |
320 | 320 | ||
321 | <section id="file-tree-arm-ppc-sparc-directories"> | 321 | <section id="file-tree-arm-ppc-sparc-directories"> |
322 | <title>arm, ppc, and sparc directories</title> | 322 | <title>arm, ppc, and sparc directories</title> |
323 | <para> | 323 | <para> |
324 | These are for the top-level card modules which are | 324 | These are for the top-level card modules which are |
325 | specific to each given architecture. | 325 | specific to each given architecture. |
326 | </para> | 326 | </para> |
327 | </section> | 327 | </section> |
328 | 328 | ||
329 | <section id="file-tree-usb-directory"> | 329 | <section id="file-tree-usb-directory"> |
330 | <title>usb directory</title> | 330 | <title>usb directory</title> |
331 | <para> | 331 | <para> |
332 | This contains the USB-audio driver. On the latest version, the | 332 | This contains the USB-audio driver. On the latest version, the |
333 | USB MIDI driver is integrated together with usb-audio driver. | 333 | USB MIDI driver is integrated together with usb-audio driver. |
334 | </para> | 334 | </para> |
335 | </section> | 335 | </section> |
336 | 336 | ||
337 | <section id="file-tree-pcmcia-directory"> | 337 | <section id="file-tree-pcmcia-directory"> |
338 | <title>pcmcia directory</title> | 338 | <title>pcmcia directory</title> |
339 | <para> | 339 | <para> |
340 | The PCMCIA, especially PCCard drivers will go here. CardBus | 340 | The PCMCIA, especially PCCard drivers will go here. CardBus |
341 | drivers will be on pci directory, because its API is identical | 341 | drivers will be on pci directory, because its API is identical |
342 | with the standard PCI cards. | 342 | with the standard PCI cards. |
343 | </para> | 343 | </para> |
344 | </section> | 344 | </section> |
345 | 345 | ||
346 | <section id="file-tree-oss-directory"> | 346 | <section id="file-tree-oss-directory"> |
347 | <title>oss directory</title> | 347 | <title>oss directory</title> |
348 | <para> | 348 | <para> |
349 | The OSS/Lite source files are stored here on Linux 2.6 (or | 349 | The OSS/Lite source files are stored here on Linux 2.6 (or |
350 | later) tree. (In the ALSA driver tarball, it's empty, of course :) | 350 | later) tree. (In the ALSA driver tarball, it's empty, of course :) |
351 | </para> | 351 | </para> |
352 | </section> | 352 | </section> |
353 | </chapter> | 353 | </chapter> |
354 | 354 | ||
355 | 355 | ||
356 | <!-- ****************************************************** --> | 356 | <!-- ****************************************************** --> |
357 | <!-- Basic Flow for PCI Drivers --> | 357 | <!-- Basic Flow for PCI Drivers --> |
358 | <!-- ****************************************************** --> | 358 | <!-- ****************************************************** --> |
359 | <chapter id="basic-flow"> | 359 | <chapter id="basic-flow"> |
360 | <title>Basic Flow for PCI Drivers</title> | 360 | <title>Basic Flow for PCI Drivers</title> |
361 | 361 | ||
362 | <section id="basic-flow-outline"> | 362 | <section id="basic-flow-outline"> |
363 | <title>Outline</title> | 363 | <title>Outline</title> |
364 | <para> | 364 | <para> |
365 | The minimum flow of PCI soundcard is like the following: | 365 | The minimum flow of PCI soundcard is like the following: |
366 | 366 | ||
367 | <itemizedlist> | 367 | <itemizedlist> |
368 | <listitem><para>define the PCI ID table (see the section | 368 | <listitem><para>define the PCI ID table (see the section |
369 | <link linkend="pci-resource-entries"><citetitle>PCI Entries | 369 | <link linkend="pci-resource-entries"><citetitle>PCI Entries |
370 | </citetitle></link>).</para></listitem> | 370 | </citetitle></link>).</para></listitem> |
371 | <listitem><para>create <function>probe()</function> callback.</para></listitem> | 371 | <listitem><para>create <function>probe()</function> callback.</para></listitem> |
372 | <listitem><para>create <function>remove()</function> callback.</para></listitem> | 372 | <listitem><para>create <function>remove()</function> callback.</para></listitem> |
373 | <listitem><para>create pci_driver table which contains the three pointers above.</para></listitem> | 373 | <listitem><para>create pci_driver table which contains the three pointers above.</para></listitem> |
374 | <listitem><para>create <function>init()</function> function just calling <function>pci_register_driver()</function> to register the pci_driver table defined above.</para></listitem> | 374 | <listitem><para>create <function>init()</function> function just calling <function>pci_register_driver()</function> to register the pci_driver table defined above.</para></listitem> |
375 | <listitem><para>create <function>exit()</function> function to call <function>pci_unregister_driver()</function> function.</para></listitem> | 375 | <listitem><para>create <function>exit()</function> function to call <function>pci_unregister_driver()</function> function.</para></listitem> |
376 | </itemizedlist> | 376 | </itemizedlist> |
377 | </para> | 377 | </para> |
378 | </section> | 378 | </section> |
379 | 379 | ||
380 | <section id="basic-flow-example"> | 380 | <section id="basic-flow-example"> |
381 | <title>Full Code Example</title> | 381 | <title>Full Code Example</title> |
382 | <para> | 382 | <para> |
383 | The code example is shown below. Some parts are kept | 383 | The code example is shown below. Some parts are kept |
384 | unimplemented at this moment but will be filled in the | 384 | unimplemented at this moment but will be filled in the |
385 | succeeding sections. The numbers in comment lines of | 385 | succeeding sections. The numbers in comment lines of |
386 | <function>snd_mychip_probe()</function> function are the | 386 | <function>snd_mychip_probe()</function> function are the |
387 | markers. | 387 | markers. |
388 | 388 | ||
389 | <example> | 389 | <example> |
390 | <title>Basic Flow for PCI Drivers Example</title> | 390 | <title>Basic Flow for PCI Drivers Example</title> |
391 | <programlisting> | 391 | <programlisting> |
392 | <![CDATA[ | 392 | <![CDATA[ |
393 | #include <sound/driver.h> | 393 | #include <sound/driver.h> |
394 | #include <linux/init.h> | 394 | #include <linux/init.h> |
395 | #include <linux/pci.h> | 395 | #include <linux/pci.h> |
396 | #include <linux/slab.h> | 396 | #include <linux/slab.h> |
397 | #include <sound/core.h> | 397 | #include <sound/core.h> |
398 | #include <sound/initval.h> | 398 | #include <sound/initval.h> |
399 | 399 | ||
400 | /* module parameters (see "Module Parameters") */ | 400 | /* module parameters (see "Module Parameters") */ |
401 | static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; | 401 | static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; |
402 | static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; | 402 | static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; |
403 | static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; | 403 | static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; |
404 | 404 | ||
405 | /* definition of the chip-specific record */ | 405 | /* definition of the chip-specific record */ |
406 | struct mychip { | 406 | struct mychip { |
407 | struct snd_card *card; | 407 | struct snd_card *card; |
408 | // rest of implementation will be in the section | 408 | // rest of implementation will be in the section |
409 | // "PCI Resource Managements" | 409 | // "PCI Resource Managements" |
410 | }; | 410 | }; |
411 | 411 | ||
412 | /* chip-specific destructor | 412 | /* chip-specific destructor |
413 | * (see "PCI Resource Managements") | 413 | * (see "PCI Resource Managements") |
414 | */ | 414 | */ |
415 | static int snd_mychip_free(struct mychip *chip) | 415 | static int snd_mychip_free(struct mychip *chip) |
416 | { | 416 | { |
417 | .... // will be implemented later... | 417 | .... // will be implemented later... |
418 | } | 418 | } |
419 | 419 | ||
420 | /* component-destructor | 420 | /* component-destructor |
421 | * (see "Management of Cards and Components") | 421 | * (see "Management of Cards and Components") |
422 | */ | 422 | */ |
423 | static int snd_mychip_dev_free(struct snd_device *device) | 423 | static int snd_mychip_dev_free(struct snd_device *device) |
424 | { | 424 | { |
425 | return snd_mychip_free(device->device_data); | 425 | return snd_mychip_free(device->device_data); |
426 | } | 426 | } |
427 | 427 | ||
428 | /* chip-specific constructor | 428 | /* chip-specific constructor |
429 | * (see "Management of Cards and Components") | 429 | * (see "Management of Cards and Components") |
430 | */ | 430 | */ |
431 | static int __devinit snd_mychip_create(struct snd_card *card, | 431 | static int __devinit snd_mychip_create(struct snd_card *card, |
432 | struct pci_dev *pci, | 432 | struct pci_dev *pci, |
433 | struct mychip **rchip) | 433 | struct mychip **rchip) |
434 | { | 434 | { |
435 | struct mychip *chip; | 435 | struct mychip *chip; |
436 | int err; | 436 | int err; |
437 | static struct snd_device_ops ops = { | 437 | static struct snd_device_ops ops = { |
438 | .dev_free = snd_mychip_dev_free, | 438 | .dev_free = snd_mychip_dev_free, |
439 | }; | 439 | }; |
440 | 440 | ||
441 | *rchip = NULL; | 441 | *rchip = NULL; |
442 | 442 | ||
443 | // check PCI availability here | 443 | // check PCI availability here |
444 | // (see "PCI Resource Managements") | 444 | // (see "PCI Resource Managements") |
445 | .... | 445 | .... |
446 | 446 | ||
447 | /* allocate a chip-specific data with zero filled */ | 447 | /* allocate a chip-specific data with zero filled */ |
448 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); | 448 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); |
449 | if (chip == NULL) | 449 | if (chip == NULL) |
450 | return -ENOMEM; | 450 | return -ENOMEM; |
451 | 451 | ||
452 | chip->card = card; | 452 | chip->card = card; |
453 | 453 | ||
454 | // rest of initialization here; will be implemented | 454 | // rest of initialization here; will be implemented |
455 | // later, see "PCI Resource Managements" | 455 | // later, see "PCI Resource Managements" |
456 | .... | 456 | .... |
457 | 457 | ||
458 | if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, | 458 | if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, |
459 | chip, &ops)) < 0) { | 459 | chip, &ops)) < 0) { |
460 | snd_mychip_free(chip); | 460 | snd_mychip_free(chip); |
461 | return err; | 461 | return err; |
462 | } | 462 | } |
463 | 463 | ||
464 | snd_card_set_dev(card, &pci->dev); | 464 | snd_card_set_dev(card, &pci->dev); |
465 | 465 | ||
466 | *rchip = chip; | 466 | *rchip = chip; |
467 | return 0; | 467 | return 0; |
468 | } | 468 | } |
469 | 469 | ||
470 | /* constructor -- see "Constructor" sub-section */ | 470 | /* constructor -- see "Constructor" sub-section */ |
471 | static int __devinit snd_mychip_probe(struct pci_dev *pci, | 471 | static int __devinit snd_mychip_probe(struct pci_dev *pci, |
472 | const struct pci_device_id *pci_id) | 472 | const struct pci_device_id *pci_id) |
473 | { | 473 | { |
474 | static int dev; | 474 | static int dev; |
475 | struct snd_card *card; | 475 | struct snd_card *card; |
476 | struct mychip *chip; | 476 | struct mychip *chip; |
477 | int err; | 477 | int err; |
478 | 478 | ||
479 | /* (1) */ | 479 | /* (1) */ |
480 | if (dev >= SNDRV_CARDS) | 480 | if (dev >= SNDRV_CARDS) |
481 | return -ENODEV; | 481 | return -ENODEV; |
482 | if (!enable[dev]) { | 482 | if (!enable[dev]) { |
483 | dev++; | 483 | dev++; |
484 | return -ENOENT; | 484 | return -ENOENT; |
485 | } | 485 | } |
486 | 486 | ||
487 | /* (2) */ | 487 | /* (2) */ |
488 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); | 488 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); |
489 | if (card == NULL) | 489 | if (card == NULL) |
490 | return -ENOMEM; | 490 | return -ENOMEM; |
491 | 491 | ||
492 | /* (3) */ | 492 | /* (3) */ |
493 | if ((err = snd_mychip_create(card, pci, &chip)) < 0) { | 493 | if ((err = snd_mychip_create(card, pci, &chip)) < 0) { |
494 | snd_card_free(card); | 494 | snd_card_free(card); |
495 | return err; | 495 | return err; |
496 | } | 496 | } |
497 | 497 | ||
498 | /* (4) */ | 498 | /* (4) */ |
499 | strcpy(card->driver, "My Chip"); | 499 | strcpy(card->driver, "My Chip"); |
500 | strcpy(card->shortname, "My Own Chip 123"); | 500 | strcpy(card->shortname, "My Own Chip 123"); |
501 | sprintf(card->longname, "%s at 0x%lx irq %i", | 501 | sprintf(card->longname, "%s at 0x%lx irq %i", |
502 | card->shortname, chip->ioport, chip->irq); | 502 | card->shortname, chip->ioport, chip->irq); |
503 | 503 | ||
504 | /* (5) */ | 504 | /* (5) */ |
505 | .... // implemented later | 505 | .... // implemented later |
506 | 506 | ||
507 | /* (6) */ | 507 | /* (6) */ |
508 | if ((err = snd_card_register(card)) < 0) { | 508 | if ((err = snd_card_register(card)) < 0) { |
509 | snd_card_free(card); | 509 | snd_card_free(card); |
510 | return err; | 510 | return err; |
511 | } | 511 | } |
512 | 512 | ||
513 | /* (7) */ | 513 | /* (7) */ |
514 | pci_set_drvdata(pci, card); | 514 | pci_set_drvdata(pci, card); |
515 | dev++; | 515 | dev++; |
516 | return 0; | 516 | return 0; |
517 | } | 517 | } |
518 | 518 | ||
519 | /* destructor -- see "Destructor" sub-section */ | 519 | /* destructor -- see "Destructor" sub-section */ |
520 | static void __devexit snd_mychip_remove(struct pci_dev *pci) | 520 | static void __devexit snd_mychip_remove(struct pci_dev *pci) |
521 | { | 521 | { |
522 | snd_card_free(pci_get_drvdata(pci)); | 522 | snd_card_free(pci_get_drvdata(pci)); |
523 | pci_set_drvdata(pci, NULL); | 523 | pci_set_drvdata(pci, NULL); |
524 | } | 524 | } |
525 | ]]> | 525 | ]]> |
526 | </programlisting> | 526 | </programlisting> |
527 | </example> | 527 | </example> |
528 | </para> | 528 | </para> |
529 | </section> | 529 | </section> |
530 | 530 | ||
531 | <section id="basic-flow-constructor"> | 531 | <section id="basic-flow-constructor"> |
532 | <title>Constructor</title> | 532 | <title>Constructor</title> |
533 | <para> | 533 | <para> |
534 | The real constructor of PCI drivers is probe callback. The | 534 | The real constructor of PCI drivers is probe callback. The |
535 | probe callback and other component-constructors which are called | 535 | probe callback and other component-constructors which are called |
536 | from probe callback should be defined with | 536 | from probe callback should be defined with |
537 | <parameter>__devinit</parameter> prefix. You | 537 | <parameter>__devinit</parameter> prefix. You |
538 | cannot use <parameter>__init</parameter> prefix for them, | 538 | cannot use <parameter>__init</parameter> prefix for them, |
539 | because any PCI device could be a hotplug device. | 539 | because any PCI device could be a hotplug device. |
540 | </para> | 540 | </para> |
541 | 541 | ||
542 | <para> | 542 | <para> |
543 | In the probe callback, the following scheme is often used. | 543 | In the probe callback, the following scheme is often used. |
544 | </para> | 544 | </para> |
545 | 545 | ||
546 | <section id="basic-flow-constructor-device-index"> | 546 | <section id="basic-flow-constructor-device-index"> |
547 | <title>1) Check and increment the device index.</title> | 547 | <title>1) Check and increment the device index.</title> |
548 | <para> | 548 | <para> |
549 | <informalexample> | 549 | <informalexample> |
550 | <programlisting> | 550 | <programlisting> |
551 | <![CDATA[ | 551 | <![CDATA[ |
552 | static int dev; | 552 | static int dev; |
553 | .... | 553 | .... |
554 | if (dev >= SNDRV_CARDS) | 554 | if (dev >= SNDRV_CARDS) |
555 | return -ENODEV; | 555 | return -ENODEV; |
556 | if (!enable[dev]) { | 556 | if (!enable[dev]) { |
557 | dev++; | 557 | dev++; |
558 | return -ENOENT; | 558 | return -ENOENT; |
559 | } | 559 | } |
560 | ]]> | 560 | ]]> |
561 | </programlisting> | 561 | </programlisting> |
562 | </informalexample> | 562 | </informalexample> |
563 | 563 | ||
564 | where enable[dev] is the module option. | 564 | where enable[dev] is the module option. |
565 | </para> | 565 | </para> |
566 | 566 | ||
567 | <para> | 567 | <para> |
568 | At each time probe callback is called, check the | 568 | At each time probe callback is called, check the |
569 | availability of the device. If not available, simply increment | 569 | availability of the device. If not available, simply increment |
570 | the device index and returns. dev will be incremented also | 570 | the device index and returns. dev will be incremented also |
571 | later (<link | 571 | later (<link |
572 | linkend="basic-flow-constructor-set-pci"><citetitle>step | 572 | linkend="basic-flow-constructor-set-pci"><citetitle>step |
573 | 7</citetitle></link>). | 573 | 7</citetitle></link>). |
574 | </para> | 574 | </para> |
575 | </section> | 575 | </section> |
576 | 576 | ||
577 | <section id="basic-flow-constructor-create-card"> | 577 | <section id="basic-flow-constructor-create-card"> |
578 | <title>2) Create a card instance</title> | 578 | <title>2) Create a card instance</title> |
579 | <para> | 579 | <para> |
580 | <informalexample> | 580 | <informalexample> |
581 | <programlisting> | 581 | <programlisting> |
582 | <![CDATA[ | 582 | <![CDATA[ |
583 | struct snd_card *card; | 583 | struct snd_card *card; |
584 | .... | 584 | .... |
585 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); | 585 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); |
586 | ]]> | 586 | ]]> |
587 | </programlisting> | 587 | </programlisting> |
588 | </informalexample> | 588 | </informalexample> |
589 | </para> | 589 | </para> |
590 | 590 | ||
591 | <para> | 591 | <para> |
592 | The detail will be explained in the section | 592 | The detail will be explained in the section |
593 | <link linkend="card-management-card-instance"><citetitle> | 593 | <link linkend="card-management-card-instance"><citetitle> |
594 | Management of Cards and Components</citetitle></link>. | 594 | Management of Cards and Components</citetitle></link>. |
595 | </para> | 595 | </para> |
596 | </section> | 596 | </section> |
597 | 597 | ||
598 | <section id="basic-flow-constructor-create-main"> | 598 | <section id="basic-flow-constructor-create-main"> |
599 | <title>3) Create a main component</title> | 599 | <title>3) Create a main component</title> |
600 | <para> | 600 | <para> |
601 | In this part, the PCI resources are allocated. | 601 | In this part, the PCI resources are allocated. |
602 | 602 | ||
603 | <informalexample> | 603 | <informalexample> |
604 | <programlisting> | 604 | <programlisting> |
605 | <![CDATA[ | 605 | <![CDATA[ |
606 | struct mychip *chip; | 606 | struct mychip *chip; |
607 | .... | 607 | .... |
608 | if ((err = snd_mychip_create(card, pci, &chip)) < 0) { | 608 | if ((err = snd_mychip_create(card, pci, &chip)) < 0) { |
609 | snd_card_free(card); | 609 | snd_card_free(card); |
610 | return err; | 610 | return err; |
611 | } | 611 | } |
612 | ]]> | 612 | ]]> |
613 | </programlisting> | 613 | </programlisting> |
614 | </informalexample> | 614 | </informalexample> |
615 | 615 | ||
616 | The detail will be explained in the section <link | 616 | The detail will be explained in the section <link |
617 | linkend="pci-resource"><citetitle>PCI Resource | 617 | linkend="pci-resource"><citetitle>PCI Resource |
618 | Managements</citetitle></link>. | 618 | Managements</citetitle></link>. |
619 | </para> | 619 | </para> |
620 | </section> | 620 | </section> |
621 | 621 | ||
622 | <section id="basic-flow-constructor-main-component"> | 622 | <section id="basic-flow-constructor-main-component"> |
623 | <title>4) Set the driver ID and name strings.</title> | 623 | <title>4) Set the driver ID and name strings.</title> |
624 | <para> | 624 | <para> |
625 | <informalexample> | 625 | <informalexample> |
626 | <programlisting> | 626 | <programlisting> |
627 | <![CDATA[ | 627 | <![CDATA[ |
628 | strcpy(card->driver, "My Chip"); | 628 | strcpy(card->driver, "My Chip"); |
629 | strcpy(card->shortname, "My Own Chip 123"); | 629 | strcpy(card->shortname, "My Own Chip 123"); |
630 | sprintf(card->longname, "%s at 0x%lx irq %i", | 630 | sprintf(card->longname, "%s at 0x%lx irq %i", |
631 | card->shortname, chip->ioport, chip->irq); | 631 | card->shortname, chip->ioport, chip->irq); |
632 | ]]> | 632 | ]]> |
633 | </programlisting> | 633 | </programlisting> |
634 | </informalexample> | 634 | </informalexample> |
635 | 635 | ||
636 | The driver field holds the minimal ID string of the | 636 | The driver field holds the minimal ID string of the |
637 | chip. This is referred by alsa-lib's configurator, so keep it | 637 | chip. This is referred by alsa-lib's configurator, so keep it |
638 | simple but unique. | 638 | simple but unique. |
639 | Even the same driver can have different driver IDs to | 639 | Even the same driver can have different driver IDs to |
640 | distinguish the functionality of each chip type. | 640 | distinguish the functionality of each chip type. |
641 | </para> | 641 | </para> |
642 | 642 | ||
643 | <para> | 643 | <para> |
644 | The shortname field is a string shown as more verbose | 644 | The shortname field is a string shown as more verbose |
645 | name. The longname field contains the information which is | 645 | name. The longname field contains the information which is |
646 | shown in <filename>/proc/asound/cards</filename>. | 646 | shown in <filename>/proc/asound/cards</filename>. |
647 | </para> | 647 | </para> |
648 | </section> | 648 | </section> |
649 | 649 | ||
650 | <section id="basic-flow-constructor-create-other"> | 650 | <section id="basic-flow-constructor-create-other"> |
651 | <title>5) Create other components, such as mixer, MIDI, etc.</title> | 651 | <title>5) Create other components, such as mixer, MIDI, etc.</title> |
652 | <para> | 652 | <para> |
653 | Here you define the basic components such as | 653 | Here you define the basic components such as |
654 | <link linkend="pcm-interface"><citetitle>PCM</citetitle></link>, | 654 | <link linkend="pcm-interface"><citetitle>PCM</citetitle></link>, |
655 | mixer (e.g. <link linkend="api-ac97"><citetitle>AC97</citetitle></link>), | 655 | mixer (e.g. <link linkend="api-ac97"><citetitle>AC97</citetitle></link>), |
656 | MIDI (e.g. <link linkend="midi-interface"><citetitle>MPU-401</citetitle></link>), | 656 | MIDI (e.g. <link linkend="midi-interface"><citetitle>MPU-401</citetitle></link>), |
657 | and other interfaces. | 657 | and other interfaces. |
658 | Also, if you want a <link linkend="proc-interface"><citetitle>proc | 658 | Also, if you want a <link linkend="proc-interface"><citetitle>proc |
659 | file</citetitle></link>, define it here, too. | 659 | file</citetitle></link>, define it here, too. |
660 | </para> | 660 | </para> |
661 | </section> | 661 | </section> |
662 | 662 | ||
663 | <section id="basic-flow-constructor-register-card"> | 663 | <section id="basic-flow-constructor-register-card"> |
664 | <title>6) Register the card instance.</title> | 664 | <title>6) Register the card instance.</title> |
665 | <para> | 665 | <para> |
666 | <informalexample> | 666 | <informalexample> |
667 | <programlisting> | 667 | <programlisting> |
668 | <![CDATA[ | 668 | <![CDATA[ |
669 | if ((err = snd_card_register(card)) < 0) { | 669 | if ((err = snd_card_register(card)) < 0) { |
670 | snd_card_free(card); | 670 | snd_card_free(card); |
671 | return err; | 671 | return err; |
672 | } | 672 | } |
673 | ]]> | 673 | ]]> |
674 | </programlisting> | 674 | </programlisting> |
675 | </informalexample> | 675 | </informalexample> |
676 | </para> | 676 | </para> |
677 | 677 | ||
678 | <para> | 678 | <para> |
679 | Will be explained in the section <link | 679 | Will be explained in the section <link |
680 | linkend="card-management-registration"><citetitle>Management | 680 | linkend="card-management-registration"><citetitle>Management |
681 | of Cards and Components</citetitle></link>, too. | 681 | of Cards and Components</citetitle></link>, too. |
682 | </para> | 682 | </para> |
683 | </section> | 683 | </section> |
684 | 684 | ||
685 | <section id="basic-flow-constructor-set-pci"> | 685 | <section id="basic-flow-constructor-set-pci"> |
686 | <title>7) Set the PCI driver data and return zero.</title> | 686 | <title>7) Set the PCI driver data and return zero.</title> |
687 | <para> | 687 | <para> |
688 | <informalexample> | 688 | <informalexample> |
689 | <programlisting> | 689 | <programlisting> |
690 | <![CDATA[ | 690 | <![CDATA[ |
691 | pci_set_drvdata(pci, card); | 691 | pci_set_drvdata(pci, card); |
692 | dev++; | 692 | dev++; |
693 | return 0; | 693 | return 0; |
694 | ]]> | 694 | ]]> |
695 | </programlisting> | 695 | </programlisting> |
696 | </informalexample> | 696 | </informalexample> |
697 | 697 | ||
698 | In the above, the card record is stored. This pointer is | 698 | In the above, the card record is stored. This pointer is |
699 | referred in the remove callback and power-management | 699 | referred in the remove callback and power-management |
700 | callbacks, too. | 700 | callbacks, too. |
701 | </para> | 701 | </para> |
702 | </section> | 702 | </section> |
703 | </section> | 703 | </section> |
704 | 704 | ||
705 | <section id="basic-flow-destructor"> | 705 | <section id="basic-flow-destructor"> |
706 | <title>Destructor</title> | 706 | <title>Destructor</title> |
707 | <para> | 707 | <para> |
708 | The destructor, remove callback, simply releases the card | 708 | The destructor, remove callback, simply releases the card |
709 | instance. Then the ALSA middle layer will release all the | 709 | instance. Then the ALSA middle layer will release all the |
710 | attached components automatically. | 710 | attached components automatically. |
711 | </para> | 711 | </para> |
712 | 712 | ||
713 | <para> | 713 | <para> |
714 | It would be typically like the following: | 714 | It would be typically like the following: |
715 | 715 | ||
716 | <informalexample> | 716 | <informalexample> |
717 | <programlisting> | 717 | <programlisting> |
718 | <![CDATA[ | 718 | <![CDATA[ |
719 | static void __devexit snd_mychip_remove(struct pci_dev *pci) | 719 | static void __devexit snd_mychip_remove(struct pci_dev *pci) |
720 | { | 720 | { |
721 | snd_card_free(pci_get_drvdata(pci)); | 721 | snd_card_free(pci_get_drvdata(pci)); |
722 | pci_set_drvdata(pci, NULL); | 722 | pci_set_drvdata(pci, NULL); |
723 | } | 723 | } |
724 | ]]> | 724 | ]]> |
725 | </programlisting> | 725 | </programlisting> |
726 | </informalexample> | 726 | </informalexample> |
727 | 727 | ||
728 | The above code assumes that the card pointer is set to the PCI | 728 | The above code assumes that the card pointer is set to the PCI |
729 | driver data. | 729 | driver data. |
730 | </para> | 730 | </para> |
731 | </section> | 731 | </section> |
732 | 732 | ||
733 | <section id="basic-flow-header-files"> | 733 | <section id="basic-flow-header-files"> |
734 | <title>Header Files</title> | 734 | <title>Header Files</title> |
735 | <para> | 735 | <para> |
736 | For the above example, at least the following include files | 736 | For the above example, at least the following include files |
737 | are necessary. | 737 | are necessary. |
738 | 738 | ||
739 | <informalexample> | 739 | <informalexample> |
740 | <programlisting> | 740 | <programlisting> |
741 | <![CDATA[ | 741 | <![CDATA[ |
742 | #include <sound/driver.h> | 742 | #include <sound/driver.h> |
743 | #include <linux/init.h> | 743 | #include <linux/init.h> |
744 | #include <linux/pci.h> | 744 | #include <linux/pci.h> |
745 | #include <linux/slab.h> | 745 | #include <linux/slab.h> |
746 | #include <sound/core.h> | 746 | #include <sound/core.h> |
747 | #include <sound/initval.h> | 747 | #include <sound/initval.h> |
748 | ]]> | 748 | ]]> |
749 | </programlisting> | 749 | </programlisting> |
750 | </informalexample> | 750 | </informalexample> |
751 | 751 | ||
752 | where the last one is necessary only when module options are | 752 | where the last one is necessary only when module options are |
753 | defined in the source file. If the codes are split to several | 753 | defined in the source file. If the codes are split to several |
754 | files, the file without module options don't need them. | 754 | files, the file without module options don't need them. |
755 | </para> | 755 | </para> |
756 | 756 | ||
757 | <para> | 757 | <para> |
758 | In addition to them, you'll need | 758 | In addition to them, you'll need |
759 | <filename><linux/interrupt.h></filename> for the interrupt | 759 | <filename><linux/interrupt.h></filename> for the interrupt |
760 | handling, and <filename><asm/io.h></filename> for the i/o | 760 | handling, and <filename><asm/io.h></filename> for the i/o |
761 | access. If you use <function>mdelay()</function> or | 761 | access. If you use <function>mdelay()</function> or |
762 | <function>udelay()</function> functions, you'll need to include | 762 | <function>udelay()</function> functions, you'll need to include |
763 | <filename><linux/delay.h></filename>, too. | 763 | <filename><linux/delay.h></filename>, too. |
764 | </para> | 764 | </para> |
765 | 765 | ||
766 | <para> | 766 | <para> |
767 | The ALSA interfaces like PCM or control API are defined in other | 767 | The ALSA interfaces like PCM or control API are defined in other |
768 | header files as <filename><sound/xxx.h></filename>. | 768 | header files as <filename><sound/xxx.h></filename>. |
769 | They have to be included after | 769 | They have to be included after |
770 | <filename><sound/core.h></filename>. | 770 | <filename><sound/core.h></filename>. |
771 | </para> | 771 | </para> |
772 | 772 | ||
773 | </section> | 773 | </section> |
774 | </chapter> | 774 | </chapter> |
775 | 775 | ||
776 | 776 | ||
777 | <!-- ****************************************************** --> | 777 | <!-- ****************************************************** --> |
778 | <!-- Management of Cards and Components --> | 778 | <!-- Management of Cards and Components --> |
779 | <!-- ****************************************************** --> | 779 | <!-- ****************************************************** --> |
780 | <chapter id="card-management"> | 780 | <chapter id="card-management"> |
781 | <title>Management of Cards and Components</title> | 781 | <title>Management of Cards and Components</title> |
782 | 782 | ||
783 | <section id="card-management-card-instance"> | 783 | <section id="card-management-card-instance"> |
784 | <title>Card Instance</title> | 784 | <title>Card Instance</title> |
785 | <para> | 785 | <para> |
786 | For each soundcard, a <quote>card</quote> record must be allocated. | 786 | For each soundcard, a <quote>card</quote> record must be allocated. |
787 | </para> | 787 | </para> |
788 | 788 | ||
789 | <para> | 789 | <para> |
790 | A card record is the headquarters of the soundcard. It manages | 790 | A card record is the headquarters of the soundcard. It manages |
791 | the list of whole devices (components) on the soundcard, such as | 791 | the list of whole devices (components) on the soundcard, such as |
792 | PCM, mixers, MIDI, synthesizer, and so on. Also, the card | 792 | PCM, mixers, MIDI, synthesizer, and so on. Also, the card |
793 | record holds the ID and the name strings of the card, manages | 793 | record holds the ID and the name strings of the card, manages |
794 | the root of proc files, and controls the power-management states | 794 | the root of proc files, and controls the power-management states |
795 | and hotplug disconnections. The component list on the card | 795 | and hotplug disconnections. The component list on the card |
796 | record is used to manage the proper releases of resources at | 796 | record is used to manage the proper releases of resources at |
797 | destruction. | 797 | destruction. |
798 | </para> | 798 | </para> |
799 | 799 | ||
800 | <para> | 800 | <para> |
801 | As mentioned above, to create a card instance, call | 801 | As mentioned above, to create a card instance, call |
802 | <function>snd_card_new()</function>. | 802 | <function>snd_card_new()</function>. |
803 | 803 | ||
804 | <informalexample> | 804 | <informalexample> |
805 | <programlisting> | 805 | <programlisting> |
806 | <![CDATA[ | 806 | <![CDATA[ |
807 | struct snd_card *card; | 807 | struct snd_card *card; |
808 | card = snd_card_new(index, id, module, extra_size); | 808 | card = snd_card_new(index, id, module, extra_size); |
809 | ]]> | 809 | ]]> |
810 | </programlisting> | 810 | </programlisting> |
811 | </informalexample> | 811 | </informalexample> |
812 | </para> | 812 | </para> |
813 | 813 | ||
814 | <para> | 814 | <para> |
815 | The function takes four arguments, the card-index number, the | 815 | The function takes four arguments, the card-index number, the |
816 | id string, the module pointer (usually | 816 | id string, the module pointer (usually |
817 | <constant>THIS_MODULE</constant>), | 817 | <constant>THIS_MODULE</constant>), |
818 | and the size of extra-data space. The last argument is used to | 818 | and the size of extra-data space. The last argument is used to |
819 | allocate card->private_data for the | 819 | allocate card->private_data for the |
820 | chip-specific data. Note that this data | 820 | chip-specific data. Note that this data |
821 | <emphasis>is</emphasis> allocated by | 821 | <emphasis>is</emphasis> allocated by |
822 | <function>snd_card_new()</function>. | 822 | <function>snd_card_new()</function>. |
823 | </para> | 823 | </para> |
824 | </section> | 824 | </section> |
825 | 825 | ||
826 | <section id="card-management-component"> | 826 | <section id="card-management-component"> |
827 | <title>Components</title> | 827 | <title>Components</title> |
828 | <para> | 828 | <para> |
829 | After the card is created, you can attach the components | 829 | After the card is created, you can attach the components |
830 | (devices) to the card instance. On ALSA driver, a component is | 830 | (devices) to the card instance. On ALSA driver, a component is |
831 | represented as a struct <structname>snd_device</structname> object. | 831 | represented as a struct <structname>snd_device</structname> object. |
832 | A component can be a PCM instance, a control interface, a raw | 832 | A component can be a PCM instance, a control interface, a raw |
833 | MIDI interface, etc. Each of such instances has one component | 833 | MIDI interface, etc. Each of such instances has one component |
834 | entry. | 834 | entry. |
835 | </para> | 835 | </para> |
836 | 836 | ||
837 | <para> | 837 | <para> |
838 | A component can be created via | 838 | A component can be created via |
839 | <function>snd_device_new()</function> function. | 839 | <function>snd_device_new()</function> function. |
840 | 840 | ||
841 | <informalexample> | 841 | <informalexample> |
842 | <programlisting> | 842 | <programlisting> |
843 | <![CDATA[ | 843 | <![CDATA[ |
844 | snd_device_new(card, SNDRV_DEV_XXX, chip, &ops); | 844 | snd_device_new(card, SNDRV_DEV_XXX, chip, &ops); |
845 | ]]> | 845 | ]]> |
846 | </programlisting> | 846 | </programlisting> |
847 | </informalexample> | 847 | </informalexample> |
848 | </para> | 848 | </para> |
849 | 849 | ||
850 | <para> | 850 | <para> |
851 | This takes the card pointer, the device-level | 851 | This takes the card pointer, the device-level |
852 | (<constant>SNDRV_DEV_XXX</constant>), the data pointer, and the | 852 | (<constant>SNDRV_DEV_XXX</constant>), the data pointer, and the |
853 | callback pointers (<parameter>&ops</parameter>). The | 853 | callback pointers (<parameter>&ops</parameter>). The |
854 | device-level defines the type of components and the order of | 854 | device-level defines the type of components and the order of |
855 | registration and de-registration. For most of components, the | 855 | registration and de-registration. For most of components, the |
856 | device-level is already defined. For a user-defined component, | 856 | device-level is already defined. For a user-defined component, |
857 | you can use <constant>SNDRV_DEV_LOWLEVEL</constant>. | 857 | you can use <constant>SNDRV_DEV_LOWLEVEL</constant>. |
858 | </para> | 858 | </para> |
859 | 859 | ||
860 | <para> | 860 | <para> |
861 | This function itself doesn't allocate the data space. The data | 861 | This function itself doesn't allocate the data space. The data |
862 | must be allocated manually beforehand, and its pointer is passed | 862 | must be allocated manually beforehand, and its pointer is passed |
863 | as the argument. This pointer is used as the identifier | 863 | as the argument. This pointer is used as the identifier |
864 | (<parameter>chip</parameter> in the above example) for the | 864 | (<parameter>chip</parameter> in the above example) for the |
865 | instance. | 865 | instance. |
866 | </para> | 866 | </para> |
867 | 867 | ||
868 | <para> | 868 | <para> |
869 | Each ALSA pre-defined component such as ac97 or pcm calls | 869 | Each ALSA pre-defined component such as ac97 or pcm calls |
870 | <function>snd_device_new()</function> inside its | 870 | <function>snd_device_new()</function> inside its |
871 | constructor. The destructor for each component is defined in the | 871 | constructor. The destructor for each component is defined in the |
872 | callback pointers. Hence, you don't need to take care of | 872 | callback pointers. Hence, you don't need to take care of |
873 | calling a destructor for such a component. | 873 | calling a destructor for such a component. |
874 | </para> | 874 | </para> |
875 | 875 | ||
876 | <para> | 876 | <para> |
877 | If you would like to create your own component, you need to | 877 | If you would like to create your own component, you need to |
878 | set the destructor function to dev_free callback in | 878 | set the destructor function to dev_free callback in |
879 | <parameter>ops</parameter>, so that it can be released | 879 | <parameter>ops</parameter>, so that it can be released |
880 | automatically via <function>snd_card_free()</function>. The | 880 | automatically via <function>snd_card_free()</function>. The |
881 | example will be shown later as an implementation of a | 881 | example will be shown later as an implementation of a |
882 | chip-specific data. | 882 | chip-specific data. |
883 | </para> | 883 | </para> |
884 | </section> | 884 | </section> |
885 | 885 | ||
886 | <section id="card-management-chip-specific"> | 886 | <section id="card-management-chip-specific"> |
887 | <title>Chip-Specific Data</title> | 887 | <title>Chip-Specific Data</title> |
888 | <para> | 888 | <para> |
889 | The chip-specific information, e.g. the i/o port address, its | 889 | The chip-specific information, e.g. the i/o port address, its |
890 | resource pointer, or the irq number, is stored in the | 890 | resource pointer, or the irq number, is stored in the |
891 | chip-specific record. | 891 | chip-specific record. |
892 | 892 | ||
893 | <informalexample> | 893 | <informalexample> |
894 | <programlisting> | 894 | <programlisting> |
895 | <![CDATA[ | 895 | <![CDATA[ |
896 | struct mychip { | 896 | struct mychip { |
897 | .... | 897 | .... |
898 | }; | 898 | }; |
899 | ]]> | 899 | ]]> |
900 | </programlisting> | 900 | </programlisting> |
901 | </informalexample> | 901 | </informalexample> |
902 | </para> | 902 | </para> |
903 | 903 | ||
904 | <para> | 904 | <para> |
905 | In general, there are two ways to allocate the chip record. | 905 | In general, there are two ways to allocate the chip record. |
906 | </para> | 906 | </para> |
907 | 907 | ||
908 | <section id="card-management-chip-specific-snd-card-new"> | 908 | <section id="card-management-chip-specific-snd-card-new"> |
909 | <title>1. Allocating via <function>snd_card_new()</function>.</title> | 909 | <title>1. Allocating via <function>snd_card_new()</function>.</title> |
910 | <para> | 910 | <para> |
911 | As mentioned above, you can pass the extra-data-length to the 4th argument of <function>snd_card_new()</function>, i.e. | 911 | As mentioned above, you can pass the extra-data-length to the 4th argument of <function>snd_card_new()</function>, i.e. |
912 | 912 | ||
913 | <informalexample> | 913 | <informalexample> |
914 | <programlisting> | 914 | <programlisting> |
915 | <![CDATA[ | 915 | <![CDATA[ |
916 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(struct mychip)); | 916 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(struct mychip)); |
917 | ]]> | 917 | ]]> |
918 | </programlisting> | 918 | </programlisting> |
919 | </informalexample> | 919 | </informalexample> |
920 | 920 | ||
921 | whether struct <structname>mychip</structname> is the type of the chip record. | 921 | whether struct <structname>mychip</structname> is the type of the chip record. |
922 | </para> | 922 | </para> |
923 | 923 | ||
924 | <para> | 924 | <para> |
925 | In return, the allocated record can be accessed as | 925 | In return, the allocated record can be accessed as |
926 | 926 | ||
927 | <informalexample> | 927 | <informalexample> |
928 | <programlisting> | 928 | <programlisting> |
929 | <![CDATA[ | 929 | <![CDATA[ |
930 | struct mychip *chip = (struct mychip *)card->private_data; | 930 | struct mychip *chip = (struct mychip *)card->private_data; |
931 | ]]> | 931 | ]]> |
932 | </programlisting> | 932 | </programlisting> |
933 | </informalexample> | 933 | </informalexample> |
934 | 934 | ||
935 | With this method, you don't have to allocate twice. | 935 | With this method, you don't have to allocate twice. |
936 | The record is released together with the card instance. | 936 | The record is released together with the card instance. |
937 | </para> | 937 | </para> |
938 | </section> | 938 | </section> |
939 | 939 | ||
940 | <section id="card-management-chip-specific-allocate-extra"> | 940 | <section id="card-management-chip-specific-allocate-extra"> |
941 | <title>2. Allocating an extra device.</title> | 941 | <title>2. Allocating an extra device.</title> |
942 | 942 | ||
943 | <para> | 943 | <para> |
944 | After allocating a card instance via | 944 | After allocating a card instance via |
945 | <function>snd_card_new()</function> (with | 945 | <function>snd_card_new()</function> (with |
946 | <constant>NULL</constant> on the 4th arg), call | 946 | <constant>NULL</constant> on the 4th arg), call |
947 | <function>kzalloc()</function>. | 947 | <function>kzalloc()</function>. |
948 | 948 | ||
949 | <informalexample> | 949 | <informalexample> |
950 | <programlisting> | 950 | <programlisting> |
951 | <![CDATA[ | 951 | <![CDATA[ |
952 | struct snd_card *card; | 952 | struct snd_card *card; |
953 | struct mychip *chip; | 953 | struct mychip *chip; |
954 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); | 954 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); |
955 | ..... | 955 | ..... |
956 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); | 956 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); |
957 | ]]> | 957 | ]]> |
958 | </programlisting> | 958 | </programlisting> |
959 | </informalexample> | 959 | </informalexample> |
960 | </para> | 960 | </para> |
961 | 961 | ||
962 | <para> | 962 | <para> |
963 | The chip record should have the field to hold the card | 963 | The chip record should have the field to hold the card |
964 | pointer at least, | 964 | pointer at least, |
965 | 965 | ||
966 | <informalexample> | 966 | <informalexample> |
967 | <programlisting> | 967 | <programlisting> |
968 | <![CDATA[ | 968 | <![CDATA[ |
969 | struct mychip { | 969 | struct mychip { |
970 | struct snd_card *card; | 970 | struct snd_card *card; |
971 | .... | 971 | .... |
972 | }; | 972 | }; |
973 | ]]> | 973 | ]]> |
974 | </programlisting> | 974 | </programlisting> |
975 | </informalexample> | 975 | </informalexample> |
976 | </para> | 976 | </para> |
977 | 977 | ||
978 | <para> | 978 | <para> |
979 | Then, set the card pointer in the returned chip instance. | 979 | Then, set the card pointer in the returned chip instance. |
980 | 980 | ||
981 | <informalexample> | 981 | <informalexample> |
982 | <programlisting> | 982 | <programlisting> |
983 | <![CDATA[ | 983 | <![CDATA[ |
984 | chip->card = card; | 984 | chip->card = card; |
985 | ]]> | 985 | ]]> |
986 | </programlisting> | 986 | </programlisting> |
987 | </informalexample> | 987 | </informalexample> |
988 | </para> | 988 | </para> |
989 | 989 | ||
990 | <para> | 990 | <para> |
991 | Next, initialize the fields, and register this chip | 991 | Next, initialize the fields, and register this chip |
992 | record as a low-level device with a specified | 992 | record as a low-level device with a specified |
993 | <parameter>ops</parameter>, | 993 | <parameter>ops</parameter>, |
994 | 994 | ||
995 | <informalexample> | 995 | <informalexample> |
996 | <programlisting> | 996 | <programlisting> |
997 | <![CDATA[ | 997 | <![CDATA[ |
998 | static struct snd_device_ops ops = { | 998 | static struct snd_device_ops ops = { |
999 | .dev_free = snd_mychip_dev_free, | 999 | .dev_free = snd_mychip_dev_free, |
1000 | }; | 1000 | }; |
1001 | .... | 1001 | .... |
1002 | snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops); | 1002 | snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops); |
1003 | ]]> | 1003 | ]]> |
1004 | </programlisting> | 1004 | </programlisting> |
1005 | </informalexample> | 1005 | </informalexample> |
1006 | 1006 | ||
1007 | <function>snd_mychip_dev_free()</function> is the | 1007 | <function>snd_mychip_dev_free()</function> is the |
1008 | device-destructor function, which will call the real | 1008 | device-destructor function, which will call the real |
1009 | destructor. | 1009 | destructor. |
1010 | </para> | 1010 | </para> |
1011 | 1011 | ||
1012 | <para> | 1012 | <para> |
1013 | <informalexample> | 1013 | <informalexample> |
1014 | <programlisting> | 1014 | <programlisting> |
1015 | <![CDATA[ | 1015 | <![CDATA[ |
1016 | static int snd_mychip_dev_free(struct snd_device *device) | 1016 | static int snd_mychip_dev_free(struct snd_device *device) |
1017 | { | 1017 | { |
1018 | return snd_mychip_free(device->device_data); | 1018 | return snd_mychip_free(device->device_data); |
1019 | } | 1019 | } |
1020 | ]]> | 1020 | ]]> |
1021 | </programlisting> | 1021 | </programlisting> |
1022 | </informalexample> | 1022 | </informalexample> |
1023 | 1023 | ||
1024 | where <function>snd_mychip_free()</function> is the real destructor. | 1024 | where <function>snd_mychip_free()</function> is the real destructor. |
1025 | </para> | 1025 | </para> |
1026 | </section> | 1026 | </section> |
1027 | </section> | 1027 | </section> |
1028 | 1028 | ||
1029 | <section id="card-management-registration"> | 1029 | <section id="card-management-registration"> |
1030 | <title>Registration and Release</title> | 1030 | <title>Registration and Release</title> |
1031 | <para> | 1031 | <para> |
1032 | After all components are assigned, register the card instance | 1032 | After all components are assigned, register the card instance |
1033 | by calling <function>snd_card_register()</function>. The access | 1033 | by calling <function>snd_card_register()</function>. The access |
1034 | to the device files are enabled at this point. That is, before | 1034 | to the device files are enabled at this point. That is, before |
1035 | <function>snd_card_register()</function> is called, the | 1035 | <function>snd_card_register()</function> is called, the |
1036 | components are safely inaccessible from external side. If this | 1036 | components are safely inaccessible from external side. If this |
1037 | call fails, exit the probe function after releasing the card via | 1037 | call fails, exit the probe function after releasing the card via |
1038 | <function>snd_card_free()</function>. | 1038 | <function>snd_card_free()</function>. |
1039 | </para> | 1039 | </para> |
1040 | 1040 | ||
1041 | <para> | 1041 | <para> |
1042 | For releasing the card instance, you can call simply | 1042 | For releasing the card instance, you can call simply |
1043 | <function>snd_card_free()</function>. As already mentioned, all | 1043 | <function>snd_card_free()</function>. As already mentioned, all |
1044 | components are released automatically by this call. | 1044 | components are released automatically by this call. |
1045 | </para> | 1045 | </para> |
1046 | 1046 | ||
1047 | <para> | 1047 | <para> |
1048 | As further notes, the destructors (both | 1048 | As further notes, the destructors (both |
1049 | <function>snd_mychip_dev_free</function> and | 1049 | <function>snd_mychip_dev_free</function> and |
1050 | <function>snd_mychip_free</function>) cannot be defined with | 1050 | <function>snd_mychip_free</function>) cannot be defined with |
1051 | <parameter>__devexit</parameter> prefix, because they may be | 1051 | <parameter>__devexit</parameter> prefix, because they may be |
1052 | called from the constructor, too, at the false path. | 1052 | called from the constructor, too, at the false path. |
1053 | </para> | 1053 | </para> |
1054 | 1054 | ||
1055 | <para> | 1055 | <para> |
1056 | For a device which allows hotplugging, you can use | 1056 | For a device which allows hotplugging, you can use |
1057 | <function>snd_card_free_when_closed</function>. This one will | 1057 | <function>snd_card_free_when_closed</function>. This one will |
1058 | postpone the destruction until all devices are closed. | 1058 | postpone the destruction until all devices are closed. |
1059 | </para> | 1059 | </para> |
1060 | 1060 | ||
1061 | </section> | 1061 | </section> |
1062 | 1062 | ||
1063 | </chapter> | 1063 | </chapter> |
1064 | 1064 | ||
1065 | 1065 | ||
1066 | <!-- ****************************************************** --> | 1066 | <!-- ****************************************************** --> |
1067 | <!-- PCI Resource Managements --> | 1067 | <!-- PCI Resource Managements --> |
1068 | <!-- ****************************************************** --> | 1068 | <!-- ****************************************************** --> |
1069 | <chapter id="pci-resource"> | 1069 | <chapter id="pci-resource"> |
1070 | <title>PCI Resource Managements</title> | 1070 | <title>PCI Resource Managements</title> |
1071 | 1071 | ||
1072 | <section id="pci-resource-example"> | 1072 | <section id="pci-resource-example"> |
1073 | <title>Full Code Example</title> | 1073 | <title>Full Code Example</title> |
1074 | <para> | 1074 | <para> |
1075 | In this section, we'll finish the chip-specific constructor, | 1075 | In this section, we'll finish the chip-specific constructor, |
1076 | destructor and PCI entries. The example code is shown first, | 1076 | destructor and PCI entries. The example code is shown first, |
1077 | below. | 1077 | below. |
1078 | 1078 | ||
1079 | <example> | 1079 | <example> |
1080 | <title>PCI Resource Managements Example</title> | 1080 | <title>PCI Resource Managements Example</title> |
1081 | <programlisting> | 1081 | <programlisting> |
1082 | <![CDATA[ | 1082 | <![CDATA[ |
1083 | struct mychip { | 1083 | struct mychip { |
1084 | struct snd_card *card; | 1084 | struct snd_card *card; |
1085 | struct pci_dev *pci; | 1085 | struct pci_dev *pci; |
1086 | 1086 | ||
1087 | unsigned long port; | 1087 | unsigned long port; |
1088 | int irq; | 1088 | int irq; |
1089 | }; | 1089 | }; |
1090 | 1090 | ||
1091 | static int snd_mychip_free(struct mychip *chip) | 1091 | static int snd_mychip_free(struct mychip *chip) |
1092 | { | 1092 | { |
1093 | /* disable hardware here if any */ | 1093 | /* disable hardware here if any */ |
1094 | .... // (not implemented in this document) | 1094 | .... // (not implemented in this document) |
1095 | 1095 | ||
1096 | /* release the irq */ | 1096 | /* release the irq */ |
1097 | if (chip->irq >= 0) | 1097 | if (chip->irq >= 0) |
1098 | free_irq(chip->irq, (void *)chip); | 1098 | free_irq(chip->irq, (void *)chip); |
1099 | /* release the i/o ports & memory */ | 1099 | /* release the i/o ports & memory */ |
1100 | pci_release_regions(chip->pci); | 1100 | pci_release_regions(chip->pci); |
1101 | /* disable the PCI entry */ | 1101 | /* disable the PCI entry */ |
1102 | pci_disable_device(chip->pci); | 1102 | pci_disable_device(chip->pci); |
1103 | /* release the data */ | 1103 | /* release the data */ |
1104 | kfree(chip); | 1104 | kfree(chip); |
1105 | return 0; | 1105 | return 0; |
1106 | } | 1106 | } |
1107 | 1107 | ||
1108 | /* chip-specific constructor */ | 1108 | /* chip-specific constructor */ |
1109 | static int __devinit snd_mychip_create(struct snd_card *card, | 1109 | static int __devinit snd_mychip_create(struct snd_card *card, |
1110 | struct pci_dev *pci, | 1110 | struct pci_dev *pci, |
1111 | struct mychip **rchip) | 1111 | struct mychip **rchip) |
1112 | { | 1112 | { |
1113 | struct mychip *chip; | 1113 | struct mychip *chip; |
1114 | int err; | 1114 | int err; |
1115 | static struct snd_device_ops ops = { | 1115 | static struct snd_device_ops ops = { |
1116 | .dev_free = snd_mychip_dev_free, | 1116 | .dev_free = snd_mychip_dev_free, |
1117 | }; | 1117 | }; |
1118 | 1118 | ||
1119 | *rchip = NULL; | 1119 | *rchip = NULL; |
1120 | 1120 | ||
1121 | /* initialize the PCI entry */ | 1121 | /* initialize the PCI entry */ |
1122 | if ((err = pci_enable_device(pci)) < 0) | 1122 | if ((err = pci_enable_device(pci)) < 0) |
1123 | return err; | 1123 | return err; |
1124 | /* check PCI availability (28bit DMA) */ | 1124 | /* check PCI availability (28bit DMA) */ |
1125 | if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || | 1125 | if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || |
1126 | pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { | 1126 | pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { |
1127 | printk(KERN_ERR "error to set 28bit mask DMA\n"); | 1127 | printk(KERN_ERR "error to set 28bit mask DMA\n"); |
1128 | pci_disable_device(pci); | 1128 | pci_disable_device(pci); |
1129 | return -ENXIO; | 1129 | return -ENXIO; |
1130 | } | 1130 | } |
1131 | 1131 | ||
1132 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); | 1132 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); |
1133 | if (chip == NULL) { | 1133 | if (chip == NULL) { |
1134 | pci_disable_device(pci); | 1134 | pci_disable_device(pci); |
1135 | return -ENOMEM; | 1135 | return -ENOMEM; |
1136 | } | 1136 | } |
1137 | 1137 | ||
1138 | /* initialize the stuff */ | 1138 | /* initialize the stuff */ |
1139 | chip->card = card; | 1139 | chip->card = card; |
1140 | chip->pci = pci; | 1140 | chip->pci = pci; |
1141 | chip->irq = -1; | 1141 | chip->irq = -1; |
1142 | 1142 | ||
1143 | /* (1) PCI resource allocation */ | 1143 | /* (1) PCI resource allocation */ |
1144 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { | 1144 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { |
1145 | kfree(chip); | 1145 | kfree(chip); |
1146 | pci_disable_device(pci); | 1146 | pci_disable_device(pci); |
1147 | return err; | 1147 | return err; |
1148 | } | 1148 | } |
1149 | chip->port = pci_resource_start(pci, 0); | 1149 | chip->port = pci_resource_start(pci, 0); |
1150 | if (request_irq(pci->irq, snd_mychip_interrupt, | 1150 | if (request_irq(pci->irq, snd_mychip_interrupt, |
1151 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { | 1151 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { |
1152 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); | 1152 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); |
1153 | snd_mychip_free(chip); | 1153 | snd_mychip_free(chip); |
1154 | return -EBUSY; | 1154 | return -EBUSY; |
1155 | } | 1155 | } |
1156 | chip->irq = pci->irq; | 1156 | chip->irq = pci->irq; |
1157 | 1157 | ||
1158 | /* (2) initialization of the chip hardware */ | 1158 | /* (2) initialization of the chip hardware */ |
1159 | .... // (not implemented in this document) | 1159 | .... // (not implemented in this document) |
1160 | 1160 | ||
1161 | if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, | 1161 | if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, |
1162 | chip, &ops)) < 0) { | 1162 | chip, &ops)) < 0) { |
1163 | snd_mychip_free(chip); | 1163 | snd_mychip_free(chip); |
1164 | return err; | 1164 | return err; |
1165 | } | 1165 | } |
1166 | 1166 | ||
1167 | snd_card_set_dev(card, &pci->dev); | 1167 | snd_card_set_dev(card, &pci->dev); |
1168 | 1168 | ||
1169 | *rchip = chip; | 1169 | *rchip = chip; |
1170 | return 0; | 1170 | return 0; |
1171 | } | 1171 | } |
1172 | 1172 | ||
1173 | /* PCI IDs */ | 1173 | /* PCI IDs */ |
1174 | static struct pci_device_id snd_mychip_ids[] = { | 1174 | static struct pci_device_id snd_mychip_ids[] = { |
1175 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, | 1175 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, |
1176 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, | 1176 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, |
1177 | .... | 1177 | .... |
1178 | { 0, } | 1178 | { 0, } |
1179 | }; | 1179 | }; |
1180 | MODULE_DEVICE_TABLE(pci, snd_mychip_ids); | 1180 | MODULE_DEVICE_TABLE(pci, snd_mychip_ids); |
1181 | 1181 | ||
1182 | /* pci_driver definition */ | 1182 | /* pci_driver definition */ |
1183 | static struct pci_driver driver = { | 1183 | static struct pci_driver driver = { |
1184 | .name = "My Own Chip", | 1184 | .name = "My Own Chip", |
1185 | .id_table = snd_mychip_ids, | 1185 | .id_table = snd_mychip_ids, |
1186 | .probe = snd_mychip_probe, | 1186 | .probe = snd_mychip_probe, |
1187 | .remove = __devexit_p(snd_mychip_remove), | 1187 | .remove = __devexit_p(snd_mychip_remove), |
1188 | }; | 1188 | }; |
1189 | 1189 | ||
1190 | /* initialization of the module */ | 1190 | /* initialization of the module */ |
1191 | static int __init alsa_card_mychip_init(void) | 1191 | static int __init alsa_card_mychip_init(void) |
1192 | { | 1192 | { |
1193 | return pci_register_driver(&driver); | 1193 | return pci_register_driver(&driver); |
1194 | } | 1194 | } |
1195 | 1195 | ||
1196 | /* clean up the module */ | 1196 | /* clean up the module */ |
1197 | static void __exit alsa_card_mychip_exit(void) | 1197 | static void __exit alsa_card_mychip_exit(void) |
1198 | { | 1198 | { |
1199 | pci_unregister_driver(&driver); | 1199 | pci_unregister_driver(&driver); |
1200 | } | 1200 | } |
1201 | 1201 | ||
1202 | module_init(alsa_card_mychip_init) | 1202 | module_init(alsa_card_mychip_init) |
1203 | module_exit(alsa_card_mychip_exit) | 1203 | module_exit(alsa_card_mychip_exit) |
1204 | 1204 | ||
1205 | EXPORT_NO_SYMBOLS; /* for old kernels only */ | 1205 | EXPORT_NO_SYMBOLS; /* for old kernels only */ |
1206 | ]]> | 1206 | ]]> |
1207 | </programlisting> | 1207 | </programlisting> |
1208 | </example> | 1208 | </example> |
1209 | </para> | 1209 | </para> |
1210 | </section> | 1210 | </section> |
1211 | 1211 | ||
1212 | <section id="pci-resource-some-haftas"> | 1212 | <section id="pci-resource-some-haftas"> |
1213 | <title>Some Hafta's</title> | 1213 | <title>Some Hafta's</title> |
1214 | <para> | 1214 | <para> |
1215 | The allocation of PCI resources is done in the | 1215 | The allocation of PCI resources is done in the |
1216 | <function>probe()</function> function, and usually an extra | 1216 | <function>probe()</function> function, and usually an extra |
1217 | <function>xxx_create()</function> function is written for this | 1217 | <function>xxx_create()</function> function is written for this |
1218 | purpose. | 1218 | purpose. |
1219 | </para> | 1219 | </para> |
1220 | 1220 | ||
1221 | <para> | 1221 | <para> |
1222 | In the case of PCI devices, you have to call at first | 1222 | In the case of PCI devices, you have to call at first |
1223 | <function>pci_enable_device()</function> function before | 1223 | <function>pci_enable_device()</function> function before |
1224 | allocating resources. Also, you need to set the proper PCI DMA | 1224 | allocating resources. Also, you need to set the proper PCI DMA |
1225 | mask to limit the accessed i/o range. In some cases, you might | 1225 | mask to limit the accessed i/o range. In some cases, you might |
1226 | need to call <function>pci_set_master()</function> function, | 1226 | need to call <function>pci_set_master()</function> function, |
1227 | too. | 1227 | too. |
1228 | </para> | 1228 | </para> |
1229 | 1229 | ||
1230 | <para> | 1230 | <para> |
1231 | Suppose the 28bit mask, and the code to be added would be like: | 1231 | Suppose the 28bit mask, and the code to be added would be like: |
1232 | 1232 | ||
1233 | <informalexample> | 1233 | <informalexample> |
1234 | <programlisting> | 1234 | <programlisting> |
1235 | <![CDATA[ | 1235 | <![CDATA[ |
1236 | if ((err = pci_enable_device(pci)) < 0) | 1236 | if ((err = pci_enable_device(pci)) < 0) |
1237 | return err; | 1237 | return err; |
1238 | if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || | 1238 | if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || |
1239 | pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { | 1239 | pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { |
1240 | printk(KERN_ERR "error to set 28bit mask DMA\n"); | 1240 | printk(KERN_ERR "error to set 28bit mask DMA\n"); |
1241 | pci_disable_device(pci); | 1241 | pci_disable_device(pci); |
1242 | return -ENXIO; | 1242 | return -ENXIO; |
1243 | } | 1243 | } |
1244 | 1244 | ||
1245 | ]]> | 1245 | ]]> |
1246 | </programlisting> | 1246 | </programlisting> |
1247 | </informalexample> | 1247 | </informalexample> |
1248 | </para> | 1248 | </para> |
1249 | </section> | 1249 | </section> |
1250 | 1250 | ||
1251 | <section id="pci-resource-resource-allocation"> | 1251 | <section id="pci-resource-resource-allocation"> |
1252 | <title>Resource Allocation</title> | 1252 | <title>Resource Allocation</title> |
1253 | <para> | 1253 | <para> |
1254 | The allocation of I/O ports and irqs are done via standard kernel | 1254 | The allocation of I/O ports and irqs are done via standard kernel |
1255 | functions. Unlike ALSA ver.0.5.x., there are no helpers for | 1255 | functions. Unlike ALSA ver.0.5.x., there are no helpers for |
1256 | that. And these resources must be released in the destructor | 1256 | that. And these resources must be released in the destructor |
1257 | function (see below). Also, on ALSA 0.9.x, you don't need to | 1257 | function (see below). Also, on ALSA 0.9.x, you don't need to |
1258 | allocate (pseudo-)DMA for PCI like ALSA 0.5.x. | 1258 | allocate (pseudo-)DMA for PCI like ALSA 0.5.x. |
1259 | </para> | 1259 | </para> |
1260 | 1260 | ||
1261 | <para> | 1261 | <para> |
1262 | Now assume that this PCI device has an I/O port with 8 bytes | 1262 | Now assume that this PCI device has an I/O port with 8 bytes |
1263 | and an interrupt. Then struct <structname>mychip</structname> will have the | 1263 | and an interrupt. Then struct <structname>mychip</structname> will have the |
1264 | following fields: | 1264 | following fields: |
1265 | 1265 | ||
1266 | <informalexample> | 1266 | <informalexample> |
1267 | <programlisting> | 1267 | <programlisting> |
1268 | <![CDATA[ | 1268 | <![CDATA[ |
1269 | struct mychip { | 1269 | struct mychip { |
1270 | struct snd_card *card; | 1270 | struct snd_card *card; |
1271 | 1271 | ||
1272 | unsigned long port; | 1272 | unsigned long port; |
1273 | int irq; | 1273 | int irq; |
1274 | }; | 1274 | }; |
1275 | ]]> | 1275 | ]]> |
1276 | </programlisting> | 1276 | </programlisting> |
1277 | </informalexample> | 1277 | </informalexample> |
1278 | </para> | 1278 | </para> |
1279 | 1279 | ||
1280 | <para> | 1280 | <para> |
1281 | For an i/o port (and also a memory region), you need to have | 1281 | For an i/o port (and also a memory region), you need to have |
1282 | the resource pointer for the standard resource management. For | 1282 | the resource pointer for the standard resource management. For |
1283 | an irq, you have to keep only the irq number (integer). But you | 1283 | an irq, you have to keep only the irq number (integer). But you |
1284 | need to initialize this number as -1 before actual allocation, | 1284 | need to initialize this number as -1 before actual allocation, |
1285 | since irq 0 is valid. The port address and its resource pointer | 1285 | since irq 0 is valid. The port address and its resource pointer |
1286 | can be initialized as null by | 1286 | can be initialized as null by |
1287 | <function>kzalloc()</function> automatically, so you | 1287 | <function>kzalloc()</function> automatically, so you |
1288 | don't have to take care of resetting them. | 1288 | don't have to take care of resetting them. |
1289 | </para> | 1289 | </para> |
1290 | 1290 | ||
1291 | <para> | 1291 | <para> |
1292 | The allocation of an i/o port is done like this: | 1292 | The allocation of an i/o port is done like this: |
1293 | 1293 | ||
1294 | <informalexample> | 1294 | <informalexample> |
1295 | <programlisting> | 1295 | <programlisting> |
1296 | <![CDATA[ | 1296 | <![CDATA[ |
1297 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { | 1297 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { |
1298 | kfree(chip); | 1298 | kfree(chip); |
1299 | pci_disable_device(pci); | 1299 | pci_disable_device(pci); |
1300 | return err; | 1300 | return err; |
1301 | } | 1301 | } |
1302 | chip->port = pci_resource_start(pci, 0); | 1302 | chip->port = pci_resource_start(pci, 0); |
1303 | ]]> | 1303 | ]]> |
1304 | </programlisting> | 1304 | </programlisting> |
1305 | </informalexample> | 1305 | </informalexample> |
1306 | </para> | 1306 | </para> |
1307 | 1307 | ||
1308 | <para> | 1308 | <para> |
1309 | <!-- obsolete --> | 1309 | <!-- obsolete --> |
1310 | It will reserve the i/o port region of 8 bytes of the given | 1310 | It will reserve the i/o port region of 8 bytes of the given |
1311 | PCI device. The returned value, chip->res_port, is allocated | 1311 | PCI device. The returned value, chip->res_port, is allocated |
1312 | via <function>kmalloc()</function> by | 1312 | via <function>kmalloc()</function> by |
1313 | <function>request_region()</function>. The pointer must be | 1313 | <function>request_region()</function>. The pointer must be |
1314 | released via <function>kfree()</function>, but there is some | 1314 | released via <function>kfree()</function>, but there is some |
1315 | problem regarding this. This issue will be explained more below. | 1315 | problem regarding this. This issue will be explained more below. |
1316 | </para> | 1316 | </para> |
1317 | 1317 | ||
1318 | <para> | 1318 | <para> |
1319 | The allocation of an interrupt source is done like this: | 1319 | The allocation of an interrupt source is done like this: |
1320 | 1320 | ||
1321 | <informalexample> | 1321 | <informalexample> |
1322 | <programlisting> | 1322 | <programlisting> |
1323 | <![CDATA[ | 1323 | <![CDATA[ |
1324 | if (request_irq(pci->irq, snd_mychip_interrupt, | 1324 | if (request_irq(pci->irq, snd_mychip_interrupt, |
1325 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { | 1325 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { |
1326 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); | 1326 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); |
1327 | snd_mychip_free(chip); | 1327 | snd_mychip_free(chip); |
1328 | return -EBUSY; | 1328 | return -EBUSY; |
1329 | } | 1329 | } |
1330 | chip->irq = pci->irq; | 1330 | chip->irq = pci->irq; |
1331 | ]]> | 1331 | ]]> |
1332 | </programlisting> | 1332 | </programlisting> |
1333 | </informalexample> | 1333 | </informalexample> |
1334 | 1334 | ||
1335 | where <function>snd_mychip_interrupt()</function> is the | 1335 | where <function>snd_mychip_interrupt()</function> is the |
1336 | interrupt handler defined <link | 1336 | interrupt handler defined <link |
1337 | linkend="pcm-interface-interrupt-handler"><citetitle>later</citetitle></link>. | 1337 | linkend="pcm-interface-interrupt-handler"><citetitle>later</citetitle></link>. |
1338 | Note that chip->irq should be defined | 1338 | Note that chip->irq should be defined |
1339 | only when <function>request_irq()</function> succeeded. | 1339 | only when <function>request_irq()</function> succeeded. |
1340 | </para> | 1340 | </para> |
1341 | 1341 | ||
1342 | <para> | 1342 | <para> |
1343 | On the PCI bus, the interrupts can be shared. Thus, | 1343 | On the PCI bus, the interrupts can be shared. Thus, |
1344 | <constant>IRQF_SHARED</constant> is given as the interrupt flag of | 1344 | <constant>IRQF_SHARED</constant> is given as the interrupt flag of |
1345 | <function>request_irq()</function>. | 1345 | <function>request_irq()</function>. |
1346 | </para> | 1346 | </para> |
1347 | 1347 | ||
1348 | <para> | 1348 | <para> |
1349 | The last argument of <function>request_irq()</function> is the | 1349 | The last argument of <function>request_irq()</function> is the |
1350 | data pointer passed to the interrupt handler. Usually, the | 1350 | data pointer passed to the interrupt handler. Usually, the |
1351 | chip-specific record is used for that, but you can use what you | 1351 | chip-specific record is used for that, but you can use what you |
1352 | like, too. | 1352 | like, too. |
1353 | </para> | 1353 | </para> |
1354 | 1354 | ||
1355 | <para> | 1355 | <para> |
1356 | I won't define the detail of the interrupt handler at this | 1356 | I won't define the detail of the interrupt handler at this |
1357 | point, but at least its appearance can be explained now. The | 1357 | point, but at least its appearance can be explained now. The |
1358 | interrupt handler looks usually like the following: | 1358 | interrupt handler looks usually like the following: |
1359 | 1359 | ||
1360 | <informalexample> | 1360 | <informalexample> |
1361 | <programlisting> | 1361 | <programlisting> |
1362 | <![CDATA[ | 1362 | <![CDATA[ |
1363 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, | 1363 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, |
1364 | struct pt_regs *regs) | 1364 | struct pt_regs *regs) |
1365 | { | 1365 | { |
1366 | struct mychip *chip = dev_id; | 1366 | struct mychip *chip = dev_id; |
1367 | .... | 1367 | .... |
1368 | return IRQ_HANDLED; | 1368 | return IRQ_HANDLED; |
1369 | } | 1369 | } |
1370 | ]]> | 1370 | ]]> |
1371 | </programlisting> | 1371 | </programlisting> |
1372 | </informalexample> | 1372 | </informalexample> |
1373 | </para> | 1373 | </para> |
1374 | 1374 | ||
1375 | <para> | 1375 | <para> |
1376 | Now let's write the corresponding destructor for the resources | 1376 | Now let's write the corresponding destructor for the resources |
1377 | above. The role of destructor is simple: disable the hardware | 1377 | above. The role of destructor is simple: disable the hardware |
1378 | (if already activated) and release the resources. So far, we | 1378 | (if already activated) and release the resources. So far, we |
1379 | have no hardware part, so the disabling is not written here. | 1379 | have no hardware part, so the disabling is not written here. |
1380 | </para> | 1380 | </para> |
1381 | 1381 | ||
1382 | <para> | 1382 | <para> |
1383 | For releasing the resources, <quote>check-and-release</quote> | 1383 | For releasing the resources, <quote>check-and-release</quote> |
1384 | method is a safer way. For the interrupt, do like this: | 1384 | method is a safer way. For the interrupt, do like this: |
1385 | 1385 | ||
1386 | <informalexample> | 1386 | <informalexample> |
1387 | <programlisting> | 1387 | <programlisting> |
1388 | <![CDATA[ | 1388 | <![CDATA[ |
1389 | if (chip->irq >= 0) | 1389 | if (chip->irq >= 0) |
1390 | free_irq(chip->irq, (void *)chip); | 1390 | free_irq(chip->irq, (void *)chip); |
1391 | ]]> | 1391 | ]]> |
1392 | </programlisting> | 1392 | </programlisting> |
1393 | </informalexample> | 1393 | </informalexample> |
1394 | 1394 | ||
1395 | Since the irq number can start from 0, you should initialize | 1395 | Since the irq number can start from 0, you should initialize |
1396 | chip->irq with a negative value (e.g. -1), so that you can | 1396 | chip->irq with a negative value (e.g. -1), so that you can |
1397 | check the validity of the irq number as above. | 1397 | check the validity of the irq number as above. |
1398 | </para> | 1398 | </para> |
1399 | 1399 | ||
1400 | <para> | 1400 | <para> |
1401 | When you requested I/O ports or memory regions via | 1401 | When you requested I/O ports or memory regions via |
1402 | <function>pci_request_region()</function> or | 1402 | <function>pci_request_region()</function> or |
1403 | <function>pci_request_regions()</function> like this example, | 1403 | <function>pci_request_regions()</function> like this example, |
1404 | release the resource(s) using the corresponding function, | 1404 | release the resource(s) using the corresponding function, |
1405 | <function>pci_release_region()</function> or | 1405 | <function>pci_release_region()</function> or |
1406 | <function>pci_release_regions()</function>. | 1406 | <function>pci_release_regions()</function>. |
1407 | 1407 | ||
1408 | <informalexample> | 1408 | <informalexample> |
1409 | <programlisting> | 1409 | <programlisting> |
1410 | <![CDATA[ | 1410 | <![CDATA[ |
1411 | pci_release_regions(chip->pci); | 1411 | pci_release_regions(chip->pci); |
1412 | ]]> | 1412 | ]]> |
1413 | </programlisting> | 1413 | </programlisting> |
1414 | </informalexample> | 1414 | </informalexample> |
1415 | </para> | 1415 | </para> |
1416 | 1416 | ||
1417 | <para> | 1417 | <para> |
1418 | When you requested manually via <function>request_region()</function> | 1418 | When you requested manually via <function>request_region()</function> |
1419 | or <function>request_mem_region</function>, you can release it via | 1419 | or <function>request_mem_region</function>, you can release it via |
1420 | <function>release_resource()</function>. Suppose that you keep | 1420 | <function>release_resource()</function>. Suppose that you keep |
1421 | the resource pointer returned from <function>request_region()</function> | 1421 | the resource pointer returned from <function>request_region()</function> |
1422 | in chip->res_port, the release procedure looks like below: | 1422 | in chip->res_port, the release procedure looks like below: |
1423 | 1423 | ||
1424 | <informalexample> | 1424 | <informalexample> |
1425 | <programlisting> | 1425 | <programlisting> |
1426 | <![CDATA[ | 1426 | <![CDATA[ |
1427 | release_and_free_resource(chip->res_port); | 1427 | release_and_free_resource(chip->res_port); |
1428 | ]]> | 1428 | ]]> |
1429 | </programlisting> | 1429 | </programlisting> |
1430 | </informalexample> | 1430 | </informalexample> |
1431 | </para> | 1431 | </para> |
1432 | 1432 | ||
1433 | <para> | 1433 | <para> |
1434 | Don't forget to call <function>pci_disable_device()</function> | 1434 | Don't forget to call <function>pci_disable_device()</function> |
1435 | before all finished. | 1435 | before all finished. |
1436 | </para> | 1436 | </para> |
1437 | 1437 | ||
1438 | <para> | 1438 | <para> |
1439 | And finally, release the chip-specific record. | 1439 | And finally, release the chip-specific record. |
1440 | 1440 | ||
1441 | <informalexample> | 1441 | <informalexample> |
1442 | <programlisting> | 1442 | <programlisting> |
1443 | <![CDATA[ | 1443 | <![CDATA[ |
1444 | kfree(chip); | 1444 | kfree(chip); |
1445 | ]]> | 1445 | ]]> |
1446 | </programlisting> | 1446 | </programlisting> |
1447 | </informalexample> | 1447 | </informalexample> |
1448 | </para> | 1448 | </para> |
1449 | 1449 | ||
1450 | <para> | 1450 | <para> |
1451 | Again, remember that you cannot | 1451 | Again, remember that you cannot |
1452 | set <parameter>__devexit</parameter> prefix for this destructor. | 1452 | set <parameter>__devexit</parameter> prefix for this destructor. |
1453 | </para> | 1453 | </para> |
1454 | 1454 | ||
1455 | <para> | 1455 | <para> |
1456 | We didn't implement the hardware-disabling part in the above. | 1456 | We didn't implement the hardware-disabling part in the above. |
1457 | If you need to do this, please note that the destructor may be | 1457 | If you need to do this, please note that the destructor may be |
1458 | called even before the initialization of the chip is completed. | 1458 | called even before the initialization of the chip is completed. |
1459 | It would be better to have a flag to skip the hardware-disabling | 1459 | It would be better to have a flag to skip the hardware-disabling |
1460 | if the hardware was not initialized yet. | 1460 | if the hardware was not initialized yet. |
1461 | </para> | 1461 | </para> |
1462 | 1462 | ||
1463 | <para> | 1463 | <para> |
1464 | When the chip-data is assigned to the card using | 1464 | When the chip-data is assigned to the card using |
1465 | <function>snd_device_new()</function> with | 1465 | <function>snd_device_new()</function> with |
1466 | <constant>SNDRV_DEV_LOWLELVEL</constant> , its destructor is | 1466 | <constant>SNDRV_DEV_LOWLELVEL</constant> , its destructor is |
1467 | called at the last. That is, it is assured that all other | 1467 | called at the last. That is, it is assured that all other |
1468 | components like PCMs and controls have been already released. | 1468 | components like PCMs and controls have been already released. |
1469 | You don't have to call stopping PCMs, etc. explicitly, but just | 1469 | You don't have to call stopping PCMs, etc. explicitly, but just |
1470 | stop the hardware in the low-level. | 1470 | stop the hardware in the low-level. |
1471 | </para> | 1471 | </para> |
1472 | 1472 | ||
1473 | <para> | 1473 | <para> |
1474 | The management of a memory-mapped region is almost as same as | 1474 | The management of a memory-mapped region is almost as same as |
1475 | the management of an i/o port. You'll need three fields like | 1475 | the management of an i/o port. You'll need three fields like |
1476 | the following: | 1476 | the following: |
1477 | 1477 | ||
1478 | <informalexample> | 1478 | <informalexample> |
1479 | <programlisting> | 1479 | <programlisting> |
1480 | <![CDATA[ | 1480 | <![CDATA[ |
1481 | struct mychip { | 1481 | struct mychip { |
1482 | .... | 1482 | .... |
1483 | unsigned long iobase_phys; | 1483 | unsigned long iobase_phys; |
1484 | void __iomem *iobase_virt; | 1484 | void __iomem *iobase_virt; |
1485 | }; | 1485 | }; |
1486 | ]]> | 1486 | ]]> |
1487 | </programlisting> | 1487 | </programlisting> |
1488 | </informalexample> | 1488 | </informalexample> |
1489 | 1489 | ||
1490 | and the allocation would be like below: | 1490 | and the allocation would be like below: |
1491 | 1491 | ||
1492 | <informalexample> | 1492 | <informalexample> |
1493 | <programlisting> | 1493 | <programlisting> |
1494 | <![CDATA[ | 1494 | <![CDATA[ |
1495 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { | 1495 | if ((err = pci_request_regions(pci, "My Chip")) < 0) { |
1496 | kfree(chip); | 1496 | kfree(chip); |
1497 | return err; | 1497 | return err; |
1498 | } | 1498 | } |
1499 | chip->iobase_phys = pci_resource_start(pci, 0); | 1499 | chip->iobase_phys = pci_resource_start(pci, 0); |
1500 | chip->iobase_virt = ioremap_nocache(chip->iobase_phys, | 1500 | chip->iobase_virt = ioremap_nocache(chip->iobase_phys, |
1501 | pci_resource_len(pci, 0)); | 1501 | pci_resource_len(pci, 0)); |
1502 | ]]> | 1502 | ]]> |
1503 | </programlisting> | 1503 | </programlisting> |
1504 | </informalexample> | 1504 | </informalexample> |
1505 | 1505 | ||
1506 | and the corresponding destructor would be: | 1506 | and the corresponding destructor would be: |
1507 | 1507 | ||
1508 | <informalexample> | 1508 | <informalexample> |
1509 | <programlisting> | 1509 | <programlisting> |
1510 | <![CDATA[ | 1510 | <![CDATA[ |
1511 | static int snd_mychip_free(struct mychip *chip) | 1511 | static int snd_mychip_free(struct mychip *chip) |
1512 | { | 1512 | { |
1513 | .... | 1513 | .... |
1514 | if (chip->iobase_virt) | 1514 | if (chip->iobase_virt) |
1515 | iounmap(chip->iobase_virt); | 1515 | iounmap(chip->iobase_virt); |
1516 | .... | 1516 | .... |
1517 | pci_release_regions(chip->pci); | 1517 | pci_release_regions(chip->pci); |
1518 | .... | 1518 | .... |
1519 | } | 1519 | } |
1520 | ]]> | 1520 | ]]> |
1521 | </programlisting> | 1521 | </programlisting> |
1522 | </informalexample> | 1522 | </informalexample> |
1523 | </para> | 1523 | </para> |
1524 | 1524 | ||
1525 | </section> | 1525 | </section> |
1526 | 1526 | ||
1527 | <section id="pci-resource-device-struct"> | 1527 | <section id="pci-resource-device-struct"> |
1528 | <title>Registration of Device Struct</title> | 1528 | <title>Registration of Device Struct</title> |
1529 | <para> | 1529 | <para> |
1530 | At some point, typically after calling <function>snd_device_new()</function>, | 1530 | At some point, typically after calling <function>snd_device_new()</function>, |
1531 | you need to register the struct <structname>device</structname> of the chip | 1531 | you need to register the struct <structname>device</structname> of the chip |
1532 | you're handling for udev and co. ALSA provides a macro for compatibility with | 1532 | you're handling for udev and co. ALSA provides a macro for compatibility with |
1533 | older kernels. Simply call like the following: | 1533 | older kernels. Simply call like the following: |
1534 | <informalexample> | 1534 | <informalexample> |
1535 | <programlisting> | 1535 | <programlisting> |
1536 | <![CDATA[ | 1536 | <![CDATA[ |
1537 | snd_card_set_dev(card, &pci->dev); | 1537 | snd_card_set_dev(card, &pci->dev); |
1538 | ]]> | 1538 | ]]> |
1539 | </programlisting> | 1539 | </programlisting> |
1540 | </informalexample> | 1540 | </informalexample> |
1541 | so that it stores the PCI's device pointer to the card. This will be | 1541 | so that it stores the PCI's device pointer to the card. This will be |
1542 | referred by ALSA core functions later when the devices are registered. | 1542 | referred by ALSA core functions later when the devices are registered. |
1543 | </para> | 1543 | </para> |
1544 | <para> | 1544 | <para> |
1545 | In the case of non-PCI, pass the proper device struct pointer of the BUS | 1545 | In the case of non-PCI, pass the proper device struct pointer of the BUS |
1546 | instead. (In the case of legacy ISA without PnP, you don't have to do | 1546 | instead. (In the case of legacy ISA without PnP, you don't have to do |
1547 | anything.) | 1547 | anything.) |
1548 | </para> | 1548 | </para> |
1549 | </section> | 1549 | </section> |
1550 | 1550 | ||
1551 | <section id="pci-resource-entries"> | 1551 | <section id="pci-resource-entries"> |
1552 | <title>PCI Entries</title> | 1552 | <title>PCI Entries</title> |
1553 | <para> | 1553 | <para> |
1554 | So far, so good. Let's finish the rest of missing PCI | 1554 | So far, so good. Let's finish the rest of missing PCI |
1555 | stuffs. At first, we need a | 1555 | stuffs. At first, we need a |
1556 | <structname>pci_device_id</structname> table for this | 1556 | <structname>pci_device_id</structname> table for this |
1557 | chipset. It's a table of PCI vendor/device ID number, and some | 1557 | chipset. It's a table of PCI vendor/device ID number, and some |
1558 | masks. | 1558 | masks. |
1559 | </para> | 1559 | </para> |
1560 | 1560 | ||
1561 | <para> | 1561 | <para> |
1562 | For example, | 1562 | For example, |
1563 | 1563 | ||
1564 | <informalexample> | 1564 | <informalexample> |
1565 | <programlisting> | 1565 | <programlisting> |
1566 | <![CDATA[ | 1566 | <![CDATA[ |
1567 | static struct pci_device_id snd_mychip_ids[] = { | 1567 | static struct pci_device_id snd_mychip_ids[] = { |
1568 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, | 1568 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, |
1569 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, | 1569 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, |
1570 | .... | 1570 | .... |
1571 | { 0, } | 1571 | { 0, } |
1572 | }; | 1572 | }; |
1573 | MODULE_DEVICE_TABLE(pci, snd_mychip_ids); | 1573 | MODULE_DEVICE_TABLE(pci, snd_mychip_ids); |
1574 | ]]> | 1574 | ]]> |
1575 | </programlisting> | 1575 | </programlisting> |
1576 | </informalexample> | 1576 | </informalexample> |
1577 | </para> | 1577 | </para> |
1578 | 1578 | ||
1579 | <para> | 1579 | <para> |
1580 | The first and second fields of | 1580 | The first and second fields of |
1581 | <structname>pci_device_id</structname> struct are the vendor and | 1581 | <structname>pci_device_id</structname> struct are the vendor and |
1582 | device IDs. If you have nothing special to filter the matching | 1582 | device IDs. If you have nothing special to filter the matching |
1583 | devices, you can use the rest of fields like above. The last | 1583 | devices, you can use the rest of fields like above. The last |
1584 | field of <structname>pci_device_id</structname> struct is a | 1584 | field of <structname>pci_device_id</structname> struct is a |
1585 | private data for this entry. You can specify any value here, for | 1585 | private data for this entry. You can specify any value here, for |
1586 | example, to tell the type of different operations per each | 1586 | example, to tell the type of different operations per each |
1587 | device IDs. Such an example is found in intel8x0 driver. | 1587 | device IDs. Such an example is found in intel8x0 driver. |
1588 | </para> | 1588 | </para> |
1589 | 1589 | ||
1590 | <para> | 1590 | <para> |
1591 | The last entry of this list is the terminator. You must | 1591 | The last entry of this list is the terminator. You must |
1592 | specify this all-zero entry. | 1592 | specify this all-zero entry. |
1593 | </para> | 1593 | </para> |
1594 | 1594 | ||
1595 | <para> | 1595 | <para> |
1596 | Then, prepare the <structname>pci_driver</structname> record: | 1596 | Then, prepare the <structname>pci_driver</structname> record: |
1597 | 1597 | ||
1598 | <informalexample> | 1598 | <informalexample> |
1599 | <programlisting> | 1599 | <programlisting> |
1600 | <![CDATA[ | 1600 | <![CDATA[ |
1601 | static struct pci_driver driver = { | 1601 | static struct pci_driver driver = { |
1602 | .name = "My Own Chip", | 1602 | .name = "My Own Chip", |
1603 | .id_table = snd_mychip_ids, | 1603 | .id_table = snd_mychip_ids, |
1604 | .probe = snd_mychip_probe, | 1604 | .probe = snd_mychip_probe, |
1605 | .remove = __devexit_p(snd_mychip_remove), | 1605 | .remove = __devexit_p(snd_mychip_remove), |
1606 | }; | 1606 | }; |
1607 | ]]> | 1607 | ]]> |
1608 | </programlisting> | 1608 | </programlisting> |
1609 | </informalexample> | 1609 | </informalexample> |
1610 | </para> | 1610 | </para> |
1611 | 1611 | ||
1612 | <para> | 1612 | <para> |
1613 | The <structfield>probe</structfield> and | 1613 | The <structfield>probe</structfield> and |
1614 | <structfield>remove</structfield> functions are what we already | 1614 | <structfield>remove</structfield> functions are what we already |
1615 | defined in | 1615 | defined in |
1616 | the previous sections. The <structfield>remove</structfield> should | 1616 | the previous sections. The <structfield>remove</structfield> should |
1617 | be defined with | 1617 | be defined with |
1618 | <function>__devexit_p()</function> macro, so that it's not | 1618 | <function>__devexit_p()</function> macro, so that it's not |
1619 | defined for built-in (and non-hot-pluggable) case. The | 1619 | defined for built-in (and non-hot-pluggable) case. The |
1620 | <structfield>name</structfield> | 1620 | <structfield>name</structfield> |
1621 | field is the name string of this device. Note that you must not | 1621 | field is the name string of this device. Note that you must not |
1622 | use a slash <quote>/</quote> in this string. | 1622 | use a slash <quote>/</quote> in this string. |
1623 | </para> | 1623 | </para> |
1624 | 1624 | ||
1625 | <para> | 1625 | <para> |
1626 | And at last, the module entries: | 1626 | And at last, the module entries: |
1627 | 1627 | ||
1628 | <informalexample> | 1628 | <informalexample> |
1629 | <programlisting> | 1629 | <programlisting> |
1630 | <![CDATA[ | 1630 | <![CDATA[ |
1631 | static int __init alsa_card_mychip_init(void) | 1631 | static int __init alsa_card_mychip_init(void) |
1632 | { | 1632 | { |
1633 | return pci_register_driver(&driver); | 1633 | return pci_register_driver(&driver); |
1634 | } | 1634 | } |
1635 | 1635 | ||
1636 | static void __exit alsa_card_mychip_exit(void) | 1636 | static void __exit alsa_card_mychip_exit(void) |
1637 | { | 1637 | { |
1638 | pci_unregister_driver(&driver); | 1638 | pci_unregister_driver(&driver); |
1639 | } | 1639 | } |
1640 | 1640 | ||
1641 | module_init(alsa_card_mychip_init) | 1641 | module_init(alsa_card_mychip_init) |
1642 | module_exit(alsa_card_mychip_exit) | 1642 | module_exit(alsa_card_mychip_exit) |
1643 | ]]> | 1643 | ]]> |
1644 | </programlisting> | 1644 | </programlisting> |
1645 | </informalexample> | 1645 | </informalexample> |
1646 | </para> | 1646 | </para> |
1647 | 1647 | ||
1648 | <para> | 1648 | <para> |
1649 | Note that these module entries are tagged with | 1649 | Note that these module entries are tagged with |
1650 | <parameter>__init</parameter> and | 1650 | <parameter>__init</parameter> and |
1651 | <parameter>__exit</parameter> prefixes, not | 1651 | <parameter>__exit</parameter> prefixes, not |
1652 | <parameter>__devinit</parameter> nor | 1652 | <parameter>__devinit</parameter> nor |
1653 | <parameter>__devexit</parameter>. | 1653 | <parameter>__devexit</parameter>. |
1654 | </para> | 1654 | </para> |
1655 | 1655 | ||
1656 | <para> | 1656 | <para> |
1657 | Oh, one thing was forgotten. If you have no exported symbols, | 1657 | Oh, one thing was forgotten. If you have no exported symbols, |
1658 | you need to declare it on 2.2 or 2.4 kernels (on 2.6 kernels | 1658 | you need to declare it on 2.2 or 2.4 kernels (on 2.6 kernels |
1659 | it's not necessary, though). | 1659 | it's not necessary, though). |
1660 | 1660 | ||
1661 | <informalexample> | 1661 | <informalexample> |
1662 | <programlisting> | 1662 | <programlisting> |
1663 | <![CDATA[ | 1663 | <![CDATA[ |
1664 | EXPORT_NO_SYMBOLS; | 1664 | EXPORT_NO_SYMBOLS; |
1665 | ]]> | 1665 | ]]> |
1666 | </programlisting> | 1666 | </programlisting> |
1667 | </informalexample> | 1667 | </informalexample> |
1668 | 1668 | ||
1669 | That's all! | 1669 | That's all! |
1670 | </para> | 1670 | </para> |
1671 | </section> | 1671 | </section> |
1672 | </chapter> | 1672 | </chapter> |
1673 | 1673 | ||
1674 | 1674 | ||
1675 | <!-- ****************************************************** --> | 1675 | <!-- ****************************************************** --> |
1676 | <!-- PCM Interface --> | 1676 | <!-- PCM Interface --> |
1677 | <!-- ****************************************************** --> | 1677 | <!-- ****************************************************** --> |
1678 | <chapter id="pcm-interface"> | 1678 | <chapter id="pcm-interface"> |
1679 | <title>PCM Interface</title> | 1679 | <title>PCM Interface</title> |
1680 | 1680 | ||
1681 | <section id="pcm-interface-general"> | 1681 | <section id="pcm-interface-general"> |
1682 | <title>General</title> | 1682 | <title>General</title> |
1683 | <para> | 1683 | <para> |
1684 | The PCM middle layer of ALSA is quite powerful and it is only | 1684 | The PCM middle layer of ALSA is quite powerful and it is only |
1685 | necessary for each driver to implement the low-level functions | 1685 | necessary for each driver to implement the low-level functions |
1686 | to access its hardware. | 1686 | to access its hardware. |
1687 | </para> | 1687 | </para> |
1688 | 1688 | ||
1689 | <para> | 1689 | <para> |
1690 | For accessing to the PCM layer, you need to include | 1690 | For accessing to the PCM layer, you need to include |
1691 | <filename><sound/pcm.h></filename> above all. In addition, | 1691 | <filename><sound/pcm.h></filename> above all. In addition, |
1692 | <filename><sound/pcm_params.h></filename> might be needed | 1692 | <filename><sound/pcm_params.h></filename> might be needed |
1693 | if you access to some functions related with hw_param. | 1693 | if you access to some functions related with hw_param. |
1694 | </para> | 1694 | </para> |
1695 | 1695 | ||
1696 | <para> | 1696 | <para> |
1697 | Each card device can have up to four pcm instances. A pcm | 1697 | Each card device can have up to four pcm instances. A pcm |
1698 | instance corresponds to a pcm device file. The limitation of | 1698 | instance corresponds to a pcm device file. The limitation of |
1699 | number of instances comes only from the available bit size of | 1699 | number of instances comes only from the available bit size of |
1700 | the linux's device number. Once when 64bit device number is | 1700 | the linux's device number. Once when 64bit device number is |
1701 | used, we'll have more available pcm instances. | 1701 | used, we'll have more available pcm instances. |
1702 | </para> | 1702 | </para> |
1703 | 1703 | ||
1704 | <para> | 1704 | <para> |
1705 | A pcm instance consists of pcm playback and capture streams, | 1705 | A pcm instance consists of pcm playback and capture streams, |
1706 | and each pcm stream consists of one or more pcm substreams. Some | 1706 | and each pcm stream consists of one or more pcm substreams. Some |
1707 | soundcard supports the multiple-playback function. For example, | 1707 | soundcard supports the multiple-playback function. For example, |
1708 | emu10k1 has a PCM playback of 32 stereo substreams. In this case, at | 1708 | emu10k1 has a PCM playback of 32 stereo substreams. In this case, at |
1709 | each open, a free substream is (usually) automatically chosen | 1709 | each open, a free substream is (usually) automatically chosen |
1710 | and opened. Meanwhile, when only one substream exists and it was | 1710 | and opened. Meanwhile, when only one substream exists and it was |
1711 | already opened, the succeeding open will result in the blocking | 1711 | already opened, the succeeding open will result in the blocking |
1712 | or the error with <constant>EAGAIN</constant> according to the | 1712 | or the error with <constant>EAGAIN</constant> according to the |
1713 | file open mode. But you don't have to know the detail in your | 1713 | file open mode. But you don't have to know the detail in your |
1714 | driver. The PCM middle layer will take all such jobs. | 1714 | driver. The PCM middle layer will take all such jobs. |
1715 | </para> | 1715 | </para> |
1716 | </section> | 1716 | </section> |
1717 | 1717 | ||
1718 | <section id="pcm-interface-example"> | 1718 | <section id="pcm-interface-example"> |
1719 | <title>Full Code Example</title> | 1719 | <title>Full Code Example</title> |
1720 | <para> | 1720 | <para> |
1721 | The example code below does not include any hardware access | 1721 | The example code below does not include any hardware access |
1722 | routines but shows only the skeleton, how to build up the PCM | 1722 | routines but shows only the skeleton, how to build up the PCM |
1723 | interfaces. | 1723 | interfaces. |
1724 | 1724 | ||
1725 | <example> | 1725 | <example> |
1726 | <title>PCM Example Code</title> | 1726 | <title>PCM Example Code</title> |
1727 | <programlisting> | 1727 | <programlisting> |
1728 | <![CDATA[ | 1728 | <![CDATA[ |
1729 | #include <sound/pcm.h> | 1729 | #include <sound/pcm.h> |
1730 | .... | 1730 | .... |
1731 | 1731 | ||
1732 | /* hardware definition */ | 1732 | /* hardware definition */ |
1733 | static struct snd_pcm_hardware snd_mychip_playback_hw = { | 1733 | static struct snd_pcm_hardware snd_mychip_playback_hw = { |
1734 | .info = (SNDRV_PCM_INFO_MMAP | | 1734 | .info = (SNDRV_PCM_INFO_MMAP | |
1735 | SNDRV_PCM_INFO_INTERLEAVED | | 1735 | SNDRV_PCM_INFO_INTERLEAVED | |
1736 | SNDRV_PCM_INFO_BLOCK_TRANSFER | | 1736 | SNDRV_PCM_INFO_BLOCK_TRANSFER | |
1737 | SNDRV_PCM_INFO_MMAP_VALID), | 1737 | SNDRV_PCM_INFO_MMAP_VALID), |
1738 | .formats = SNDRV_PCM_FMTBIT_S16_LE, | 1738 | .formats = SNDRV_PCM_FMTBIT_S16_LE, |
1739 | .rates = SNDRV_PCM_RATE_8000_48000, | 1739 | .rates = SNDRV_PCM_RATE_8000_48000, |
1740 | .rate_min = 8000, | 1740 | .rate_min = 8000, |
1741 | .rate_max = 48000, | 1741 | .rate_max = 48000, |
1742 | .channels_min = 2, | 1742 | .channels_min = 2, |
1743 | .channels_max = 2, | 1743 | .channels_max = 2, |
1744 | .buffer_bytes_max = 32768, | 1744 | .buffer_bytes_max = 32768, |
1745 | .period_bytes_min = 4096, | 1745 | .period_bytes_min = 4096, |
1746 | .period_bytes_max = 32768, | 1746 | .period_bytes_max = 32768, |
1747 | .periods_min = 1, | 1747 | .periods_min = 1, |
1748 | .periods_max = 1024, | 1748 | .periods_max = 1024, |
1749 | }; | 1749 | }; |
1750 | 1750 | ||
1751 | /* hardware definition */ | 1751 | /* hardware definition */ |
1752 | static struct snd_pcm_hardware snd_mychip_capture_hw = { | 1752 | static struct snd_pcm_hardware snd_mychip_capture_hw = { |
1753 | .info = (SNDRV_PCM_INFO_MMAP | | 1753 | .info = (SNDRV_PCM_INFO_MMAP | |
1754 | SNDRV_PCM_INFO_INTERLEAVED | | 1754 | SNDRV_PCM_INFO_INTERLEAVED | |
1755 | SNDRV_PCM_INFO_BLOCK_TRANSFER | | 1755 | SNDRV_PCM_INFO_BLOCK_TRANSFER | |
1756 | SNDRV_PCM_INFO_MMAP_VALID), | 1756 | SNDRV_PCM_INFO_MMAP_VALID), |
1757 | .formats = SNDRV_PCM_FMTBIT_S16_LE, | 1757 | .formats = SNDRV_PCM_FMTBIT_S16_LE, |
1758 | .rates = SNDRV_PCM_RATE_8000_48000, | 1758 | .rates = SNDRV_PCM_RATE_8000_48000, |
1759 | .rate_min = 8000, | 1759 | .rate_min = 8000, |
1760 | .rate_max = 48000, | 1760 | .rate_max = 48000, |
1761 | .channels_min = 2, | 1761 | .channels_min = 2, |
1762 | .channels_max = 2, | 1762 | .channels_max = 2, |
1763 | .buffer_bytes_max = 32768, | 1763 | .buffer_bytes_max = 32768, |
1764 | .period_bytes_min = 4096, | 1764 | .period_bytes_min = 4096, |
1765 | .period_bytes_max = 32768, | 1765 | .period_bytes_max = 32768, |
1766 | .periods_min = 1, | 1766 | .periods_min = 1, |
1767 | .periods_max = 1024, | 1767 | .periods_max = 1024, |
1768 | }; | 1768 | }; |
1769 | 1769 | ||
1770 | /* open callback */ | 1770 | /* open callback */ |
1771 | static int snd_mychip_playback_open(struct snd_pcm_substream *substream) | 1771 | static int snd_mychip_playback_open(struct snd_pcm_substream *substream) |
1772 | { | 1772 | { |
1773 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1773 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1774 | struct snd_pcm_runtime *runtime = substream->runtime; | 1774 | struct snd_pcm_runtime *runtime = substream->runtime; |
1775 | 1775 | ||
1776 | runtime->hw = snd_mychip_playback_hw; | 1776 | runtime->hw = snd_mychip_playback_hw; |
1777 | // more hardware-initialization will be done here | 1777 | // more hardware-initialization will be done here |
1778 | return 0; | 1778 | return 0; |
1779 | } | 1779 | } |
1780 | 1780 | ||
1781 | /* close callback */ | 1781 | /* close callback */ |
1782 | static int snd_mychip_playback_close(struct snd_pcm_substream *substream) | 1782 | static int snd_mychip_playback_close(struct snd_pcm_substream *substream) |
1783 | { | 1783 | { |
1784 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1784 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1785 | // the hardware-specific codes will be here | 1785 | // the hardware-specific codes will be here |
1786 | return 0; | 1786 | return 0; |
1787 | 1787 | ||
1788 | } | 1788 | } |
1789 | 1789 | ||
1790 | /* open callback */ | 1790 | /* open callback */ |
1791 | static int snd_mychip_capture_open(struct snd_pcm_substream *substream) | 1791 | static int snd_mychip_capture_open(struct snd_pcm_substream *substream) |
1792 | { | 1792 | { |
1793 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1793 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1794 | struct snd_pcm_runtime *runtime = substream->runtime; | 1794 | struct snd_pcm_runtime *runtime = substream->runtime; |
1795 | 1795 | ||
1796 | runtime->hw = snd_mychip_capture_hw; | 1796 | runtime->hw = snd_mychip_capture_hw; |
1797 | // more hardware-initialization will be done here | 1797 | // more hardware-initialization will be done here |
1798 | return 0; | 1798 | return 0; |
1799 | } | 1799 | } |
1800 | 1800 | ||
1801 | /* close callback */ | 1801 | /* close callback */ |
1802 | static int snd_mychip_capture_close(struct snd_pcm_substream *substream) | 1802 | static int snd_mychip_capture_close(struct snd_pcm_substream *substream) |
1803 | { | 1803 | { |
1804 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1804 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1805 | // the hardware-specific codes will be here | 1805 | // the hardware-specific codes will be here |
1806 | return 0; | 1806 | return 0; |
1807 | 1807 | ||
1808 | } | 1808 | } |
1809 | 1809 | ||
1810 | /* hw_params callback */ | 1810 | /* hw_params callback */ |
1811 | static int snd_mychip_pcm_hw_params(struct snd_pcm_substream *substream, | 1811 | static int snd_mychip_pcm_hw_params(struct snd_pcm_substream *substream, |
1812 | struct snd_pcm_hw_params *hw_params) | 1812 | struct snd_pcm_hw_params *hw_params) |
1813 | { | 1813 | { |
1814 | return snd_pcm_lib_malloc_pages(substream, | 1814 | return snd_pcm_lib_malloc_pages(substream, |
1815 | params_buffer_bytes(hw_params)); | 1815 | params_buffer_bytes(hw_params)); |
1816 | } | 1816 | } |
1817 | 1817 | ||
1818 | /* hw_free callback */ | 1818 | /* hw_free callback */ |
1819 | static int snd_mychip_pcm_hw_free(struct snd_pcm_substream *substream) | 1819 | static int snd_mychip_pcm_hw_free(struct snd_pcm_substream *substream) |
1820 | { | 1820 | { |
1821 | return snd_pcm_lib_free_pages(substream); | 1821 | return snd_pcm_lib_free_pages(substream); |
1822 | } | 1822 | } |
1823 | 1823 | ||
1824 | /* prepare callback */ | 1824 | /* prepare callback */ |
1825 | static int snd_mychip_pcm_prepare(struct snd_pcm_substream *substream) | 1825 | static int snd_mychip_pcm_prepare(struct snd_pcm_substream *substream) |
1826 | { | 1826 | { |
1827 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1827 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1828 | struct snd_pcm_runtime *runtime = substream->runtime; | 1828 | struct snd_pcm_runtime *runtime = substream->runtime; |
1829 | 1829 | ||
1830 | /* set up the hardware with the current configuration | 1830 | /* set up the hardware with the current configuration |
1831 | * for example... | 1831 | * for example... |
1832 | */ | 1832 | */ |
1833 | mychip_set_sample_format(chip, runtime->format); | 1833 | mychip_set_sample_format(chip, runtime->format); |
1834 | mychip_set_sample_rate(chip, runtime->rate); | 1834 | mychip_set_sample_rate(chip, runtime->rate); |
1835 | mychip_set_channels(chip, runtime->channels); | 1835 | mychip_set_channels(chip, runtime->channels); |
1836 | mychip_set_dma_setup(chip, runtime->dma_addr, | 1836 | mychip_set_dma_setup(chip, runtime->dma_addr, |
1837 | chip->buffer_size, | 1837 | chip->buffer_size, |
1838 | chip->period_size); | 1838 | chip->period_size); |
1839 | return 0; | 1839 | return 0; |
1840 | } | 1840 | } |
1841 | 1841 | ||
1842 | /* trigger callback */ | 1842 | /* trigger callback */ |
1843 | static int snd_mychip_pcm_trigger(struct snd_pcm_substream *substream, | 1843 | static int snd_mychip_pcm_trigger(struct snd_pcm_substream *substream, |
1844 | int cmd) | 1844 | int cmd) |
1845 | { | 1845 | { |
1846 | switch (cmd) { | 1846 | switch (cmd) { |
1847 | case SNDRV_PCM_TRIGGER_START: | 1847 | case SNDRV_PCM_TRIGGER_START: |
1848 | // do something to start the PCM engine | 1848 | // do something to start the PCM engine |
1849 | break; | 1849 | break; |
1850 | case SNDRV_PCM_TRIGGER_STOP: | 1850 | case SNDRV_PCM_TRIGGER_STOP: |
1851 | // do something to stop the PCM engine | 1851 | // do something to stop the PCM engine |
1852 | break; | 1852 | break; |
1853 | default: | 1853 | default: |
1854 | return -EINVAL; | 1854 | return -EINVAL; |
1855 | } | 1855 | } |
1856 | } | 1856 | } |
1857 | 1857 | ||
1858 | /* pointer callback */ | 1858 | /* pointer callback */ |
1859 | static snd_pcm_uframes_t | 1859 | static snd_pcm_uframes_t |
1860 | snd_mychip_pcm_pointer(struct snd_pcm_substream *substream) | 1860 | snd_mychip_pcm_pointer(struct snd_pcm_substream *substream) |
1861 | { | 1861 | { |
1862 | struct mychip *chip = snd_pcm_substream_chip(substream); | 1862 | struct mychip *chip = snd_pcm_substream_chip(substream); |
1863 | unsigned int current_ptr; | 1863 | unsigned int current_ptr; |
1864 | 1864 | ||
1865 | /* get the current hardware pointer */ | 1865 | /* get the current hardware pointer */ |
1866 | current_ptr = mychip_get_hw_pointer(chip); | 1866 | current_ptr = mychip_get_hw_pointer(chip); |
1867 | return current_ptr; | 1867 | return current_ptr; |
1868 | } | 1868 | } |
1869 | 1869 | ||
1870 | /* operators */ | 1870 | /* operators */ |
1871 | static struct snd_pcm_ops snd_mychip_playback_ops = { | 1871 | static struct snd_pcm_ops snd_mychip_playback_ops = { |
1872 | .open = snd_mychip_playback_open, | 1872 | .open = snd_mychip_playback_open, |
1873 | .close = snd_mychip_playback_close, | 1873 | .close = snd_mychip_playback_close, |
1874 | .ioctl = snd_pcm_lib_ioctl, | 1874 | .ioctl = snd_pcm_lib_ioctl, |
1875 | .hw_params = snd_mychip_pcm_hw_params, | 1875 | .hw_params = snd_mychip_pcm_hw_params, |
1876 | .hw_free = snd_mychip_pcm_hw_free, | 1876 | .hw_free = snd_mychip_pcm_hw_free, |
1877 | .prepare = snd_mychip_pcm_prepare, | 1877 | .prepare = snd_mychip_pcm_prepare, |
1878 | .trigger = snd_mychip_pcm_trigger, | 1878 | .trigger = snd_mychip_pcm_trigger, |
1879 | .pointer = snd_mychip_pcm_pointer, | 1879 | .pointer = snd_mychip_pcm_pointer, |
1880 | }; | 1880 | }; |
1881 | 1881 | ||
1882 | /* operators */ | 1882 | /* operators */ |
1883 | static struct snd_pcm_ops snd_mychip_capture_ops = { | 1883 | static struct snd_pcm_ops snd_mychip_capture_ops = { |
1884 | .open = snd_mychip_capture_open, | 1884 | .open = snd_mychip_capture_open, |
1885 | .close = snd_mychip_capture_close, | 1885 | .close = snd_mychip_capture_close, |
1886 | .ioctl = snd_pcm_lib_ioctl, | 1886 | .ioctl = snd_pcm_lib_ioctl, |
1887 | .hw_params = snd_mychip_pcm_hw_params, | 1887 | .hw_params = snd_mychip_pcm_hw_params, |
1888 | .hw_free = snd_mychip_pcm_hw_free, | 1888 | .hw_free = snd_mychip_pcm_hw_free, |
1889 | .prepare = snd_mychip_pcm_prepare, | 1889 | .prepare = snd_mychip_pcm_prepare, |
1890 | .trigger = snd_mychip_pcm_trigger, | 1890 | .trigger = snd_mychip_pcm_trigger, |
1891 | .pointer = snd_mychip_pcm_pointer, | 1891 | .pointer = snd_mychip_pcm_pointer, |
1892 | }; | 1892 | }; |
1893 | 1893 | ||
1894 | /* | 1894 | /* |
1895 | * definitions of capture are omitted here... | 1895 | * definitions of capture are omitted here... |
1896 | */ | 1896 | */ |
1897 | 1897 | ||
1898 | /* create a pcm device */ | 1898 | /* create a pcm device */ |
1899 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) | 1899 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) |
1900 | { | 1900 | { |
1901 | struct snd_pcm *pcm; | 1901 | struct snd_pcm *pcm; |
1902 | int err; | 1902 | int err; |
1903 | 1903 | ||
1904 | if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1, | 1904 | if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1, |
1905 | &pcm)) < 0) | 1905 | &pcm)) < 0) |
1906 | return err; | 1906 | return err; |
1907 | pcm->private_data = chip; | 1907 | pcm->private_data = chip; |
1908 | strcpy(pcm->name, "My Chip"); | 1908 | strcpy(pcm->name, "My Chip"); |
1909 | chip->pcm = pcm; | 1909 | chip->pcm = pcm; |
1910 | /* set operators */ | 1910 | /* set operators */ |
1911 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, | 1911 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, |
1912 | &snd_mychip_playback_ops); | 1912 | &snd_mychip_playback_ops); |
1913 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, | 1913 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, |
1914 | &snd_mychip_capture_ops); | 1914 | &snd_mychip_capture_ops); |
1915 | /* pre-allocation of buffers */ | 1915 | /* pre-allocation of buffers */ |
1916 | /* NOTE: this may fail */ | 1916 | /* NOTE: this may fail */ |
1917 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, | 1917 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, |
1918 | snd_dma_pci_data(chip->pci), | 1918 | snd_dma_pci_data(chip->pci), |
1919 | 64*1024, 64*1024); | 1919 | 64*1024, 64*1024); |
1920 | return 0; | 1920 | return 0; |
1921 | } | 1921 | } |
1922 | ]]> | 1922 | ]]> |
1923 | </programlisting> | 1923 | </programlisting> |
1924 | </example> | 1924 | </example> |
1925 | </para> | 1925 | </para> |
1926 | </section> | 1926 | </section> |
1927 | 1927 | ||
1928 | <section id="pcm-interface-constructor"> | 1928 | <section id="pcm-interface-constructor"> |
1929 | <title>Constructor</title> | 1929 | <title>Constructor</title> |
1930 | <para> | 1930 | <para> |
1931 | A pcm instance is allocated by <function>snd_pcm_new()</function> | 1931 | A pcm instance is allocated by <function>snd_pcm_new()</function> |
1932 | function. It would be better to create a constructor for pcm, | 1932 | function. It would be better to create a constructor for pcm, |
1933 | namely, | 1933 | namely, |
1934 | 1934 | ||
1935 | <informalexample> | 1935 | <informalexample> |
1936 | <programlisting> | 1936 | <programlisting> |
1937 | <![CDATA[ | 1937 | <![CDATA[ |
1938 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) | 1938 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) |
1939 | { | 1939 | { |
1940 | struct snd_pcm *pcm; | 1940 | struct snd_pcm *pcm; |
1941 | int err; | 1941 | int err; |
1942 | 1942 | ||
1943 | if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1, | 1943 | if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1, |
1944 | &pcm)) < 0) | 1944 | &pcm)) < 0) |
1945 | return err; | 1945 | return err; |
1946 | pcm->private_data = chip; | 1946 | pcm->private_data = chip; |
1947 | strcpy(pcm->name, "My Chip"); | 1947 | strcpy(pcm->name, "My Chip"); |
1948 | chip->pcm = pcm; | 1948 | chip->pcm = pcm; |
1949 | .... | 1949 | .... |
1950 | return 0; | 1950 | return 0; |
1951 | } | 1951 | } |
1952 | ]]> | 1952 | ]]> |
1953 | </programlisting> | 1953 | </programlisting> |
1954 | </informalexample> | 1954 | </informalexample> |
1955 | </para> | 1955 | </para> |
1956 | 1956 | ||
1957 | <para> | 1957 | <para> |
1958 | The <function>snd_pcm_new()</function> function takes the four | 1958 | The <function>snd_pcm_new()</function> function takes the four |
1959 | arguments. The first argument is the card pointer to which this | 1959 | arguments. The first argument is the card pointer to which this |
1960 | pcm is assigned, and the second is the ID string. | 1960 | pcm is assigned, and the second is the ID string. |
1961 | </para> | 1961 | </para> |
1962 | 1962 | ||
1963 | <para> | 1963 | <para> |
1964 | The third argument (<parameter>index</parameter>, 0 in the | 1964 | The third argument (<parameter>index</parameter>, 0 in the |
1965 | above) is the index of this new pcm. It begins from zero. When | 1965 | above) is the index of this new pcm. It begins from zero. When |
1966 | you will create more than one pcm instances, specify the | 1966 | you will create more than one pcm instances, specify the |
1967 | different numbers in this argument. For example, | 1967 | different numbers in this argument. For example, |
1968 | <parameter>index</parameter> = 1 for the second PCM device. | 1968 | <parameter>index</parameter> = 1 for the second PCM device. |
1969 | </para> | 1969 | </para> |
1970 | 1970 | ||
1971 | <para> | 1971 | <para> |
1972 | The fourth and fifth arguments are the number of substreams | 1972 | The fourth and fifth arguments are the number of substreams |
1973 | for playback and capture, respectively. Here both 1 are given in | 1973 | for playback and capture, respectively. Here both 1 are given in |
1974 | the above example. When no playback or no capture is available, | 1974 | the above example. When no playback or no capture is available, |
1975 | pass 0 to the corresponding argument. | 1975 | pass 0 to the corresponding argument. |
1976 | </para> | 1976 | </para> |
1977 | 1977 | ||
1978 | <para> | 1978 | <para> |
1979 | If a chip supports multiple playbacks or captures, you can | 1979 | If a chip supports multiple playbacks or captures, you can |
1980 | specify more numbers, but they must be handled properly in | 1980 | specify more numbers, but they must be handled properly in |
1981 | open/close, etc. callbacks. When you need to know which | 1981 | open/close, etc. callbacks. When you need to know which |
1982 | substream you are referring to, then it can be obtained from | 1982 | substream you are referring to, then it can be obtained from |
1983 | struct <structname>snd_pcm_substream</structname> data passed to each callback | 1983 | struct <structname>snd_pcm_substream</structname> data passed to each callback |
1984 | as follows: | 1984 | as follows: |
1985 | 1985 | ||
1986 | <informalexample> | 1986 | <informalexample> |
1987 | <programlisting> | 1987 | <programlisting> |
1988 | <![CDATA[ | 1988 | <![CDATA[ |
1989 | struct snd_pcm_substream *substream; | 1989 | struct snd_pcm_substream *substream; |
1990 | int index = substream->number; | 1990 | int index = substream->number; |
1991 | ]]> | 1991 | ]]> |
1992 | </programlisting> | 1992 | </programlisting> |
1993 | </informalexample> | 1993 | </informalexample> |
1994 | </para> | 1994 | </para> |
1995 | 1995 | ||
1996 | <para> | 1996 | <para> |
1997 | After the pcm is created, you need to set operators for each | 1997 | After the pcm is created, you need to set operators for each |
1998 | pcm stream. | 1998 | pcm stream. |
1999 | 1999 | ||
2000 | <informalexample> | 2000 | <informalexample> |
2001 | <programlisting> | 2001 | <programlisting> |
2002 | <![CDATA[ | 2002 | <![CDATA[ |
2003 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, | 2003 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, |
2004 | &snd_mychip_playback_ops); | 2004 | &snd_mychip_playback_ops); |
2005 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, | 2005 | snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, |
2006 | &snd_mychip_capture_ops); | 2006 | &snd_mychip_capture_ops); |
2007 | ]]> | 2007 | ]]> |
2008 | </programlisting> | 2008 | </programlisting> |
2009 | </informalexample> | 2009 | </informalexample> |
2010 | </para> | 2010 | </para> |
2011 | 2011 | ||
2012 | <para> | 2012 | <para> |
2013 | The operators are defined typically like this: | 2013 | The operators are defined typically like this: |
2014 | 2014 | ||
2015 | <informalexample> | 2015 | <informalexample> |
2016 | <programlisting> | 2016 | <programlisting> |
2017 | <![CDATA[ | 2017 | <![CDATA[ |
2018 | static struct snd_pcm_ops snd_mychip_playback_ops = { | 2018 | static struct snd_pcm_ops snd_mychip_playback_ops = { |
2019 | .open = snd_mychip_pcm_open, | 2019 | .open = snd_mychip_pcm_open, |
2020 | .close = snd_mychip_pcm_close, | 2020 | .close = snd_mychip_pcm_close, |
2021 | .ioctl = snd_pcm_lib_ioctl, | 2021 | .ioctl = snd_pcm_lib_ioctl, |
2022 | .hw_params = snd_mychip_pcm_hw_params, | 2022 | .hw_params = snd_mychip_pcm_hw_params, |
2023 | .hw_free = snd_mychip_pcm_hw_free, | 2023 | .hw_free = snd_mychip_pcm_hw_free, |
2024 | .prepare = snd_mychip_pcm_prepare, | 2024 | .prepare = snd_mychip_pcm_prepare, |
2025 | .trigger = snd_mychip_pcm_trigger, | 2025 | .trigger = snd_mychip_pcm_trigger, |
2026 | .pointer = snd_mychip_pcm_pointer, | 2026 | .pointer = snd_mychip_pcm_pointer, |
2027 | }; | 2027 | }; |
2028 | ]]> | 2028 | ]]> |
2029 | </programlisting> | 2029 | </programlisting> |
2030 | </informalexample> | 2030 | </informalexample> |
2031 | 2031 | ||
2032 | Each of callbacks is explained in the subsection | 2032 | Each of callbacks is explained in the subsection |
2033 | <link linkend="pcm-interface-operators"><citetitle> | 2033 | <link linkend="pcm-interface-operators"><citetitle> |
2034 | Operators</citetitle></link>. | 2034 | Operators</citetitle></link>. |
2035 | </para> | 2035 | </para> |
2036 | 2036 | ||
2037 | <para> | 2037 | <para> |
2038 | After setting the operators, most likely you'd like to | 2038 | After setting the operators, most likely you'd like to |
2039 | pre-allocate the buffer. For the pre-allocation, simply call | 2039 | pre-allocate the buffer. For the pre-allocation, simply call |
2040 | the following: | 2040 | the following: |
2041 | 2041 | ||
2042 | <informalexample> | 2042 | <informalexample> |
2043 | <programlisting> | 2043 | <programlisting> |
2044 | <![CDATA[ | 2044 | <![CDATA[ |
2045 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, | 2045 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, |
2046 | snd_dma_pci_data(chip->pci), | 2046 | snd_dma_pci_data(chip->pci), |
2047 | 64*1024, 64*1024); | 2047 | 64*1024, 64*1024); |
2048 | ]]> | 2048 | ]]> |
2049 | </programlisting> | 2049 | </programlisting> |
2050 | </informalexample> | 2050 | </informalexample> |
2051 | 2051 | ||
2052 | It will allocate up to 64kB buffer as default. The details of | 2052 | It will allocate up to 64kB buffer as default. The details of |
2053 | buffer management will be described in the later section <link | 2053 | buffer management will be described in the later section <link |
2054 | linkend="buffer-and-memory"><citetitle>Buffer and Memory | 2054 | linkend="buffer-and-memory"><citetitle>Buffer and Memory |
2055 | Management</citetitle></link>. | 2055 | Management</citetitle></link>. |
2056 | </para> | 2056 | </para> |
2057 | 2057 | ||
2058 | <para> | 2058 | <para> |
2059 | Additionally, you can set some extra information for this pcm | 2059 | Additionally, you can set some extra information for this pcm |
2060 | in pcm->info_flags. | 2060 | in pcm->info_flags. |
2061 | The available values are defined as | 2061 | The available values are defined as |
2062 | <constant>SNDRV_PCM_INFO_XXX</constant> in | 2062 | <constant>SNDRV_PCM_INFO_XXX</constant> in |
2063 | <filename><sound/asound.h></filename>, which is used for | 2063 | <filename><sound/asound.h></filename>, which is used for |
2064 | the hardware definition (described later). When your soundchip | 2064 | the hardware definition (described later). When your soundchip |
2065 | supports only half-duplex, specify like this: | 2065 | supports only half-duplex, specify like this: |
2066 | 2066 | ||
2067 | <informalexample> | 2067 | <informalexample> |
2068 | <programlisting> | 2068 | <programlisting> |
2069 | <![CDATA[ | 2069 | <![CDATA[ |
2070 | pcm->info_flags = SNDRV_PCM_INFO_HALF_DUPLEX; | 2070 | pcm->info_flags = SNDRV_PCM_INFO_HALF_DUPLEX; |
2071 | ]]> | 2071 | ]]> |
2072 | </programlisting> | 2072 | </programlisting> |
2073 | </informalexample> | 2073 | </informalexample> |
2074 | </para> | 2074 | </para> |
2075 | </section> | 2075 | </section> |
2076 | 2076 | ||
2077 | <section id="pcm-interface-destructor"> | 2077 | <section id="pcm-interface-destructor"> |
2078 | <title>... And the Destructor?</title> | 2078 | <title>... And the Destructor?</title> |
2079 | <para> | 2079 | <para> |
2080 | The destructor for a pcm instance is not always | 2080 | The destructor for a pcm instance is not always |
2081 | necessary. Since the pcm device will be released by the middle | 2081 | necessary. Since the pcm device will be released by the middle |
2082 | layer code automatically, you don't have to call destructor | 2082 | layer code automatically, you don't have to call destructor |
2083 | explicitly. | 2083 | explicitly. |
2084 | </para> | 2084 | </para> |
2085 | 2085 | ||
2086 | <para> | 2086 | <para> |
2087 | The destructor would be necessary when you created some | 2087 | The destructor would be necessary when you created some |
2088 | special records internally and need to release them. In such a | 2088 | special records internally and need to release them. In such a |
2089 | case, set the destructor function to | 2089 | case, set the destructor function to |
2090 | pcm->private_free: | 2090 | pcm->private_free: |
2091 | 2091 | ||
2092 | <example> | 2092 | <example> |
2093 | <title>PCM Instance with a Destructor</title> | 2093 | <title>PCM Instance with a Destructor</title> |
2094 | <programlisting> | 2094 | <programlisting> |
2095 | <![CDATA[ | 2095 | <![CDATA[ |
2096 | static void mychip_pcm_free(struct snd_pcm *pcm) | 2096 | static void mychip_pcm_free(struct snd_pcm *pcm) |
2097 | { | 2097 | { |
2098 | struct mychip *chip = snd_pcm_chip(pcm); | 2098 | struct mychip *chip = snd_pcm_chip(pcm); |
2099 | /* free your own data */ | 2099 | /* free your own data */ |
2100 | kfree(chip->my_private_pcm_data); | 2100 | kfree(chip->my_private_pcm_data); |
2101 | // do what you like else | 2101 | // do what you like else |
2102 | .... | 2102 | .... |
2103 | } | 2103 | } |
2104 | 2104 | ||
2105 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) | 2105 | static int __devinit snd_mychip_new_pcm(struct mychip *chip) |
2106 | { | 2106 | { |
2107 | struct snd_pcm *pcm; | 2107 | struct snd_pcm *pcm; |
2108 | .... | 2108 | .... |
2109 | /* allocate your own data */ | 2109 | /* allocate your own data */ |
2110 | chip->my_private_pcm_data = kmalloc(...); | 2110 | chip->my_private_pcm_data = kmalloc(...); |
2111 | /* set the destructor */ | 2111 | /* set the destructor */ |
2112 | pcm->private_data = chip; | 2112 | pcm->private_data = chip; |
2113 | pcm->private_free = mychip_pcm_free; | 2113 | pcm->private_free = mychip_pcm_free; |
2114 | .... | 2114 | .... |
2115 | } | 2115 | } |
2116 | ]]> | 2116 | ]]> |
2117 | </programlisting> | 2117 | </programlisting> |
2118 | </example> | 2118 | </example> |
2119 | </para> | 2119 | </para> |
2120 | </section> | 2120 | </section> |
2121 | 2121 | ||
2122 | <section id="pcm-interface-runtime"> | 2122 | <section id="pcm-interface-runtime"> |
2123 | <title>Runtime Pointer - The Chest of PCM Information</title> | 2123 | <title>Runtime Pointer - The Chest of PCM Information</title> |
2124 | <para> | 2124 | <para> |
2125 | When the PCM substream is opened, a PCM runtime instance is | 2125 | When the PCM substream is opened, a PCM runtime instance is |
2126 | allocated and assigned to the substream. This pointer is | 2126 | allocated and assigned to the substream. This pointer is |
2127 | accessible via <constant>substream->runtime</constant>. | 2127 | accessible via <constant>substream->runtime</constant>. |
2128 | This runtime pointer holds the various information; it holds | 2128 | This runtime pointer holds the various information; it holds |
2129 | the copy of hw_params and sw_params configurations, the buffer | 2129 | the copy of hw_params and sw_params configurations, the buffer |
2130 | pointers, mmap records, spinlocks, etc. Almost everyhing you | 2130 | pointers, mmap records, spinlocks, etc. Almost everyhing you |
2131 | need for controlling the PCM can be found there. | 2131 | need for controlling the PCM can be found there. |
2132 | </para> | 2132 | </para> |
2133 | 2133 | ||
2134 | <para> | 2134 | <para> |
2135 | The definition of runtime instance is found in | 2135 | The definition of runtime instance is found in |
2136 | <filename><sound/pcm.h></filename>. Here is the | 2136 | <filename><sound/pcm.h></filename>. Here is the |
2137 | copy from the file. | 2137 | copy from the file. |
2138 | <informalexample> | 2138 | <informalexample> |
2139 | <programlisting> | 2139 | <programlisting> |
2140 | <![CDATA[ | 2140 | <![CDATA[ |
2141 | struct _snd_pcm_runtime { | 2141 | struct _snd_pcm_runtime { |
2142 | /* -- Status -- */ | 2142 | /* -- Status -- */ |
2143 | struct snd_pcm_substream *trigger_master; | 2143 | struct snd_pcm_substream *trigger_master; |
2144 | snd_timestamp_t trigger_tstamp; /* trigger timestamp */ | 2144 | snd_timestamp_t trigger_tstamp; /* trigger timestamp */ |
2145 | int overrange; | 2145 | int overrange; |
2146 | snd_pcm_uframes_t avail_max; | 2146 | snd_pcm_uframes_t avail_max; |
2147 | snd_pcm_uframes_t hw_ptr_base; /* Position at buffer restart */ | 2147 | snd_pcm_uframes_t hw_ptr_base; /* Position at buffer restart */ |
2148 | snd_pcm_uframes_t hw_ptr_interrupt; /* Position at interrupt time*/ | 2148 | snd_pcm_uframes_t hw_ptr_interrupt; /* Position at interrupt time*/ |
2149 | 2149 | ||
2150 | /* -- HW params -- */ | 2150 | /* -- HW params -- */ |
2151 | snd_pcm_access_t access; /* access mode */ | 2151 | snd_pcm_access_t access; /* access mode */ |
2152 | snd_pcm_format_t format; /* SNDRV_PCM_FORMAT_* */ | 2152 | snd_pcm_format_t format; /* SNDRV_PCM_FORMAT_* */ |
2153 | snd_pcm_subformat_t subformat; /* subformat */ | 2153 | snd_pcm_subformat_t subformat; /* subformat */ |
2154 | unsigned int rate; /* rate in Hz */ | 2154 | unsigned int rate; /* rate in Hz */ |
2155 | unsigned int channels; /* channels */ | 2155 | unsigned int channels; /* channels */ |
2156 | snd_pcm_uframes_t period_size; /* period size */ | 2156 | snd_pcm_uframes_t period_size; /* period size */ |
2157 | unsigned int periods; /* periods */ | 2157 | unsigned int periods; /* periods */ |
2158 | snd_pcm_uframes_t buffer_size; /* buffer size */ | 2158 | snd_pcm_uframes_t buffer_size; /* buffer size */ |
2159 | unsigned int tick_time; /* tick time */ | 2159 | unsigned int tick_time; /* tick time */ |
2160 | snd_pcm_uframes_t min_align; /* Min alignment for the format */ | 2160 | snd_pcm_uframes_t min_align; /* Min alignment for the format */ |
2161 | size_t byte_align; | 2161 | size_t byte_align; |
2162 | unsigned int frame_bits; | 2162 | unsigned int frame_bits; |
2163 | unsigned int sample_bits; | 2163 | unsigned int sample_bits; |
2164 | unsigned int info; | 2164 | unsigned int info; |
2165 | unsigned int rate_num; | 2165 | unsigned int rate_num; |
2166 | unsigned int rate_den; | 2166 | unsigned int rate_den; |
2167 | 2167 | ||
2168 | /* -- SW params -- */ | 2168 | /* -- SW params -- */ |
2169 | struct timespec tstamp_mode; /* mmap timestamp is updated */ | 2169 | struct timespec tstamp_mode; /* mmap timestamp is updated */ |
2170 | unsigned int period_step; | 2170 | unsigned int period_step; |
2171 | unsigned int sleep_min; /* min ticks to sleep */ | 2171 | unsigned int sleep_min; /* min ticks to sleep */ |
2172 | snd_pcm_uframes_t xfer_align; /* xfer size need to be a multiple */ | 2172 | snd_pcm_uframes_t xfer_align; /* xfer size need to be a multiple */ |
2173 | snd_pcm_uframes_t start_threshold; | 2173 | snd_pcm_uframes_t start_threshold; |
2174 | snd_pcm_uframes_t stop_threshold; | 2174 | snd_pcm_uframes_t stop_threshold; |
2175 | snd_pcm_uframes_t silence_threshold; /* Silence filling happens when | 2175 | snd_pcm_uframes_t silence_threshold; /* Silence filling happens when |
2176 | noise is nearest than this */ | 2176 | noise is nearest than this */ |
2177 | snd_pcm_uframes_t silence_size; /* Silence filling size */ | 2177 | snd_pcm_uframes_t silence_size; /* Silence filling size */ |
2178 | snd_pcm_uframes_t boundary; /* pointers wrap point */ | 2178 | snd_pcm_uframes_t boundary; /* pointers wrap point */ |
2179 | 2179 | ||
2180 | snd_pcm_uframes_t silenced_start; | 2180 | snd_pcm_uframes_t silenced_start; |
2181 | snd_pcm_uframes_t silenced_size; | 2181 | snd_pcm_uframes_t silenced_size; |
2182 | 2182 | ||
2183 | snd_pcm_sync_id_t sync; /* hardware synchronization ID */ | 2183 | snd_pcm_sync_id_t sync; /* hardware synchronization ID */ |
2184 | 2184 | ||
2185 | /* -- mmap -- */ | 2185 | /* -- mmap -- */ |
2186 | volatile struct snd_pcm_mmap_status *status; | 2186 | volatile struct snd_pcm_mmap_status *status; |
2187 | volatile struct snd_pcm_mmap_control *control; | 2187 | volatile struct snd_pcm_mmap_control *control; |
2188 | atomic_t mmap_count; | 2188 | atomic_t mmap_count; |
2189 | 2189 | ||
2190 | /* -- locking / scheduling -- */ | 2190 | /* -- locking / scheduling -- */ |
2191 | spinlock_t lock; | 2191 | spinlock_t lock; |
2192 | wait_queue_head_t sleep; | 2192 | wait_queue_head_t sleep; |
2193 | struct timer_list tick_timer; | 2193 | struct timer_list tick_timer; |
2194 | struct fasync_struct *fasync; | 2194 | struct fasync_struct *fasync; |
2195 | 2195 | ||
2196 | /* -- private section -- */ | 2196 | /* -- private section -- */ |
2197 | void *private_data; | 2197 | void *private_data; |
2198 | void (*private_free)(struct snd_pcm_runtime *runtime); | 2198 | void (*private_free)(struct snd_pcm_runtime *runtime); |
2199 | 2199 | ||
2200 | /* -- hardware description -- */ | 2200 | /* -- hardware description -- */ |
2201 | struct snd_pcm_hardware hw; | 2201 | struct snd_pcm_hardware hw; |
2202 | struct snd_pcm_hw_constraints hw_constraints; | 2202 | struct snd_pcm_hw_constraints hw_constraints; |
2203 | 2203 | ||
2204 | /* -- interrupt callbacks -- */ | 2204 | /* -- interrupt callbacks -- */ |
2205 | void (*transfer_ack_begin)(struct snd_pcm_substream *substream); | 2205 | void (*transfer_ack_begin)(struct snd_pcm_substream *substream); |
2206 | void (*transfer_ack_end)(struct snd_pcm_substream *substream); | 2206 | void (*transfer_ack_end)(struct snd_pcm_substream *substream); |
2207 | 2207 | ||
2208 | /* -- timer -- */ | 2208 | /* -- timer -- */ |
2209 | unsigned int timer_resolution; /* timer resolution */ | 2209 | unsigned int timer_resolution; /* timer resolution */ |
2210 | 2210 | ||
2211 | /* -- DMA -- */ | 2211 | /* -- DMA -- */ |
2212 | unsigned char *dma_area; /* DMA area */ | 2212 | unsigned char *dma_area; /* DMA area */ |
2213 | dma_addr_t dma_addr; /* physical bus address (not accessible from main CPU) */ | 2213 | dma_addr_t dma_addr; /* physical bus address (not accessible from main CPU) */ |
2214 | size_t dma_bytes; /* size of DMA area */ | 2214 | size_t dma_bytes; /* size of DMA area */ |
2215 | 2215 | ||
2216 | struct snd_dma_buffer *dma_buffer_p; /* allocated buffer */ | 2216 | struct snd_dma_buffer *dma_buffer_p; /* allocated buffer */ |
2217 | 2217 | ||
2218 | #if defined(CONFIG_SND_PCM_OSS) || defined(CONFIG_SND_PCM_OSS_MODULE) | 2218 | #if defined(CONFIG_SND_PCM_OSS) || defined(CONFIG_SND_PCM_OSS_MODULE) |
2219 | /* -- OSS things -- */ | 2219 | /* -- OSS things -- */ |
2220 | struct snd_pcm_oss_runtime oss; | 2220 | struct snd_pcm_oss_runtime oss; |
2221 | #endif | 2221 | #endif |
2222 | }; | 2222 | }; |
2223 | ]]> | 2223 | ]]> |
2224 | </programlisting> | 2224 | </programlisting> |
2225 | </informalexample> | 2225 | </informalexample> |
2226 | </para> | 2226 | </para> |
2227 | 2227 | ||
2228 | <para> | 2228 | <para> |
2229 | For the operators (callbacks) of each sound driver, most of | 2229 | For the operators (callbacks) of each sound driver, most of |
2230 | these records are supposed to be read-only. Only the PCM | 2230 | these records are supposed to be read-only. Only the PCM |
2231 | middle-layer changes / updates these info. The exceptions are | 2231 | middle-layer changes / updates these info. The exceptions are |
2232 | the hardware description (hw), interrupt callbacks | 2232 | the hardware description (hw), interrupt callbacks |
2233 | (transfer_ack_xxx), DMA buffer information, and the private | 2233 | (transfer_ack_xxx), DMA buffer information, and the private |
2234 | data. Besides, if you use the standard buffer allocation | 2234 | data. Besides, if you use the standard buffer allocation |
2235 | method via <function>snd_pcm_lib_malloc_pages()</function>, | 2235 | method via <function>snd_pcm_lib_malloc_pages()</function>, |
2236 | you don't need to set the DMA buffer information by yourself. | 2236 | you don't need to set the DMA buffer information by yourself. |
2237 | </para> | 2237 | </para> |
2238 | 2238 | ||
2239 | <para> | 2239 | <para> |
2240 | In the sections below, important records are explained. | 2240 | In the sections below, important records are explained. |
2241 | </para> | 2241 | </para> |
2242 | 2242 | ||
2243 | <section id="pcm-interface-runtime-hw"> | 2243 | <section id="pcm-interface-runtime-hw"> |
2244 | <title>Hardware Description</title> | 2244 | <title>Hardware Description</title> |
2245 | <para> | 2245 | <para> |
2246 | The hardware descriptor (struct <structname>snd_pcm_hardware</structname>) | 2246 | The hardware descriptor (struct <structname>snd_pcm_hardware</structname>) |
2247 | contains the definitions of the fundamental hardware | 2247 | contains the definitions of the fundamental hardware |
2248 | configuration. Above all, you'll need to define this in | 2248 | configuration. Above all, you'll need to define this in |
2249 | <link linkend="pcm-interface-operators-open-callback"><citetitle> | 2249 | <link linkend="pcm-interface-operators-open-callback"><citetitle> |
2250 | the open callback</citetitle></link>. | 2250 | the open callback</citetitle></link>. |
2251 | Note that the runtime instance holds the copy of the | 2251 | Note that the runtime instance holds the copy of the |
2252 | descriptor, not the pointer to the existing descriptor. That | 2252 | descriptor, not the pointer to the existing descriptor. That |
2253 | is, in the open callback, you can modify the copied descriptor | 2253 | is, in the open callback, you can modify the copied descriptor |
2254 | (<constant>runtime->hw</constant>) as you need. For example, if the maximum | 2254 | (<constant>runtime->hw</constant>) as you need. For example, if the maximum |
2255 | number of channels is 1 only on some chip models, you can | 2255 | number of channels is 1 only on some chip models, you can |
2256 | still use the same hardware descriptor and change the | 2256 | still use the same hardware descriptor and change the |
2257 | channels_max later: | 2257 | channels_max later: |
2258 | <informalexample> | 2258 | <informalexample> |
2259 | <programlisting> | 2259 | <programlisting> |
2260 | <![CDATA[ | 2260 | <![CDATA[ |
2261 | struct snd_pcm_runtime *runtime = substream->runtime; | 2261 | struct snd_pcm_runtime *runtime = substream->runtime; |
2262 | ... | 2262 | ... |
2263 | runtime->hw = snd_mychip_playback_hw; /* common definition */ | 2263 | runtime->hw = snd_mychip_playback_hw; /* common definition */ |
2264 | if (chip->model == VERY_OLD_ONE) | 2264 | if (chip->model == VERY_OLD_ONE) |
2265 | runtime->hw.channels_max = 1; | 2265 | runtime->hw.channels_max = 1; |
2266 | ]]> | 2266 | ]]> |
2267 | </programlisting> | 2267 | </programlisting> |
2268 | </informalexample> | 2268 | </informalexample> |
2269 | </para> | 2269 | </para> |
2270 | 2270 | ||
2271 | <para> | 2271 | <para> |
2272 | Typically, you'll have a hardware descriptor like below: | 2272 | Typically, you'll have a hardware descriptor like below: |
2273 | <informalexample> | 2273 | <informalexample> |
2274 | <programlisting> | 2274 | <programlisting> |
2275 | <![CDATA[ | 2275 | <![CDATA[ |
2276 | static struct snd_pcm_hardware snd_mychip_playback_hw = { | 2276 | static struct snd_pcm_hardware snd_mychip_playback_hw = { |
2277 | .info = (SNDRV_PCM_INFO_MMAP | | 2277 | .info = (SNDRV_PCM_INFO_MMAP | |
2278 | SNDRV_PCM_INFO_INTERLEAVED | | 2278 | SNDRV_PCM_INFO_INTERLEAVED | |
2279 | SNDRV_PCM_INFO_BLOCK_TRANSFER | | 2279 | SNDRV_PCM_INFO_BLOCK_TRANSFER | |
2280 | SNDRV_PCM_INFO_MMAP_VALID), | 2280 | SNDRV_PCM_INFO_MMAP_VALID), |
2281 | .formats = SNDRV_PCM_FMTBIT_S16_LE, | 2281 | .formats = SNDRV_PCM_FMTBIT_S16_LE, |
2282 | .rates = SNDRV_PCM_RATE_8000_48000, | 2282 | .rates = SNDRV_PCM_RATE_8000_48000, |
2283 | .rate_min = 8000, | 2283 | .rate_min = 8000, |
2284 | .rate_max = 48000, | 2284 | .rate_max = 48000, |
2285 | .channels_min = 2, | 2285 | .channels_min = 2, |
2286 | .channels_max = 2, | 2286 | .channels_max = 2, |
2287 | .buffer_bytes_max = 32768, | 2287 | .buffer_bytes_max = 32768, |
2288 | .period_bytes_min = 4096, | 2288 | .period_bytes_min = 4096, |
2289 | .period_bytes_max = 32768, | 2289 | .period_bytes_max = 32768, |
2290 | .periods_min = 1, | 2290 | .periods_min = 1, |
2291 | .periods_max = 1024, | 2291 | .periods_max = 1024, |
2292 | }; | 2292 | }; |
2293 | ]]> | 2293 | ]]> |
2294 | </programlisting> | 2294 | </programlisting> |
2295 | </informalexample> | 2295 | </informalexample> |
2296 | </para> | 2296 | </para> |
2297 | 2297 | ||
2298 | <para> | 2298 | <para> |
2299 | <itemizedlist> | 2299 | <itemizedlist> |
2300 | <listitem><para> | 2300 | <listitem><para> |
2301 | The <structfield>info</structfield> field contains the type and | 2301 | The <structfield>info</structfield> field contains the type and |
2302 | capabilities of this pcm. The bit flags are defined in | 2302 | capabilities of this pcm. The bit flags are defined in |
2303 | <filename><sound/asound.h></filename> as | 2303 | <filename><sound/asound.h></filename> as |
2304 | <constant>SNDRV_PCM_INFO_XXX</constant>. Here, at least, you | 2304 | <constant>SNDRV_PCM_INFO_XXX</constant>. Here, at least, you |
2305 | have to specify whether the mmap is supported and which | 2305 | have to specify whether the mmap is supported and which |
2306 | interleaved format is supported. | 2306 | interleaved format is supported. |
2307 | When the mmap is supported, add | 2307 | When the mmap is supported, add |
2308 | <constant>SNDRV_PCM_INFO_MMAP</constant> flag here. When the | 2308 | <constant>SNDRV_PCM_INFO_MMAP</constant> flag here. When the |
2309 | hardware supports the interleaved or the non-interleaved | 2309 | hardware supports the interleaved or the non-interleaved |
2310 | format, <constant>SNDRV_PCM_INFO_INTERLEAVED</constant> or | 2310 | format, <constant>SNDRV_PCM_INFO_INTERLEAVED</constant> or |
2311 | <constant>SNDRV_PCM_INFO_NONINTERLEAVED</constant> flag must | 2311 | <constant>SNDRV_PCM_INFO_NONINTERLEAVED</constant> flag must |
2312 | be set, respectively. If both are supported, you can set both, | 2312 | be set, respectively. If both are supported, you can set both, |
2313 | too. | 2313 | too. |
2314 | </para> | 2314 | </para> |
2315 | 2315 | ||
2316 | <para> | 2316 | <para> |
2317 | In the above example, <constant>MMAP_VALID</constant> and | 2317 | In the above example, <constant>MMAP_VALID</constant> and |
2318 | <constant>BLOCK_TRANSFER</constant> are specified for OSS mmap | 2318 | <constant>BLOCK_TRANSFER</constant> are specified for OSS mmap |
2319 | mode. Usually both are set. Of course, | 2319 | mode. Usually both are set. Of course, |
2320 | <constant>MMAP_VALID</constant> is set only if the mmap is | 2320 | <constant>MMAP_VALID</constant> is set only if the mmap is |
2321 | really supported. | 2321 | really supported. |
2322 | </para> | 2322 | </para> |
2323 | 2323 | ||
2324 | <para> | 2324 | <para> |
2325 | The other possible flags are | 2325 | The other possible flags are |
2326 | <constant>SNDRV_PCM_INFO_PAUSE</constant> and | 2326 | <constant>SNDRV_PCM_INFO_PAUSE</constant> and |
2327 | <constant>SNDRV_PCM_INFO_RESUME</constant>. The | 2327 | <constant>SNDRV_PCM_INFO_RESUME</constant>. The |
2328 | <constant>PAUSE</constant> bit means that the pcm supports the | 2328 | <constant>PAUSE</constant> bit means that the pcm supports the |
2329 | <quote>pause</quote> operation, while the | 2329 | <quote>pause</quote> operation, while the |
2330 | <constant>RESUME</constant> bit means that the pcm supports | 2330 | <constant>RESUME</constant> bit means that the pcm supports |
2331 | the full <quote>suspend/resume</quote> operation. | 2331 | the full <quote>suspend/resume</quote> operation. |
2332 | If <constant>PAUSE</constant> flag is set, | 2332 | If <constant>PAUSE</constant> flag is set, |
2333 | the <structfield>trigger</structfield> callback below | 2333 | the <structfield>trigger</structfield> callback below |
2334 | must handle the corresponding (pause push/release) commands. | 2334 | must handle the corresponding (pause push/release) commands. |
2335 | The suspend/resume trigger commands can be defined even without | 2335 | The suspend/resume trigger commands can be defined even without |
2336 | <constant>RESUME</constant> flag. See <link | 2336 | <constant>RESUME</constant> flag. See <link |
2337 | linkend="power-management"><citetitle> | 2337 | linkend="power-management"><citetitle> |
2338 | Power Management</citetitle></link> section for details. | 2338 | Power Management</citetitle></link> section for details. |
2339 | </para> | 2339 | </para> |
2340 | 2340 | ||
2341 | <para> | 2341 | <para> |
2342 | When the PCM substreams can be synchronized (typically, | 2342 | When the PCM substreams can be synchronized (typically, |
2343 | synchorinized start/stop of a playback and a capture streams), | 2343 | synchorinized start/stop of a playback and a capture streams), |
2344 | you can give <constant>SNDRV_PCM_INFO_SYNC_START</constant>, | 2344 | you can give <constant>SNDRV_PCM_INFO_SYNC_START</constant>, |
2345 | too. In this case, you'll need to check the linked-list of | 2345 | too. In this case, you'll need to check the linked-list of |
2346 | PCM substreams in the trigger callback. This will be | 2346 | PCM substreams in the trigger callback. This will be |
2347 | described in the later section. | 2347 | described in the later section. |
2348 | </para> | 2348 | </para> |
2349 | </listitem> | 2349 | </listitem> |
2350 | 2350 | ||
2351 | <listitem> | 2351 | <listitem> |
2352 | <para> | 2352 | <para> |
2353 | <structfield>formats</structfield> field contains the bit-flags | 2353 | <structfield>formats</structfield> field contains the bit-flags |
2354 | of supported formats (<constant>SNDRV_PCM_FMTBIT_XXX</constant>). | 2354 | of supported formats (<constant>SNDRV_PCM_FMTBIT_XXX</constant>). |
2355 | If the hardware supports more than one format, give all or'ed | 2355 | If the hardware supports more than one format, give all or'ed |
2356 | bits. In the example above, the signed 16bit little-endian | 2356 | bits. In the example above, the signed 16bit little-endian |
2357 | format is specified. | 2357 | format is specified. |
2358 | </para> | 2358 | </para> |
2359 | </listitem> | 2359 | </listitem> |
2360 | 2360 | ||
2361 | <listitem> | 2361 | <listitem> |
2362 | <para> | 2362 | <para> |
2363 | <structfield>rates</structfield> field contains the bit-flags of | 2363 | <structfield>rates</structfield> field contains the bit-flags of |
2364 | supported rates (<constant>SNDRV_PCM_RATE_XXX</constant>). | 2364 | supported rates (<constant>SNDRV_PCM_RATE_XXX</constant>). |
2365 | When the chip supports continuous rates, pass | 2365 | When the chip supports continuous rates, pass |
2366 | <constant>CONTINUOUS</constant> bit additionally. | 2366 | <constant>CONTINUOUS</constant> bit additionally. |
2367 | The pre-defined rate bits are provided only for typical | 2367 | The pre-defined rate bits are provided only for typical |
2368 | rates. If your chip supports unconventional rates, you need to add | 2368 | rates. If your chip supports unconventional rates, you need to add |
2369 | <constant>KNOT</constant> bit and set up the hardware | 2369 | <constant>KNOT</constant> bit and set up the hardware |
2370 | constraint manually (explained later). | 2370 | constraint manually (explained later). |
2371 | </para> | 2371 | </para> |
2372 | </listitem> | 2372 | </listitem> |
2373 | 2373 | ||
2374 | <listitem> | 2374 | <listitem> |
2375 | <para> | 2375 | <para> |
2376 | <structfield>rate_min</structfield> and | 2376 | <structfield>rate_min</structfield> and |
2377 | <structfield>rate_max</structfield> define the minimal and | 2377 | <structfield>rate_max</structfield> define the minimal and |
2378 | maximal sample rate. This should correspond somehow to | 2378 | maximal sample rate. This should correspond somehow to |
2379 | <structfield>rates</structfield> bits. | 2379 | <structfield>rates</structfield> bits. |
2380 | </para> | 2380 | </para> |
2381 | </listitem> | 2381 | </listitem> |
2382 | 2382 | ||
2383 | <listitem> | 2383 | <listitem> |
2384 | <para> | 2384 | <para> |
2385 | <structfield>channel_min</structfield> and | 2385 | <structfield>channel_min</structfield> and |
2386 | <structfield>channel_max</structfield> | 2386 | <structfield>channel_max</structfield> |
2387 | define, as you might already expected, the minimal and maximal | 2387 | define, as you might already expected, the minimal and maximal |
2388 | number of channels. | 2388 | number of channels. |
2389 | </para> | 2389 | </para> |
2390 | </listitem> | 2390 | </listitem> |
2391 | 2391 | ||
2392 | <listitem> | 2392 | <listitem> |
2393 | <para> | 2393 | <para> |
2394 | <structfield>buffer_bytes_max</structfield> defines the | 2394 | <structfield>buffer_bytes_max</structfield> defines the |
2395 | maximal buffer size in bytes. There is no | 2395 | maximal buffer size in bytes. There is no |
2396 | <structfield>buffer_bytes_min</structfield> field, since | 2396 | <structfield>buffer_bytes_min</structfield> field, since |
2397 | it can be calculated from the minimal period size and the | 2397 | it can be calculated from the minimal period size and the |
2398 | minimal number of periods. | 2398 | minimal number of periods. |
2399 | Meanwhile, <structfield>period_bytes_min</structfield> and | 2399 | Meanwhile, <structfield>period_bytes_min</structfield> and |
2400 | define the minimal and maximal size of the period in bytes. | 2400 | define the minimal and maximal size of the period in bytes. |
2401 | <structfield>periods_max</structfield> and | 2401 | <structfield>periods_max</structfield> and |
2402 | <structfield>periods_min</structfield> define the maximal and | 2402 | <structfield>periods_min</structfield> define the maximal and |
2403 | minimal number of periods in the buffer. | 2403 | minimal number of periods in the buffer. |
2404 | </para> | 2404 | </para> |
2405 | 2405 | ||
2406 | <para> | 2406 | <para> |
2407 | The <quote>period</quote> is a term, that corresponds to | 2407 | The <quote>period</quote> is a term, that corresponds to |
2408 | fragment in the OSS world. The period defines the size at | 2408 | fragment in the OSS world. The period defines the size at |
2409 | which the PCM interrupt is generated. This size strongly | 2409 | which the PCM interrupt is generated. This size strongly |
2410 | depends on the hardware. | 2410 | depends on the hardware. |
2411 | Generally, the smaller period size will give you more | 2411 | Generally, the smaller period size will give you more |
2412 | interrupts, that is, more controls. | 2412 | interrupts, that is, more controls. |
2413 | In the case of capture, this size defines the input latency. | 2413 | In the case of capture, this size defines the input latency. |
2414 | On the other hand, the whole buffer size defines the | 2414 | On the other hand, the whole buffer size defines the |
2415 | output latency for the playback direction. | 2415 | output latency for the playback direction. |
2416 | </para> | 2416 | </para> |
2417 | </listitem> | 2417 | </listitem> |
2418 | 2418 | ||
2419 | <listitem> | 2419 | <listitem> |
2420 | <para> | 2420 | <para> |
2421 | There is also a field <structfield>fifo_size</structfield>. | 2421 | There is also a field <structfield>fifo_size</structfield>. |
2422 | This specifies the size of the hardware FIFO, but it's not | 2422 | This specifies the size of the hardware FIFO, but it's not |
2423 | used currently in the driver nor in the alsa-lib. So, you | 2423 | used currently in the driver nor in the alsa-lib. So, you |
2424 | can ignore this field. | 2424 | can ignore this field. |
2425 | </para> | 2425 | </para> |
2426 | </listitem> | 2426 | </listitem> |
2427 | </itemizedlist> | 2427 | </itemizedlist> |
2428 | </para> | 2428 | </para> |
2429 | </section> | 2429 | </section> |
2430 | 2430 | ||
2431 | <section id="pcm-interface-runtime-config"> | 2431 | <section id="pcm-interface-runtime-config"> |
2432 | <title>PCM Configurations</title> | 2432 | <title>PCM Configurations</title> |
2433 | <para> | 2433 | <para> |
2434 | Ok, let's go back again to the PCM runtime records. | 2434 | Ok, let's go back again to the PCM runtime records. |
2435 | The most frequently referred records in the runtime instance are | 2435 | The most frequently referred records in the runtime instance are |
2436 | the PCM configurations. | 2436 | the PCM configurations. |
2437 | The PCM configurations are stored on runtime instance | 2437 | The PCM configurations are stored on runtime instance |
2438 | after the application sends <type>hw_params</type> data via | 2438 | after the application sends <type>hw_params</type> data via |
2439 | alsa-lib. There are many fields copied from hw_params and | 2439 | alsa-lib. There are many fields copied from hw_params and |
2440 | sw_params structs. For example, | 2440 | sw_params structs. For example, |
2441 | <structfield>format</structfield> holds the format type | 2441 | <structfield>format</structfield> holds the format type |
2442 | chosen by the application. This field contains the enum value | 2442 | chosen by the application. This field contains the enum value |
2443 | <constant>SNDRV_PCM_FORMAT_XXX</constant>. | 2443 | <constant>SNDRV_PCM_FORMAT_XXX</constant>. |
2444 | </para> | 2444 | </para> |
2445 | 2445 | ||
2446 | <para> | 2446 | <para> |
2447 | One thing to be noted is that the configured buffer and period | 2447 | One thing to be noted is that the configured buffer and period |
2448 | sizes are stored in <quote>frames</quote> in the runtime | 2448 | sizes are stored in <quote>frames</quote> in the runtime |
2449 | In the ALSA world, 1 frame = channels * samples-size. | 2449 | In the ALSA world, 1 frame = channels * samples-size. |
2450 | For conversion between frames and bytes, you can use the | 2450 | For conversion between frames and bytes, you can use the |
2451 | helper functions, <function>frames_to_bytes()</function> and | 2451 | helper functions, <function>frames_to_bytes()</function> and |
2452 | <function>bytes_to_frames()</function>. | 2452 | <function>bytes_to_frames()</function>. |
2453 | <informalexample> | 2453 | <informalexample> |
2454 | <programlisting> | 2454 | <programlisting> |
2455 | <![CDATA[ | 2455 | <![CDATA[ |
2456 | period_bytes = frames_to_bytes(runtime, runtime->period_size); | 2456 | period_bytes = frames_to_bytes(runtime, runtime->period_size); |
2457 | ]]> | 2457 | ]]> |
2458 | </programlisting> | 2458 | </programlisting> |
2459 | </informalexample> | 2459 | </informalexample> |
2460 | </para> | 2460 | </para> |
2461 | 2461 | ||
2462 | <para> | 2462 | <para> |
2463 | Also, many software parameters (sw_params) are | 2463 | Also, many software parameters (sw_params) are |
2464 | stored in frames, too. Please check the type of the field. | 2464 | stored in frames, too. Please check the type of the field. |
2465 | <type>snd_pcm_uframes_t</type> is for the frames as unsigned | 2465 | <type>snd_pcm_uframes_t</type> is for the frames as unsigned |
2466 | integer while <type>snd_pcm_sframes_t</type> is for the frames | 2466 | integer while <type>snd_pcm_sframes_t</type> is for the frames |
2467 | as signed integer. | 2467 | as signed integer. |
2468 | </para> | 2468 | </para> |
2469 | </section> | 2469 | </section> |
2470 | 2470 | ||
2471 | <section id="pcm-interface-runtime-dma"> | 2471 | <section id="pcm-interface-runtime-dma"> |
2472 | <title>DMA Buffer Information</title> | 2472 | <title>DMA Buffer Information</title> |
2473 | <para> | 2473 | <para> |
2474 | The DMA buffer is defined by the following four fields, | 2474 | The DMA buffer is defined by the following four fields, |
2475 | <structfield>dma_area</structfield>, | 2475 | <structfield>dma_area</structfield>, |
2476 | <structfield>dma_addr</structfield>, | 2476 | <structfield>dma_addr</structfield>, |
2477 | <structfield>dma_bytes</structfield> and | 2477 | <structfield>dma_bytes</structfield> and |
2478 | <structfield>dma_private</structfield>. | 2478 | <structfield>dma_private</structfield>. |
2479 | The <structfield>dma_area</structfield> holds the buffer | 2479 | The <structfield>dma_area</structfield> holds the buffer |
2480 | pointer (the logical address). You can call | 2480 | pointer (the logical address). You can call |
2481 | <function>memcpy</function> from/to | 2481 | <function>memcpy</function> from/to |
2482 | this pointer. Meanwhile, <structfield>dma_addr</structfield> | 2482 | this pointer. Meanwhile, <structfield>dma_addr</structfield> |
2483 | holds the physical address of the buffer. This field is | 2483 | holds the physical address of the buffer. This field is |
2484 | specified only when the buffer is a linear buffer. | 2484 | specified only when the buffer is a linear buffer. |
2485 | <structfield>dma_bytes</structfield> holds the size of buffer | 2485 | <structfield>dma_bytes</structfield> holds the size of buffer |
2486 | in bytes. <structfield>dma_private</structfield> is used for | 2486 | in bytes. <structfield>dma_private</structfield> is used for |
2487 | the ALSA DMA allocator. | 2487 | the ALSA DMA allocator. |
2488 | </para> | 2488 | </para> |
2489 | 2489 | ||
2490 | <para> | 2490 | <para> |
2491 | If you use a standard ALSA function, | 2491 | If you use a standard ALSA function, |
2492 | <function>snd_pcm_lib_malloc_pages()</function>, for | 2492 | <function>snd_pcm_lib_malloc_pages()</function>, for |
2493 | allocating the buffer, these fields are set by the ALSA middle | 2493 | allocating the buffer, these fields are set by the ALSA middle |
2494 | layer, and you should <emphasis>not</emphasis> change them by | 2494 | layer, and you should <emphasis>not</emphasis> change them by |
2495 | yourself. You can read them but not write them. | 2495 | yourself. You can read them but not write them. |
2496 | On the other hand, if you want to allocate the buffer by | 2496 | On the other hand, if you want to allocate the buffer by |
2497 | yourself, you'll need to manage it in hw_params callback. | 2497 | yourself, you'll need to manage it in hw_params callback. |
2498 | At least, <structfield>dma_bytes</structfield> is mandatory. | 2498 | At least, <structfield>dma_bytes</structfield> is mandatory. |
2499 | <structfield>dma_area</structfield> is necessary when the | 2499 | <structfield>dma_area</structfield> is necessary when the |
2500 | buffer is mmapped. If your driver doesn't support mmap, this | 2500 | buffer is mmapped. If your driver doesn't support mmap, this |
2501 | field is not necessary. <structfield>dma_addr</structfield> | 2501 | field is not necessary. <structfield>dma_addr</structfield> |
2502 | is also not mandatory. You can use | 2502 | is also not mandatory. You can use |
2503 | <structfield>dma_private</structfield> as you like, too. | 2503 | <structfield>dma_private</structfield> as you like, too. |
2504 | </para> | 2504 | </para> |
2505 | </section> | 2505 | </section> |
2506 | 2506 | ||
2507 | <section id="pcm-interface-runtime-status"> | 2507 | <section id="pcm-interface-runtime-status"> |
2508 | <title>Running Status</title> | 2508 | <title>Running Status</title> |
2509 | <para> | 2509 | <para> |
2510 | The running status can be referred via <constant>runtime->status</constant>. | 2510 | The running status can be referred via <constant>runtime->status</constant>. |
2511 | This is the pointer to struct <structname>snd_pcm_mmap_status</structname> | 2511 | This is the pointer to struct <structname>snd_pcm_mmap_status</structname> |
2512 | record. For example, you can get the current DMA hardware | 2512 | record. For example, you can get the current DMA hardware |
2513 | pointer via <constant>runtime->status->hw_ptr</constant>. | 2513 | pointer via <constant>runtime->status->hw_ptr</constant>. |
2514 | </para> | 2514 | </para> |
2515 | 2515 | ||
2516 | <para> | 2516 | <para> |
2517 | The DMA application pointer can be referred via | 2517 | The DMA application pointer can be referred via |
2518 | <constant>runtime->control</constant>, which points | 2518 | <constant>runtime->control</constant>, which points |
2519 | struct <structname>snd_pcm_mmap_control</structname> record. | 2519 | struct <structname>snd_pcm_mmap_control</structname> record. |
2520 | However, accessing directly to this value is not recommended. | 2520 | However, accessing directly to this value is not recommended. |
2521 | </para> | 2521 | </para> |
2522 | </section> | 2522 | </section> |
2523 | 2523 | ||
2524 | <section id="pcm-interface-runtime-private"> | 2524 | <section id="pcm-interface-runtime-private"> |
2525 | <title>Private Data</title> | 2525 | <title>Private Data</title> |
2526 | <para> | 2526 | <para> |
2527 | You can allocate a record for the substream and store it in | 2527 | You can allocate a record for the substream and store it in |
2528 | <constant>runtime->private_data</constant>. Usually, this | 2528 | <constant>runtime->private_data</constant>. Usually, this |
2529 | done in | 2529 | done in |
2530 | <link linkend="pcm-interface-operators-open-callback"><citetitle> | 2530 | <link linkend="pcm-interface-operators-open-callback"><citetitle> |
2531 | the open callback</citetitle></link>. | 2531 | the open callback</citetitle></link>. |
2532 | Don't mix this with <constant>pcm->private_data</constant>. | 2532 | Don't mix this with <constant>pcm->private_data</constant>. |
2533 | The <constant>pcm->private_data</constant> usually points the | 2533 | The <constant>pcm->private_data</constant> usually points the |
2534 | chip instance assigned statically at the creation of PCM, while the | 2534 | chip instance assigned statically at the creation of PCM, while the |
2535 | <constant>runtime->private_data</constant> points a dynamic | 2535 | <constant>runtime->private_data</constant> points a dynamic |
2536 | data created at the PCM open callback. | 2536 | data created at the PCM open callback. |
2537 | 2537 | ||
2538 | <informalexample> | 2538 | <informalexample> |
2539 | <programlisting> | 2539 | <programlisting> |
2540 | <![CDATA[ | 2540 | <![CDATA[ |
2541 | static int snd_xxx_open(struct snd_pcm_substream *substream) | 2541 | static int snd_xxx_open(struct snd_pcm_substream *substream) |
2542 | { | 2542 | { |
2543 | struct my_pcm_data *data; | 2543 | struct my_pcm_data *data; |
2544 | .... | 2544 | .... |
2545 | data = kmalloc(sizeof(*data), GFP_KERNEL); | 2545 | data = kmalloc(sizeof(*data), GFP_KERNEL); |
2546 | substream->runtime->private_data = data; | 2546 | substream->runtime->private_data = data; |
2547 | .... | 2547 | .... |
2548 | } | 2548 | } |
2549 | ]]> | 2549 | ]]> |
2550 | </programlisting> | 2550 | </programlisting> |
2551 | </informalexample> | 2551 | </informalexample> |
2552 | </para> | 2552 | </para> |
2553 | 2553 | ||
2554 | <para> | 2554 | <para> |
2555 | The allocated object must be released in | 2555 | The allocated object must be released in |
2556 | <link linkend="pcm-interface-operators-open-callback"><citetitle> | 2556 | <link linkend="pcm-interface-operators-open-callback"><citetitle> |
2557 | the close callback</citetitle></link>. | 2557 | the close callback</citetitle></link>. |
2558 | </para> | 2558 | </para> |
2559 | </section> | 2559 | </section> |
2560 | 2560 | ||
2561 | <section id="pcm-interface-runtime-intr"> | 2561 | <section id="pcm-interface-runtime-intr"> |
2562 | <title>Interrupt Callbacks</title> | 2562 | <title>Interrupt Callbacks</title> |
2563 | <para> | 2563 | <para> |
2564 | The field <structfield>transfer_ack_begin</structfield> and | 2564 | The field <structfield>transfer_ack_begin</structfield> and |
2565 | <structfield>transfer_ack_end</structfield> are called at | 2565 | <structfield>transfer_ack_end</structfield> are called at |
2566 | the beginning and the end of | 2566 | the beginning and the end of |
2567 | <function>snd_pcm_period_elapsed()</function>, respectively. | 2567 | <function>snd_pcm_period_elapsed()</function>, respectively. |
2568 | </para> | 2568 | </para> |
2569 | </section> | 2569 | </section> |
2570 | 2570 | ||
2571 | </section> | 2571 | </section> |
2572 | 2572 | ||
2573 | <section id="pcm-interface-operators"> | 2573 | <section id="pcm-interface-operators"> |
2574 | <title>Operators</title> | 2574 | <title>Operators</title> |
2575 | <para> | 2575 | <para> |
2576 | OK, now let me explain the detail of each pcm callback | 2576 | OK, now let me explain the detail of each pcm callback |
2577 | (<parameter>ops</parameter>). In general, every callback must | 2577 | (<parameter>ops</parameter>). In general, every callback must |
2578 | return 0 if successful, or a negative number with the error | 2578 | return 0 if successful, or a negative number with the error |
2579 | number such as <constant>-EINVAL</constant> at any | 2579 | number such as <constant>-EINVAL</constant> at any |
2580 | error. | 2580 | error. |
2581 | </para> | 2581 | </para> |
2582 | 2582 | ||
2583 | <para> | 2583 | <para> |
2584 | The callback function takes at least the argument with | 2584 | The callback function takes at least the argument with |
2585 | <structname>snd_pcm_substream</structname> pointer. For retrieving the | 2585 | <structname>snd_pcm_substream</structname> pointer. For retrieving the |
2586 | chip record from the given substream instance, you can use the | 2586 | chip record from the given substream instance, you can use the |
2587 | following macro. | 2587 | following macro. |
2588 | 2588 | ||
2589 | <informalexample> | 2589 | <informalexample> |
2590 | <programlisting> | 2590 | <programlisting> |
2591 | <![CDATA[ | 2591 | <![CDATA[ |
2592 | int xxx() { | 2592 | int xxx() { |
2593 | struct mychip *chip = snd_pcm_substream_chip(substream); | 2593 | struct mychip *chip = snd_pcm_substream_chip(substream); |
2594 | .... | 2594 | .... |
2595 | } | 2595 | } |
2596 | ]]> | 2596 | ]]> |
2597 | </programlisting> | 2597 | </programlisting> |
2598 | </informalexample> | 2598 | </informalexample> |
2599 | 2599 | ||
2600 | The macro reads <constant>substream->private_data</constant>, | 2600 | The macro reads <constant>substream->private_data</constant>, |
2601 | which is a copy of <constant>pcm->private_data</constant>. | 2601 | which is a copy of <constant>pcm->private_data</constant>. |
2602 | You can override the former if you need to assign different data | 2602 | You can override the former if you need to assign different data |
2603 | records per PCM substream. For example, cmi8330 driver assigns | 2603 | records per PCM substream. For example, cmi8330 driver assigns |
2604 | different private_data for playback and capture directions, | 2604 | different private_data for playback and capture directions, |
2605 | because it uses two different codecs (SB- and AD-compatible) for | 2605 | because it uses two different codecs (SB- and AD-compatible) for |
2606 | different directions. | 2606 | different directions. |
2607 | </para> | 2607 | </para> |
2608 | 2608 | ||
2609 | <section id="pcm-interface-operators-open-callback"> | 2609 | <section id="pcm-interface-operators-open-callback"> |
2610 | <title>open callback</title> | 2610 | <title>open callback</title> |
2611 | <para> | 2611 | <para> |
2612 | <informalexample> | 2612 | <informalexample> |
2613 | <programlisting> | 2613 | <programlisting> |
2614 | <![CDATA[ | 2614 | <![CDATA[ |
2615 | static int snd_xxx_open(struct snd_pcm_substream *substream); | 2615 | static int snd_xxx_open(struct snd_pcm_substream *substream); |
2616 | ]]> | 2616 | ]]> |
2617 | </programlisting> | 2617 | </programlisting> |
2618 | </informalexample> | 2618 | </informalexample> |
2619 | 2619 | ||
2620 | This is called when a pcm substream is opened. | 2620 | This is called when a pcm substream is opened. |
2621 | </para> | 2621 | </para> |
2622 | 2622 | ||
2623 | <para> | 2623 | <para> |
2624 | At least, here you have to initialize the runtime->hw | 2624 | At least, here you have to initialize the runtime->hw |
2625 | record. Typically, this is done by like this: | 2625 | record. Typically, this is done by like this: |
2626 | 2626 | ||
2627 | <informalexample> | 2627 | <informalexample> |
2628 | <programlisting> | 2628 | <programlisting> |
2629 | <![CDATA[ | 2629 | <![CDATA[ |
2630 | static int snd_xxx_open(struct snd_pcm_substream *substream) | 2630 | static int snd_xxx_open(struct snd_pcm_substream *substream) |
2631 | { | 2631 | { |
2632 | struct mychip *chip = snd_pcm_substream_chip(substream); | 2632 | struct mychip *chip = snd_pcm_substream_chip(substream); |
2633 | struct snd_pcm_runtime *runtime = substream->runtime; | 2633 | struct snd_pcm_runtime *runtime = substream->runtime; |
2634 | 2634 | ||
2635 | runtime->hw = snd_mychip_playback_hw; | 2635 | runtime->hw = snd_mychip_playback_hw; |
2636 | return 0; | 2636 | return 0; |
2637 | } | 2637 | } |
2638 | ]]> | 2638 | ]]> |
2639 | </programlisting> | 2639 | </programlisting> |
2640 | </informalexample> | 2640 | </informalexample> |
2641 | 2641 | ||
2642 | where <parameter>snd_mychip_playback_hw</parameter> is the | 2642 | where <parameter>snd_mychip_playback_hw</parameter> is the |
2643 | pre-defined hardware description. | 2643 | pre-defined hardware description. |
2644 | </para> | 2644 | </para> |
2645 | 2645 | ||
2646 | <para> | 2646 | <para> |
2647 | You can allocate a private data in this callback, as described | 2647 | You can allocate a private data in this callback, as described |
2648 | in <link linkend="pcm-interface-runtime-private"><citetitle> | 2648 | in <link linkend="pcm-interface-runtime-private"><citetitle> |
2649 | Private Data</citetitle></link> section. | 2649 | Private Data</citetitle></link> section. |
2650 | </para> | 2650 | </para> |
2651 | 2651 | ||
2652 | <para> | 2652 | <para> |
2653 | If the hardware configuration needs more constraints, set the | 2653 | If the hardware configuration needs more constraints, set the |
2654 | hardware constraints here, too. | 2654 | hardware constraints here, too. |
2655 | See <link linkend="pcm-interface-constraints"><citetitle> | 2655 | See <link linkend="pcm-interface-constraints"><citetitle> |
2656 | Constraints</citetitle></link> for more details. | 2656 | Constraints</citetitle></link> for more details. |
2657 | </para> | 2657 | </para> |
2658 | </section> | 2658 | </section> |
2659 | 2659 | ||
2660 | <section id="pcm-interface-operators-close-callback"> | 2660 | <section id="pcm-interface-operators-close-callback"> |
2661 | <title>close callback</title> | 2661 | <title>close callback</title> |
2662 | <para> | 2662 | <para> |
2663 | <informalexample> | 2663 | <informalexample> |
2664 | <programlisting> | 2664 | <programlisting> |
2665 | <![CDATA[ | 2665 | <![CDATA[ |
2666 | static int snd_xxx_close(struct snd_pcm_substream *substream); | 2666 | static int snd_xxx_close(struct snd_pcm_substream *substream); |
2667 | ]]> | 2667 | ]]> |
2668 | </programlisting> | 2668 | </programlisting> |
2669 | </informalexample> | 2669 | </informalexample> |
2670 | 2670 | ||
2671 | Obviously, this is called when a pcm substream is closed. | 2671 | Obviously, this is called when a pcm substream is closed. |
2672 | </para> | 2672 | </para> |
2673 | 2673 | ||
2674 | <para> | 2674 | <para> |
2675 | Any private instance for a pcm substream allocated in the | 2675 | Any private instance for a pcm substream allocated in the |
2676 | open callback will be released here. | 2676 | open callback will be released here. |
2677 | 2677 | ||
2678 | <informalexample> | 2678 | <informalexample> |
2679 | <programlisting> | 2679 | <programlisting> |
2680 | <![CDATA[ | 2680 | <![CDATA[ |
2681 | static int snd_xxx_close(struct snd_pcm_substream *substream) | 2681 | static int snd_xxx_close(struct snd_pcm_substream *substream) |
2682 | { | 2682 | { |
2683 | .... | 2683 | .... |
2684 | kfree(substream->runtime->private_data); | 2684 | kfree(substream->runtime->private_data); |
2685 | .... | 2685 | .... |
2686 | } | 2686 | } |
2687 | ]]> | 2687 | ]]> |
2688 | </programlisting> | 2688 | </programlisting> |
2689 | </informalexample> | 2689 | </informalexample> |
2690 | </para> | 2690 | </para> |
2691 | </section> | 2691 | </section> |
2692 | 2692 | ||
2693 | <section id="pcm-interface-operators-ioctl-callback"> | 2693 | <section id="pcm-interface-operators-ioctl-callback"> |
2694 | <title>ioctl callback</title> | 2694 | <title>ioctl callback</title> |
2695 | <para> | 2695 | <para> |
2696 | This is used for any special action to pcm ioctls. But | 2696 | This is used for any special action to pcm ioctls. But |
2697 | usually you can pass a generic ioctl callback, | 2697 | usually you can pass a generic ioctl callback, |
2698 | <function>snd_pcm_lib_ioctl</function>. | 2698 | <function>snd_pcm_lib_ioctl</function>. |
2699 | </para> | 2699 | </para> |
2700 | </section> | 2700 | </section> |
2701 | 2701 | ||
2702 | <section id="pcm-interface-operators-hw-params-callback"> | 2702 | <section id="pcm-interface-operators-hw-params-callback"> |
2703 | <title>hw_params callback</title> | 2703 | <title>hw_params callback</title> |
2704 | <para> | 2704 | <para> |
2705 | <informalexample> | 2705 | <informalexample> |
2706 | <programlisting> | 2706 | <programlisting> |
2707 | <![CDATA[ | 2707 | <![CDATA[ |
2708 | static int snd_xxx_hw_params(struct snd_pcm_substream *substream, | 2708 | static int snd_xxx_hw_params(struct snd_pcm_substream *substream, |
2709 | struct snd_pcm_hw_params *hw_params); | 2709 | struct snd_pcm_hw_params *hw_params); |
2710 | ]]> | 2710 | ]]> |
2711 | </programlisting> | 2711 | </programlisting> |
2712 | </informalexample> | 2712 | </informalexample> |
2713 | 2713 | ||
2714 | This and <structfield>hw_free</structfield> callbacks exist | 2714 | This and <structfield>hw_free</structfield> callbacks exist |
2715 | only on ALSA 0.9.x. | 2715 | only on ALSA 0.9.x. |
2716 | </para> | 2716 | </para> |
2717 | 2717 | ||
2718 | <para> | 2718 | <para> |
2719 | This is called when the hardware parameter | 2719 | This is called when the hardware parameter |
2720 | (<structfield>hw_params</structfield>) is set | 2720 | (<structfield>hw_params</structfield>) is set |
2721 | up by the application, | 2721 | up by the application, |
2722 | that is, once when the buffer size, the period size, the | 2722 | that is, once when the buffer size, the period size, the |
2723 | format, etc. are defined for the pcm substream. | 2723 | format, etc. are defined for the pcm substream. |
2724 | </para> | 2724 | </para> |
2725 | 2725 | ||
2726 | <para> | 2726 | <para> |
2727 | Many hardware set-up should be done in this callback, | 2727 | Many hardware set-up should be done in this callback, |
2728 | including the allocation of buffers. | 2728 | including the allocation of buffers. |
2729 | </para> | 2729 | </para> |
2730 | 2730 | ||
2731 | <para> | 2731 | <para> |
2732 | Parameters to be initialized are retrieved by | 2732 | Parameters to be initialized are retrieved by |
2733 | <function>params_xxx()</function> macros. For allocating a | 2733 | <function>params_xxx()</function> macros. For allocating a |
2734 | buffer, you can call a helper function, | 2734 | buffer, you can call a helper function, |
2735 | 2735 | ||
2736 | <informalexample> | 2736 | <informalexample> |
2737 | <programlisting> | 2737 | <programlisting> |
2738 | <![CDATA[ | 2738 | <![CDATA[ |
2739 | snd_pcm_lib_malloc_pages(substream, params_buffer_bytes(hw_params)); | 2739 | snd_pcm_lib_malloc_pages(substream, params_buffer_bytes(hw_params)); |
2740 | ]]> | 2740 | ]]> |
2741 | </programlisting> | 2741 | </programlisting> |
2742 | </informalexample> | 2742 | </informalexample> |
2743 | 2743 | ||
2744 | <function>snd_pcm_lib_malloc_pages()</function> is available | 2744 | <function>snd_pcm_lib_malloc_pages()</function> is available |
2745 | only when the DMA buffers have been pre-allocated. | 2745 | only when the DMA buffers have been pre-allocated. |
2746 | See the section <link | 2746 | See the section <link |
2747 | linkend="buffer-and-memory-buffer-types"><citetitle> | 2747 | linkend="buffer-and-memory-buffer-types"><citetitle> |
2748 | Buffer Types</citetitle></link> for more details. | 2748 | Buffer Types</citetitle></link> for more details. |
2749 | </para> | 2749 | </para> |
2750 | 2750 | ||
2751 | <para> | 2751 | <para> |
2752 | Note that this and <structfield>prepare</structfield> callbacks | 2752 | Note that this and <structfield>prepare</structfield> callbacks |
2753 | may be called multiple times per initialization. | 2753 | may be called multiple times per initialization. |
2754 | For example, the OSS emulation may | 2754 | For example, the OSS emulation may |
2755 | call these callbacks at each change via its ioctl. | 2755 | call these callbacks at each change via its ioctl. |
2756 | </para> | 2756 | </para> |
2757 | 2757 | ||
2758 | <para> | 2758 | <para> |
2759 | Thus, you need to take care not to allocate the same buffers | 2759 | Thus, you need to take care not to allocate the same buffers |
2760 | many times, which will lead to memory leak! Calling the | 2760 | many times, which will lead to memory leak! Calling the |
2761 | helper function above many times is OK. It will release the | 2761 | helper function above many times is OK. It will release the |
2762 | previous buffer automatically when it was already allocated. | 2762 | previous buffer automatically when it was already allocated. |
2763 | </para> | 2763 | </para> |
2764 | 2764 | ||
2765 | <para> | 2765 | <para> |
2766 | Another note is that this callback is non-atomic | 2766 | Another note is that this callback is non-atomic |
2767 | (schedulable). This is important, because the | 2767 | (schedulable). This is important, because the |
2768 | <structfield>trigger</structfield> callback | 2768 | <structfield>trigger</structfield> callback |
2769 | is atomic (non-schedulable). That is, mutex or any | 2769 | is atomic (non-schedulable). That is, mutex or any |
2770 | schedule-related functions are not available in | 2770 | schedule-related functions are not available in |
2771 | <structfield>trigger</structfield> callback. | 2771 | <structfield>trigger</structfield> callback. |
2772 | Please see the subsection | 2772 | Please see the subsection |
2773 | <link linkend="pcm-interface-atomicity"><citetitle> | 2773 | <link linkend="pcm-interface-atomicity"><citetitle> |
2774 | Atomicity</citetitle></link> for details. | 2774 | Atomicity</citetitle></link> for details. |
2775 | </para> | 2775 | </para> |
2776 | </section> | 2776 | </section> |
2777 | 2777 | ||
2778 | <section id="pcm-interface-operators-hw-free-callback"> | 2778 | <section id="pcm-interface-operators-hw-free-callback"> |
2779 | <title>hw_free callback</title> | 2779 | <title>hw_free callback</title> |
2780 | <para> | 2780 | <para> |
2781 | <informalexample> | 2781 | <informalexample> |
2782 | <programlisting> | 2782 | <programlisting> |
2783 | <![CDATA[ | 2783 | <![CDATA[ |
2784 | static int snd_xxx_hw_free(struct snd_pcm_substream *substream); | 2784 | static int snd_xxx_hw_free(struct snd_pcm_substream *substream); |
2785 | ]]> | 2785 | ]]> |
2786 | </programlisting> | 2786 | </programlisting> |
2787 | </informalexample> | 2787 | </informalexample> |
2788 | </para> | 2788 | </para> |
2789 | 2789 | ||
2790 | <para> | 2790 | <para> |
2791 | This is called to release the resources allocated via | 2791 | This is called to release the resources allocated via |
2792 | <structfield>hw_params</structfield>. For example, releasing the | 2792 | <structfield>hw_params</structfield>. For example, releasing the |
2793 | buffer via | 2793 | buffer via |
2794 | <function>snd_pcm_lib_malloc_pages()</function> is done by | 2794 | <function>snd_pcm_lib_malloc_pages()</function> is done by |
2795 | calling the following: | 2795 | calling the following: |
2796 | 2796 | ||
2797 | <informalexample> | 2797 | <informalexample> |
2798 | <programlisting> | 2798 | <programlisting> |
2799 | <![CDATA[ | 2799 | <![CDATA[ |
2800 | snd_pcm_lib_free_pages(substream); | 2800 | snd_pcm_lib_free_pages(substream); |
2801 | ]]> | 2801 | ]]> |
2802 | </programlisting> | 2802 | </programlisting> |
2803 | </informalexample> | 2803 | </informalexample> |
2804 | </para> | 2804 | </para> |
2805 | 2805 | ||
2806 | <para> | 2806 | <para> |
2807 | This function is always called before the close callback is called. | 2807 | This function is always called before the close callback is called. |
2808 | Also, the callback may be called multiple times, too. | 2808 | Also, the callback may be called multiple times, too. |
2809 | Keep track whether the resource was already released. | 2809 | Keep track whether the resource was already released. |
2810 | </para> | 2810 | </para> |
2811 | </section> | 2811 | </section> |
2812 | 2812 | ||
2813 | <section id="pcm-interface-operators-prepare-callback"> | 2813 | <section id="pcm-interface-operators-prepare-callback"> |
2814 | <title>prepare callback</title> | 2814 | <title>prepare callback</title> |
2815 | <para> | 2815 | <para> |
2816 | <informalexample> | 2816 | <informalexample> |
2817 | <programlisting> | 2817 | <programlisting> |
2818 | <![CDATA[ | 2818 | <![CDATA[ |
2819 | static int snd_xxx_prepare(struct snd_pcm_substream *substream); | 2819 | static int snd_xxx_prepare(struct snd_pcm_substream *substream); |
2820 | ]]> | 2820 | ]]> |
2821 | </programlisting> | 2821 | </programlisting> |
2822 | </informalexample> | 2822 | </informalexample> |
2823 | </para> | 2823 | </para> |
2824 | 2824 | ||
2825 | <para> | 2825 | <para> |
2826 | This callback is called when the pcm is | 2826 | This callback is called when the pcm is |
2827 | <quote>prepared</quote>. You can set the format type, sample | 2827 | <quote>prepared</quote>. You can set the format type, sample |
2828 | rate, etc. here. The difference from | 2828 | rate, etc. here. The difference from |
2829 | <structfield>hw_params</structfield> is that the | 2829 | <structfield>hw_params</structfield> is that the |
2830 | <structfield>prepare</structfield> callback will be called at each | 2830 | <structfield>prepare</structfield> callback will be called at each |
2831 | time | 2831 | time |
2832 | <function>snd_pcm_prepare()</function> is called, i.e. when | 2832 | <function>snd_pcm_prepare()</function> is called, i.e. when |
2833 | recovered after underruns, etc. | 2833 | recovered after underruns, etc. |
2834 | </para> | 2834 | </para> |
2835 | 2835 | ||
2836 | <para> | 2836 | <para> |
2837 | Note that this callback became non-atomic since the recent version. | 2837 | Note that this callback became non-atomic since the recent version. |
2838 | You can use schedule-related functions safely in this callback now. | 2838 | You can use schedule-related functions safely in this callback now. |
2839 | </para> | 2839 | </para> |
2840 | 2840 | ||
2841 | <para> | 2841 | <para> |
2842 | In this and the following callbacks, you can refer to the | 2842 | In this and the following callbacks, you can refer to the |
2843 | values via the runtime record, | 2843 | values via the runtime record, |
2844 | substream->runtime. | 2844 | substream->runtime. |
2845 | For example, to get the current | 2845 | For example, to get the current |
2846 | rate, format or channels, access to | 2846 | rate, format or channels, access to |
2847 | runtime->rate, | 2847 | runtime->rate, |
2848 | runtime->format or | 2848 | runtime->format or |
2849 | runtime->channels, respectively. | 2849 | runtime->channels, respectively. |
2850 | The physical address of the allocated buffer is set to | 2850 | The physical address of the allocated buffer is set to |
2851 | runtime->dma_area. The buffer and period sizes are | 2851 | runtime->dma_area. The buffer and period sizes are |
2852 | in runtime->buffer_size and runtime->period_size, | 2852 | in runtime->buffer_size and runtime->period_size, |
2853 | respectively. | 2853 | respectively. |
2854 | </para> | 2854 | </para> |
2855 | 2855 | ||
2856 | <para> | 2856 | <para> |
2857 | Be careful that this callback will be called many times at | 2857 | Be careful that this callback will be called many times at |
2858 | each set up, too. | 2858 | each set up, too. |
2859 | </para> | 2859 | </para> |
2860 | </section> | 2860 | </section> |
2861 | 2861 | ||
2862 | <section id="pcm-interface-operators-trigger-callback"> | 2862 | <section id="pcm-interface-operators-trigger-callback"> |
2863 | <title>trigger callback</title> | 2863 | <title>trigger callback</title> |
2864 | <para> | 2864 | <para> |
2865 | <informalexample> | 2865 | <informalexample> |
2866 | <programlisting> | 2866 | <programlisting> |
2867 | <![CDATA[ | 2867 | <![CDATA[ |
2868 | static int snd_xxx_trigger(struct snd_pcm_substream *substream, int cmd); | 2868 | static int snd_xxx_trigger(struct snd_pcm_substream *substream, int cmd); |
2869 | ]]> | 2869 | ]]> |
2870 | </programlisting> | 2870 | </programlisting> |
2871 | </informalexample> | 2871 | </informalexample> |
2872 | 2872 | ||
2873 | This is called when the pcm is started, stopped or paused. | 2873 | This is called when the pcm is started, stopped or paused. |
2874 | </para> | 2874 | </para> |
2875 | 2875 | ||
2876 | <para> | 2876 | <para> |
2877 | Which action is specified in the second argument, | 2877 | Which action is specified in the second argument, |
2878 | <constant>SNDRV_PCM_TRIGGER_XXX</constant> in | 2878 | <constant>SNDRV_PCM_TRIGGER_XXX</constant> in |
2879 | <filename><sound/pcm.h></filename>. At least, | 2879 | <filename><sound/pcm.h></filename>. At least, |
2880 | <constant>START</constant> and <constant>STOP</constant> | 2880 | <constant>START</constant> and <constant>STOP</constant> |
2881 | commands must be defined in this callback. | 2881 | commands must be defined in this callback. |
2882 | 2882 | ||
2883 | <informalexample> | 2883 | <informalexample> |
2884 | <programlisting> | 2884 | <programlisting> |
2885 | <![CDATA[ | 2885 | <![CDATA[ |
2886 | switch (cmd) { | 2886 | switch (cmd) { |
2887 | case SNDRV_PCM_TRIGGER_START: | 2887 | case SNDRV_PCM_TRIGGER_START: |
2888 | // do something to start the PCM engine | 2888 | // do something to start the PCM engine |
2889 | break; | 2889 | break; |
2890 | case SNDRV_PCM_TRIGGER_STOP: | 2890 | case SNDRV_PCM_TRIGGER_STOP: |
2891 | // do something to stop the PCM engine | 2891 | // do something to stop the PCM engine |
2892 | break; | 2892 | break; |
2893 | default: | 2893 | default: |
2894 | return -EINVAL; | 2894 | return -EINVAL; |
2895 | } | 2895 | } |
2896 | ]]> | 2896 | ]]> |
2897 | </programlisting> | 2897 | </programlisting> |
2898 | </informalexample> | 2898 | </informalexample> |
2899 | </para> | 2899 | </para> |
2900 | 2900 | ||
2901 | <para> | 2901 | <para> |
2902 | When the pcm supports the pause operation (given in info | 2902 | When the pcm supports the pause operation (given in info |
2903 | field of the hardware table), <constant>PAUSE_PUSE</constant> | 2903 | field of the hardware table), <constant>PAUSE_PUSE</constant> |
2904 | and <constant>PAUSE_RELEASE</constant> commands must be | 2904 | and <constant>PAUSE_RELEASE</constant> commands must be |
2905 | handled here, too. The former is the command to pause the pcm, | 2905 | handled here, too. The former is the command to pause the pcm, |
2906 | and the latter to restart the pcm again. | 2906 | and the latter to restart the pcm again. |
2907 | </para> | 2907 | </para> |
2908 | 2908 | ||
2909 | <para> | 2909 | <para> |
2910 | When the pcm supports the suspend/resume operation, | 2910 | When the pcm supports the suspend/resume operation, |
2911 | regardless of full or partial suspend/resume support, | 2911 | regardless of full or partial suspend/resume support, |
2912 | <constant>SUSPEND</constant> and <constant>RESUME</constant> | 2912 | <constant>SUSPEND</constant> and <constant>RESUME</constant> |
2913 | commands must be handled, too. | 2913 | commands must be handled, too. |
2914 | These commands are issued when the power-management status is | 2914 | These commands are issued when the power-management status is |
2915 | changed. Obviously, the <constant>SUSPEND</constant> and | 2915 | changed. Obviously, the <constant>SUSPEND</constant> and |
2916 | <constant>RESUME</constant> | 2916 | <constant>RESUME</constant> |
2917 | do suspend and resume of the pcm substream, and usually, they | 2917 | do suspend and resume of the pcm substream, and usually, they |
2918 | are identical with <constant>STOP</constant> and | 2918 | are identical with <constant>STOP</constant> and |
2919 | <constant>START</constant> commands, respectively. | 2919 | <constant>START</constant> commands, respectively. |
2920 | See <link linkend="power-management"><citetitle> | 2920 | See <link linkend="power-management"><citetitle> |
2921 | Power Management</citetitle></link> section for details. | 2921 | Power Management</citetitle></link> section for details. |
2922 | </para> | 2922 | </para> |
2923 | 2923 | ||
2924 | <para> | 2924 | <para> |
2925 | As mentioned, this callback is atomic. You cannot call | 2925 | As mentioned, this callback is atomic. You cannot call |
2926 | the function going to sleep. | 2926 | the function going to sleep. |
2927 | The trigger callback should be as minimal as possible, | 2927 | The trigger callback should be as minimal as possible, |
2928 | just really triggering the DMA. The other stuff should be | 2928 | just really triggering the DMA. The other stuff should be |
2929 | initialized hw_params and prepare callbacks properly | 2929 | initialized hw_params and prepare callbacks properly |
2930 | beforehand. | 2930 | beforehand. |
2931 | </para> | 2931 | </para> |
2932 | </section> | 2932 | </section> |
2933 | 2933 | ||
2934 | <section id="pcm-interface-operators-pointer-callback"> | 2934 | <section id="pcm-interface-operators-pointer-callback"> |
2935 | <title>pointer callback</title> | 2935 | <title>pointer callback</title> |
2936 | <para> | 2936 | <para> |
2937 | <informalexample> | 2937 | <informalexample> |
2938 | <programlisting> | 2938 | <programlisting> |
2939 | <![CDATA[ | 2939 | <![CDATA[ |
2940 | static snd_pcm_uframes_t snd_xxx_pointer(struct snd_pcm_substream *substream) | 2940 | static snd_pcm_uframes_t snd_xxx_pointer(struct snd_pcm_substream *substream) |
2941 | ]]> | 2941 | ]]> |
2942 | </programlisting> | 2942 | </programlisting> |
2943 | </informalexample> | 2943 | </informalexample> |
2944 | 2944 | ||
2945 | This callback is called when the PCM middle layer inquires | 2945 | This callback is called when the PCM middle layer inquires |
2946 | the current hardware position on the buffer. The position must | 2946 | the current hardware position on the buffer. The position must |
2947 | be returned in frames (which was in bytes on ALSA 0.5.x), | 2947 | be returned in frames (which was in bytes on ALSA 0.5.x), |
2948 | ranged from 0 to buffer_size - 1. | 2948 | ranged from 0 to buffer_size - 1. |
2949 | </para> | 2949 | </para> |
2950 | 2950 | ||
2951 | <para> | 2951 | <para> |
2952 | This is called usually from the buffer-update routine in the | 2952 | This is called usually from the buffer-update routine in the |
2953 | pcm middle layer, which is invoked when | 2953 | pcm middle layer, which is invoked when |
2954 | <function>snd_pcm_period_elapsed()</function> is called in the | 2954 | <function>snd_pcm_period_elapsed()</function> is called in the |
2955 | interrupt routine. Then the pcm middle layer updates the | 2955 | interrupt routine. Then the pcm middle layer updates the |
2956 | position and calculates the available space, and wakes up the | 2956 | position and calculates the available space, and wakes up the |
2957 | sleeping poll threads, etc. | 2957 | sleeping poll threads, etc. |
2958 | </para> | 2958 | </para> |
2959 | 2959 | ||
2960 | <para> | 2960 | <para> |
2961 | This callback is also atomic. | 2961 | This callback is also atomic. |
2962 | </para> | 2962 | </para> |
2963 | </section> | 2963 | </section> |
2964 | 2964 | ||
2965 | <section id="pcm-interface-operators-copy-silence"> | 2965 | <section id="pcm-interface-operators-copy-silence"> |
2966 | <title>copy and silence callbacks</title> | 2966 | <title>copy and silence callbacks</title> |
2967 | <para> | 2967 | <para> |
2968 | These callbacks are not mandatory, and can be omitted in | 2968 | These callbacks are not mandatory, and can be omitted in |
2969 | most cases. These callbacks are used when the hardware buffer | 2969 | most cases. These callbacks are used when the hardware buffer |
2970 | cannot be on the normal memory space. Some chips have their | 2970 | cannot be on the normal memory space. Some chips have their |
2971 | own buffer on the hardware which is not mappable. In such a | 2971 | own buffer on the hardware which is not mappable. In such a |
2972 | case, you have to transfer the data manually from the memory | 2972 | case, you have to transfer the data manually from the memory |
2973 | buffer to the hardware buffer. Or, if the buffer is | 2973 | buffer to the hardware buffer. Or, if the buffer is |
2974 | non-contiguous on both physical and virtual memory spaces, | 2974 | non-contiguous on both physical and virtual memory spaces, |
2975 | these callbacks must be defined, too. | 2975 | these callbacks must be defined, too. |
2976 | </para> | 2976 | </para> |
2977 | 2977 | ||
2978 | <para> | 2978 | <para> |
2979 | If these two callbacks are defined, copy and set-silence | 2979 | If these two callbacks are defined, copy and set-silence |
2980 | operations are done by them. The detailed will be described in | 2980 | operations are done by them. The detailed will be described in |
2981 | the later section <link | 2981 | the later section <link |
2982 | linkend="buffer-and-memory"><citetitle>Buffer and Memory | 2982 | linkend="buffer-and-memory"><citetitle>Buffer and Memory |
2983 | Management</citetitle></link>. | 2983 | Management</citetitle></link>. |
2984 | </para> | 2984 | </para> |
2985 | </section> | 2985 | </section> |
2986 | 2986 | ||
2987 | <section id="pcm-interface-operators-ack"> | 2987 | <section id="pcm-interface-operators-ack"> |
2988 | <title>ack callback</title> | 2988 | <title>ack callback</title> |
2989 | <para> | 2989 | <para> |
2990 | This callback is also not mandatory. This callback is called | 2990 | This callback is also not mandatory. This callback is called |
2991 | when the appl_ptr is updated in read or write operations. | 2991 | when the appl_ptr is updated in read or write operations. |
2992 | Some drivers like emu10k1-fx and cs46xx need to track the | 2992 | Some drivers like emu10k1-fx and cs46xx need to track the |
2993 | current appl_ptr for the internal buffer, and this callback | 2993 | current appl_ptr for the internal buffer, and this callback |
2994 | is useful only for such a purpose. | 2994 | is useful only for such a purpose. |
2995 | </para> | 2995 | </para> |
2996 | <para> | 2996 | <para> |
2997 | This callback is atomic. | 2997 | This callback is atomic. |
2998 | </para> | 2998 | </para> |
2999 | </section> | 2999 | </section> |
3000 | 3000 | ||
3001 | <section id="pcm-interface-operators-page-callback"> | 3001 | <section id="pcm-interface-operators-page-callback"> |
3002 | <title>page callback</title> | 3002 | <title>page callback</title> |
3003 | 3003 | ||
3004 | <para> | 3004 | <para> |
3005 | This callback is also not mandatory. This callback is used | 3005 | This callback is also not mandatory. This callback is used |
3006 | mainly for the non-contiguous buffer. The mmap calls this | 3006 | mainly for the non-contiguous buffer. The mmap calls this |
3007 | callback to get the page address. Some examples will be | 3007 | callback to get the page address. Some examples will be |
3008 | explained in the later section <link | 3008 | explained in the later section <link |
3009 | linkend="buffer-and-memory"><citetitle>Buffer and Memory | 3009 | linkend="buffer-and-memory"><citetitle>Buffer and Memory |
3010 | Management</citetitle></link>, too. | 3010 | Management</citetitle></link>, too. |
3011 | </para> | 3011 | </para> |
3012 | </section> | 3012 | </section> |
3013 | </section> | 3013 | </section> |
3014 | 3014 | ||
3015 | <section id="pcm-interface-interrupt-handler"> | 3015 | <section id="pcm-interface-interrupt-handler"> |
3016 | <title>Interrupt Handler</title> | 3016 | <title>Interrupt Handler</title> |
3017 | <para> | 3017 | <para> |
3018 | The rest of pcm stuff is the PCM interrupt handler. The | 3018 | The rest of pcm stuff is the PCM interrupt handler. The |
3019 | role of PCM interrupt handler in the sound driver is to update | 3019 | role of PCM interrupt handler in the sound driver is to update |
3020 | the buffer position and to tell the PCM middle layer when the | 3020 | the buffer position and to tell the PCM middle layer when the |
3021 | buffer position goes across the prescribed period size. To | 3021 | buffer position goes across the prescribed period size. To |
3022 | inform this, call <function>snd_pcm_period_elapsed()</function> | 3022 | inform this, call <function>snd_pcm_period_elapsed()</function> |
3023 | function. | 3023 | function. |
3024 | </para> | 3024 | </para> |
3025 | 3025 | ||
3026 | <para> | 3026 | <para> |
3027 | There are several types of sound chips to generate the interrupts. | 3027 | There are several types of sound chips to generate the interrupts. |
3028 | </para> | 3028 | </para> |
3029 | 3029 | ||
3030 | <section id="pcm-interface-interrupt-handler-boundary"> | 3030 | <section id="pcm-interface-interrupt-handler-boundary"> |
3031 | <title>Interrupts at the period (fragment) boundary</title> | 3031 | <title>Interrupts at the period (fragment) boundary</title> |
3032 | <para> | 3032 | <para> |
3033 | This is the most frequently found type: the hardware | 3033 | This is the most frequently found type: the hardware |
3034 | generates an interrupt at each period boundary. | 3034 | generates an interrupt at each period boundary. |
3035 | In this case, you can call | 3035 | In this case, you can call |
3036 | <function>snd_pcm_period_elapsed()</function> at each | 3036 | <function>snd_pcm_period_elapsed()</function> at each |
3037 | interrupt. | 3037 | interrupt. |
3038 | </para> | 3038 | </para> |
3039 | 3039 | ||
3040 | <para> | 3040 | <para> |
3041 | <function>snd_pcm_period_elapsed()</function> takes the | 3041 | <function>snd_pcm_period_elapsed()</function> takes the |
3042 | substream pointer as its argument. Thus, you need to keep the | 3042 | substream pointer as its argument. Thus, you need to keep the |
3043 | substream pointer accessible from the chip instance. For | 3043 | substream pointer accessible from the chip instance. For |
3044 | example, define substream field in the chip record to hold the | 3044 | example, define substream field in the chip record to hold the |
3045 | current running substream pointer, and set the pointer value | 3045 | current running substream pointer, and set the pointer value |
3046 | at open callback (and reset at close callback). | 3046 | at open callback (and reset at close callback). |
3047 | </para> | 3047 | </para> |
3048 | 3048 | ||
3049 | <para> | 3049 | <para> |
3050 | If you acquire a spinlock in the interrupt handler, and the | 3050 | If you acquire a spinlock in the interrupt handler, and the |
3051 | lock is used in other pcm callbacks, too, then you have to | 3051 | lock is used in other pcm callbacks, too, then you have to |
3052 | release the lock before calling | 3052 | release the lock before calling |
3053 | <function>snd_pcm_period_elapsed()</function>, because | 3053 | <function>snd_pcm_period_elapsed()</function>, because |
3054 | <function>snd_pcm_period_elapsed()</function> calls other pcm | 3054 | <function>snd_pcm_period_elapsed()</function> calls other pcm |
3055 | callbacks inside. | 3055 | callbacks inside. |
3056 | </para> | 3056 | </para> |
3057 | 3057 | ||
3058 | <para> | 3058 | <para> |
3059 | A typical coding would be like: | 3059 | A typical coding would be like: |
3060 | 3060 | ||
3061 | <example> | 3061 | <example> |
3062 | <title>Interrupt Handler Case #1</title> | 3062 | <title>Interrupt Handler Case #1</title> |
3063 | <programlisting> | 3063 | <programlisting> |
3064 | <![CDATA[ | 3064 | <![CDATA[ |
3065 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, | 3065 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, |
3066 | struct pt_regs *regs) | 3066 | struct pt_regs *regs) |
3067 | { | 3067 | { |
3068 | struct mychip *chip = dev_id; | 3068 | struct mychip *chip = dev_id; |
3069 | spin_lock(&chip->lock); | 3069 | spin_lock(&chip->lock); |
3070 | .... | 3070 | .... |
3071 | if (pcm_irq_invoked(chip)) { | 3071 | if (pcm_irq_invoked(chip)) { |
3072 | /* call updater, unlock before it */ | 3072 | /* call updater, unlock before it */ |
3073 | spin_unlock(&chip->lock); | 3073 | spin_unlock(&chip->lock); |
3074 | snd_pcm_period_elapsed(chip->substream); | 3074 | snd_pcm_period_elapsed(chip->substream); |
3075 | spin_lock(&chip->lock); | 3075 | spin_lock(&chip->lock); |
3076 | // acknowledge the interrupt if necessary | 3076 | // acknowledge the interrupt if necessary |
3077 | } | 3077 | } |
3078 | .... | 3078 | .... |
3079 | spin_unlock(&chip->lock); | 3079 | spin_unlock(&chip->lock); |
3080 | return IRQ_HANDLED; | 3080 | return IRQ_HANDLED; |
3081 | } | 3081 | } |
3082 | ]]> | 3082 | ]]> |
3083 | </programlisting> | 3083 | </programlisting> |
3084 | </example> | 3084 | </example> |
3085 | </para> | 3085 | </para> |
3086 | </section> | 3086 | </section> |
3087 | 3087 | ||
3088 | <section id="pcm-interface-interrupt-handler-timer"> | 3088 | <section id="pcm-interface-interrupt-handler-timer"> |
3089 | <title>High-frequent timer interrupts</title> | 3089 | <title>High-frequent timer interrupts</title> |
3090 | <para> | 3090 | <para> |
3091 | This is the case when the hardware doesn't generate interrupts | 3091 | This is the case when the hardware doesn't generate interrupts |
3092 | at the period boundary but do timer-interrupts at the fixed | 3092 | at the period boundary but do timer-interrupts at the fixed |
3093 | timer rate (e.g. es1968 or ymfpci drivers). | 3093 | timer rate (e.g. es1968 or ymfpci drivers). |
3094 | In this case, you need to check the current hardware | 3094 | In this case, you need to check the current hardware |
3095 | position and accumulates the processed sample length at each | 3095 | position and accumulates the processed sample length at each |
3096 | interrupt. When the accumulated size overcomes the period | 3096 | interrupt. When the accumulated size overcomes the period |
3097 | size, call | 3097 | size, call |
3098 | <function>snd_pcm_period_elapsed()</function> and reset the | 3098 | <function>snd_pcm_period_elapsed()</function> and reset the |
3099 | accumulator. | 3099 | accumulator. |
3100 | </para> | 3100 | </para> |
3101 | 3101 | ||
3102 | <para> | 3102 | <para> |
3103 | A typical coding would be like the following. | 3103 | A typical coding would be like the following. |
3104 | 3104 | ||
3105 | <example> | 3105 | <example> |
3106 | <title>Interrupt Handler Case #2</title> | 3106 | <title>Interrupt Handler Case #2</title> |
3107 | <programlisting> | 3107 | <programlisting> |
3108 | <![CDATA[ | 3108 | <![CDATA[ |
3109 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, | 3109 | static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id, |
3110 | struct pt_regs *regs) | 3110 | struct pt_regs *regs) |
3111 | { | 3111 | { |
3112 | struct mychip *chip = dev_id; | 3112 | struct mychip *chip = dev_id; |
3113 | spin_lock(&chip->lock); | 3113 | spin_lock(&chip->lock); |
3114 | .... | 3114 | .... |
3115 | if (pcm_irq_invoked(chip)) { | 3115 | if (pcm_irq_invoked(chip)) { |
3116 | unsigned int last_ptr, size; | 3116 | unsigned int last_ptr, size; |
3117 | /* get the current hardware pointer (in frames) */ | 3117 | /* get the current hardware pointer (in frames) */ |
3118 | last_ptr = get_hw_ptr(chip); | 3118 | last_ptr = get_hw_ptr(chip); |
3119 | /* calculate the processed frames since the | 3119 | /* calculate the processed frames since the |
3120 | * last update | 3120 | * last update |
3121 | */ | 3121 | */ |
3122 | if (last_ptr < chip->last_ptr) | 3122 | if (last_ptr < chip->last_ptr) |
3123 | size = runtime->buffer_size + last_ptr | 3123 | size = runtime->buffer_size + last_ptr |
3124 | - chip->last_ptr; | 3124 | - chip->last_ptr; |
3125 | else | 3125 | else |
3126 | size = last_ptr - chip->last_ptr; | 3126 | size = last_ptr - chip->last_ptr; |
3127 | /* remember the last updated point */ | 3127 | /* remember the last updated point */ |
3128 | chip->last_ptr = last_ptr; | 3128 | chip->last_ptr = last_ptr; |
3129 | /* accumulate the size */ | 3129 | /* accumulate the size */ |
3130 | chip->size += size; | 3130 | chip->size += size; |
3131 | /* over the period boundary? */ | 3131 | /* over the period boundary? */ |
3132 | if (chip->size >= runtime->period_size) { | 3132 | if (chip->size >= runtime->period_size) { |
3133 | /* reset the accumulator */ | 3133 | /* reset the accumulator */ |
3134 | chip->size %= runtime->period_size; | 3134 | chip->size %= runtime->period_size; |
3135 | /* call updater */ | 3135 | /* call updater */ |
3136 | spin_unlock(&chip->lock); | 3136 | spin_unlock(&chip->lock); |
3137 | snd_pcm_period_elapsed(substream); | 3137 | snd_pcm_period_elapsed(substream); |
3138 | spin_lock(&chip->lock); | 3138 | spin_lock(&chip->lock); |
3139 | } | 3139 | } |
3140 | // acknowledge the interrupt if necessary | 3140 | // acknowledge the interrupt if necessary |
3141 | } | 3141 | } |
3142 | .... | 3142 | .... |
3143 | spin_unlock(&chip->lock); | 3143 | spin_unlock(&chip->lock); |
3144 | return IRQ_HANDLED; | 3144 | return IRQ_HANDLED; |
3145 | } | 3145 | } |
3146 | ]]> | 3146 | ]]> |
3147 | </programlisting> | 3147 | </programlisting> |
3148 | </example> | 3148 | </example> |
3149 | </para> | 3149 | </para> |
3150 | </section> | 3150 | </section> |
3151 | 3151 | ||
3152 | <section id="pcm-interface-interrupt-handler-both"> | 3152 | <section id="pcm-interface-interrupt-handler-both"> |
3153 | <title>On calling <function>snd_pcm_period_elapsed()</function></title> | 3153 | <title>On calling <function>snd_pcm_period_elapsed()</function></title> |
3154 | <para> | 3154 | <para> |
3155 | In both cases, even if more than one period are elapsed, you | 3155 | In both cases, even if more than one period are elapsed, you |
3156 | don't have to call | 3156 | don't have to call |
3157 | <function>snd_pcm_period_elapsed()</function> many times. Call | 3157 | <function>snd_pcm_period_elapsed()</function> many times. Call |
3158 | only once. And the pcm layer will check the current hardware | 3158 | only once. And the pcm layer will check the current hardware |
3159 | pointer and update to the latest status. | 3159 | pointer and update to the latest status. |
3160 | </para> | 3160 | </para> |
3161 | </section> | 3161 | </section> |
3162 | </section> | 3162 | </section> |
3163 | 3163 | ||
3164 | <section id="pcm-interface-atomicity"> | 3164 | <section id="pcm-interface-atomicity"> |
3165 | <title>Atomicity</title> | 3165 | <title>Atomicity</title> |
3166 | <para> | 3166 | <para> |
3167 | One of the most important (and thus difficult to debug) problem | 3167 | One of the most important (and thus difficult to debug) problem |
3168 | on the kernel programming is the race condition. | 3168 | on the kernel programming is the race condition. |
3169 | On linux kernel, usually it's solved via spin-locks or | 3169 | On linux kernel, usually it's solved via spin-locks or |
3170 | semaphores. In general, if the race condition may | 3170 | semaphores. In general, if the race condition may |
3171 | happen in the interrupt handler, it's handled as atomic, and you | 3171 | happen in the interrupt handler, it's handled as atomic, and you |
3172 | have to use spinlock for protecting the critical session. If it | 3172 | have to use spinlock for protecting the critical session. If it |
3173 | never happens in the interrupt and it may take relatively long | 3173 | never happens in the interrupt and it may take relatively long |
3174 | time, you should use semaphore. | 3174 | time, you should use semaphore. |
3175 | </para> | 3175 | </para> |
3176 | 3176 | ||
3177 | <para> | 3177 | <para> |
3178 | As already seen, some pcm callbacks are atomic and some are | 3178 | As already seen, some pcm callbacks are atomic and some are |
3179 | not. For example, <parameter>hw_params</parameter> callback is | 3179 | not. For example, <parameter>hw_params</parameter> callback is |
3180 | non-atomic, while <parameter>trigger</parameter> callback is | 3180 | non-atomic, while <parameter>trigger</parameter> callback is |
3181 | atomic. This means, the latter is called already in a spinlock | 3181 | atomic. This means, the latter is called already in a spinlock |
3182 | held by the PCM middle layer. Please take this atomicity into | 3182 | held by the PCM middle layer. Please take this atomicity into |
3183 | account when you use a spinlock or a semaphore in the callbacks. | 3183 | account when you use a spinlock or a semaphore in the callbacks. |
3184 | </para> | 3184 | </para> |
3185 | 3185 | ||
3186 | <para> | 3186 | <para> |
3187 | In the atomic callbacks, you cannot use functions which may call | 3187 | In the atomic callbacks, you cannot use functions which may call |
3188 | <function>schedule</function> or go to | 3188 | <function>schedule</function> or go to |
3189 | <function>sleep</function>. The semaphore and mutex do sleep, | 3189 | <function>sleep</function>. The semaphore and mutex do sleep, |
3190 | and hence they cannot be used inside the atomic callbacks | 3190 | and hence they cannot be used inside the atomic callbacks |
3191 | (e.g. <parameter>trigger</parameter> callback). | 3191 | (e.g. <parameter>trigger</parameter> callback). |
3192 | For taking a certain delay in such a callback, please use | 3192 | For taking a certain delay in such a callback, please use |
3193 | <function>udelay()</function> or <function>mdelay()</function>. | 3193 | <function>udelay()</function> or <function>mdelay()</function>. |
3194 | </para> | 3194 | </para> |
3195 | 3195 | ||
3196 | <para> | 3196 | <para> |
3197 | All three atomic callbacks (trigger, pointer, and ack) are | 3197 | All three atomic callbacks (trigger, pointer, and ack) are |
3198 | called with local interrupts disabled. | 3198 | called with local interrupts disabled. |
3199 | </para> | 3199 | </para> |
3200 | 3200 | ||
3201 | </section> | 3201 | </section> |
3202 | <section id="pcm-interface-constraints"> | 3202 | <section id="pcm-interface-constraints"> |
3203 | <title>Constraints</title> | 3203 | <title>Constraints</title> |
3204 | <para> | 3204 | <para> |
3205 | If your chip supports unconventional sample rates, or only the | 3205 | If your chip supports unconventional sample rates, or only the |
3206 | limited samples, you need to set a constraint for the | 3206 | limited samples, you need to set a constraint for the |
3207 | condition. | 3207 | condition. |
3208 | </para> | 3208 | </para> |
3209 | 3209 | ||
3210 | <para> | 3210 | <para> |
3211 | For example, in order to restrict the sample rates in the some | 3211 | For example, in order to restrict the sample rates in the some |
3212 | supported values, use | 3212 | supported values, use |
3213 | <function>snd_pcm_hw_constraint_list()</function>. | 3213 | <function>snd_pcm_hw_constraint_list()</function>. |
3214 | You need to call this function in the open callback. | 3214 | You need to call this function in the open callback. |
3215 | 3215 | ||
3216 | <example> | 3216 | <example> |
3217 | <title>Example of Hardware Constraints</title> | 3217 | <title>Example of Hardware Constraints</title> |
3218 | <programlisting> | 3218 | <programlisting> |
3219 | <![CDATA[ | 3219 | <![CDATA[ |
3220 | static unsigned int rates[] = | 3220 | static unsigned int rates[] = |
3221 | {4000, 10000, 22050, 44100}; | 3221 | {4000, 10000, 22050, 44100}; |
3222 | static struct snd_pcm_hw_constraint_list constraints_rates = { | 3222 | static struct snd_pcm_hw_constraint_list constraints_rates = { |
3223 | .count = ARRAY_SIZE(rates), | 3223 | .count = ARRAY_SIZE(rates), |
3224 | .list = rates, | 3224 | .list = rates, |
3225 | .mask = 0, | 3225 | .mask = 0, |
3226 | }; | 3226 | }; |
3227 | 3227 | ||
3228 | static int snd_mychip_pcm_open(struct snd_pcm_substream *substream) | 3228 | static int snd_mychip_pcm_open(struct snd_pcm_substream *substream) |
3229 | { | 3229 | { |
3230 | int err; | 3230 | int err; |
3231 | .... | 3231 | .... |
3232 | err = snd_pcm_hw_constraint_list(substream->runtime, 0, | 3232 | err = snd_pcm_hw_constraint_list(substream->runtime, 0, |
3233 | SNDRV_PCM_HW_PARAM_RATE, | 3233 | SNDRV_PCM_HW_PARAM_RATE, |
3234 | &constraints_rates); | 3234 | &constraints_rates); |
3235 | if (err < 0) | 3235 | if (err < 0) |
3236 | return err; | 3236 | return err; |
3237 | .... | 3237 | .... |
3238 | } | 3238 | } |
3239 | ]]> | 3239 | ]]> |
3240 | </programlisting> | 3240 | </programlisting> |
3241 | </example> | 3241 | </example> |
3242 | </para> | 3242 | </para> |
3243 | 3243 | ||
3244 | <para> | 3244 | <para> |
3245 | There are many different constraints. | 3245 | There are many different constraints. |
3246 | Look in <filename>sound/pcm.h</filename> for a complete list. | 3246 | Look in <filename>sound/pcm.h</filename> for a complete list. |
3247 | You can even define your own constraint rules. | 3247 | You can even define your own constraint rules. |
3248 | For example, let's suppose my_chip can manage a substream of 1 channel | 3248 | For example, let's suppose my_chip can manage a substream of 1 channel |
3249 | if and only if the format is S16_LE, otherwise it supports any format | 3249 | if and only if the format is S16_LE, otherwise it supports any format |
3250 | specified in the <structname>snd_pcm_hardware</structname> stucture (or in any | 3250 | specified in the <structname>snd_pcm_hardware</structname> stucture (or in any |
3251 | other constraint_list). You can build a rule like this: | 3251 | other constraint_list). You can build a rule like this: |
3252 | 3252 | ||
3253 | <example> | 3253 | <example> |
3254 | <title>Example of Hardware Constraints for Channels</title> | 3254 | <title>Example of Hardware Constraints for Channels</title> |
3255 | <programlisting> | 3255 | <programlisting> |
3256 | <![CDATA[ | 3256 | <![CDATA[ |
3257 | static int hw_rule_format_by_channels(struct snd_pcm_hw_params *params, | 3257 | static int hw_rule_format_by_channels(struct snd_pcm_hw_params *params, |
3258 | struct snd_pcm_hw_rule *rule) | 3258 | struct snd_pcm_hw_rule *rule) |
3259 | { | 3259 | { |
3260 | struct snd_interval *c = hw_param_interval(params, | 3260 | struct snd_interval *c = hw_param_interval(params, |
3261 | SNDRV_PCM_HW_PARAM_CHANNELS); | 3261 | SNDRV_PCM_HW_PARAM_CHANNELS); |
3262 | struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); | 3262 | struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); |
3263 | struct snd_mask fmt; | 3263 | struct snd_mask fmt; |
3264 | 3264 | ||
3265 | snd_mask_any(&fmt); /* Init the struct */ | 3265 | snd_mask_any(&fmt); /* Init the struct */ |
3266 | if (c->min < 2) { | 3266 | if (c->min < 2) { |
3267 | fmt.bits[0] &= SNDRV_PCM_FMTBIT_S16_LE; | 3267 | fmt.bits[0] &= SNDRV_PCM_FMTBIT_S16_LE; |
3268 | return snd_mask_refine(f, &fmt); | 3268 | return snd_mask_refine(f, &fmt); |
3269 | } | 3269 | } |
3270 | return 0; | 3270 | return 0; |
3271 | } | 3271 | } |
3272 | ]]> | 3272 | ]]> |
3273 | </programlisting> | 3273 | </programlisting> |
3274 | </example> | 3274 | </example> |
3275 | </para> | 3275 | </para> |
3276 | 3276 | ||
3277 | <para> | 3277 | <para> |
3278 | Then you need to call this function to add your rule: | 3278 | Then you need to call this function to add your rule: |
3279 | 3279 | ||
3280 | <informalexample> | 3280 | <informalexample> |
3281 | <programlisting> | 3281 | <programlisting> |
3282 | <![CDATA[ | 3282 | <![CDATA[ |
3283 | snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_CHANNELS, | 3283 | snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_CHANNELS, |
3284 | hw_rule_channels_by_format, 0, SNDRV_PCM_HW_PARAM_FORMAT, | 3284 | hw_rule_channels_by_format, 0, SNDRV_PCM_HW_PARAM_FORMAT, |
3285 | -1); | 3285 | -1); |
3286 | ]]> | 3286 | ]]> |
3287 | </programlisting> | 3287 | </programlisting> |
3288 | </informalexample> | 3288 | </informalexample> |
3289 | </para> | 3289 | </para> |
3290 | 3290 | ||
3291 | <para> | 3291 | <para> |
3292 | The rule function is called when an application sets the number of | 3292 | The rule function is called when an application sets the number of |
3293 | channels. But an application can set the format before the number of | 3293 | channels. But an application can set the format before the number of |
3294 | channels. Thus you also need to define the inverse rule: | 3294 | channels. Thus you also need to define the inverse rule: |
3295 | 3295 | ||
3296 | <example> | 3296 | <example> |
3297 | <title>Example of Hardware Constraints for Channels</title> | 3297 | <title>Example of Hardware Constraints for Channels</title> |
3298 | <programlisting> | 3298 | <programlisting> |
3299 | <![CDATA[ | 3299 | <![CDATA[ |
3300 | static int hw_rule_channels_by_format(struct snd_pcm_hw_params *params, | 3300 | static int hw_rule_channels_by_format(struct snd_pcm_hw_params *params, |
3301 | struct snd_pcm_hw_rule *rule) | 3301 | struct snd_pcm_hw_rule *rule) |
3302 | { | 3302 | { |
3303 | struct snd_interval *c = hw_param_interval(params, | 3303 | struct snd_interval *c = hw_param_interval(params, |
3304 | SNDRV_PCM_HW_PARAM_CHANNELS); | 3304 | SNDRV_PCM_HW_PARAM_CHANNELS); |
3305 | struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); | 3305 | struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); |
3306 | struct snd_interval ch; | 3306 | struct snd_interval ch; |
3307 | 3307 | ||
3308 | snd_interval_any(&ch); | 3308 | snd_interval_any(&ch); |
3309 | if (f->bits[0] == SNDRV_PCM_FMTBIT_S16_LE) { | 3309 | if (f->bits[0] == SNDRV_PCM_FMTBIT_S16_LE) { |
3310 | ch.min = ch.max = 1; | 3310 | ch.min = ch.max = 1; |
3311 | ch.integer = 1; | 3311 | ch.integer = 1; |
3312 | return snd_interval_refine(c, &ch); | 3312 | return snd_interval_refine(c, &ch); |
3313 | } | 3313 | } |
3314 | return 0; | 3314 | return 0; |
3315 | } | 3315 | } |
3316 | ]]> | 3316 | ]]> |
3317 | </programlisting> | 3317 | </programlisting> |
3318 | </example> | 3318 | </example> |
3319 | </para> | 3319 | </para> |
3320 | 3320 | ||
3321 | <para> | 3321 | <para> |
3322 | ...and in the open callback: | 3322 | ...and in the open callback: |
3323 | <informalexample> | 3323 | <informalexample> |
3324 | <programlisting> | 3324 | <programlisting> |
3325 | <![CDATA[ | 3325 | <![CDATA[ |
3326 | snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_FORMAT, | 3326 | snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_FORMAT, |
3327 | hw_rule_format_by_channels, 0, SNDRV_PCM_HW_PARAM_CHANNELS, | 3327 | hw_rule_format_by_channels, 0, SNDRV_PCM_HW_PARAM_CHANNELS, |
3328 | -1); | 3328 | -1); |
3329 | ]]> | 3329 | ]]> |
3330 | </programlisting> | 3330 | </programlisting> |
3331 | </informalexample> | 3331 | </informalexample> |
3332 | </para> | 3332 | </para> |
3333 | 3333 | ||
3334 | <para> | 3334 | <para> |
3335 | I won't explain more details here, rather I | 3335 | I won't explain more details here, rather I |
3336 | would like to say, <quote>Luke, use the source.</quote> | 3336 | would like to say, <quote>Luke, use the source.</quote> |
3337 | </para> | 3337 | </para> |
3338 | </section> | 3338 | </section> |
3339 | 3339 | ||
3340 | </chapter> | 3340 | </chapter> |
3341 | 3341 | ||
3342 | 3342 | ||
3343 | <!-- ****************************************************** --> | 3343 | <!-- ****************************************************** --> |
3344 | <!-- Control Interface --> | 3344 | <!-- Control Interface --> |
3345 | <!-- ****************************************************** --> | 3345 | <!-- ****************************************************** --> |
3346 | <chapter id="control-interface"> | 3346 | <chapter id="control-interface"> |
3347 | <title>Control Interface</title> | 3347 | <title>Control Interface</title> |
3348 | 3348 | ||
3349 | <section id="control-interface-general"> | 3349 | <section id="control-interface-general"> |
3350 | <title>General</title> | 3350 | <title>General</title> |
3351 | <para> | 3351 | <para> |
3352 | The control interface is used widely for many switches, | 3352 | The control interface is used widely for many switches, |
3353 | sliders, etc. which are accessed from the user-space. Its most | 3353 | sliders, etc. which are accessed from the user-space. Its most |
3354 | important use is the mixer interface. In other words, on ALSA | 3354 | important use is the mixer interface. In other words, on ALSA |
3355 | 0.9.x, all the mixer stuff is implemented on the control kernel | 3355 | 0.9.x, all the mixer stuff is implemented on the control kernel |
3356 | API (while there was an independent mixer kernel API on 0.5.x). | 3356 | API (while there was an independent mixer kernel API on 0.5.x). |
3357 | </para> | 3357 | </para> |
3358 | 3358 | ||
3359 | <para> | 3359 | <para> |
3360 | ALSA has a well-defined AC97 control module. If your chip | 3360 | ALSA has a well-defined AC97 control module. If your chip |
3361 | supports only the AC97 and nothing else, you can skip this | 3361 | supports only the AC97 and nothing else, you can skip this |
3362 | section. | 3362 | section. |
3363 | </para> | 3363 | </para> |
3364 | 3364 | ||
3365 | <para> | 3365 | <para> |
3366 | The control API is defined in | 3366 | The control API is defined in |
3367 | <filename><sound/control.h></filename>. | 3367 | <filename><sound/control.h></filename>. |
3368 | Include this file if you add your own controls. | 3368 | Include this file if you add your own controls. |
3369 | </para> | 3369 | </para> |
3370 | </section> | 3370 | </section> |
3371 | 3371 | ||
3372 | <section id="control-interface-definition"> | 3372 | <section id="control-interface-definition"> |
3373 | <title>Definition of Controls</title> | 3373 | <title>Definition of Controls</title> |
3374 | <para> | 3374 | <para> |
3375 | For creating a new control, you need to define the three | 3375 | For creating a new control, you need to define the three |
3376 | callbacks: <structfield>info</structfield>, | 3376 | callbacks: <structfield>info</structfield>, |
3377 | <structfield>get</structfield> and | 3377 | <structfield>get</structfield> and |
3378 | <structfield>put</structfield>. Then, define a | 3378 | <structfield>put</structfield>. Then, define a |
3379 | struct <structname>snd_kcontrol_new</structname> record, such as: | 3379 | struct <structname>snd_kcontrol_new</structname> record, such as: |
3380 | 3380 | ||
3381 | <example> | 3381 | <example> |
3382 | <title>Definition of a Control</title> | 3382 | <title>Definition of a Control</title> |
3383 | <programlisting> | 3383 | <programlisting> |
3384 | <![CDATA[ | 3384 | <![CDATA[ |
3385 | static struct snd_kcontrol_new my_control __devinitdata = { | 3385 | static struct snd_kcontrol_new my_control __devinitdata = { |
3386 | .iface = SNDRV_CTL_ELEM_IFACE_MIXER, | 3386 | .iface = SNDRV_CTL_ELEM_IFACE_MIXER, |
3387 | .name = "PCM Playback Switch", | 3387 | .name = "PCM Playback Switch", |
3388 | .index = 0, | 3388 | .index = 0, |
3389 | .access = SNDRV_CTL_ELEM_ACCESS_READWRITE, | 3389 | .access = SNDRV_CTL_ELEM_ACCESS_READWRITE, |
3390 | .private_value = 0xffff, | 3390 | .private_value = 0xffff, |
3391 | .info = my_control_info, | 3391 | .info = my_control_info, |
3392 | .get = my_control_get, | 3392 | .get = my_control_get, |
3393 | .put = my_control_put | 3393 | .put = my_control_put |
3394 | }; | 3394 | }; |
3395 | ]]> | 3395 | ]]> |
3396 | </programlisting> | 3396 | </programlisting> |
3397 | </example> | 3397 | </example> |
3398 | </para> | 3398 | </para> |
3399 | 3399 | ||
3400 | <para> | 3400 | <para> |
3401 | Most likely the control is created via | 3401 | Most likely the control is created via |
3402 | <function>snd_ctl_new1()</function>, and in such a case, you can | 3402 | <function>snd_ctl_new1()</function>, and in such a case, you can |
3403 | add <parameter>__devinitdata</parameter> prefix to the | 3403 | add <parameter>__devinitdata</parameter> prefix to the |
3404 | definition like above. | 3404 | definition like above. |
3405 | </para> | 3405 | </para> |
3406 | 3406 | ||
3407 | <para> | 3407 | <para> |
3408 | The <structfield>iface</structfield> field specifies the type of | 3408 | The <structfield>iface</structfield> field specifies the type of |
3409 | the control, <constant>SNDRV_CTL_ELEM_IFACE_XXX</constant>, which | 3409 | the control, <constant>SNDRV_CTL_ELEM_IFACE_XXX</constant>, which |
3410 | is usually <constant>MIXER</constant>. | 3410 | is usually <constant>MIXER</constant>. |
3411 | Use <constant>CARD</constant> for global controls that are not | 3411 | Use <constant>CARD</constant> for global controls that are not |
3412 | logically part of the mixer. | 3412 | logically part of the mixer. |
3413 | If the control is closely associated with some specific device on | 3413 | If the control is closely associated with some specific device on |
3414 | the sound card, use <constant>HWDEP</constant>, | 3414 | the sound card, use <constant>HWDEP</constant>, |
3415 | <constant>PCM</constant>, <constant>RAWMIDI</constant>, | 3415 | <constant>PCM</constant>, <constant>RAWMIDI</constant>, |
3416 | <constant>TIMER</constant>, or <constant>SEQUENCER</constant>, and | 3416 | <constant>TIMER</constant>, or <constant>SEQUENCER</constant>, and |
3417 | specify the device number with the | 3417 | specify the device number with the |
3418 | <structfield>device</structfield> and | 3418 | <structfield>device</structfield> and |
3419 | <structfield>subdevice</structfield> fields. | 3419 | <structfield>subdevice</structfield> fields. |
3420 | </para> | 3420 | </para> |
3421 | 3421 | ||
3422 | <para> | 3422 | <para> |
3423 | The <structfield>name</structfield> is the name identifier | 3423 | The <structfield>name</structfield> is the name identifier |
3424 | string. On ALSA 0.9.x, the control name is very important, | 3424 | string. On ALSA 0.9.x, the control name is very important, |
3425 | because its role is classified from its name. There are | 3425 | because its role is classified from its name. There are |
3426 | pre-defined standard control names. The details are described in | 3426 | pre-defined standard control names. The details are described in |
3427 | the subsection | 3427 | the subsection |
3428 | <link linkend="control-interface-control-names"><citetitle> | 3428 | <link linkend="control-interface-control-names"><citetitle> |
3429 | Control Names</citetitle></link>. | 3429 | Control Names</citetitle></link>. |
3430 | </para> | 3430 | </para> |
3431 | 3431 | ||
3432 | <para> | 3432 | <para> |
3433 | The <structfield>index</structfield> field holds the index number | 3433 | The <structfield>index</structfield> field holds the index number |
3434 | of this control. If there are several different controls with | 3434 | of this control. If there are several different controls with |
3435 | the same name, they can be distinguished by the index | 3435 | the same name, they can be distinguished by the index |
3436 | number. This is the case when | 3436 | number. This is the case when |
3437 | several codecs exist on the card. If the index is zero, you can | 3437 | several codecs exist on the card. If the index is zero, you can |
3438 | omit the definition above. | 3438 | omit the definition above. |
3439 | </para> | 3439 | </para> |
3440 | 3440 | ||
3441 | <para> | 3441 | <para> |
3442 | The <structfield>access</structfield> field contains the access | 3442 | The <structfield>access</structfield> field contains the access |
3443 | type of this control. Give the combination of bit masks, | 3443 | type of this control. Give the combination of bit masks, |
3444 | <constant>SNDRV_CTL_ELEM_ACCESS_XXX</constant>, there. | 3444 | <constant>SNDRV_CTL_ELEM_ACCESS_XXX</constant>, there. |
3445 | The detailed will be explained in the subsection | 3445 | The detailed will be explained in the subsection |
3446 | <link linkend="control-interface-access-flags"><citetitle> | 3446 | <link linkend="control-interface-access-flags"><citetitle> |
3447 | Access Flags</citetitle></link>. | 3447 | Access Flags</citetitle></link>. |
3448 | </para> | 3448 | </para> |
3449 | 3449 | ||
3450 | <para> | 3450 | <para> |
3451 | The <structfield>private_value</structfield> field contains | 3451 | The <structfield>private_value</structfield> field contains |
3452 | an arbitrary long integer value for this record. When using | 3452 | an arbitrary long integer value for this record. When using |
3453 | generic <structfield>info</structfield>, | 3453 | generic <structfield>info</structfield>, |
3454 | <structfield>get</structfield> and | 3454 | <structfield>get</structfield> and |
3455 | <structfield>put</structfield> callbacks, you can pass a value | 3455 | <structfield>put</structfield> callbacks, you can pass a value |
3456 | through this field. If several small numbers are necessary, you can | 3456 | through this field. If several small numbers are necessary, you can |
3457 | combine them in bitwise. Or, it's possible to give a pointer | 3457 | combine them in bitwise. Or, it's possible to give a pointer |
3458 | (casted to unsigned long) of some record to this field, too. | 3458 | (casted to unsigned long) of some record to this field, too. |
3459 | </para> | 3459 | </para> |
3460 | 3460 | ||
3461 | <para> | 3461 | <para> |
3462 | The other three are | 3462 | The other three are |
3463 | <link linkend="control-interface-callbacks"><citetitle> | 3463 | <link linkend="control-interface-callbacks"><citetitle> |
3464 | callback functions</citetitle></link>. | 3464 | callback functions</citetitle></link>. |
3465 | </para> | 3465 | </para> |
3466 | </section> | 3466 | </section> |
3467 | 3467 | ||
3468 | <section id="control-interface-control-names"> | 3468 | <section id="control-interface-control-names"> |
3469 | <title>Control Names</title> | 3469 | <title>Control Names</title> |
3470 | <para> | 3470 | <para> |
3471 | There are some standards for defining the control names. A | 3471 | There are some standards for defining the control names. A |
3472 | control is usually defined from the three parts as | 3472 | control is usually defined from the three parts as |
3473 | <quote>SOURCE DIRECTION FUNCTION</quote>. | 3473 | <quote>SOURCE DIRECTION FUNCTION</quote>. |
3474 | </para> | 3474 | </para> |
3475 | 3475 | ||
3476 | <para> | 3476 | <para> |
3477 | The first, <constant>SOURCE</constant>, specifies the source | 3477 | The first, <constant>SOURCE</constant>, specifies the source |
3478 | of the control, and is a string such as <quote>Master</quote>, | 3478 | of the control, and is a string such as <quote>Master</quote>, |
3479 | <quote>PCM</quote>, <quote>CD</quote> or | 3479 | <quote>PCM</quote>, <quote>CD</quote> or |
3480 | <quote>Line</quote>. There are many pre-defined sources. | 3480 | <quote>Line</quote>. There are many pre-defined sources. |
3481 | </para> | 3481 | </para> |
3482 | 3482 | ||
3483 | <para> | 3483 | <para> |
3484 | The second, <constant>DIRECTION</constant>, is one of the | 3484 | The second, <constant>DIRECTION</constant>, is one of the |
3485 | following strings according to the direction of the control: | 3485 | following strings according to the direction of the control: |
3486 | <quote>Playback</quote>, <quote>Capture</quote>, <quote>Bypass | 3486 | <quote>Playback</quote>, <quote>Capture</quote>, <quote>Bypass |
3487 | Playback</quote> and <quote>Bypass Capture</quote>. Or, it can | 3487 | Playback</quote> and <quote>Bypass Capture</quote>. Or, it can |
3488 | be omitted, meaning both playback and capture directions. | 3488 | be omitted, meaning both playback and capture directions. |
3489 | </para> | 3489 | </para> |
3490 | 3490 | ||
3491 | <para> | 3491 | <para> |
3492 | The third, <constant>FUNCTION</constant>, is one of the | 3492 | The third, <constant>FUNCTION</constant>, is one of the |
3493 | following strings according to the function of the control: | 3493 | following strings according to the function of the control: |
3494 | <quote>Switch</quote>, <quote>Volume</quote> and | 3494 | <quote>Switch</quote>, <quote>Volume</quote> and |
3495 | <quote>Route</quote>. | 3495 | <quote>Route</quote>. |
3496 | </para> | 3496 | </para> |
3497 | 3497 | ||
3498 | <para> | 3498 | <para> |
3499 | The example of control names are, thus, <quote>Master Capture | 3499 | The example of control names are, thus, <quote>Master Capture |
3500 | Switch</quote> or <quote>PCM Playback Volume</quote>. | 3500 | Switch</quote> or <quote>PCM Playback Volume</quote>. |
3501 | </para> | 3501 | </para> |
3502 | 3502 | ||
3503 | <para> | 3503 | <para> |
3504 | There are some exceptions: | 3504 | There are some exceptions: |
3505 | </para> | 3505 | </para> |
3506 | 3506 | ||
3507 | <section id="control-interface-control-names-global"> | 3507 | <section id="control-interface-control-names-global"> |
3508 | <title>Global capture and playback</title> | 3508 | <title>Global capture and playback</title> |
3509 | <para> | 3509 | <para> |
3510 | <quote>Capture Source</quote>, <quote>Capture Switch</quote> | 3510 | <quote>Capture Source</quote>, <quote>Capture Switch</quote> |
3511 | and <quote>Capture Volume</quote> are used for the global | 3511 | and <quote>Capture Volume</quote> are used for the global |
3512 | capture (input) source, switch and volume. Similarly, | 3512 | capture (input) source, switch and volume. Similarly, |
3513 | <quote>Playback Switch</quote> and <quote>Playback | 3513 | <quote>Playback Switch</quote> and <quote>Playback |
3514 | Volume</quote> are used for the global output gain switch and | 3514 | Volume</quote> are used for the global output gain switch and |
3515 | volume. | 3515 | volume. |
3516 | </para> | 3516 | </para> |
3517 | </section> | 3517 | </section> |
3518 | 3518 | ||
3519 | <section id="control-interface-control-names-tone"> | 3519 | <section id="control-interface-control-names-tone"> |
3520 | <title>Tone-controls</title> | 3520 | <title>Tone-controls</title> |
3521 | <para> | 3521 | <para> |
3522 | tone-control switch and volumes are specified like | 3522 | tone-control switch and volumes are specified like |
3523 | <quote>Tone Control - XXX</quote>, e.g. <quote>Tone Control - | 3523 | <quote>Tone Control - XXX</quote>, e.g. <quote>Tone Control - |
3524 | Switch</quote>, <quote>Tone Control - Bass</quote>, | 3524 | Switch</quote>, <quote>Tone Control - Bass</quote>, |
3525 | <quote>Tone Control - Center</quote>. | 3525 | <quote>Tone Control - Center</quote>. |
3526 | </para> | 3526 | </para> |
3527 | </section> | 3527 | </section> |
3528 | 3528 | ||
3529 | <section id="control-interface-control-names-3d"> | 3529 | <section id="control-interface-control-names-3d"> |
3530 | <title>3D controls</title> | 3530 | <title>3D controls</title> |
3531 | <para> | 3531 | <para> |
3532 | 3D-control switches and volumes are specified like <quote>3D | 3532 | 3D-control switches and volumes are specified like <quote>3D |
3533 | Control - XXX</quote>, e.g. <quote>3D Control - | 3533 | Control - XXX</quote>, e.g. <quote>3D Control - |
3534 | Switch</quote>, <quote>3D Control - Center</quote>, <quote>3D | 3534 | Switch</quote>, <quote>3D Control - Center</quote>, <quote>3D |
3535 | Control - Space</quote>. | 3535 | Control - Space</quote>. |
3536 | </para> | 3536 | </para> |
3537 | </section> | 3537 | </section> |
3538 | 3538 | ||
3539 | <section id="control-interface-control-names-mic"> | 3539 | <section id="control-interface-control-names-mic"> |
3540 | <title>Mic boost</title> | 3540 | <title>Mic boost</title> |
3541 | <para> | 3541 | <para> |
3542 | Mic-boost switch is set as <quote>Mic Boost</quote> or | 3542 | Mic-boost switch is set as <quote>Mic Boost</quote> or |
3543 | <quote>Mic Boost (6dB)</quote>. | 3543 | <quote>Mic Boost (6dB)</quote>. |
3544 | </para> | 3544 | </para> |
3545 | 3545 | ||
3546 | <para> | 3546 | <para> |
3547 | More precise information can be found in | 3547 | More precise information can be found in |
3548 | <filename>Documentation/sound/alsa/ControlNames.txt</filename>. | 3548 | <filename>Documentation/sound/alsa/ControlNames.txt</filename>. |
3549 | </para> | 3549 | </para> |
3550 | </section> | 3550 | </section> |
3551 | </section> | 3551 | </section> |
3552 | 3552 | ||
3553 | <section id="control-interface-access-flags"> | 3553 | <section id="control-interface-access-flags"> |
3554 | <title>Access Flags</title> | 3554 | <title>Access Flags</title> |
3555 | 3555 | ||
3556 | <para> | 3556 | <para> |
3557 | The access flag is the bit-flags which specifies the access type | 3557 | The access flag is the bit-flags which specifies the access type |
3558 | of the given control. The default access type is | 3558 | of the given control. The default access type is |
3559 | <constant>SNDRV_CTL_ELEM_ACCESS_READWRITE</constant>, | 3559 | <constant>SNDRV_CTL_ELEM_ACCESS_READWRITE</constant>, |
3560 | which means both read and write are allowed to this control. | 3560 | which means both read and write are allowed to this control. |
3561 | When the access flag is omitted (i.e. = 0), it is | 3561 | When the access flag is omitted (i.e. = 0), it is |
3562 | regarded as <constant>READWRITE</constant> access as default. | 3562 | regarded as <constant>READWRITE</constant> access as default. |
3563 | </para> | 3563 | </para> |
3564 | 3564 | ||
3565 | <para> | 3565 | <para> |
3566 | When the control is read-only, pass | 3566 | When the control is read-only, pass |
3567 | <constant>SNDRV_CTL_ELEM_ACCESS_READ</constant> instead. | 3567 | <constant>SNDRV_CTL_ELEM_ACCESS_READ</constant> instead. |
3568 | In this case, you don't have to define | 3568 | In this case, you don't have to define |
3569 | <structfield>put</structfield> callback. | 3569 | <structfield>put</structfield> callback. |
3570 | Similarly, when the control is write-only (although it's a rare | 3570 | Similarly, when the control is write-only (although it's a rare |
3571 | case), you can use <constant>WRITE</constant> flag instead, and | 3571 | case), you can use <constant>WRITE</constant> flag instead, and |
3572 | you don't need <structfield>get</structfield> callback. | 3572 | you don't need <structfield>get</structfield> callback. |
3573 | </para> | 3573 | </para> |
3574 | 3574 | ||
3575 | <para> | 3575 | <para> |
3576 | If the control value changes frequently (e.g. the VU meter), | 3576 | If the control value changes frequently (e.g. the VU meter), |
3577 | <constant>VOLATILE</constant> flag should be given. This means | 3577 | <constant>VOLATILE</constant> flag should be given. This means |
3578 | that the control may be changed without | 3578 | that the control may be changed without |
3579 | <link linkend="control-interface-change-notification"><citetitle> | 3579 | <link linkend="control-interface-change-notification"><citetitle> |
3580 | notification</citetitle></link>. Applications should poll such | 3580 | notification</citetitle></link>. Applications should poll such |
3581 | a control constantly. | 3581 | a control constantly. |
3582 | </para> | 3582 | </para> |
3583 | 3583 | ||
3584 | <para> | 3584 | <para> |
3585 | When the control is inactive, set | 3585 | When the control is inactive, set |
3586 | <constant>INACTIVE</constant> flag, too. | 3586 | <constant>INACTIVE</constant> flag, too. |
3587 | There are <constant>LOCK</constant> and | 3587 | There are <constant>LOCK</constant> and |
3588 | <constant>OWNER</constant> flags for changing the write | 3588 | <constant>OWNER</constant> flags for changing the write |
3589 | permissions. | 3589 | permissions. |
3590 | </para> | 3590 | </para> |
3591 | 3591 | ||
3592 | </section> | 3592 | </section> |
3593 | 3593 | ||
3594 | <section id="control-interface-callbacks"> | 3594 | <section id="control-interface-callbacks"> |
3595 | <title>Callbacks</title> | 3595 | <title>Callbacks</title> |
3596 | 3596 | ||
3597 | <section id="control-interface-callbacks-info"> | 3597 | <section id="control-interface-callbacks-info"> |
3598 | <title>info callback</title> | 3598 | <title>info callback</title> |
3599 | <para> | 3599 | <para> |
3600 | The <structfield>info</structfield> callback is used to get | 3600 | The <structfield>info</structfield> callback is used to get |
3601 | the detailed information of this control. This must store the | 3601 | the detailed information of this control. This must store the |
3602 | values of the given struct <structname>snd_ctl_elem_info</structname> | 3602 | values of the given struct <structname>snd_ctl_elem_info</structname> |
3603 | object. For example, for a boolean control with a single | 3603 | object. For example, for a boolean control with a single |
3604 | element will be: | 3604 | element will be: |
3605 | 3605 | ||
3606 | <example> | 3606 | <example> |
3607 | <title>Example of info callback</title> | 3607 | <title>Example of info callback</title> |
3608 | <programlisting> | 3608 | <programlisting> |
3609 | <![CDATA[ | 3609 | <![CDATA[ |
3610 | static int snd_myctl_info(struct snd_kcontrol *kcontrol, | 3610 | static int snd_myctl_info(struct snd_kcontrol *kcontrol, |
3611 | struct snd_ctl_elem_info *uinfo) | 3611 | struct snd_ctl_elem_info *uinfo) |
3612 | { | 3612 | { |
3613 | uinfo->type = SNDRV_CTL_ELEM_TYPE_BOOLEAN; | 3613 | uinfo->type = SNDRV_CTL_ELEM_TYPE_BOOLEAN; |
3614 | uinfo->count = 1; | 3614 | uinfo->count = 1; |
3615 | uinfo->value.integer.min = 0; | 3615 | uinfo->value.integer.min = 0; |
3616 | uinfo->value.integer.max = 1; | 3616 | uinfo->value.integer.max = 1; |
3617 | return 0; | 3617 | return 0; |
3618 | } | 3618 | } |
3619 | ]]> | 3619 | ]]> |
3620 | </programlisting> | 3620 | </programlisting> |
3621 | </example> | 3621 | </example> |
3622 | </para> | 3622 | </para> |
3623 | 3623 | ||
3624 | <para> | 3624 | <para> |
3625 | The <structfield>type</structfield> field specifies the type | 3625 | The <structfield>type</structfield> field specifies the type |
3626 | of the control. There are <constant>BOOLEAN</constant>, | 3626 | of the control. There are <constant>BOOLEAN</constant>, |
3627 | <constant>INTEGER</constant>, <constant>ENUMERATED</constant>, | 3627 | <constant>INTEGER</constant>, <constant>ENUMERATED</constant>, |
3628 | <constant>BYTES</constant>, <constant>IEC958</constant> and | 3628 | <constant>BYTES</constant>, <constant>IEC958</constant> and |
3629 | <constant>INTEGER64</constant>. The | 3629 | <constant>INTEGER64</constant>. The |
3630 | <structfield>count</structfield> field specifies the | 3630 | <structfield>count</structfield> field specifies the |
3631 | number of elements in this control. For example, a stereo | 3631 | number of elements in this control. For example, a stereo |
3632 | volume would have count = 2. The | 3632 | volume would have count = 2. The |
3633 | <structfield>value</structfield> field is a union, and | 3633 | <structfield>value</structfield> field is a union, and |
3634 | the values stored are depending on the type. The boolean and | 3634 | the values stored are depending on the type. The boolean and |
3635 | integer are identical. | 3635 | integer are identical. |
3636 | </para> | 3636 | </para> |
3637 | 3637 | ||
3638 | <para> | 3638 | <para> |
3639 | The enumerated type is a bit different from others. You'll | 3639 | The enumerated type is a bit different from others. You'll |
3640 | need to set the string for the currently given item index. | 3640 | need to set the string for the currently given item index. |
3641 | 3641 | ||
3642 | <informalexample> | 3642 | <informalexample> |
3643 | <programlisting> | 3643 | <programlisting> |
3644 | <![CDATA[ | 3644 | <![CDATA[ |
3645 | static int snd_myctl_info(struct snd_kcontrol *kcontrol, | 3645 | static int snd_myctl_info(struct snd_kcontrol *kcontrol, |
3646 | struct snd_ctl_elem_info *uinfo) | 3646 | struct snd_ctl_elem_info *uinfo) |
3647 | { | 3647 | { |
3648 | static char *texts[4] = { | 3648 | static char *texts[4] = { |
3649 | "First", "Second", "Third", "Fourth" | 3649 | "First", "Second", "Third", "Fourth" |
3650 | }; | 3650 | }; |
3651 | uinfo->type = SNDRV_CTL_ELEM_TYPE_ENUMERATED; | 3651 | uinfo->type = SNDRV_CTL_ELEM_TYPE_ENUMERATED; |
3652 | uinfo->count = 1; | 3652 | uinfo->count = 1; |
3653 | uinfo->value.enumerated.items = 4; | 3653 | uinfo->value.enumerated.items = 4; |
3654 | if (uinfo->value.enumerated.item > 3) | 3654 | if (uinfo->value.enumerated.item > 3) |
3655 | uinfo->value.enumerated.item = 3; | 3655 | uinfo->value.enumerated.item = 3; |
3656 | strcpy(uinfo->value.enumerated.name, | 3656 | strcpy(uinfo->value.enumerated.name, |
3657 | texts[uinfo->value.enumerated.item]); | 3657 | texts[uinfo->value.enumerated.item]); |
3658 | return 0; | 3658 | return 0; |
3659 | } | 3659 | } |
3660 | ]]> | 3660 | ]]> |
3661 | </programlisting> | 3661 | </programlisting> |
3662 | </informalexample> | 3662 | </informalexample> |
3663 | </para> | 3663 | </para> |
3664 | </section> | 3664 | </section> |
3665 | 3665 | ||
3666 | <section id="control-interface-callbacks-get"> | 3666 | <section id="control-interface-callbacks-get"> |
3667 | <title>get callback</title> | 3667 | <title>get callback</title> |
3668 | 3668 | ||
3669 | <para> | 3669 | <para> |
3670 | This callback is used to read the current value of the | 3670 | This callback is used to read the current value of the |
3671 | control and to return to the user-space. | 3671 | control and to return to the user-space. |
3672 | </para> | 3672 | </para> |
3673 | 3673 | ||
3674 | <para> | 3674 | <para> |
3675 | For example, | 3675 | For example, |
3676 | 3676 | ||
3677 | <example> | 3677 | <example> |
3678 | <title>Example of get callback</title> | 3678 | <title>Example of get callback</title> |
3679 | <programlisting> | 3679 | <programlisting> |
3680 | <![CDATA[ | 3680 | <![CDATA[ |
3681 | static int snd_myctl_get(struct snd_kcontrol *kcontrol, | 3681 | static int snd_myctl_get(struct snd_kcontrol *kcontrol, |
3682 | struct snd_ctl_elem_value *ucontrol) | 3682 | struct snd_ctl_elem_value *ucontrol) |
3683 | { | 3683 | { |
3684 | struct mychip *chip = snd_kcontrol_chip(kcontrol); | 3684 | struct mychip *chip = snd_kcontrol_chip(kcontrol); |
3685 | ucontrol->value.integer.value[0] = get_some_value(chip); | 3685 | ucontrol->value.integer.value[0] = get_some_value(chip); |
3686 | return 0; | 3686 | return 0; |
3687 | } | 3687 | } |
3688 | ]]> | 3688 | ]]> |
3689 | </programlisting> | 3689 | </programlisting> |
3690 | </example> | 3690 | </example> |
3691 | </para> | 3691 | </para> |
3692 | 3692 | ||
3693 | <para> | 3693 | <para> |
3694 | Here, the chip instance is retrieved via | 3694 | Here, the chip instance is retrieved via |
3695 | <function>snd_kcontrol_chip()</function> macro. This macro | 3695 | <function>snd_kcontrol_chip()</function> macro. This macro |
3696 | just accesses to kcontrol->private_data. The | 3696 | just accesses to kcontrol->private_data. The |
3697 | kcontrol->private_data field is | 3697 | kcontrol->private_data field is |
3698 | given as the argument of <function>snd_ctl_new()</function> | 3698 | given as the argument of <function>snd_ctl_new()</function> |
3699 | (see the later subsection | 3699 | (see the later subsection |
3700 | <link linkend="control-interface-constructor"><citetitle>Constructor</citetitle></link>). | 3700 | <link linkend="control-interface-constructor"><citetitle>Constructor</citetitle></link>). |
3701 | </para> | 3701 | </para> |
3702 | 3702 | ||
3703 | <para> | 3703 | <para> |
3704 | The <structfield>value</structfield> field is depending on | 3704 | The <structfield>value</structfield> field is depending on |
3705 | the type of control as well as on info callback. For example, | 3705 | the type of control as well as on info callback. For example, |
3706 | the sb driver uses this field to store the register offset, | 3706 | the sb driver uses this field to store the register offset, |
3707 | the bit-shift and the bit-mask. The | 3707 | the bit-shift and the bit-mask. The |
3708 | <structfield>private_value</structfield> is set like | 3708 | <structfield>private_value</structfield> is set like |
3709 | <informalexample> | 3709 | <informalexample> |
3710 | <programlisting> | 3710 | <programlisting> |
3711 | <![CDATA[ | 3711 | <![CDATA[ |
3712 | .private_value = reg | (shift << 16) | (mask << 24) | 3712 | .private_value = reg | (shift << 16) | (mask << 24) |
3713 | ]]> | 3713 | ]]> |
3714 | </programlisting> | 3714 | </programlisting> |
3715 | </informalexample> | 3715 | </informalexample> |
3716 | and is retrieved in callbacks like | 3716 | and is retrieved in callbacks like |
3717 | <informalexample> | 3717 | <informalexample> |
3718 | <programlisting> | 3718 | <programlisting> |
3719 | <![CDATA[ | 3719 | <![CDATA[ |
3720 | static int snd_sbmixer_get_single(struct snd_kcontrol *kcontrol, | 3720 | static int snd_sbmixer_get_single(struct snd_kcontrol *kcontrol, |
3721 | struct snd_ctl_elem_value *ucontrol) | 3721 | struct snd_ctl_elem_value *ucontrol) |
3722 | { | 3722 | { |
3723 | int reg = kcontrol->private_value & 0xff; | 3723 | int reg = kcontrol->private_value & 0xff; |
3724 | int shift = (kcontrol->private_value >> 16) & 0xff; | 3724 | int shift = (kcontrol->private_value >> 16) & 0xff; |
3725 | int mask = (kcontrol->private_value >> 24) & 0xff; | 3725 | int mask = (kcontrol->private_value >> 24) & 0xff; |
3726 | .... | 3726 | .... |
3727 | } | 3727 | } |
3728 | ]]> | 3728 | ]]> |
3729 | </programlisting> | 3729 | </programlisting> |
3730 | </informalexample> | 3730 | </informalexample> |
3731 | </para> | 3731 | </para> |
3732 | 3732 | ||
3733 | <para> | 3733 | <para> |
3734 | In <structfield>get</structfield> callback, you have to fill all the elements if the | 3734 | In <structfield>get</structfield> callback, you have to fill all the elements if the |
3735 | control has more than one elements, | 3735 | control has more than one elements, |
3736 | i.e. <structfield>count</structfield> > 1. | 3736 | i.e. <structfield>count</structfield> > 1. |
3737 | In the example above, we filled only one element | 3737 | In the example above, we filled only one element |
3738 | (<structfield>value.integer.value[0]</structfield>) since it's | 3738 | (<structfield>value.integer.value[0]</structfield>) since it's |
3739 | assumed as <structfield>count</structfield> = 1. | 3739 | assumed as <structfield>count</structfield> = 1. |
3740 | </para> | 3740 | </para> |
3741 | </section> | 3741 | </section> |
3742 | 3742 | ||
3743 | <section id="control-interface-callbacks-put"> | 3743 | <section id="control-interface-callbacks-put"> |
3744 | <title>put callback</title> | 3744 | <title>put callback</title> |
3745 | 3745 | ||
3746 | <para> | 3746 | <para> |
3747 | This callback is used to write a value from the user-space. | 3747 | This callback is used to write a value from the user-space. |
3748 | </para> | 3748 | </para> |
3749 | 3749 | ||
3750 | <para> | 3750 | <para> |
3751 | For example, | 3751 | For example, |
3752 | 3752 | ||
3753 | <example> | 3753 | <example> |
3754 | <title>Example of put callback</title> | 3754 | <title>Example of put callback</title> |
3755 | <programlisting> | 3755 | <programlisting> |
3756 | <![CDATA[ | 3756 | <![CDATA[ |
3757 | static int snd_myctl_put(struct snd_kcontrol *kcontrol, | 3757 | static int snd_myctl_put(struct snd_kcontrol *kcontrol, |
3758 | struct snd_ctl_elem_value *ucontrol) | 3758 | struct snd_ctl_elem_value *ucontrol) |
3759 | { | 3759 | { |
3760 | struct mychip *chip = snd_kcontrol_chip(kcontrol); | 3760 | struct mychip *chip = snd_kcontrol_chip(kcontrol); |
3761 | int changed = 0; | 3761 | int changed = 0; |
3762 | if (chip->current_value != | 3762 | if (chip->current_value != |
3763 | ucontrol->value.integer.value[0]) { | 3763 | ucontrol->value.integer.value[0]) { |
3764 | change_current_value(chip, | 3764 | change_current_value(chip, |
3765 | ucontrol->value.integer.value[0]); | 3765 | ucontrol->value.integer.value[0]); |
3766 | changed = 1; | 3766 | changed = 1; |
3767 | } | 3767 | } |
3768 | return changed; | 3768 | return changed; |
3769 | } | 3769 | } |
3770 | ]]> | 3770 | ]]> |
3771 | </programlisting> | 3771 | </programlisting> |
3772 | </example> | 3772 | </example> |
3773 | 3773 | ||
3774 | As seen above, you have to return 1 if the value is | 3774 | As seen above, you have to return 1 if the value is |
3775 | changed. If the value is not changed, return 0 instead. | 3775 | changed. If the value is not changed, return 0 instead. |
3776 | If any fatal error happens, return a negative error code as | 3776 | If any fatal error happens, return a negative error code as |
3777 | usual. | 3777 | usual. |
3778 | </para> | 3778 | </para> |
3779 | 3779 | ||
3780 | <para> | 3780 | <para> |
3781 | Like <structfield>get</structfield> callback, | 3781 | Like <structfield>get</structfield> callback, |
3782 | when the control has more than one elements, | 3782 | when the control has more than one elements, |
3783 | all elemehts must be evaluated in this callback, too. | 3783 | all elemehts must be evaluated in this callback, too. |
3784 | </para> | 3784 | </para> |
3785 | </section> | 3785 | </section> |
3786 | 3786 | ||
3787 | <section id="control-interface-callbacks-all"> | 3787 | <section id="control-interface-callbacks-all"> |
3788 | <title>Callbacks are not atomic</title> | 3788 | <title>Callbacks are not atomic</title> |
3789 | <para> | 3789 | <para> |
3790 | All these three callbacks are basically not atomic. | 3790 | All these three callbacks are basically not atomic. |
3791 | </para> | 3791 | </para> |
3792 | </section> | 3792 | </section> |
3793 | </section> | 3793 | </section> |
3794 | 3794 | ||
3795 | <section id="control-interface-constructor"> | 3795 | <section id="control-interface-constructor"> |
3796 | <title>Constructor</title> | 3796 | <title>Constructor</title> |
3797 | <para> | 3797 | <para> |
3798 | When everything is ready, finally we can create a new | 3798 | When everything is ready, finally we can create a new |
3799 | control. For creating a control, there are two functions to be | 3799 | control. For creating a control, there are two functions to be |
3800 | called, <function>snd_ctl_new1()</function> and | 3800 | called, <function>snd_ctl_new1()</function> and |
3801 | <function>snd_ctl_add()</function>. | 3801 | <function>snd_ctl_add()</function>. |
3802 | </para> | 3802 | </para> |
3803 | 3803 | ||
3804 | <para> | 3804 | <para> |
3805 | In the simplest way, you can do like this: | 3805 | In the simplest way, you can do like this: |
3806 | 3806 | ||
3807 | <informalexample> | 3807 | <informalexample> |
3808 | <programlisting> | 3808 | <programlisting> |
3809 | <![CDATA[ | 3809 | <![CDATA[ |
3810 | if ((err = snd_ctl_add(card, snd_ctl_new1(&my_control, chip))) < 0) | 3810 | if ((err = snd_ctl_add(card, snd_ctl_new1(&my_control, chip))) < 0) |
3811 | return err; | 3811 | return err; |
3812 | ]]> | 3812 | ]]> |
3813 | </programlisting> | 3813 | </programlisting> |
3814 | </informalexample> | 3814 | </informalexample> |
3815 | 3815 | ||
3816 | where <parameter>my_control</parameter> is the | 3816 | where <parameter>my_control</parameter> is the |
3817 | struct <structname>snd_kcontrol_new</structname> object defined above, and chip | 3817 | struct <structname>snd_kcontrol_new</structname> object defined above, and chip |
3818 | is the object pointer to be passed to | 3818 | is the object pointer to be passed to |
3819 | kcontrol->private_data | 3819 | kcontrol->private_data |
3820 | which can be referred in callbacks. | 3820 | which can be referred in callbacks. |
3821 | </para> | 3821 | </para> |
3822 | 3822 | ||
3823 | <para> | 3823 | <para> |
3824 | <function>snd_ctl_new1()</function> allocates a new | 3824 | <function>snd_ctl_new1()</function> allocates a new |
3825 | <structname>snd_kcontrol</structname> instance (that's why the definition | 3825 | <structname>snd_kcontrol</structname> instance (that's why the definition |
3826 | of <parameter>my_control</parameter> can be with | 3826 | of <parameter>my_control</parameter> can be with |
3827 | <parameter>__devinitdata</parameter> | 3827 | <parameter>__devinitdata</parameter> |
3828 | prefix), and <function>snd_ctl_add</function> assigns the given | 3828 | prefix), and <function>snd_ctl_add</function> assigns the given |
3829 | control component to the card. | 3829 | control component to the card. |
3830 | </para> | 3830 | </para> |
3831 | </section> | 3831 | </section> |
3832 | 3832 | ||
3833 | <section id="control-interface-change-notification"> | 3833 | <section id="control-interface-change-notification"> |
3834 | <title>Change Notification</title> | 3834 | <title>Change Notification</title> |
3835 | <para> | 3835 | <para> |
3836 | If you need to change and update a control in the interrupt | 3836 | If you need to change and update a control in the interrupt |
3837 | routine, you can call <function>snd_ctl_notify()</function>. For | 3837 | routine, you can call <function>snd_ctl_notify()</function>. For |
3838 | example, | 3838 | example, |
3839 | 3839 | ||
3840 | <informalexample> | 3840 | <informalexample> |
3841 | <programlisting> | 3841 | <programlisting> |
3842 | <![CDATA[ | 3842 | <![CDATA[ |
3843 | snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_VALUE, id_pointer); | 3843 | snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_VALUE, id_pointer); |
3844 | ]]> | 3844 | ]]> |
3845 | </programlisting> | 3845 | </programlisting> |
3846 | </informalexample> | 3846 | </informalexample> |
3847 | 3847 | ||
3848 | This function takes the card pointer, the event-mask, and the | 3848 | This function takes the card pointer, the event-mask, and the |
3849 | control id pointer for the notification. The event-mask | 3849 | control id pointer for the notification. The event-mask |
3850 | specifies the types of notification, for example, in the above | 3850 | specifies the types of notification, for example, in the above |
3851 | example, the change of control values is notified. | 3851 | example, the change of control values is notified. |
3852 | The id pointer is the pointer of struct <structname>snd_ctl_elem_id</structname> | 3852 | The id pointer is the pointer of struct <structname>snd_ctl_elem_id</structname> |
3853 | to be notified. | 3853 | to be notified. |
3854 | You can find some examples in <filename>es1938.c</filename> or | 3854 | You can find some examples in <filename>es1938.c</filename> or |
3855 | <filename>es1968.c</filename> for hardware volume interrupts. | 3855 | <filename>es1968.c</filename> for hardware volume interrupts. |
3856 | </para> | 3856 | </para> |
3857 | </section> | 3857 | </section> |
3858 | 3858 | ||
3859 | </chapter> | 3859 | </chapter> |
3860 | 3860 | ||
3861 | 3861 | ||
3862 | <!-- ****************************************************** --> | 3862 | <!-- ****************************************************** --> |
3863 | <!-- API for AC97 Codec --> | 3863 | <!-- API for AC97 Codec --> |
3864 | <!-- ****************************************************** --> | 3864 | <!-- ****************************************************** --> |
3865 | <chapter id="api-ac97"> | 3865 | <chapter id="api-ac97"> |
3866 | <title>API for AC97 Codec</title> | 3866 | <title>API for AC97 Codec</title> |
3867 | 3867 | ||
3868 | <section> | 3868 | <section> |
3869 | <title>General</title> | 3869 | <title>General</title> |
3870 | <para> | 3870 | <para> |
3871 | The ALSA AC97 codec layer is a well-defined one, and you don't | 3871 | The ALSA AC97 codec layer is a well-defined one, and you don't |
3872 | have to write many codes to control it. Only low-level control | 3872 | have to write many codes to control it. Only low-level control |
3873 | routines are necessary. The AC97 codec API is defined in | 3873 | routines are necessary. The AC97 codec API is defined in |
3874 | <filename><sound/ac97_codec.h></filename>. | 3874 | <filename><sound/ac97_codec.h></filename>. |
3875 | </para> | 3875 | </para> |
3876 | </section> | 3876 | </section> |
3877 | 3877 | ||
3878 | <section id="api-ac97-example"> | 3878 | <section id="api-ac97-example"> |
3879 | <title>Full Code Example</title> | 3879 | <title>Full Code Example</title> |
3880 | <para> | 3880 | <para> |
3881 | <example> | 3881 | <example> |
3882 | <title>Example of AC97 Interface</title> | 3882 | <title>Example of AC97 Interface</title> |
3883 | <programlisting> | 3883 | <programlisting> |
3884 | <![CDATA[ | 3884 | <![CDATA[ |
3885 | struct mychip { | 3885 | struct mychip { |
3886 | .... | 3886 | .... |
3887 | struct snd_ac97 *ac97; | 3887 | struct snd_ac97 *ac97; |
3888 | .... | 3888 | .... |
3889 | }; | 3889 | }; |
3890 | 3890 | ||
3891 | static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97, | 3891 | static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97, |
3892 | unsigned short reg) | 3892 | unsigned short reg) |
3893 | { | 3893 | { |
3894 | struct mychip *chip = ac97->private_data; | 3894 | struct mychip *chip = ac97->private_data; |
3895 | .... | 3895 | .... |
3896 | // read a register value here from the codec | 3896 | // read a register value here from the codec |
3897 | return the_register_value; | 3897 | return the_register_value; |
3898 | } | 3898 | } |
3899 | 3899 | ||
3900 | static void snd_mychip_ac97_write(struct snd_ac97 *ac97, | 3900 | static void snd_mychip_ac97_write(struct snd_ac97 *ac97, |
3901 | unsigned short reg, unsigned short val) | 3901 | unsigned short reg, unsigned short val) |
3902 | { | 3902 | { |
3903 | struct mychip *chip = ac97->private_data; | 3903 | struct mychip *chip = ac97->private_data; |
3904 | .... | 3904 | .... |
3905 | // write the given register value to the codec | 3905 | // write the given register value to the codec |
3906 | } | 3906 | } |
3907 | 3907 | ||
3908 | static int snd_mychip_ac97(struct mychip *chip) | 3908 | static int snd_mychip_ac97(struct mychip *chip) |
3909 | { | 3909 | { |
3910 | struct snd_ac97_bus *bus; | 3910 | struct snd_ac97_bus *bus; |
3911 | struct snd_ac97_template ac97; | 3911 | struct snd_ac97_template ac97; |
3912 | int err; | 3912 | int err; |
3913 | static struct snd_ac97_bus_ops ops = { | 3913 | static struct snd_ac97_bus_ops ops = { |
3914 | .write = snd_mychip_ac97_write, | 3914 | .write = snd_mychip_ac97_write, |
3915 | .read = snd_mychip_ac97_read, | 3915 | .read = snd_mychip_ac97_read, |
3916 | }; | 3916 | }; |
3917 | 3917 | ||
3918 | if ((err = snd_ac97_bus(chip->card, 0, &ops, NULL, &bus)) < 0) | 3918 | if ((err = snd_ac97_bus(chip->card, 0, &ops, NULL, &bus)) < 0) |
3919 | return err; | 3919 | return err; |
3920 | memset(&ac97, 0, sizeof(ac97)); | 3920 | memset(&ac97, 0, sizeof(ac97)); |
3921 | ac97.private_data = chip; | 3921 | ac97.private_data = chip; |
3922 | return snd_ac97_mixer(bus, &ac97, &chip->ac97); | 3922 | return snd_ac97_mixer(bus, &ac97, &chip->ac97); |
3923 | } | 3923 | } |
3924 | 3924 | ||
3925 | ]]> | 3925 | ]]> |
3926 | </programlisting> | 3926 | </programlisting> |
3927 | </example> | 3927 | </example> |
3928 | </para> | 3928 | </para> |
3929 | </section> | 3929 | </section> |
3930 | 3930 | ||
3931 | <section id="api-ac97-constructor"> | 3931 | <section id="api-ac97-constructor"> |
3932 | <title>Constructor</title> | 3932 | <title>Constructor</title> |
3933 | <para> | 3933 | <para> |
3934 | For creating an ac97 instance, first call <function>snd_ac97_bus</function> | 3934 | For creating an ac97 instance, first call <function>snd_ac97_bus</function> |
3935 | with an <type>ac97_bus_ops_t</type> record with callback functions. | 3935 | with an <type>ac97_bus_ops_t</type> record with callback functions. |
3936 | 3936 | ||
3937 | <informalexample> | 3937 | <informalexample> |
3938 | <programlisting> | 3938 | <programlisting> |
3939 | <![CDATA[ | 3939 | <![CDATA[ |
3940 | struct snd_ac97_bus *bus; | 3940 | struct snd_ac97_bus *bus; |
3941 | static struct snd_ac97_bus_ops ops = { | 3941 | static struct snd_ac97_bus_ops ops = { |
3942 | .write = snd_mychip_ac97_write, | 3942 | .write = snd_mychip_ac97_write, |
3943 | .read = snd_mychip_ac97_read, | 3943 | .read = snd_mychip_ac97_read, |
3944 | }; | 3944 | }; |
3945 | 3945 | ||
3946 | snd_ac97_bus(card, 0, &ops, NULL, &pbus); | 3946 | snd_ac97_bus(card, 0, &ops, NULL, &pbus); |
3947 | ]]> | 3947 | ]]> |
3948 | </programlisting> | 3948 | </programlisting> |
3949 | </informalexample> | 3949 | </informalexample> |
3950 | 3950 | ||
3951 | The bus record is shared among all belonging ac97 instances. | 3951 | The bus record is shared among all belonging ac97 instances. |
3952 | </para> | 3952 | </para> |
3953 | 3953 | ||
3954 | <para> | 3954 | <para> |
3955 | And then call <function>snd_ac97_mixer()</function> with an | 3955 | And then call <function>snd_ac97_mixer()</function> with an |
3956 | struct <structname>snd_ac97_template</structname> | 3956 | struct <structname>snd_ac97_template</structname> |
3957 | record together with the bus pointer created above. | 3957 | record together with the bus pointer created above. |
3958 | 3958 | ||
3959 | <informalexample> | 3959 | <informalexample> |
3960 | <programlisting> | 3960 | <programlisting> |
3961 | <![CDATA[ | 3961 | <![CDATA[ |
3962 | struct snd_ac97_template ac97; | 3962 | struct snd_ac97_template ac97; |
3963 | int err; | 3963 | int err; |
3964 | 3964 | ||
3965 | memset(&ac97, 0, sizeof(ac97)); | 3965 | memset(&ac97, 0, sizeof(ac97)); |
3966 | ac97.private_data = chip; | 3966 | ac97.private_data = chip; |
3967 | snd_ac97_mixer(bus, &ac97, &chip->ac97); | 3967 | snd_ac97_mixer(bus, &ac97, &chip->ac97); |
3968 | ]]> | 3968 | ]]> |
3969 | </programlisting> | 3969 | </programlisting> |
3970 | </informalexample> | 3970 | </informalexample> |
3971 | 3971 | ||
3972 | where chip->ac97 is the pointer of a newly created | 3972 | where chip->ac97 is the pointer of a newly created |
3973 | <type>ac97_t</type> instance. | 3973 | <type>ac97_t</type> instance. |
3974 | In this case, the chip pointer is set as the private data, so that | 3974 | In this case, the chip pointer is set as the private data, so that |
3975 | the read/write callback functions can refer to this chip instance. | 3975 | the read/write callback functions can refer to this chip instance. |
3976 | This instance is not necessarily stored in the chip | 3976 | This instance is not necessarily stored in the chip |
3977 | record. When you need to change the register values from the | 3977 | record. When you need to change the register values from the |
3978 | driver, or need the suspend/resume of ac97 codecs, keep this | 3978 | driver, or need the suspend/resume of ac97 codecs, keep this |
3979 | pointer to pass to the corresponding functions. | 3979 | pointer to pass to the corresponding functions. |
3980 | </para> | 3980 | </para> |
3981 | </section> | 3981 | </section> |
3982 | 3982 | ||
3983 | <section id="api-ac97-callbacks"> | 3983 | <section id="api-ac97-callbacks"> |
3984 | <title>Callbacks</title> | 3984 | <title>Callbacks</title> |
3985 | <para> | 3985 | <para> |
3986 | The standard callbacks are <structfield>read</structfield> and | 3986 | The standard callbacks are <structfield>read</structfield> and |
3987 | <structfield>write</structfield>. Obviously they | 3987 | <structfield>write</structfield>. Obviously they |
3988 | correspond to the functions for read and write accesses to the | 3988 | correspond to the functions for read and write accesses to the |
3989 | hardware low-level codes. | 3989 | hardware low-level codes. |
3990 | </para> | 3990 | </para> |
3991 | 3991 | ||
3992 | <para> | 3992 | <para> |
3993 | The <structfield>read</structfield> callback returns the | 3993 | The <structfield>read</structfield> callback returns the |
3994 | register value specified in the argument. | 3994 | register value specified in the argument. |
3995 | 3995 | ||
3996 | <informalexample> | 3996 | <informalexample> |
3997 | <programlisting> | 3997 | <programlisting> |
3998 | <![CDATA[ | 3998 | <![CDATA[ |
3999 | static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97, | 3999 | static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97, |
4000 | unsigned short reg) | 4000 | unsigned short reg) |
4001 | { | 4001 | { |
4002 | struct mychip *chip = ac97->private_data; | 4002 | struct mychip *chip = ac97->private_data; |
4003 | .... | 4003 | .... |
4004 | return the_register_value; | 4004 | return the_register_value; |
4005 | } | 4005 | } |
4006 | ]]> | 4006 | ]]> |
4007 | </programlisting> | 4007 | </programlisting> |
4008 | </informalexample> | 4008 | </informalexample> |
4009 | 4009 | ||
4010 | Here, the chip can be cast from ac97->private_data. | 4010 | Here, the chip can be cast from ac97->private_data. |
4011 | </para> | 4011 | </para> |
4012 | 4012 | ||
4013 | <para> | 4013 | <para> |
4014 | Meanwhile, the <structfield>write</structfield> callback is | 4014 | Meanwhile, the <structfield>write</structfield> callback is |
4015 | used to set the register value. | 4015 | used to set the register value. |
4016 | 4016 | ||
4017 | <informalexample> | 4017 | <informalexample> |
4018 | <programlisting> | 4018 | <programlisting> |
4019 | <![CDATA[ | 4019 | <![CDATA[ |
4020 | static void snd_mychip_ac97_write(struct snd_ac97 *ac97, | 4020 | static void snd_mychip_ac97_write(struct snd_ac97 *ac97, |
4021 | unsigned short reg, unsigned short val) | 4021 | unsigned short reg, unsigned short val) |
4022 | ]]> | 4022 | ]]> |
4023 | </programlisting> | 4023 | </programlisting> |
4024 | </informalexample> | 4024 | </informalexample> |
4025 | </para> | 4025 | </para> |
4026 | 4026 | ||
4027 | <para> | 4027 | <para> |
4028 | These callbacks are non-atomic like the callbacks of control API. | 4028 | These callbacks are non-atomic like the callbacks of control API. |
4029 | </para> | 4029 | </para> |
4030 | 4030 | ||
4031 | <para> | 4031 | <para> |
4032 | There are also other callbacks: | 4032 | There are also other callbacks: |
4033 | <structfield>reset</structfield>, | 4033 | <structfield>reset</structfield>, |
4034 | <structfield>wait</structfield> and | 4034 | <structfield>wait</structfield> and |
4035 | <structfield>init</structfield>. | 4035 | <structfield>init</structfield>. |
4036 | </para> | 4036 | </para> |
4037 | 4037 | ||
4038 | <para> | 4038 | <para> |
4039 | The <structfield>reset</structfield> callback is used to reset | 4039 | The <structfield>reset</structfield> callback is used to reset |
4040 | the codec. If the chip requires a special way of reset, you can | 4040 | the codec. If the chip requires a special way of reset, you can |
4041 | define this callback. | 4041 | define this callback. |
4042 | </para> | 4042 | </para> |
4043 | 4043 | ||
4044 | <para> | 4044 | <para> |
4045 | The <structfield>wait</structfield> callback is used for a | 4045 | The <structfield>wait</structfield> callback is used for a |
4046 | certain wait at the standard initialization of the codec. If the | 4046 | certain wait at the standard initialization of the codec. If the |
4047 | chip requires the extra wait-time, define this callback. | 4047 | chip requires the extra wait-time, define this callback. |
4048 | </para> | 4048 | </para> |
4049 | 4049 | ||
4050 | <para> | 4050 | <para> |
4051 | The <structfield>init</structfield> callback is used for | 4051 | The <structfield>init</structfield> callback is used for |
4052 | additional initialization of the codec. | 4052 | additional initialization of the codec. |
4053 | </para> | 4053 | </para> |
4054 | </section> | 4054 | </section> |
4055 | 4055 | ||
4056 | <section id="api-ac97-updating-registers"> | 4056 | <section id="api-ac97-updating-registers"> |
4057 | <title>Updating Registers in The Driver</title> | 4057 | <title>Updating Registers in The Driver</title> |
4058 | <para> | 4058 | <para> |
4059 | If you need to access to the codec from the driver, you can | 4059 | If you need to access to the codec from the driver, you can |
4060 | call the following functions: | 4060 | call the following functions: |
4061 | <function>snd_ac97_write()</function>, | 4061 | <function>snd_ac97_write()</function>, |
4062 | <function>snd_ac97_read()</function>, | 4062 | <function>snd_ac97_read()</function>, |
4063 | <function>snd_ac97_update()</function> and | 4063 | <function>snd_ac97_update()</function> and |
4064 | <function>snd_ac97_update_bits()</function>. | 4064 | <function>snd_ac97_update_bits()</function>. |
4065 | </para> | 4065 | </para> |
4066 | 4066 | ||
4067 | <para> | 4067 | <para> |
4068 | Both <function>snd_ac97_write()</function> and | 4068 | Both <function>snd_ac97_write()</function> and |
4069 | <function>snd_ac97_update()</function> functions are used to | 4069 | <function>snd_ac97_update()</function> functions are used to |
4070 | set a value to the given register | 4070 | set a value to the given register |
4071 | (<constant>AC97_XXX</constant>). The difference between them is | 4071 | (<constant>AC97_XXX</constant>). The difference between them is |
4072 | that <function>snd_ac97_update()</function> doesn't write a | 4072 | that <function>snd_ac97_update()</function> doesn't write a |
4073 | value if the given value has been already set, while | 4073 | value if the given value has been already set, while |
4074 | <function>snd_ac97_write()</function> always rewrites the | 4074 | <function>snd_ac97_write()</function> always rewrites the |
4075 | value. | 4075 | value. |
4076 | 4076 | ||
4077 | <informalexample> | 4077 | <informalexample> |
4078 | <programlisting> | 4078 | <programlisting> |
4079 | <![CDATA[ | 4079 | <![CDATA[ |
4080 | snd_ac97_write(ac97, AC97_MASTER, 0x8080); | 4080 | snd_ac97_write(ac97, AC97_MASTER, 0x8080); |
4081 | snd_ac97_update(ac97, AC97_MASTER, 0x8080); | 4081 | snd_ac97_update(ac97, AC97_MASTER, 0x8080); |
4082 | ]]> | 4082 | ]]> |
4083 | </programlisting> | 4083 | </programlisting> |
4084 | </informalexample> | 4084 | </informalexample> |
4085 | </para> | 4085 | </para> |
4086 | 4086 | ||
4087 | <para> | 4087 | <para> |
4088 | <function>snd_ac97_read()</function> is used to read the value | 4088 | <function>snd_ac97_read()</function> is used to read the value |
4089 | of the given register. For example, | 4089 | of the given register. For example, |
4090 | 4090 | ||
4091 | <informalexample> | 4091 | <informalexample> |
4092 | <programlisting> | 4092 | <programlisting> |
4093 | <![CDATA[ | 4093 | <![CDATA[ |
4094 | value = snd_ac97_read(ac97, AC97_MASTER); | 4094 | value = snd_ac97_read(ac97, AC97_MASTER); |
4095 | ]]> | 4095 | ]]> |
4096 | </programlisting> | 4096 | </programlisting> |
4097 | </informalexample> | 4097 | </informalexample> |
4098 | </para> | 4098 | </para> |
4099 | 4099 | ||
4100 | <para> | 4100 | <para> |
4101 | <function>snd_ac97_update_bits()</function> is used to update | 4101 | <function>snd_ac97_update_bits()</function> is used to update |
4102 | some bits of the given register. | 4102 | some bits of the given register. |
4103 | 4103 | ||
4104 | <informalexample> | 4104 | <informalexample> |
4105 | <programlisting> | 4105 | <programlisting> |
4106 | <![CDATA[ | 4106 | <![CDATA[ |
4107 | snd_ac97_update_bits(ac97, reg, mask, value); | 4107 | snd_ac97_update_bits(ac97, reg, mask, value); |
4108 | ]]> | 4108 | ]]> |
4109 | </programlisting> | 4109 | </programlisting> |
4110 | </informalexample> | 4110 | </informalexample> |
4111 | </para> | 4111 | </para> |
4112 | 4112 | ||
4113 | <para> | 4113 | <para> |
4114 | Also, there is a function to change the sample rate (of a | 4114 | Also, there is a function to change the sample rate (of a |
4115 | certain register such as | 4115 | certain register such as |
4116 | <constant>AC97_PCM_FRONT_DAC_RATE</constant>) when VRA or | 4116 | <constant>AC97_PCM_FRONT_DAC_RATE</constant>) when VRA or |
4117 | DRA is supported by the codec: | 4117 | DRA is supported by the codec: |
4118 | <function>snd_ac97_set_rate()</function>. | 4118 | <function>snd_ac97_set_rate()</function>. |
4119 | 4119 | ||
4120 | <informalexample> | 4120 | <informalexample> |
4121 | <programlisting> | 4121 | <programlisting> |
4122 | <![CDATA[ | 4122 | <![CDATA[ |
4123 | snd_ac97_set_rate(ac97, AC97_PCM_FRONT_DAC_RATE, 44100); | 4123 | snd_ac97_set_rate(ac97, AC97_PCM_FRONT_DAC_RATE, 44100); |
4124 | ]]> | 4124 | ]]> |
4125 | </programlisting> | 4125 | </programlisting> |
4126 | </informalexample> | 4126 | </informalexample> |
4127 | </para> | 4127 | </para> |
4128 | 4128 | ||
4129 | <para> | 4129 | <para> |
4130 | The following registers are available for setting the rate: | 4130 | The following registers are available for setting the rate: |
4131 | <constant>AC97_PCM_MIC_ADC_RATE</constant>, | 4131 | <constant>AC97_PCM_MIC_ADC_RATE</constant>, |
4132 | <constant>AC97_PCM_FRONT_DAC_RATE</constant>, | 4132 | <constant>AC97_PCM_FRONT_DAC_RATE</constant>, |
4133 | <constant>AC97_PCM_LR_ADC_RATE</constant>, | 4133 | <constant>AC97_PCM_LR_ADC_RATE</constant>, |
4134 | <constant>AC97_SPDIF</constant>. When the | 4134 | <constant>AC97_SPDIF</constant>. When the |
4135 | <constant>AC97_SPDIF</constant> is specified, the register is | 4135 | <constant>AC97_SPDIF</constant> is specified, the register is |
4136 | not really changed but the corresponding IEC958 status bits will | 4136 | not really changed but the corresponding IEC958 status bits will |
4137 | be updated. | 4137 | be updated. |
4138 | </para> | 4138 | </para> |
4139 | </section> | 4139 | </section> |
4140 | 4140 | ||
4141 | <section id="api-ac97-clock-adjustment"> | 4141 | <section id="api-ac97-clock-adjustment"> |
4142 | <title>Clock Adjustment</title> | 4142 | <title>Clock Adjustment</title> |
4143 | <para> | 4143 | <para> |
4144 | On some chip, the clock of the codec isn't 48000 but using a | 4144 | On some chip, the clock of the codec isn't 48000 but using a |
4145 | PCI clock (to save a quartz!). In this case, change the field | 4145 | PCI clock (to save a quartz!). In this case, change the field |
4146 | bus->clock to the corresponding | 4146 | bus->clock to the corresponding |
4147 | value. For example, intel8x0 | 4147 | value. For example, intel8x0 |
4148 | and es1968 drivers have the auto-measurement function of the | 4148 | and es1968 drivers have the auto-measurement function of the |
4149 | clock. | 4149 | clock. |
4150 | </para> | 4150 | </para> |
4151 | </section> | 4151 | </section> |
4152 | 4152 | ||
4153 | <section id="api-ac97-proc-files"> | 4153 | <section id="api-ac97-proc-files"> |
4154 | <title>Proc Files</title> | 4154 | <title>Proc Files</title> |
4155 | <para> | 4155 | <para> |
4156 | The ALSA AC97 interface will create a proc file such as | 4156 | The ALSA AC97 interface will create a proc file such as |
4157 | <filename>/proc/asound/card0/codec97#0/ac97#0-0</filename> and | 4157 | <filename>/proc/asound/card0/codec97#0/ac97#0-0</filename> and |
4158 | <filename>ac97#0-0+regs</filename>. You can refer to these files to | 4158 | <filename>ac97#0-0+regs</filename>. You can refer to these files to |
4159 | see the current status and registers of the codec. | 4159 | see the current status and registers of the codec. |
4160 | </para> | 4160 | </para> |
4161 | </section> | 4161 | </section> |
4162 | 4162 | ||
4163 | <section id="api-ac97-multiple-codecs"> | 4163 | <section id="api-ac97-multiple-codecs"> |
4164 | <title>Multiple Codecs</title> | 4164 | <title>Multiple Codecs</title> |
4165 | <para> | 4165 | <para> |
4166 | When there are several codecs on the same card, you need to | 4166 | When there are several codecs on the same card, you need to |
4167 | call <function>snd_ac97_mixer()</function> multiple times with | 4167 | call <function>snd_ac97_mixer()</function> multiple times with |
4168 | ac97.num=1 or greater. The <structfield>num</structfield> field | 4168 | ac97.num=1 or greater. The <structfield>num</structfield> field |
4169 | specifies the codec | 4169 | specifies the codec |
4170 | number. | 4170 | number. |
4171 | </para> | 4171 | </para> |
4172 | 4172 | ||
4173 | <para> | 4173 | <para> |
4174 | If you have set up multiple codecs, you need to either write | 4174 | If you have set up multiple codecs, you need to either write |
4175 | different callbacks for each codec or check | 4175 | different callbacks for each codec or check |
4176 | ac97->num in the | 4176 | ac97->num in the |
4177 | callback routines. | 4177 | callback routines. |
4178 | </para> | 4178 | </para> |
4179 | </section> | 4179 | </section> |
4180 | 4180 | ||
4181 | </chapter> | 4181 | </chapter> |
4182 | 4182 | ||
4183 | 4183 | ||
4184 | <!-- ****************************************************** --> | 4184 | <!-- ****************************************************** --> |
4185 | <!-- MIDI (MPU401-UART) Interface --> | 4185 | <!-- MIDI (MPU401-UART) Interface --> |
4186 | <!-- ****************************************************** --> | 4186 | <!-- ****************************************************** --> |
4187 | <chapter id="midi-interface"> | 4187 | <chapter id="midi-interface"> |
4188 | <title>MIDI (MPU401-UART) Interface</title> | 4188 | <title>MIDI (MPU401-UART) Interface</title> |
4189 | 4189 | ||
4190 | <section id="midi-interface-general"> | 4190 | <section id="midi-interface-general"> |
4191 | <title>General</title> | 4191 | <title>General</title> |
4192 | <para> | 4192 | <para> |
4193 | Many soundcards have built-in MIDI (MPU401-UART) | 4193 | Many soundcards have built-in MIDI (MPU401-UART) |
4194 | interfaces. When the soundcard supports the standard MPU401-UART | 4194 | interfaces. When the soundcard supports the standard MPU401-UART |
4195 | interface, most likely you can use the ALSA MPU401-UART API. The | 4195 | interface, most likely you can use the ALSA MPU401-UART API. The |
4196 | MPU401-UART API is defined in | 4196 | MPU401-UART API is defined in |
4197 | <filename><sound/mpu401.h></filename>. | 4197 | <filename><sound/mpu401.h></filename>. |
4198 | </para> | 4198 | </para> |
4199 | 4199 | ||
4200 | <para> | 4200 | <para> |
4201 | Some soundchips have similar but a little bit different | 4201 | Some soundchips have similar but a little bit different |
4202 | implementation of mpu401 stuff. For example, emu10k1 has its own | 4202 | implementation of mpu401 stuff. For example, emu10k1 has its own |
4203 | mpu401 routines. | 4203 | mpu401 routines. |
4204 | </para> | 4204 | </para> |
4205 | </section> | 4205 | </section> |
4206 | 4206 | ||
4207 | <section id="midi-interface-constructor"> | 4207 | <section id="midi-interface-constructor"> |
4208 | <title>Constructor</title> | 4208 | <title>Constructor</title> |
4209 | <para> | 4209 | <para> |
4210 | For creating a rawmidi object, call | 4210 | For creating a rawmidi object, call |
4211 | <function>snd_mpu401_uart_new()</function>. | 4211 | <function>snd_mpu401_uart_new()</function>. |
4212 | 4212 | ||
4213 | <informalexample> | 4213 | <informalexample> |
4214 | <programlisting> | 4214 | <programlisting> |
4215 | <![CDATA[ | 4215 | <![CDATA[ |
4216 | struct snd_rawmidi *rmidi; | 4216 | struct snd_rawmidi *rmidi; |
4217 | snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, info_flags, | 4217 | snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, info_flags, |
4218 | irq, irq_flags, &rmidi); | 4218 | irq, irq_flags, &rmidi); |
4219 | ]]> | 4219 | ]]> |
4220 | </programlisting> | 4220 | </programlisting> |
4221 | </informalexample> | 4221 | </informalexample> |
4222 | </para> | 4222 | </para> |
4223 | 4223 | ||
4224 | <para> | 4224 | <para> |
4225 | The first argument is the card pointer, and the second is the | 4225 | The first argument is the card pointer, and the second is the |
4226 | index of this component. You can create up to 8 rawmidi | 4226 | index of this component. You can create up to 8 rawmidi |
4227 | devices. | 4227 | devices. |
4228 | </para> | 4228 | </para> |
4229 | 4229 | ||
4230 | <para> | 4230 | <para> |
4231 | The third argument is the type of the hardware, | 4231 | The third argument is the type of the hardware, |
4232 | <constant>MPU401_HW_XXX</constant>. If it's not a special one, | 4232 | <constant>MPU401_HW_XXX</constant>. If it's not a special one, |
4233 | you can use <constant>MPU401_HW_MPU401</constant>. | 4233 | you can use <constant>MPU401_HW_MPU401</constant>. |
4234 | </para> | 4234 | </para> |
4235 | 4235 | ||
4236 | <para> | 4236 | <para> |
4237 | The 4th argument is the i/o port address. Many | 4237 | The 4th argument is the i/o port address. Many |
4238 | backward-compatible MPU401 has an i/o port such as 0x330. Or, it | 4238 | backward-compatible MPU401 has an i/o port such as 0x330. Or, it |
4239 | might be a part of its own PCI i/o region. It depends on the | 4239 | might be a part of its own PCI i/o region. It depends on the |
4240 | chip design. | 4240 | chip design. |
4241 | </para> | 4241 | </para> |
4242 | 4242 | ||
4243 | <para> | 4243 | <para> |
4244 | The 5th argument is bitflags for additional information. | 4244 | The 5th argument is bitflags for additional information. |
4245 | When the i/o port address above is a part of the PCI i/o | 4245 | When the i/o port address above is a part of the PCI i/o |
4246 | region, the MPU401 i/o port might have been already allocated | 4246 | region, the MPU401 i/o port might have been already allocated |
4247 | (reserved) by the driver itself. In such a case, pass a bit flag | 4247 | (reserved) by the driver itself. In such a case, pass a bit flag |
4248 | <constant>MPU401_INFO_INTEGRATED</constant>, | 4248 | <constant>MPU401_INFO_INTEGRATED</constant>, |
4249 | and | 4249 | and |
4250 | the mpu401-uart layer will allocate the i/o ports by itself. | 4250 | the mpu401-uart layer will allocate the i/o ports by itself. |
4251 | </para> | 4251 | </para> |
4252 | 4252 | ||
4253 | <para> | 4253 | <para> |
4254 | When the controller supports only the input or output MIDI stream, | 4254 | When the controller supports only the input or output MIDI stream, |
4255 | pass <constant>MPU401_INFO_INPUT</constant> or | 4255 | pass <constant>MPU401_INFO_INPUT</constant> or |
4256 | <constant>MPU401_INFO_OUTPUT</constant> bitflag, respectively. | 4256 | <constant>MPU401_INFO_OUTPUT</constant> bitflag, respectively. |
4257 | Then the rawmidi instance is created as a single stream. | 4257 | Then the rawmidi instance is created as a single stream. |
4258 | </para> | 4258 | </para> |
4259 | 4259 | ||
4260 | <para> | 4260 | <para> |
4261 | <constant>MPU401_INFO_MMIO</constant> bitflag is used to change | 4261 | <constant>MPU401_INFO_MMIO</constant> bitflag is used to change |
4262 | the access method to MMIO (via readb and writeb) instead of | 4262 | the access method to MMIO (via readb and writeb) instead of |
4263 | iob and outb. In this case, you have to pass the iomapped address | 4263 | iob and outb. In this case, you have to pass the iomapped address |
4264 | to <function>snd_mpu401_uart_new()</function>. | 4264 | to <function>snd_mpu401_uart_new()</function>. |
4265 | </para> | 4265 | </para> |
4266 | 4266 | ||
4267 | <para> | 4267 | <para> |
4268 | When <constant>MPU401_INFO_TX_IRQ</constant> is set, the output | 4268 | When <constant>MPU401_INFO_TX_IRQ</constant> is set, the output |
4269 | stream isn't checked in the default interrupt handler. The driver | 4269 | stream isn't checked in the default interrupt handler. The driver |
4270 | needs to call <function>snd_mpu401_uart_interrupt_tx()</function> | 4270 | needs to call <function>snd_mpu401_uart_interrupt_tx()</function> |
4271 | by itself to start processing the output stream in irq handler. | 4271 | by itself to start processing the output stream in irq handler. |
4272 | </para> | 4272 | </para> |
4273 | 4273 | ||
4274 | <para> | 4274 | <para> |
4275 | Usually, the port address corresponds to the command port and | 4275 | Usually, the port address corresponds to the command port and |
4276 | port + 1 corresponds to the data port. If not, you may change | 4276 | port + 1 corresponds to the data port. If not, you may change |
4277 | the <structfield>cport</structfield> field of | 4277 | the <structfield>cport</structfield> field of |
4278 | struct <structname>snd_mpu401</structname> manually | 4278 | struct <structname>snd_mpu401</structname> manually |
4279 | afterward. However, <structname>snd_mpu401</structname> pointer is not | 4279 | afterward. However, <structname>snd_mpu401</structname> pointer is not |
4280 | returned explicitly by | 4280 | returned explicitly by |
4281 | <function>snd_mpu401_uart_new()</function>. You need to cast | 4281 | <function>snd_mpu401_uart_new()</function>. You need to cast |
4282 | rmidi->private_data to | 4282 | rmidi->private_data to |
4283 | <structname>snd_mpu401</structname> explicitly, | 4283 | <structname>snd_mpu401</structname> explicitly, |
4284 | 4284 | ||
4285 | <informalexample> | 4285 | <informalexample> |
4286 | <programlisting> | 4286 | <programlisting> |
4287 | <![CDATA[ | 4287 | <![CDATA[ |
4288 | struct snd_mpu401 *mpu; | 4288 | struct snd_mpu401 *mpu; |
4289 | mpu = rmidi->private_data; | 4289 | mpu = rmidi->private_data; |
4290 | ]]> | 4290 | ]]> |
4291 | </programlisting> | 4291 | </programlisting> |
4292 | </informalexample> | 4292 | </informalexample> |
4293 | 4293 | ||
4294 | and reset the cport as you like: | 4294 | and reset the cport as you like: |
4295 | 4295 | ||
4296 | <informalexample> | 4296 | <informalexample> |
4297 | <programlisting> | 4297 | <programlisting> |
4298 | <![CDATA[ | 4298 | <![CDATA[ |
4299 | mpu->cport = my_own_control_port; | 4299 | mpu->cport = my_own_control_port; |
4300 | ]]> | 4300 | ]]> |
4301 | </programlisting> | 4301 | </programlisting> |
4302 | </informalexample> | 4302 | </informalexample> |
4303 | </para> | 4303 | </para> |
4304 | 4304 | ||
4305 | <para> | 4305 | <para> |
4306 | The 6th argument specifies the irq number for UART. If the irq | 4306 | The 6th argument specifies the irq number for UART. If the irq |
4307 | is already allocated, pass 0 to the 7th argument | 4307 | is already allocated, pass 0 to the 7th argument |
4308 | (<parameter>irq_flags</parameter>). Otherwise, pass the flags | 4308 | (<parameter>irq_flags</parameter>). Otherwise, pass the flags |
4309 | for irq allocation | 4309 | for irq allocation |
4310 | (<constant>SA_XXX</constant> bits) to it, and the irq will be | 4310 | (<constant>SA_XXX</constant> bits) to it, and the irq will be |
4311 | reserved by the mpu401-uart layer. If the card doesn't generates | 4311 | reserved by the mpu401-uart layer. If the card doesn't generates |
4312 | UART interrupts, pass -1 as the irq number. Then a timer | 4312 | UART interrupts, pass -1 as the irq number. Then a timer |
4313 | interrupt will be invoked for polling. | 4313 | interrupt will be invoked for polling. |
4314 | </para> | 4314 | </para> |
4315 | </section> | 4315 | </section> |
4316 | 4316 | ||
4317 | <section id="midi-interface-interrupt-handler"> | 4317 | <section id="midi-interface-interrupt-handler"> |
4318 | <title>Interrupt Handler</title> | 4318 | <title>Interrupt Handler</title> |
4319 | <para> | 4319 | <para> |
4320 | When the interrupt is allocated in | 4320 | When the interrupt is allocated in |
4321 | <function>snd_mpu401_uart_new()</function>, the private | 4321 | <function>snd_mpu401_uart_new()</function>, the private |
4322 | interrupt handler is used, hence you don't have to do nothing | 4322 | interrupt handler is used, hence you don't have to do nothing |
4323 | else than creating the mpu401 stuff. Otherwise, you have to call | 4323 | else than creating the mpu401 stuff. Otherwise, you have to call |
4324 | <function>snd_mpu401_uart_interrupt()</function> explicitly when | 4324 | <function>snd_mpu401_uart_interrupt()</function> explicitly when |
4325 | a UART interrupt is invoked and checked in your own interrupt | 4325 | a UART interrupt is invoked and checked in your own interrupt |
4326 | handler. | 4326 | handler. |
4327 | </para> | 4327 | </para> |
4328 | 4328 | ||
4329 | <para> | 4329 | <para> |
4330 | In this case, you need to pass the private_data of the | 4330 | In this case, you need to pass the private_data of the |
4331 | returned rawmidi object from | 4331 | returned rawmidi object from |
4332 | <function>snd_mpu401_uart_new()</function> as the second | 4332 | <function>snd_mpu401_uart_new()</function> as the second |
4333 | argument of <function>snd_mpu401_uart_interrupt()</function>. | 4333 | argument of <function>snd_mpu401_uart_interrupt()</function>. |
4334 | 4334 | ||
4335 | <informalexample> | 4335 | <informalexample> |
4336 | <programlisting> | 4336 | <programlisting> |
4337 | <![CDATA[ | 4337 | <![CDATA[ |
4338 | snd_mpu401_uart_interrupt(irq, rmidi->private_data, regs); | 4338 | snd_mpu401_uart_interrupt(irq, rmidi->private_data, regs); |
4339 | ]]> | 4339 | ]]> |
4340 | </programlisting> | 4340 | </programlisting> |
4341 | </informalexample> | 4341 | </informalexample> |
4342 | </para> | 4342 | </para> |
4343 | </section> | 4343 | </section> |
4344 | 4344 | ||
4345 | </chapter> | 4345 | </chapter> |
4346 | 4346 | ||
4347 | 4347 | ||
4348 | <!-- ****************************************************** --> | 4348 | <!-- ****************************************************** --> |
4349 | <!-- RawMIDI Interface --> | 4349 | <!-- RawMIDI Interface --> |
4350 | <!-- ****************************************************** --> | 4350 | <!-- ****************************************************** --> |
4351 | <chapter id="rawmidi-interface"> | 4351 | <chapter id="rawmidi-interface"> |
4352 | <title>RawMIDI Interface</title> | 4352 | <title>RawMIDI Interface</title> |
4353 | 4353 | ||
4354 | <section id="rawmidi-interface-overview"> | 4354 | <section id="rawmidi-interface-overview"> |
4355 | <title>Overview</title> | 4355 | <title>Overview</title> |
4356 | 4356 | ||
4357 | <para> | 4357 | <para> |
4358 | The raw MIDI interface is used for hardware MIDI ports that can | 4358 | The raw MIDI interface is used for hardware MIDI ports that can |
4359 | be accessed as a byte stream. It is not used for synthesizer | 4359 | be accessed as a byte stream. It is not used for synthesizer |
4360 | chips that do not directly understand MIDI. | 4360 | chips that do not directly understand MIDI. |
4361 | </para> | 4361 | </para> |
4362 | 4362 | ||
4363 | <para> | 4363 | <para> |
4364 | ALSA handles file and buffer management. All you have to do is | 4364 | ALSA handles file and buffer management. All you have to do is |
4365 | to write some code to move data between the buffer and the | 4365 | to write some code to move data between the buffer and the |
4366 | hardware. | 4366 | hardware. |
4367 | </para> | 4367 | </para> |
4368 | 4368 | ||
4369 | <para> | 4369 | <para> |
4370 | The rawmidi API is defined in | 4370 | The rawmidi API is defined in |
4371 | <filename><sound/rawmidi.h></filename>. | 4371 | <filename><sound/rawmidi.h></filename>. |
4372 | </para> | 4372 | </para> |
4373 | </section> | 4373 | </section> |
4374 | 4374 | ||
4375 | <section id="rawmidi-interface-constructor"> | 4375 | <section id="rawmidi-interface-constructor"> |
4376 | <title>Constructor</title> | 4376 | <title>Constructor</title> |
4377 | 4377 | ||
4378 | <para> | 4378 | <para> |
4379 | To create a rawmidi device, call the | 4379 | To create a rawmidi device, call the |
4380 | <function>snd_rawmidi_new</function> function: | 4380 | <function>snd_rawmidi_new</function> function: |
4381 | <informalexample> | 4381 | <informalexample> |
4382 | <programlisting> | 4382 | <programlisting> |
4383 | <![CDATA[ | 4383 | <![CDATA[ |
4384 | struct snd_rawmidi *rmidi; | 4384 | struct snd_rawmidi *rmidi; |
4385 | err = snd_rawmidi_new(chip->card, "MyMIDI", 0, outs, ins, &rmidi); | 4385 | err = snd_rawmidi_new(chip->card, "MyMIDI", 0, outs, ins, &rmidi); |
4386 | if (err < 0) | 4386 | if (err < 0) |
4387 | return err; | 4387 | return err; |
4388 | rmidi->private_data = chip; | 4388 | rmidi->private_data = chip; |
4389 | strcpy(rmidi->name, "My MIDI"); | 4389 | strcpy(rmidi->name, "My MIDI"); |
4390 | rmidi->info_flags = SNDRV_RAWMIDI_INFO_OUTPUT | | 4390 | rmidi->info_flags = SNDRV_RAWMIDI_INFO_OUTPUT | |
4391 | SNDRV_RAWMIDI_INFO_INPUT | | 4391 | SNDRV_RAWMIDI_INFO_INPUT | |
4392 | SNDRV_RAWMIDI_INFO_DUPLEX; | 4392 | SNDRV_RAWMIDI_INFO_DUPLEX; |
4393 | ]]> | 4393 | ]]> |
4394 | </programlisting> | 4394 | </programlisting> |
4395 | </informalexample> | 4395 | </informalexample> |
4396 | </para> | 4396 | </para> |
4397 | 4397 | ||
4398 | <para> | 4398 | <para> |
4399 | The first argument is the card pointer, the second argument is | 4399 | The first argument is the card pointer, the second argument is |
4400 | the ID string. | 4400 | the ID string. |
4401 | </para> | 4401 | </para> |
4402 | 4402 | ||
4403 | <para> | 4403 | <para> |
4404 | The third argument is the index of this component. You can | 4404 | The third argument is the index of this component. You can |
4405 | create up to 8 rawmidi devices. | 4405 | create up to 8 rawmidi devices. |
4406 | </para> | 4406 | </para> |
4407 | 4407 | ||
4408 | <para> | 4408 | <para> |
4409 | The fourth and fifth arguments are the number of output and | 4409 | The fourth and fifth arguments are the number of output and |
4410 | input substreams, respectively, of this device. (A substream is | 4410 | input substreams, respectively, of this device. (A substream is |
4411 | the equivalent of a MIDI port.) | 4411 | the equivalent of a MIDI port.) |
4412 | </para> | 4412 | </para> |
4413 | 4413 | ||
4414 | <para> | 4414 | <para> |
4415 | Set the <structfield>info_flags</structfield> field to specify | 4415 | Set the <structfield>info_flags</structfield> field to specify |
4416 | the capabilities of the device. | 4416 | the capabilities of the device. |
4417 | Set <constant>SNDRV_RAWMIDI_INFO_OUTPUT</constant> if there is | 4417 | Set <constant>SNDRV_RAWMIDI_INFO_OUTPUT</constant> if there is |
4418 | at least one output port, | 4418 | at least one output port, |
4419 | <constant>SNDRV_RAWMIDI_INFO_INPUT</constant> if there is at | 4419 | <constant>SNDRV_RAWMIDI_INFO_INPUT</constant> if there is at |
4420 | least one input port, | 4420 | least one input port, |
4421 | and <constant>SNDRV_RAWMIDI_INFO_DUPLEX</constant> if the device | 4421 | and <constant>SNDRV_RAWMIDI_INFO_DUPLEX</constant> if the device |
4422 | can handle output and input at the same time. | 4422 | can handle output and input at the same time. |
4423 | </para> | 4423 | </para> |
4424 | 4424 | ||
4425 | <para> | 4425 | <para> |
4426 | After the rawmidi device is created, you need to set the | 4426 | After the rawmidi device is created, you need to set the |
4427 | operators (callbacks) for each substream. There are helper | 4427 | operators (callbacks) for each substream. There are helper |
4428 | functions to set the operators for all substream of a device: | 4428 | functions to set the operators for all substream of a device: |
4429 | <informalexample> | 4429 | <informalexample> |
4430 | <programlisting> | 4430 | <programlisting> |
4431 | <![CDATA[ | 4431 | <![CDATA[ |
4432 | snd_rawmidi_set_ops(rmidi, SNDRV_RAWMIDI_STREAM_OUTPUT, &snd_mymidi_output_ops); | 4432 | snd_rawmidi_set_ops(rmidi, SNDRV_RAWMIDI_STREAM_OUTPUT, &snd_mymidi_output_ops); |
4433 | snd_rawmidi_set_ops(rmidi, SNDRV_RAWMIDI_STREAM_INPUT, &snd_mymidi_input_ops); | 4433 | snd_rawmidi_set_ops(rmidi, SNDRV_RAWMIDI_STREAM_INPUT, &snd_mymidi_input_ops); |
4434 | ]]> | 4434 | ]]> |
4435 | </programlisting> | 4435 | </programlisting> |
4436 | </informalexample> | 4436 | </informalexample> |
4437 | </para> | 4437 | </para> |
4438 | 4438 | ||
4439 | <para> | 4439 | <para> |
4440 | The operators are usually defined like this: | 4440 | The operators are usually defined like this: |
4441 | <informalexample> | 4441 | <informalexample> |
4442 | <programlisting> | 4442 | <programlisting> |
4443 | <![CDATA[ | 4443 | <![CDATA[ |
4444 | static struct snd_rawmidi_ops snd_mymidi_output_ops = { | 4444 | static struct snd_rawmidi_ops snd_mymidi_output_ops = { |
4445 | .open = snd_mymidi_output_open, | 4445 | .open = snd_mymidi_output_open, |
4446 | .close = snd_mymidi_output_close, | 4446 | .close = snd_mymidi_output_close, |
4447 | .trigger = snd_mymidi_output_trigger, | 4447 | .trigger = snd_mymidi_output_trigger, |
4448 | }; | 4448 | }; |
4449 | ]]> | 4449 | ]]> |
4450 | </programlisting> | 4450 | </programlisting> |
4451 | </informalexample> | 4451 | </informalexample> |
4452 | These callbacks are explained in the <link | 4452 | These callbacks are explained in the <link |
4453 | linkend="rawmidi-interface-callbacks"><citetitle>Callbacks</citetitle></link> | 4453 | linkend="rawmidi-interface-callbacks"><citetitle>Callbacks</citetitle></link> |
4454 | section. | 4454 | section. |
4455 | </para> | 4455 | </para> |
4456 | 4456 | ||
4457 | <para> | 4457 | <para> |
4458 | If there is more than one substream, you should give each one a | 4458 | If there is more than one substream, you should give each one a |
4459 | unique name: | 4459 | unique name: |
4460 | <informalexample> | 4460 | <informalexample> |
4461 | <programlisting> | 4461 | <programlisting> |
4462 | <![CDATA[ | 4462 | <![CDATA[ |
4463 | struct list_head *list; | 4463 | struct list_head *list; |
4464 | struct snd_rawmidi_substream *substream; | 4464 | struct snd_rawmidi_substream *substream; |
4465 | list_for_each(list, &rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams) { | 4465 | list_for_each(list, &rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams) { |
4466 | substream = list_entry(list, struct snd_rawmidi_substream, list); | 4466 | substream = list_entry(list, struct snd_rawmidi_substream, list); |
4467 | sprintf(substream->name, "My MIDI Port %d", substream->number + 1); | 4467 | sprintf(substream->name, "My MIDI Port %d", substream->number + 1); |
4468 | } | 4468 | } |
4469 | /* same for SNDRV_RAWMIDI_STREAM_INPUT */ | 4469 | /* same for SNDRV_RAWMIDI_STREAM_INPUT */ |
4470 | ]]> | 4470 | ]]> |
4471 | </programlisting> | 4471 | </programlisting> |
4472 | </informalexample> | 4472 | </informalexample> |
4473 | </para> | 4473 | </para> |
4474 | </section> | 4474 | </section> |
4475 | 4475 | ||
4476 | <section id="rawmidi-interface-callbacks"> | 4476 | <section id="rawmidi-interface-callbacks"> |
4477 | <title>Callbacks</title> | 4477 | <title>Callbacks</title> |
4478 | 4478 | ||
4479 | <para> | 4479 | <para> |
4480 | In all callbacks, the private data that you've set for the | 4480 | In all callbacks, the private data that you've set for the |
4481 | rawmidi device can be accessed as | 4481 | rawmidi device can be accessed as |
4482 | substream->rmidi->private_data. | 4482 | substream->rmidi->private_data. |
4483 | <!-- <code> isn't available before DocBook 4.3 --> | 4483 | <!-- <code> isn't available before DocBook 4.3 --> |
4484 | </para> | 4484 | </para> |
4485 | 4485 | ||
4486 | <para> | 4486 | <para> |
4487 | If there is more than one port, your callbacks can determine the | 4487 | If there is more than one port, your callbacks can determine the |
4488 | port index from the struct snd_rawmidi_substream data passed to each | 4488 | port index from the struct snd_rawmidi_substream data passed to each |
4489 | callback: | 4489 | callback: |
4490 | <informalexample> | 4490 | <informalexample> |
4491 | <programlisting> | 4491 | <programlisting> |
4492 | <![CDATA[ | 4492 | <![CDATA[ |
4493 | struct snd_rawmidi_substream *substream; | 4493 | struct snd_rawmidi_substream *substream; |
4494 | int index = substream->number; | 4494 | int index = substream->number; |
4495 | ]]> | 4495 | ]]> |
4496 | </programlisting> | 4496 | </programlisting> |
4497 | </informalexample> | 4497 | </informalexample> |
4498 | </para> | 4498 | </para> |
4499 | 4499 | ||
4500 | <section id="rawmidi-interface-op-open"> | 4500 | <section id="rawmidi-interface-op-open"> |
4501 | <title><function>open</function> callback</title> | 4501 | <title><function>open</function> callback</title> |
4502 | 4502 | ||
4503 | <informalexample> | 4503 | <informalexample> |
4504 | <programlisting> | 4504 | <programlisting> |
4505 | <![CDATA[ | 4505 | <![CDATA[ |
4506 | static int snd_xxx_open(struct snd_rawmidi_substream *substream); | 4506 | static int snd_xxx_open(struct snd_rawmidi_substream *substream); |
4507 | ]]> | 4507 | ]]> |
4508 | </programlisting> | 4508 | </programlisting> |
4509 | </informalexample> | 4509 | </informalexample> |
4510 | 4510 | ||
4511 | <para> | 4511 | <para> |
4512 | This is called when a substream is opened. | 4512 | This is called when a substream is opened. |
4513 | You can initialize the hardware here, but you should not yet | 4513 | You can initialize the hardware here, but you should not yet |
4514 | start transmitting/receiving data. | 4514 | start transmitting/receiving data. |
4515 | </para> | 4515 | </para> |
4516 | </section> | 4516 | </section> |
4517 | 4517 | ||
4518 | <section id="rawmidi-interface-op-close"> | 4518 | <section id="rawmidi-interface-op-close"> |
4519 | <title><function>close</function> callback</title> | 4519 | <title><function>close</function> callback</title> |
4520 | 4520 | ||
4521 | <informalexample> | 4521 | <informalexample> |
4522 | <programlisting> | 4522 | <programlisting> |
4523 | <![CDATA[ | 4523 | <![CDATA[ |
4524 | static int snd_xxx_close(struct snd_rawmidi_substream *substream); | 4524 | static int snd_xxx_close(struct snd_rawmidi_substream *substream); |
4525 | ]]> | 4525 | ]]> |
4526 | </programlisting> | 4526 | </programlisting> |
4527 | </informalexample> | 4527 | </informalexample> |
4528 | 4528 | ||
4529 | <para> | 4529 | <para> |
4530 | Guess what. | 4530 | Guess what. |
4531 | </para> | 4531 | </para> |
4532 | 4532 | ||
4533 | <para> | 4533 | <para> |
4534 | The <function>open</function> and <function>close</function> | 4534 | The <function>open</function> and <function>close</function> |
4535 | callbacks of a rawmidi device are serialized with a mutex, | 4535 | callbacks of a rawmidi device are serialized with a mutex, |
4536 | and can sleep. | 4536 | and can sleep. |
4537 | </para> | 4537 | </para> |
4538 | </section> | 4538 | </section> |
4539 | 4539 | ||
4540 | <section id="rawmidi-interface-op-trigger-out"> | 4540 | <section id="rawmidi-interface-op-trigger-out"> |
4541 | <title><function>trigger</function> callback for output | 4541 | <title><function>trigger</function> callback for output |
4542 | substreams</title> | 4542 | substreams</title> |
4543 | 4543 | ||
4544 | <informalexample> | 4544 | <informalexample> |
4545 | <programlisting> | 4545 | <programlisting> |
4546 | <![CDATA[ | 4546 | <![CDATA[ |
4547 | static void snd_xxx_output_trigger(struct snd_rawmidi_substream *substream, int up); | 4547 | static void snd_xxx_output_trigger(struct snd_rawmidi_substream *substream, int up); |
4548 | ]]> | 4548 | ]]> |
4549 | </programlisting> | 4549 | </programlisting> |
4550 | </informalexample> | 4550 | </informalexample> |
4551 | 4551 | ||
4552 | <para> | 4552 | <para> |
4553 | This is called with a nonzero <parameter>up</parameter> | 4553 | This is called with a nonzero <parameter>up</parameter> |
4554 | parameter when there is some data in the substream buffer that | 4554 | parameter when there is some data in the substream buffer that |
4555 | must be transmitted. | 4555 | must be transmitted. |
4556 | </para> | 4556 | </para> |
4557 | 4557 | ||
4558 | <para> | 4558 | <para> |
4559 | To read data from the buffer, call | 4559 | To read data from the buffer, call |
4560 | <function>snd_rawmidi_transmit_peek</function>. It will | 4560 | <function>snd_rawmidi_transmit_peek</function>. It will |
4561 | return the number of bytes that have been read; this will be | 4561 | return the number of bytes that have been read; this will be |
4562 | less than the number of bytes requested when there is no more | 4562 | less than the number of bytes requested when there is no more |
4563 | data in the buffer. | 4563 | data in the buffer. |
4564 | After the data has been transmitted successfully, call | 4564 | After the data has been transmitted successfully, call |
4565 | <function>snd_rawmidi_transmit_ack</function> to remove the | 4565 | <function>snd_rawmidi_transmit_ack</function> to remove the |
4566 | data from the substream buffer: | 4566 | data from the substream buffer: |
4567 | <informalexample> | 4567 | <informalexample> |
4568 | <programlisting> | 4568 | <programlisting> |
4569 | <![CDATA[ | 4569 | <![CDATA[ |
4570 | unsigned char data; | 4570 | unsigned char data; |
4571 | while (snd_rawmidi_transmit_peek(substream, &data, 1) == 1) { | 4571 | while (snd_rawmidi_transmit_peek(substream, &data, 1) == 1) { |
4572 | if (snd_mychip_try_to_transmit(data)) | 4572 | if (snd_mychip_try_to_transmit(data)) |
4573 | snd_rawmidi_transmit_ack(substream, 1); | 4573 | snd_rawmidi_transmit_ack(substream, 1); |
4574 | else | 4574 | else |
4575 | break; /* hardware FIFO full */ | 4575 | break; /* hardware FIFO full */ |
4576 | } | 4576 | } |
4577 | ]]> | 4577 | ]]> |
4578 | </programlisting> | 4578 | </programlisting> |
4579 | </informalexample> | 4579 | </informalexample> |
4580 | </para> | 4580 | </para> |
4581 | 4581 | ||
4582 | <para> | 4582 | <para> |
4583 | If you know beforehand that the hardware will accept data, you | 4583 | If you know beforehand that the hardware will accept data, you |
4584 | can use the <function>snd_rawmidi_transmit</function> function | 4584 | can use the <function>snd_rawmidi_transmit</function> function |
4585 | which reads some data and removes it from the buffer at once: | 4585 | which reads some data and removes it from the buffer at once: |
4586 | <informalexample> | 4586 | <informalexample> |
4587 | <programlisting> | 4587 | <programlisting> |
4588 | <![CDATA[ | 4588 | <![CDATA[ |
4589 | while (snd_mychip_transmit_possible()) { | 4589 | while (snd_mychip_transmit_possible()) { |
4590 | unsigned char data; | 4590 | unsigned char data; |
4591 | if (snd_rawmidi_transmit(substream, &data, 1) != 1) | 4591 | if (snd_rawmidi_transmit(substream, &data, 1) != 1) |
4592 | break; /* no more data */ | 4592 | break; /* no more data */ |
4593 | snd_mychip_transmit(data); | 4593 | snd_mychip_transmit(data); |
4594 | } | 4594 | } |
4595 | ]]> | 4595 | ]]> |
4596 | </programlisting> | 4596 | </programlisting> |
4597 | </informalexample> | 4597 | </informalexample> |
4598 | </para> | 4598 | </para> |
4599 | 4599 | ||
4600 | <para> | 4600 | <para> |
4601 | If you know beforehand how many bytes you can accept, you can | 4601 | If you know beforehand how many bytes you can accept, you can |
4602 | use a buffer size greater than one with the | 4602 | use a buffer size greater than one with the |
4603 | <function>snd_rawmidi_transmit*</function> functions. | 4603 | <function>snd_rawmidi_transmit*</function> functions. |
4604 | </para> | 4604 | </para> |
4605 | 4605 | ||
4606 | <para> | 4606 | <para> |
4607 | The <function>trigger</function> callback must not sleep. If | 4607 | The <function>trigger</function> callback must not sleep. If |
4608 | the hardware FIFO is full before the substream buffer has been | 4608 | the hardware FIFO is full before the substream buffer has been |
4609 | emptied, you have to continue transmitting data later, either | 4609 | emptied, you have to continue transmitting data later, either |
4610 | in an interrupt handler, or with a timer if the hardware | 4610 | in an interrupt handler, or with a timer if the hardware |
4611 | doesn't have a MIDI transmit interrupt. | 4611 | doesn't have a MIDI transmit interrupt. |
4612 | </para> | 4612 | </para> |
4613 | 4613 | ||
4614 | <para> | 4614 | <para> |
4615 | The <function>trigger</function> callback is called with a | 4615 | The <function>trigger</function> callback is called with a |
4616 | zero <parameter>up</parameter> parameter when the transmission | 4616 | zero <parameter>up</parameter> parameter when the transmission |
4617 | of data should be aborted. | 4617 | of data should be aborted. |
4618 | </para> | 4618 | </para> |
4619 | </section> | 4619 | </section> |
4620 | 4620 | ||
4621 | <section id="rawmidi-interface-op-trigger-in"> | 4621 | <section id="rawmidi-interface-op-trigger-in"> |
4622 | <title><function>trigger</function> callback for input | 4622 | <title><function>trigger</function> callback for input |
4623 | substreams</title> | 4623 | substreams</title> |
4624 | 4624 | ||
4625 | <informalexample> | 4625 | <informalexample> |
4626 | <programlisting> | 4626 | <programlisting> |
4627 | <![CDATA[ | 4627 | <![CDATA[ |
4628 | static void snd_xxx_input_trigger(struct snd_rawmidi_substream *substream, int up); | 4628 | static void snd_xxx_input_trigger(struct snd_rawmidi_substream *substream, int up); |
4629 | ]]> | 4629 | ]]> |
4630 | </programlisting> | 4630 | </programlisting> |
4631 | </informalexample> | 4631 | </informalexample> |
4632 | 4632 | ||
4633 | <para> | 4633 | <para> |
4634 | This is called with a nonzero <parameter>up</parameter> | 4634 | This is called with a nonzero <parameter>up</parameter> |
4635 | parameter to enable receiving data, or with a zero | 4635 | parameter to enable receiving data, or with a zero |
4636 | <parameter>up</parameter> parameter do disable receiving data. | 4636 | <parameter>up</parameter> parameter do disable receiving data. |
4637 | </para> | 4637 | </para> |
4638 | 4638 | ||
4639 | <para> | 4639 | <para> |
4640 | The <function>trigger</function> callback must not sleep; the | 4640 | The <function>trigger</function> callback must not sleep; the |
4641 | actual reading of data from the device is usually done in an | 4641 | actual reading of data from the device is usually done in an |
4642 | interrupt handler. | 4642 | interrupt handler. |
4643 | </para> | 4643 | </para> |
4644 | 4644 | ||
4645 | <para> | 4645 | <para> |
4646 | When data reception is enabled, your interrupt handler should | 4646 | When data reception is enabled, your interrupt handler should |
4647 | call <function>snd_rawmidi_receive</function> for all received | 4647 | call <function>snd_rawmidi_receive</function> for all received |
4648 | data: | 4648 | data: |
4649 | <informalexample> | 4649 | <informalexample> |
4650 | <programlisting> | 4650 | <programlisting> |
4651 | <![CDATA[ | 4651 | <![CDATA[ |
4652 | void snd_mychip_midi_interrupt(...) | 4652 | void snd_mychip_midi_interrupt(...) |
4653 | { | 4653 | { |
4654 | while (mychip_midi_available()) { | 4654 | while (mychip_midi_available()) { |
4655 | unsigned char data; | 4655 | unsigned char data; |
4656 | data = mychip_midi_read(); | 4656 | data = mychip_midi_read(); |
4657 | snd_rawmidi_receive(substream, &data, 1); | 4657 | snd_rawmidi_receive(substream, &data, 1); |
4658 | } | 4658 | } |
4659 | } | 4659 | } |
4660 | ]]> | 4660 | ]]> |
4661 | </programlisting> | 4661 | </programlisting> |
4662 | </informalexample> | 4662 | </informalexample> |
4663 | </para> | 4663 | </para> |
4664 | </section> | 4664 | </section> |
4665 | 4665 | ||
4666 | <section id="rawmidi-interface-op-drain"> | 4666 | <section id="rawmidi-interface-op-drain"> |
4667 | <title><function>drain</function> callback</title> | 4667 | <title><function>drain</function> callback</title> |
4668 | 4668 | ||
4669 | <informalexample> | 4669 | <informalexample> |
4670 | <programlisting> | 4670 | <programlisting> |
4671 | <![CDATA[ | 4671 | <![CDATA[ |
4672 | static void snd_xxx_drain(struct snd_rawmidi_substream *substream); | 4672 | static void snd_xxx_drain(struct snd_rawmidi_substream *substream); |
4673 | ]]> | 4673 | ]]> |
4674 | </programlisting> | 4674 | </programlisting> |
4675 | </informalexample> | 4675 | </informalexample> |
4676 | 4676 | ||
4677 | <para> | 4677 | <para> |
4678 | This is only used with output substreams. This function should wait | 4678 | This is only used with output substreams. This function should wait |
4679 | until all data read from the substream buffer has been transmitted. | 4679 | until all data read from the substream buffer has been transmitted. |
4680 | This ensures that the device can be closed and the driver unloaded | 4680 | This ensures that the device can be closed and the driver unloaded |
4681 | without losing data. | 4681 | without losing data. |
4682 | </para> | 4682 | </para> |
4683 | 4683 | ||
4684 | <para> | 4684 | <para> |
4685 | This callback is optional. If you do not set | 4685 | This callback is optional. If you do not set |
4686 | <structfield>drain</structfield> in the struct snd_rawmidi_ops | 4686 | <structfield>drain</structfield> in the struct snd_rawmidi_ops |
4687 | structure, ALSA will simply wait for 50 milliseconds | 4687 | structure, ALSA will simply wait for 50 milliseconds |
4688 | instead. | 4688 | instead. |
4689 | </para> | 4689 | </para> |
4690 | </section> | 4690 | </section> |
4691 | </section> | 4691 | </section> |
4692 | 4692 | ||
4693 | </chapter> | 4693 | </chapter> |
4694 | 4694 | ||
4695 | 4695 | ||
4696 | <!-- ****************************************************** --> | 4696 | <!-- ****************************************************** --> |
4697 | <!-- Miscellaneous Devices --> | 4697 | <!-- Miscellaneous Devices --> |
4698 | <!-- ****************************************************** --> | 4698 | <!-- ****************************************************** --> |
4699 | <chapter id="misc-devices"> | 4699 | <chapter id="misc-devices"> |
4700 | <title>Miscellaneous Devices</title> | 4700 | <title>Miscellaneous Devices</title> |
4701 | 4701 | ||
4702 | <section id="misc-devices-opl3"> | 4702 | <section id="misc-devices-opl3"> |
4703 | <title>FM OPL3</title> | 4703 | <title>FM OPL3</title> |
4704 | <para> | 4704 | <para> |
4705 | The FM OPL3 is still used on many chips (mainly for backward | 4705 | The FM OPL3 is still used on many chips (mainly for backward |
4706 | compatibility). ALSA has a nice OPL3 FM control layer, too. The | 4706 | compatibility). ALSA has a nice OPL3 FM control layer, too. The |
4707 | OPL3 API is defined in | 4707 | OPL3 API is defined in |
4708 | <filename><sound/opl3.h></filename>. | 4708 | <filename><sound/opl3.h></filename>. |
4709 | </para> | 4709 | </para> |
4710 | 4710 | ||
4711 | <para> | 4711 | <para> |
4712 | FM registers can be directly accessed through direct-FM API, | 4712 | FM registers can be directly accessed through direct-FM API, |
4713 | defined in <filename><sound/asound_fm.h></filename>. In | 4713 | defined in <filename><sound/asound_fm.h></filename>. In |
4714 | ALSA native mode, FM registers are accessed through | 4714 | ALSA native mode, FM registers are accessed through |
4715 | Hardware-Dependant Device direct-FM extension API, whereas in | 4715 | Hardware-Dependant Device direct-FM extension API, whereas in |
4716 | OSS compatible mode, FM registers can be accessed with OSS | 4716 | OSS compatible mode, FM registers can be accessed with OSS |
4717 | direct-FM compatible API on <filename>/dev/dmfmX</filename> device. | 4717 | direct-FM compatible API on <filename>/dev/dmfmX</filename> device. |
4718 | </para> | 4718 | </para> |
4719 | 4719 | ||
4720 | <para> | 4720 | <para> |
4721 | For creating the OPL3 component, you have two functions to | 4721 | For creating the OPL3 component, you have two functions to |
4722 | call. The first one is a constructor for <type>opl3_t</type> | 4722 | call. The first one is a constructor for <type>opl3_t</type> |
4723 | instance. | 4723 | instance. |
4724 | 4724 | ||
4725 | <informalexample> | 4725 | <informalexample> |
4726 | <programlisting> | 4726 | <programlisting> |
4727 | <![CDATA[ | 4727 | <![CDATA[ |
4728 | struct snd_opl3 *opl3; | 4728 | struct snd_opl3 *opl3; |
4729 | snd_opl3_create(card, lport, rport, OPL3_HW_OPL3_XXX, | 4729 | snd_opl3_create(card, lport, rport, OPL3_HW_OPL3_XXX, |
4730 | integrated, &opl3); | 4730 | integrated, &opl3); |
4731 | ]]> | 4731 | ]]> |
4732 | </programlisting> | 4732 | </programlisting> |
4733 | </informalexample> | 4733 | </informalexample> |
4734 | </para> | 4734 | </para> |
4735 | 4735 | ||
4736 | <para> | 4736 | <para> |
4737 | The first argument is the card pointer, the second one is the | 4737 | The first argument is the card pointer, the second one is the |
4738 | left port address, and the third is the right port address. In | 4738 | left port address, and the third is the right port address. In |
4739 | most cases, the right port is placed at the left port + 2. | 4739 | most cases, the right port is placed at the left port + 2. |
4740 | </para> | 4740 | </para> |
4741 | 4741 | ||
4742 | <para> | 4742 | <para> |
4743 | The fourth argument is the hardware type. | 4743 | The fourth argument is the hardware type. |
4744 | </para> | 4744 | </para> |
4745 | 4745 | ||
4746 | <para> | 4746 | <para> |
4747 | When the left and right ports have been already allocated by | 4747 | When the left and right ports have been already allocated by |
4748 | the card driver, pass non-zero to the fifth argument | 4748 | the card driver, pass non-zero to the fifth argument |
4749 | (<parameter>integrated</parameter>). Otherwise, opl3 module will | 4749 | (<parameter>integrated</parameter>). Otherwise, opl3 module will |
4750 | allocate the specified ports by itself. | 4750 | allocate the specified ports by itself. |
4751 | </para> | 4751 | </para> |
4752 | 4752 | ||
4753 | <para> | 4753 | <para> |
4754 | When the accessing to the hardware requires special method | 4754 | When the accessing to the hardware requires special method |
4755 | instead of the standard I/O access, you can create opl3 instance | 4755 | instead of the standard I/O access, you can create opl3 instance |
4756 | separately with <function>snd_opl3_new()</function>. | 4756 | separately with <function>snd_opl3_new()</function>. |
4757 | 4757 | ||
4758 | <informalexample> | 4758 | <informalexample> |
4759 | <programlisting> | 4759 | <programlisting> |
4760 | <![CDATA[ | 4760 | <![CDATA[ |
4761 | struct snd_opl3 *opl3; | 4761 | struct snd_opl3 *opl3; |
4762 | snd_opl3_new(card, OPL3_HW_OPL3_XXX, &opl3); | 4762 | snd_opl3_new(card, OPL3_HW_OPL3_XXX, &opl3); |
4763 | ]]> | 4763 | ]]> |
4764 | </programlisting> | 4764 | </programlisting> |
4765 | </informalexample> | 4765 | </informalexample> |
4766 | </para> | 4766 | </para> |
4767 | 4767 | ||
4768 | <para> | 4768 | <para> |
4769 | Then set <structfield>command</structfield>, | 4769 | Then set <structfield>command</structfield>, |
4770 | <structfield>private_data</structfield> and | 4770 | <structfield>private_data</structfield> and |
4771 | <structfield>private_free</structfield> for the private | 4771 | <structfield>private_free</structfield> for the private |
4772 | access function, the private data and the destructor. | 4772 | access function, the private data and the destructor. |
4773 | The l_port and r_port are not necessarily set. Only the | 4773 | The l_port and r_port are not necessarily set. Only the |
4774 | command must be set properly. You can retrieve the data | 4774 | command must be set properly. You can retrieve the data |
4775 | from opl3->private_data field. | 4775 | from opl3->private_data field. |
4776 | </para> | 4776 | </para> |
4777 | 4777 | ||
4778 | <para> | 4778 | <para> |
4779 | After creating the opl3 instance via <function>snd_opl3_new()</function>, | 4779 | After creating the opl3 instance via <function>snd_opl3_new()</function>, |
4780 | call <function>snd_opl3_init()</function> to initialize the chip to the | 4780 | call <function>snd_opl3_init()</function> to initialize the chip to the |
4781 | proper state. Note that <function>snd_opl3_create()</function> always | 4781 | proper state. Note that <function>snd_opl3_create()</function> always |
4782 | calls it internally. | 4782 | calls it internally. |
4783 | </para> | 4783 | </para> |
4784 | 4784 | ||
4785 | <para> | 4785 | <para> |
4786 | If the opl3 instance is created successfully, then create a | 4786 | If the opl3 instance is created successfully, then create a |
4787 | hwdep device for this opl3. | 4787 | hwdep device for this opl3. |
4788 | 4788 | ||
4789 | <informalexample> | 4789 | <informalexample> |
4790 | <programlisting> | 4790 | <programlisting> |
4791 | <![CDATA[ | 4791 | <![CDATA[ |
4792 | struct snd_hwdep *opl3hwdep; | 4792 | struct snd_hwdep *opl3hwdep; |
4793 | snd_opl3_hwdep_new(opl3, 0, 1, &opl3hwdep); | 4793 | snd_opl3_hwdep_new(opl3, 0, 1, &opl3hwdep); |
4794 | ]]> | 4794 | ]]> |
4795 | </programlisting> | 4795 | </programlisting> |
4796 | </informalexample> | 4796 | </informalexample> |
4797 | </para> | 4797 | </para> |
4798 | 4798 | ||
4799 | <para> | 4799 | <para> |
4800 | The first argument is the <type>opl3_t</type> instance you | 4800 | The first argument is the <type>opl3_t</type> instance you |
4801 | created, and the second is the index number, usually 0. | 4801 | created, and the second is the index number, usually 0. |
4802 | </para> | 4802 | </para> |
4803 | 4803 | ||
4804 | <para> | 4804 | <para> |
4805 | The third argument is the index-offset for the sequencer | 4805 | The third argument is the index-offset for the sequencer |
4806 | client assigned to the OPL3 port. When there is an MPU401-UART, | 4806 | client assigned to the OPL3 port. When there is an MPU401-UART, |
4807 | give 1 for here (UART always takes 0). | 4807 | give 1 for here (UART always takes 0). |
4808 | </para> | 4808 | </para> |
4809 | </section> | 4809 | </section> |
4810 | 4810 | ||
4811 | <section id="misc-devices-hardware-dependent"> | 4811 | <section id="misc-devices-hardware-dependent"> |
4812 | <title>Hardware-Dependent Devices</title> | 4812 | <title>Hardware-Dependent Devices</title> |
4813 | <para> | 4813 | <para> |
4814 | Some chips need the access from the user-space for special | 4814 | Some chips need the access from the user-space for special |
4815 | controls or for loading the micro code. In such a case, you can | 4815 | controls or for loading the micro code. In such a case, you can |
4816 | create a hwdep (hardware-dependent) device. The hwdep API is | 4816 | create a hwdep (hardware-dependent) device. The hwdep API is |
4817 | defined in <filename><sound/hwdep.h></filename>. You can | 4817 | defined in <filename><sound/hwdep.h></filename>. You can |
4818 | find examples in opl3 driver or | 4818 | find examples in opl3 driver or |
4819 | <filename>isa/sb/sb16_csp.c</filename>. | 4819 | <filename>isa/sb/sb16_csp.c</filename>. |
4820 | </para> | 4820 | </para> |
4821 | 4821 | ||
4822 | <para> | 4822 | <para> |
4823 | Creation of the <type>hwdep</type> instance is done via | 4823 | Creation of the <type>hwdep</type> instance is done via |
4824 | <function>snd_hwdep_new()</function>. | 4824 | <function>snd_hwdep_new()</function>. |
4825 | 4825 | ||
4826 | <informalexample> | 4826 | <informalexample> |
4827 | <programlisting> | 4827 | <programlisting> |
4828 | <![CDATA[ | 4828 | <![CDATA[ |
4829 | struct snd_hwdep *hw; | 4829 | struct snd_hwdep *hw; |
4830 | snd_hwdep_new(card, "My HWDEP", 0, &hw); | 4830 | snd_hwdep_new(card, "My HWDEP", 0, &hw); |
4831 | ]]> | 4831 | ]]> |
4832 | </programlisting> | 4832 | </programlisting> |
4833 | </informalexample> | 4833 | </informalexample> |
4834 | 4834 | ||
4835 | where the third argument is the index number. | 4835 | where the third argument is the index number. |
4836 | </para> | 4836 | </para> |
4837 | 4837 | ||
4838 | <para> | 4838 | <para> |
4839 | You can then pass any pointer value to the | 4839 | You can then pass any pointer value to the |
4840 | <parameter>private_data</parameter>. | 4840 | <parameter>private_data</parameter>. |
4841 | If you assign a private data, you should define the | 4841 | If you assign a private data, you should define the |
4842 | destructor, too. The destructor function is set to | 4842 | destructor, too. The destructor function is set to |
4843 | <structfield>private_free</structfield> field. | 4843 | <structfield>private_free</structfield> field. |
4844 | 4844 | ||
4845 | <informalexample> | 4845 | <informalexample> |
4846 | <programlisting> | 4846 | <programlisting> |
4847 | <![CDATA[ | 4847 | <![CDATA[ |
4848 | struct mydata *p = kmalloc(sizeof(*p), GFP_KERNEL); | 4848 | struct mydata *p = kmalloc(sizeof(*p), GFP_KERNEL); |
4849 | hw->private_data = p; | 4849 | hw->private_data = p; |
4850 | hw->private_free = mydata_free; | 4850 | hw->private_free = mydata_free; |
4851 | ]]> | 4851 | ]]> |
4852 | </programlisting> | 4852 | </programlisting> |
4853 | </informalexample> | 4853 | </informalexample> |
4854 | 4854 | ||
4855 | and the implementation of destructor would be: | 4855 | and the implementation of destructor would be: |
4856 | 4856 | ||
4857 | <informalexample> | 4857 | <informalexample> |
4858 | <programlisting> | 4858 | <programlisting> |
4859 | <![CDATA[ | 4859 | <![CDATA[ |
4860 | static void mydata_free(struct snd_hwdep *hw) | 4860 | static void mydata_free(struct snd_hwdep *hw) |
4861 | { | 4861 | { |
4862 | struct mydata *p = hw->private_data; | 4862 | struct mydata *p = hw->private_data; |
4863 | kfree(p); | 4863 | kfree(p); |
4864 | } | 4864 | } |
4865 | ]]> | 4865 | ]]> |
4866 | </programlisting> | 4866 | </programlisting> |
4867 | </informalexample> | 4867 | </informalexample> |
4868 | </para> | 4868 | </para> |
4869 | 4869 | ||
4870 | <para> | 4870 | <para> |
4871 | The arbitrary file operations can be defined for this | 4871 | The arbitrary file operations can be defined for this |
4872 | instance. The file operators are defined in | 4872 | instance. The file operators are defined in |
4873 | <parameter>ops</parameter> table. For example, assume that | 4873 | <parameter>ops</parameter> table. For example, assume that |
4874 | this chip needs an ioctl. | 4874 | this chip needs an ioctl. |
4875 | 4875 | ||
4876 | <informalexample> | 4876 | <informalexample> |
4877 | <programlisting> | 4877 | <programlisting> |
4878 | <![CDATA[ | 4878 | <![CDATA[ |
4879 | hw->ops.open = mydata_open; | 4879 | hw->ops.open = mydata_open; |
4880 | hw->ops.ioctl = mydata_ioctl; | 4880 | hw->ops.ioctl = mydata_ioctl; |
4881 | hw->ops.release = mydata_release; | 4881 | hw->ops.release = mydata_release; |
4882 | ]]> | 4882 | ]]> |
4883 | </programlisting> | 4883 | </programlisting> |
4884 | </informalexample> | 4884 | </informalexample> |
4885 | 4885 | ||
4886 | And implement the callback functions as you like. | 4886 | And implement the callback functions as you like. |
4887 | </para> | 4887 | </para> |
4888 | </section> | 4888 | </section> |
4889 | 4889 | ||
4890 | <section id="misc-devices-IEC958"> | 4890 | <section id="misc-devices-IEC958"> |
4891 | <title>IEC958 (S/PDIF)</title> | 4891 | <title>IEC958 (S/PDIF)</title> |
4892 | <para> | 4892 | <para> |
4893 | Usually the controls for IEC958 devices are implemented via | 4893 | Usually the controls for IEC958 devices are implemented via |
4894 | control interface. There is a macro to compose a name string for | 4894 | control interface. There is a macro to compose a name string for |
4895 | IEC958 controls, <function>SNDRV_CTL_NAME_IEC958()</function> | 4895 | IEC958 controls, <function>SNDRV_CTL_NAME_IEC958()</function> |
4896 | defined in <filename><include/asound.h></filename>. | 4896 | defined in <filename><include/asound.h></filename>. |
4897 | </para> | 4897 | </para> |
4898 | 4898 | ||
4899 | <para> | 4899 | <para> |
4900 | There are some standard controls for IEC958 status bits. These | 4900 | There are some standard controls for IEC958 status bits. These |
4901 | controls use the type <type>SNDRV_CTL_ELEM_TYPE_IEC958</type>, | 4901 | controls use the type <type>SNDRV_CTL_ELEM_TYPE_IEC958</type>, |
4902 | and the size of element is fixed as 4 bytes array | 4902 | and the size of element is fixed as 4 bytes array |
4903 | (value.iec958.status[x]). For <structfield>info</structfield> | 4903 | (value.iec958.status[x]). For <structfield>info</structfield> |
4904 | callback, you don't specify | 4904 | callback, you don't specify |
4905 | the value field for this type (the count field must be set, | 4905 | the value field for this type (the count field must be set, |
4906 | though). | 4906 | though). |
4907 | </para> | 4907 | </para> |
4908 | 4908 | ||
4909 | <para> | 4909 | <para> |
4910 | <quote>IEC958 Playback Con Mask</quote> is used to return the | 4910 | <quote>IEC958 Playback Con Mask</quote> is used to return the |
4911 | bit-mask for the IEC958 status bits of consumer mode. Similarly, | 4911 | bit-mask for the IEC958 status bits of consumer mode. Similarly, |
4912 | <quote>IEC958 Playback Pro Mask</quote> returns the bitmask for | 4912 | <quote>IEC958 Playback Pro Mask</quote> returns the bitmask for |
4913 | professional mode. They are read-only controls, and are defined | 4913 | professional mode. They are read-only controls, and are defined |
4914 | as MIXER controls (iface = | 4914 | as MIXER controls (iface = |
4915 | <constant>SNDRV_CTL_ELEM_IFACE_MIXER</constant>). | 4915 | <constant>SNDRV_CTL_ELEM_IFACE_MIXER</constant>). |
4916 | </para> | 4916 | </para> |
4917 | 4917 | ||
4918 | <para> | 4918 | <para> |
4919 | Meanwhile, <quote>IEC958 Playback Default</quote> control is | 4919 | Meanwhile, <quote>IEC958 Playback Default</quote> control is |
4920 | defined for getting and setting the current default IEC958 | 4920 | defined for getting and setting the current default IEC958 |
4921 | bits. Note that this one is usually defined as a PCM control | 4921 | bits. Note that this one is usually defined as a PCM control |
4922 | (iface = <constant>SNDRV_CTL_ELEM_IFACE_PCM</constant>), | 4922 | (iface = <constant>SNDRV_CTL_ELEM_IFACE_PCM</constant>), |
4923 | although in some places it's defined as a MIXER control. | 4923 | although in some places it's defined as a MIXER control. |
4924 | </para> | 4924 | </para> |
4925 | 4925 | ||
4926 | <para> | 4926 | <para> |
4927 | In addition, you can define the control switches to | 4927 | In addition, you can define the control switches to |
4928 | enable/disable or to set the raw bit mode. The implementation | 4928 | enable/disable or to set the raw bit mode. The implementation |
4929 | will depend on the chip, but the control should be named as | 4929 | will depend on the chip, but the control should be named as |
4930 | <quote>IEC958 xxx</quote>, preferably using | 4930 | <quote>IEC958 xxx</quote>, preferably using |
4931 | <function>SNDRV_CTL_NAME_IEC958()</function> macro. | 4931 | <function>SNDRV_CTL_NAME_IEC958()</function> macro. |
4932 | </para> | 4932 | </para> |
4933 | 4933 | ||
4934 | <para> | 4934 | <para> |
4935 | You can find several cases, for example, | 4935 | You can find several cases, for example, |
4936 | <filename>pci/emu10k1</filename>, | 4936 | <filename>pci/emu10k1</filename>, |
4937 | <filename>pci/ice1712</filename>, or | 4937 | <filename>pci/ice1712</filename>, or |
4938 | <filename>pci/cmipci.c</filename>. | 4938 | <filename>pci/cmipci.c</filename>. |
4939 | </para> | 4939 | </para> |
4940 | </section> | 4940 | </section> |
4941 | 4941 | ||
4942 | </chapter> | 4942 | </chapter> |
4943 | 4943 | ||
4944 | 4944 | ||
4945 | <!-- ****************************************************** --> | 4945 | <!-- ****************************************************** --> |
4946 | <!-- Buffer and Memory Management --> | 4946 | <!-- Buffer and Memory Management --> |
4947 | <!-- ****************************************************** --> | 4947 | <!-- ****************************************************** --> |
4948 | <chapter id="buffer-and-memory"> | 4948 | <chapter id="buffer-and-memory"> |
4949 | <title>Buffer and Memory Management</title> | 4949 | <title>Buffer and Memory Management</title> |
4950 | 4950 | ||
4951 | <section id="buffer-and-memory-buffer-types"> | 4951 | <section id="buffer-and-memory-buffer-types"> |
4952 | <title>Buffer Types</title> | 4952 | <title>Buffer Types</title> |
4953 | <para> | 4953 | <para> |
4954 | ALSA provides several different buffer allocation functions | 4954 | ALSA provides several different buffer allocation functions |
4955 | depending on the bus and the architecture. All these have a | 4955 | depending on the bus and the architecture. All these have a |
4956 | consistent API. The allocation of physically-contiguous pages is | 4956 | consistent API. The allocation of physically-contiguous pages is |
4957 | done via | 4957 | done via |
4958 | <function>snd_malloc_xxx_pages()</function> function, where xxx | 4958 | <function>snd_malloc_xxx_pages()</function> function, where xxx |
4959 | is the bus type. | 4959 | is the bus type. |
4960 | </para> | 4960 | </para> |
4961 | 4961 | ||
4962 | <para> | 4962 | <para> |
4963 | The allocation of pages with fallback is | 4963 | The allocation of pages with fallback is |
4964 | <function>snd_malloc_xxx_pages_fallback()</function>. This | 4964 | <function>snd_malloc_xxx_pages_fallback()</function>. This |
4965 | function tries to allocate the specified pages but if the pages | 4965 | function tries to allocate the specified pages but if the pages |
4966 | are not available, it tries to reduce the page sizes until the | 4966 | are not available, it tries to reduce the page sizes until the |
4967 | enough space is found. | 4967 | enough space is found. |
4968 | </para> | 4968 | </para> |
4969 | 4969 | ||
4970 | <para> | 4970 | <para> |
4971 | For releasing the space, call | 4971 | For releasing the space, call |
4972 | <function>snd_free_xxx_pages()</function> function. | 4972 | <function>snd_free_xxx_pages()</function> function. |
4973 | </para> | 4973 | </para> |
4974 | 4974 | ||
4975 | <para> | 4975 | <para> |
4976 | Usually, ALSA drivers try to allocate and reserve | 4976 | Usually, ALSA drivers try to allocate and reserve |
4977 | a large contiguous physical space | 4977 | a large contiguous physical space |
4978 | at the time the module is loaded for the later use. | 4978 | at the time the module is loaded for the later use. |
4979 | This is called <quote>pre-allocation</quote>. | 4979 | This is called <quote>pre-allocation</quote>. |
4980 | As already written, you can call the following function at the | 4980 | As already written, you can call the following function at the |
4981 | construction of pcm instance (in the case of PCI bus). | 4981 | construction of pcm instance (in the case of PCI bus). |
4982 | 4982 | ||
4983 | <informalexample> | 4983 | <informalexample> |
4984 | <programlisting> | 4984 | <programlisting> |
4985 | <![CDATA[ | 4985 | <![CDATA[ |
4986 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, | 4986 | snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, |
4987 | snd_dma_pci_data(pci), size, max); | 4987 | snd_dma_pci_data(pci), size, max); |
4988 | ]]> | 4988 | ]]> |
4989 | </programlisting> | 4989 | </programlisting> |
4990 | </informalexample> | 4990 | </informalexample> |
4991 | 4991 | ||
4992 | where <parameter>size</parameter> is the byte size to be | 4992 | where <parameter>size</parameter> is the byte size to be |
4993 | pre-allocated and the <parameter>max</parameter> is the maximal | 4993 | pre-allocated and the <parameter>max</parameter> is the maximal |
4994 | size to be changed via <filename>prealloc</filename> proc file. | 4994 | size to be changed via <filename>prealloc</filename> proc file. |
4995 | The allocator will try to get as large area as possible | 4995 | The allocator will try to get as large area as possible |
4996 | within the given size. | 4996 | within the given size. |
4997 | </para> | 4997 | </para> |
4998 | 4998 | ||
4999 | <para> | 4999 | <para> |
5000 | The second argument (type) and the third argument (device pointer) | 5000 | The second argument (type) and the third argument (device pointer) |
5001 | are dependent on the bus. | 5001 | are dependent on the bus. |
5002 | In the case of ISA bus, pass <function>snd_dma_isa_data()</function> | 5002 | In the case of ISA bus, pass <function>snd_dma_isa_data()</function> |
5003 | as the third argument with <constant>SNDRV_DMA_TYPE_DEV</constant> type. | 5003 | as the third argument with <constant>SNDRV_DMA_TYPE_DEV</constant> type. |
5004 | For the continuous buffer unrelated to the bus can be pre-allocated | 5004 | For the continuous buffer unrelated to the bus can be pre-allocated |
5005 | with <constant>SNDRV_DMA_TYPE_CONTINUOUS</constant> type and the | 5005 | with <constant>SNDRV_DMA_TYPE_CONTINUOUS</constant> type and the |
5006 | <function>snd_dma_continuous_data(GFP_KERNEL)</function> device pointer, | 5006 | <function>snd_dma_continuous_data(GFP_KERNEL)</function> device pointer, |
5007 | whereh <constant>GFP_KERNEL</constant> is the kernel allocation flag to | 5007 | whereh <constant>GFP_KERNEL</constant> is the kernel allocation flag to |
5008 | use. For the SBUS, <constant>SNDRV_DMA_TYPE_SBUS</constant> and | 5008 | use. For the SBUS, <constant>SNDRV_DMA_TYPE_SBUS</constant> and |
5009 | <function>snd_dma_sbus_data(sbus_dev)</function> are used instead. | 5009 | <function>snd_dma_sbus_data(sbus_dev)</function> are used instead. |
5010 | For the PCI scatter-gather buffers, use | 5010 | For the PCI scatter-gather buffers, use |
5011 | <constant>SNDRV_DMA_TYPE_DEV_SG</constant> with | 5011 | <constant>SNDRV_DMA_TYPE_DEV_SG</constant> with |
5012 | <function>snd_dma_pci_data(pci)</function> | 5012 | <function>snd_dma_pci_data(pci)</function> |
5013 | (see the section | 5013 | (see the section |
5014 | <link linkend="buffer-and-memory-non-contiguous"><citetitle>Non-Contiguous Buffers | 5014 | <link linkend="buffer-and-memory-non-contiguous"><citetitle>Non-Contiguous Buffers |
5015 | </citetitle></link>). | 5015 | </citetitle></link>). |
5016 | </para> | 5016 | </para> |
5017 | 5017 | ||
5018 | <para> | 5018 | <para> |
5019 | Once when the buffer is pre-allocated, you can use the | 5019 | Once when the buffer is pre-allocated, you can use the |
5020 | allocator in the <structfield>hw_params</structfield> callback | 5020 | allocator in the <structfield>hw_params</structfield> callback |
5021 | 5021 | ||
5022 | <informalexample> | 5022 | <informalexample> |
5023 | <programlisting> | 5023 | <programlisting> |
5024 | <![CDATA[ | 5024 | <![CDATA[ |
5025 | snd_pcm_lib_malloc_pages(substream, size); | 5025 | snd_pcm_lib_malloc_pages(substream, size); |
5026 | ]]> | 5026 | ]]> |
5027 | </programlisting> | 5027 | </programlisting> |
5028 | </informalexample> | 5028 | </informalexample> |
5029 | 5029 | ||
5030 | Note that you have to pre-allocate to use this function. | 5030 | Note that you have to pre-allocate to use this function. |
5031 | </para> | 5031 | </para> |
5032 | </section> | 5032 | </section> |
5033 | 5033 | ||
5034 | <section id="buffer-and-memory-external-hardware"> | 5034 | <section id="buffer-and-memory-external-hardware"> |
5035 | <title>External Hardware Buffers</title> | 5035 | <title>External Hardware Buffers</title> |
5036 | <para> | 5036 | <para> |
5037 | Some chips have their own hardware buffers and the DMA | 5037 | Some chips have their own hardware buffers and the DMA |
5038 | transfer from the host memory is not available. In such a case, | 5038 | transfer from the host memory is not available. In such a case, |
5039 | you need to either 1) copy/set the audio data directly to the | 5039 | you need to either 1) copy/set the audio data directly to the |
5040 | external hardware buffer, or 2) make an intermediate buffer and | 5040 | external hardware buffer, or 2) make an intermediate buffer and |
5041 | copy/set the data from it to the external hardware buffer in | 5041 | copy/set the data from it to the external hardware buffer in |
5042 | interrupts (or in tasklets, preferably). | 5042 | interrupts (or in tasklets, preferably). |
5043 | </para> | 5043 | </para> |
5044 | 5044 | ||
5045 | <para> | 5045 | <para> |
5046 | The first case works fine if the external hardware buffer is enough | 5046 | The first case works fine if the external hardware buffer is enough |
5047 | large. This method doesn't need any extra buffers and thus is | 5047 | large. This method doesn't need any extra buffers and thus is |
5048 | more effective. You need to define the | 5048 | more effective. You need to define the |
5049 | <structfield>copy</structfield> and | 5049 | <structfield>copy</structfield> and |
5050 | <structfield>silence</structfield> callbacks for | 5050 | <structfield>silence</structfield> callbacks for |
5051 | the data transfer. However, there is a drawback: it cannot | 5051 | the data transfer. However, there is a drawback: it cannot |
5052 | be mmapped. The examples are GUS's GF1 PCM or emu8000's | 5052 | be mmapped. The examples are GUS's GF1 PCM or emu8000's |
5053 | wavetable PCM. | 5053 | wavetable PCM. |
5054 | </para> | 5054 | </para> |
5055 | 5055 | ||
5056 | <para> | 5056 | <para> |
5057 | The second case allows the mmap of the buffer, although you have | 5057 | The second case allows the mmap of the buffer, although you have |
5058 | to handle an interrupt or a tasklet for transferring the data | 5058 | to handle an interrupt or a tasklet for transferring the data |
5059 | from the intermediate buffer to the hardware buffer. You can find an | 5059 | from the intermediate buffer to the hardware buffer. You can find an |
5060 | example in vxpocket driver. | 5060 | example in vxpocket driver. |
5061 | </para> | 5061 | </para> |
5062 | 5062 | ||
5063 | <para> | 5063 | <para> |
5064 | Another case is that the chip uses a PCI memory-map | 5064 | Another case is that the chip uses a PCI memory-map |
5065 | region for the buffer instead of the host memory. In this case, | 5065 | region for the buffer instead of the host memory. In this case, |
5066 | mmap is available only on certain architectures like intel. In | 5066 | mmap is available only on certain architectures like intel. In |
5067 | non-mmap mode, the data cannot be transferred as the normal | 5067 | non-mmap mode, the data cannot be transferred as the normal |
5068 | way. Thus you need to define <structfield>copy</structfield> and | 5068 | way. Thus you need to define <structfield>copy</structfield> and |
5069 | <structfield>silence</structfield> callbacks as well | 5069 | <structfield>silence</structfield> callbacks as well |
5070 | as in the cases above. The examples are found in | 5070 | as in the cases above. The examples are found in |
5071 | <filename>rme32.c</filename> and <filename>rme96.c</filename>. | 5071 | <filename>rme32.c</filename> and <filename>rme96.c</filename>. |
5072 | </para> | 5072 | </para> |
5073 | 5073 | ||
5074 | <para> | 5074 | <para> |
5075 | The implementation of <structfield>copy</structfield> and | 5075 | The implementation of <structfield>copy</structfield> and |
5076 | <structfield>silence</structfield> callbacks depends upon | 5076 | <structfield>silence</structfield> callbacks depends upon |
5077 | whether the hardware supports interleaved or non-interleaved | 5077 | whether the hardware supports interleaved or non-interleaved |
5078 | samples. The <structfield>copy</structfield> callback is | 5078 | samples. The <structfield>copy</structfield> callback is |
5079 | defined like below, a bit | 5079 | defined like below, a bit |
5080 | differently depending whether the direction is playback or | 5080 | differently depending whether the direction is playback or |
5081 | capture: | 5081 | capture: |
5082 | 5082 | ||
5083 | <informalexample> | 5083 | <informalexample> |
5084 | <programlisting> | 5084 | <programlisting> |
5085 | <![CDATA[ | 5085 | <![CDATA[ |
5086 | static int playback_copy(struct snd_pcm_substream *substream, int channel, | 5086 | static int playback_copy(struct snd_pcm_substream *substream, int channel, |
5087 | snd_pcm_uframes_t pos, void *src, snd_pcm_uframes_t count); | 5087 | snd_pcm_uframes_t pos, void *src, snd_pcm_uframes_t count); |
5088 | static int capture_copy(struct snd_pcm_substream *substream, int channel, | 5088 | static int capture_copy(struct snd_pcm_substream *substream, int channel, |
5089 | snd_pcm_uframes_t pos, void *dst, snd_pcm_uframes_t count); | 5089 | snd_pcm_uframes_t pos, void *dst, snd_pcm_uframes_t count); |
5090 | ]]> | 5090 | ]]> |
5091 | </programlisting> | 5091 | </programlisting> |
5092 | </informalexample> | 5092 | </informalexample> |
5093 | </para> | 5093 | </para> |
5094 | 5094 | ||
5095 | <para> | 5095 | <para> |
5096 | In the case of interleaved samples, the second argument | 5096 | In the case of interleaved samples, the second argument |
5097 | (<parameter>channel</parameter>) is not used. The third argument | 5097 | (<parameter>channel</parameter>) is not used. The third argument |
5098 | (<parameter>pos</parameter>) points the | 5098 | (<parameter>pos</parameter>) points the |
5099 | current position offset in frames. | 5099 | current position offset in frames. |
5100 | </para> | 5100 | </para> |
5101 | 5101 | ||
5102 | <para> | 5102 | <para> |
5103 | The meaning of the fourth argument is different between | 5103 | The meaning of the fourth argument is different between |
5104 | playback and capture. For playback, it holds the source data | 5104 | playback and capture. For playback, it holds the source data |
5105 | pointer, and for capture, it's the destination data pointer. | 5105 | pointer, and for capture, it's the destination data pointer. |
5106 | </para> | 5106 | </para> |
5107 | 5107 | ||
5108 | <para> | 5108 | <para> |
5109 | The last argument is the number of frames to be copied. | 5109 | The last argument is the number of frames to be copied. |
5110 | </para> | 5110 | </para> |
5111 | 5111 | ||
5112 | <para> | 5112 | <para> |
5113 | What you have to do in this callback is again different | 5113 | What you have to do in this callback is again different |
5114 | between playback and capture directions. In the case of | 5114 | between playback and capture directions. In the case of |
5115 | playback, you do: copy the given amount of data | 5115 | playback, you do: copy the given amount of data |
5116 | (<parameter>count</parameter>) at the specified pointer | 5116 | (<parameter>count</parameter>) at the specified pointer |
5117 | (<parameter>src</parameter>) to the specified offset | 5117 | (<parameter>src</parameter>) to the specified offset |
5118 | (<parameter>pos</parameter>) on the hardware buffer. When | 5118 | (<parameter>pos</parameter>) on the hardware buffer. When |
5119 | coded like memcpy-like way, the copy would be like: | 5119 | coded like memcpy-like way, the copy would be like: |
5120 | 5120 | ||
5121 | <informalexample> | 5121 | <informalexample> |
5122 | <programlisting> | 5122 | <programlisting> |
5123 | <![CDATA[ | 5123 | <![CDATA[ |
5124 | my_memcpy(my_buffer + frames_to_bytes(runtime, pos), src, | 5124 | my_memcpy(my_buffer + frames_to_bytes(runtime, pos), src, |
5125 | frames_to_bytes(runtime, count)); | 5125 | frames_to_bytes(runtime, count)); |
5126 | ]]> | 5126 | ]]> |
5127 | </programlisting> | 5127 | </programlisting> |
5128 | </informalexample> | 5128 | </informalexample> |
5129 | </para> | 5129 | </para> |
5130 | 5130 | ||
5131 | <para> | 5131 | <para> |
5132 | For the capture direction, you do: copy the given amount of | 5132 | For the capture direction, you do: copy the given amount of |
5133 | data (<parameter>count</parameter>) at the specified offset | 5133 | data (<parameter>count</parameter>) at the specified offset |
5134 | (<parameter>pos</parameter>) on the hardware buffer to the | 5134 | (<parameter>pos</parameter>) on the hardware buffer to the |
5135 | specified pointer (<parameter>dst</parameter>). | 5135 | specified pointer (<parameter>dst</parameter>). |
5136 | 5136 | ||
5137 | <informalexample> | 5137 | <informalexample> |
5138 | <programlisting> | 5138 | <programlisting> |
5139 | <![CDATA[ | 5139 | <![CDATA[ |
5140 | my_memcpy(dst, my_buffer + frames_to_bytes(runtime, pos), | 5140 | my_memcpy(dst, my_buffer + frames_to_bytes(runtime, pos), |
5141 | frames_to_bytes(runtime, count)); | 5141 | frames_to_bytes(runtime, count)); |
5142 | ]]> | 5142 | ]]> |
5143 | </programlisting> | 5143 | </programlisting> |
5144 | </informalexample> | 5144 | </informalexample> |
5145 | 5145 | ||
5146 | Note that both of the position and the data amount are given | 5146 | Note that both of the position and the data amount are given |
5147 | in frames. | 5147 | in frames. |
5148 | </para> | 5148 | </para> |
5149 | 5149 | ||
5150 | <para> | 5150 | <para> |
5151 | In the case of non-interleaved samples, the implementation | 5151 | In the case of non-interleaved samples, the implementation |
5152 | will be a bit more complicated. | 5152 | will be a bit more complicated. |
5153 | </para> | 5153 | </para> |
5154 | 5154 | ||
5155 | <para> | 5155 | <para> |
5156 | You need to check the channel argument, and if it's -1, copy | 5156 | You need to check the channel argument, and if it's -1, copy |
5157 | the whole channels. Otherwise, you have to copy only the | 5157 | the whole channels. Otherwise, you have to copy only the |
5158 | specified channel. Please check | 5158 | specified channel. Please check |
5159 | <filename>isa/gus/gus_pcm.c</filename> as an example. | 5159 | <filename>isa/gus/gus_pcm.c</filename> as an example. |
5160 | </para> | 5160 | </para> |
5161 | 5161 | ||
5162 | <para> | 5162 | <para> |
5163 | The <structfield>silence</structfield> callback is also | 5163 | The <structfield>silence</structfield> callback is also |
5164 | implemented in a similar way. | 5164 | implemented in a similar way. |
5165 | 5165 | ||
5166 | <informalexample> | 5166 | <informalexample> |
5167 | <programlisting> | 5167 | <programlisting> |
5168 | <![CDATA[ | 5168 | <![CDATA[ |
5169 | static int silence(struct snd_pcm_substream *substream, int channel, | 5169 | static int silence(struct snd_pcm_substream *substream, int channel, |
5170 | snd_pcm_uframes_t pos, snd_pcm_uframes_t count); | 5170 | snd_pcm_uframes_t pos, snd_pcm_uframes_t count); |
5171 | ]]> | 5171 | ]]> |
5172 | </programlisting> | 5172 | </programlisting> |
5173 | </informalexample> | 5173 | </informalexample> |
5174 | </para> | 5174 | </para> |
5175 | 5175 | ||
5176 | <para> | 5176 | <para> |
5177 | The meanings of arguments are identical with the | 5177 | The meanings of arguments are identical with the |
5178 | <structfield>copy</structfield> | 5178 | <structfield>copy</structfield> |
5179 | callback, although there is no <parameter>src/dst</parameter> | 5179 | callback, although there is no <parameter>src/dst</parameter> |
5180 | argument. In the case of interleaved samples, the channel | 5180 | argument. In the case of interleaved samples, the channel |
5181 | argument has no meaning, as well as on | 5181 | argument has no meaning, as well as on |
5182 | <structfield>copy</structfield> callback. | 5182 | <structfield>copy</structfield> callback. |
5183 | </para> | 5183 | </para> |
5184 | 5184 | ||
5185 | <para> | 5185 | <para> |
5186 | The role of <structfield>silence</structfield> callback is to | 5186 | The role of <structfield>silence</structfield> callback is to |
5187 | set the given amount | 5187 | set the given amount |
5188 | (<parameter>count</parameter>) of silence data at the | 5188 | (<parameter>count</parameter>) of silence data at the |
5189 | specified offset (<parameter>pos</parameter>) on the hardware | 5189 | specified offset (<parameter>pos</parameter>) on the hardware |
5190 | buffer. Suppose that the data format is signed (that is, the | 5190 | buffer. Suppose that the data format is signed (that is, the |
5191 | silent-data is 0), and the implementation using a memset-like | 5191 | silent-data is 0), and the implementation using a memset-like |
5192 | function would be like: | 5192 | function would be like: |
5193 | 5193 | ||
5194 | <informalexample> | 5194 | <informalexample> |
5195 | <programlisting> | 5195 | <programlisting> |
5196 | <![CDATA[ | 5196 | <![CDATA[ |
5197 | my_memcpy(my_buffer + frames_to_bytes(runtime, pos), 0, | 5197 | my_memcpy(my_buffer + frames_to_bytes(runtime, pos), 0, |
5198 | frames_to_bytes(runtime, count)); | 5198 | frames_to_bytes(runtime, count)); |
5199 | ]]> | 5199 | ]]> |
5200 | </programlisting> | 5200 | </programlisting> |
5201 | </informalexample> | 5201 | </informalexample> |
5202 | </para> | 5202 | </para> |
5203 | 5203 | ||
5204 | <para> | 5204 | <para> |
5205 | In the case of non-interleaved samples, again, the | 5205 | In the case of non-interleaved samples, again, the |
5206 | implementation becomes a bit more complicated. See, for example, | 5206 | implementation becomes a bit more complicated. See, for example, |
5207 | <filename>isa/gus/gus_pcm.c</filename>. | 5207 | <filename>isa/gus/gus_pcm.c</filename>. |
5208 | </para> | 5208 | </para> |
5209 | </section> | 5209 | </section> |
5210 | 5210 | ||
5211 | <section id="buffer-and-memory-non-contiguous"> | 5211 | <section id="buffer-and-memory-non-contiguous"> |
5212 | <title>Non-Contiguous Buffers</title> | 5212 | <title>Non-Contiguous Buffers</title> |
5213 | <para> | 5213 | <para> |
5214 | If your hardware supports the page table like emu10k1 or the | 5214 | If your hardware supports the page table like emu10k1 or the |
5215 | buffer descriptors like via82xx, you can use the scatter-gather | 5215 | buffer descriptors like via82xx, you can use the scatter-gather |
5216 | (SG) DMA. ALSA provides an interface for handling SG-buffers. | 5216 | (SG) DMA. ALSA provides an interface for handling SG-buffers. |
5217 | The API is provided in <filename><sound/pcm.h></filename>. | 5217 | The API is provided in <filename><sound/pcm.h></filename>. |
5218 | </para> | 5218 | </para> |
5219 | 5219 | ||
5220 | <para> | 5220 | <para> |
5221 | For creating the SG-buffer handler, call | 5221 | For creating the SG-buffer handler, call |
5222 | <function>snd_pcm_lib_preallocate_pages()</function> or | 5222 | <function>snd_pcm_lib_preallocate_pages()</function> or |
5223 | <function>snd_pcm_lib_preallocate_pages_for_all()</function> | 5223 | <function>snd_pcm_lib_preallocate_pages_for_all()</function> |
5224 | with <constant>SNDRV_DMA_TYPE_DEV_SG</constant> | 5224 | with <constant>SNDRV_DMA_TYPE_DEV_SG</constant> |
5225 | in the PCM constructor like other PCI pre-allocator. | 5225 | in the PCM constructor like other PCI pre-allocator. |
5226 | You need to pass the <function>snd_dma_pci_data(pci)</function>, | 5226 | You need to pass the <function>snd_dma_pci_data(pci)</function>, |
5227 | where pci is the struct <structname>pci_dev</structname> pointer | 5227 | where pci is the struct <structname>pci_dev</structname> pointer |
5228 | of the chip as well. | 5228 | of the chip as well. |
5229 | The <type>struct snd_sg_buf</type> instance is created as | 5229 | The <type>struct snd_sg_buf</type> instance is created as |
5230 | substream->dma_private. You can cast | 5230 | substream->dma_private. You can cast |
5231 | the pointer like: | 5231 | the pointer like: |
5232 | 5232 | ||
5233 | <informalexample> | 5233 | <informalexample> |
5234 | <programlisting> | 5234 | <programlisting> |
5235 | <![CDATA[ | 5235 | <![CDATA[ |
5236 | struct snd_sg_buf *sgbuf = (struct snd_sg_buf *)substream->dma_private; | 5236 | struct snd_sg_buf *sgbuf = (struct snd_sg_buf *)substream->dma_private; |
5237 | ]]> | 5237 | ]]> |
5238 | </programlisting> | 5238 | </programlisting> |
5239 | </informalexample> | 5239 | </informalexample> |
5240 | </para> | 5240 | </para> |
5241 | 5241 | ||
5242 | <para> | 5242 | <para> |
5243 | Then call <function>snd_pcm_lib_malloc_pages()</function> | 5243 | Then call <function>snd_pcm_lib_malloc_pages()</function> |
5244 | in <structfield>hw_params</structfield> callback | 5244 | in <structfield>hw_params</structfield> callback |
5245 | as well as in the case of normal PCI buffer. | 5245 | as well as in the case of normal PCI buffer. |
5246 | The SG-buffer handler will allocate the non-contiguous kernel | 5246 | The SG-buffer handler will allocate the non-contiguous kernel |
5247 | pages of the given size and map them onto the virtually contiguous | 5247 | pages of the given size and map them onto the virtually contiguous |
5248 | memory. The virtual pointer is addressed in runtime->dma_area. | 5248 | memory. The virtual pointer is addressed in runtime->dma_area. |
5249 | The physical address (runtime->dma_addr) is set to zero, | 5249 | The physical address (runtime->dma_addr) is set to zero, |
5250 | because the buffer is physically non-contigous. | 5250 | because the buffer is physically non-contigous. |
5251 | The physical address table is set up in sgbuf->table. | 5251 | The physical address table is set up in sgbuf->table. |
5252 | You can get the physical address at a certain offset via | 5252 | You can get the physical address at a certain offset via |
5253 | <function>snd_pcm_sgbuf_get_addr()</function>. | 5253 | <function>snd_pcm_sgbuf_get_addr()</function>. |
5254 | </para> | 5254 | </para> |
5255 | 5255 | ||
5256 | <para> | 5256 | <para> |
5257 | When a SG-handler is used, you need to set | 5257 | When a SG-handler is used, you need to set |
5258 | <function>snd_pcm_sgbuf_ops_page</function> as | 5258 | <function>snd_pcm_sgbuf_ops_page</function> as |
5259 | the <structfield>page</structfield> callback. | 5259 | the <structfield>page</structfield> callback. |
5260 | (See <link linkend="pcm-interface-operators-page-callback"> | 5260 | (See <link linkend="pcm-interface-operators-page-callback"> |
5261 | <citetitle>page callback section</citetitle></link>.) | 5261 | <citetitle>page callback section</citetitle></link>.) |
5262 | </para> | 5262 | </para> |
5263 | 5263 | ||
5264 | <para> | 5264 | <para> |
5265 | For releasing the data, call | 5265 | For releasing the data, call |
5266 | <function>snd_pcm_lib_free_pages()</function> in the | 5266 | <function>snd_pcm_lib_free_pages()</function> in the |
5267 | <structfield>hw_free</structfield> callback as usual. | 5267 | <structfield>hw_free</structfield> callback as usual. |
5268 | </para> | 5268 | </para> |
5269 | </section> | 5269 | </section> |
5270 | 5270 | ||
5271 | <section id="buffer-and-memory-vmalloced"> | 5271 | <section id="buffer-and-memory-vmalloced"> |
5272 | <title>Vmalloc'ed Buffers</title> | 5272 | <title>Vmalloc'ed Buffers</title> |
5273 | <para> | 5273 | <para> |
5274 | It's possible to use a buffer allocated via | 5274 | It's possible to use a buffer allocated via |
5275 | <function>vmalloc</function>, for example, for an intermediate | 5275 | <function>vmalloc</function>, for example, for an intermediate |
5276 | buffer. Since the allocated pages are not contiguous, you need | 5276 | buffer. Since the allocated pages are not contiguous, you need |
5277 | to set the <structfield>page</structfield> callback to obtain | 5277 | to set the <structfield>page</structfield> callback to obtain |
5278 | the physical address at every offset. | 5278 | the physical address at every offset. |
5279 | </para> | 5279 | </para> |
5280 | 5280 | ||
5281 | <para> | 5281 | <para> |
5282 | The implementation of <structfield>page</structfield> callback | 5282 | The implementation of <structfield>page</structfield> callback |
5283 | would be like this: | 5283 | would be like this: |
5284 | 5284 | ||
5285 | <informalexample> | 5285 | <informalexample> |
5286 | <programlisting> | 5286 | <programlisting> |
5287 | <![CDATA[ | 5287 | <![CDATA[ |
5288 | #include <linux/vmalloc.h> | 5288 | #include <linux/vmalloc.h> |
5289 | 5289 | ||
5290 | /* get the physical page pointer on the given offset */ | 5290 | /* get the physical page pointer on the given offset */ |
5291 | static struct page *mychip_page(struct snd_pcm_substream *substream, | 5291 | static struct page *mychip_page(struct snd_pcm_substream *substream, |
5292 | unsigned long offset) | 5292 | unsigned long offset) |
5293 | { | 5293 | { |
5294 | void *pageptr = substream->runtime->dma_area + offset; | 5294 | void *pageptr = substream->runtime->dma_area + offset; |
5295 | return vmalloc_to_page(pageptr); | 5295 | return vmalloc_to_page(pageptr); |
5296 | } | 5296 | } |
5297 | ]]> | 5297 | ]]> |
5298 | </programlisting> | 5298 | </programlisting> |
5299 | </informalexample> | 5299 | </informalexample> |
5300 | </para> | 5300 | </para> |
5301 | </section> | 5301 | </section> |
5302 | 5302 | ||
5303 | </chapter> | 5303 | </chapter> |
5304 | 5304 | ||
5305 | 5305 | ||
5306 | <!-- ****************************************************** --> | 5306 | <!-- ****************************************************** --> |
5307 | <!-- Proc Interface --> | 5307 | <!-- Proc Interface --> |
5308 | <!-- ****************************************************** --> | 5308 | <!-- ****************************************************** --> |
5309 | <chapter id="proc-interface"> | 5309 | <chapter id="proc-interface"> |
5310 | <title>Proc Interface</title> | 5310 | <title>Proc Interface</title> |
5311 | <para> | 5311 | <para> |
5312 | ALSA provides an easy interface for procfs. The proc files are | 5312 | ALSA provides an easy interface for procfs. The proc files are |
5313 | very useful for debugging. I recommend you set up proc files if | 5313 | very useful for debugging. I recommend you set up proc files if |
5314 | you write a driver and want to get a running status or register | 5314 | you write a driver and want to get a running status or register |
5315 | dumps. The API is found in | 5315 | dumps. The API is found in |
5316 | <filename><sound/info.h></filename>. | 5316 | <filename><sound/info.h></filename>. |
5317 | </para> | 5317 | </para> |
5318 | 5318 | ||
5319 | <para> | 5319 | <para> |
5320 | For creating a proc file, call | 5320 | For creating a proc file, call |
5321 | <function>snd_card_proc_new()</function>. | 5321 | <function>snd_card_proc_new()</function>. |
5322 | 5322 | ||
5323 | <informalexample> | 5323 | <informalexample> |
5324 | <programlisting> | 5324 | <programlisting> |
5325 | <![CDATA[ | 5325 | <![CDATA[ |
5326 | struct snd_info_entry *entry; | 5326 | struct snd_info_entry *entry; |
5327 | int err = snd_card_proc_new(card, "my-file", &entry); | 5327 | int err = snd_card_proc_new(card, "my-file", &entry); |
5328 | ]]> | 5328 | ]]> |
5329 | </programlisting> | 5329 | </programlisting> |
5330 | </informalexample> | 5330 | </informalexample> |
5331 | 5331 | ||
5332 | where the second argument specifies the proc-file name to be | 5332 | where the second argument specifies the proc-file name to be |
5333 | created. The above example will create a file | 5333 | created. The above example will create a file |
5334 | <filename>my-file</filename> under the card directory, | 5334 | <filename>my-file</filename> under the card directory, |
5335 | e.g. <filename>/proc/asound/card0/my-file</filename>. | 5335 | e.g. <filename>/proc/asound/card0/my-file</filename>. |
5336 | </para> | 5336 | </para> |
5337 | 5337 | ||
5338 | <para> | 5338 | <para> |
5339 | Like other components, the proc entry created via | 5339 | Like other components, the proc entry created via |
5340 | <function>snd_card_proc_new()</function> will be registered and | 5340 | <function>snd_card_proc_new()</function> will be registered and |
5341 | released automatically in the card registration and release | 5341 | released automatically in the card registration and release |
5342 | functions. | 5342 | functions. |
5343 | </para> | 5343 | </para> |
5344 | 5344 | ||
5345 | <para> | 5345 | <para> |
5346 | When the creation is successful, the function stores a new | 5346 | When the creation is successful, the function stores a new |
5347 | instance at the pointer given in the third argument. | 5347 | instance at the pointer given in the third argument. |
5348 | It is initialized as a text proc file for read only. For using | 5348 | It is initialized as a text proc file for read only. For using |
5349 | this proc file as a read-only text file as it is, set the read | 5349 | this proc file as a read-only text file as it is, set the read |
5350 | callback with a private data via | 5350 | callback with a private data via |
5351 | <function>snd_info_set_text_ops()</function>. | 5351 | <function>snd_info_set_text_ops()</function>. |
5352 | 5352 | ||
5353 | <informalexample> | 5353 | <informalexample> |
5354 | <programlisting> | 5354 | <programlisting> |
5355 | <![CDATA[ | 5355 | <![CDATA[ |
5356 | snd_info_set_text_ops(entry, chip, my_proc_read); | 5356 | snd_info_set_text_ops(entry, chip, my_proc_read); |
5357 | ]]> | 5357 | ]]> |
5358 | </programlisting> | 5358 | </programlisting> |
5359 | </informalexample> | 5359 | </informalexample> |
5360 | 5360 | ||
5361 | where the second argument (<parameter>chip</parameter>) is the | 5361 | where the second argument (<parameter>chip</parameter>) is the |
5362 | private data to be used in the callbacks. The third parameter | 5362 | private data to be used in the callbacks. The third parameter |
5363 | specifies the read buffer size and the fourth | 5363 | specifies the read buffer size and the fourth |
5364 | (<parameter>my_proc_read</parameter>) is the callback function, which | 5364 | (<parameter>my_proc_read</parameter>) is the callback function, which |
5365 | is defined like | 5365 | is defined like |
5366 | 5366 | ||
5367 | <informalexample> | 5367 | <informalexample> |
5368 | <programlisting> | 5368 | <programlisting> |
5369 | <![CDATA[ | 5369 | <![CDATA[ |
5370 | static void my_proc_read(struct snd_info_entry *entry, | 5370 | static void my_proc_read(struct snd_info_entry *entry, |
5371 | struct snd_info_buffer *buffer); | 5371 | struct snd_info_buffer *buffer); |
5372 | ]]> | 5372 | ]]> |
5373 | </programlisting> | 5373 | </programlisting> |
5374 | </informalexample> | 5374 | </informalexample> |
5375 | 5375 | ||
5376 | </para> | 5376 | </para> |
5377 | 5377 | ||
5378 | <para> | 5378 | <para> |
5379 | In the read callback, use <function>snd_iprintf()</function> for | 5379 | In the read callback, use <function>snd_iprintf()</function> for |
5380 | output strings, which works just like normal | 5380 | output strings, which works just like normal |
5381 | <function>printf()</function>. For example, | 5381 | <function>printf()</function>. For example, |
5382 | 5382 | ||
5383 | <informalexample> | 5383 | <informalexample> |
5384 | <programlisting> | 5384 | <programlisting> |
5385 | <![CDATA[ | 5385 | <![CDATA[ |
5386 | static void my_proc_read(struct snd_info_entry *entry, | 5386 | static void my_proc_read(struct snd_info_entry *entry, |
5387 | struct snd_info_buffer *buffer) | 5387 | struct snd_info_buffer *buffer) |
5388 | { | 5388 | { |
5389 | struct my_chip *chip = entry->private_data; | 5389 | struct my_chip *chip = entry->private_data; |
5390 | 5390 | ||
5391 | snd_iprintf(buffer, "This is my chip!\n"); | 5391 | snd_iprintf(buffer, "This is my chip!\n"); |
5392 | snd_iprintf(buffer, "Port = %ld\n", chip->port); | 5392 | snd_iprintf(buffer, "Port = %ld\n", chip->port); |
5393 | } | 5393 | } |
5394 | ]]> | 5394 | ]]> |
5395 | </programlisting> | 5395 | </programlisting> |
5396 | </informalexample> | 5396 | </informalexample> |
5397 | </para> | 5397 | </para> |
5398 | 5398 | ||
5399 | <para> | 5399 | <para> |
5400 | The file permission can be changed afterwards. As default, it's | 5400 | The file permission can be changed afterwards. As default, it's |
5401 | set as read only for all users. If you want to add the write | 5401 | set as read only for all users. If you want to add the write |
5402 | permission to the user (root as default), set like below: | 5402 | permission to the user (root as default), set like below: |
5403 | 5403 | ||
5404 | <informalexample> | 5404 | <informalexample> |
5405 | <programlisting> | 5405 | <programlisting> |
5406 | <![CDATA[ | 5406 | <![CDATA[ |
5407 | entry->mode = S_IFREG | S_IRUGO | S_IWUSR; | 5407 | entry->mode = S_IFREG | S_IRUGO | S_IWUSR; |
5408 | ]]> | 5408 | ]]> |
5409 | </programlisting> | 5409 | </programlisting> |
5410 | </informalexample> | 5410 | </informalexample> |
5411 | 5411 | ||
5412 | and set the write buffer size and the callback | 5412 | and set the write buffer size and the callback |
5413 | 5413 | ||
5414 | <informalexample> | 5414 | <informalexample> |
5415 | <programlisting> | 5415 | <programlisting> |
5416 | <![CDATA[ | 5416 | <![CDATA[ |
5417 | entry->c.text.write = my_proc_write; | 5417 | entry->c.text.write = my_proc_write; |
5418 | ]]> | 5418 | ]]> |
5419 | </programlisting> | 5419 | </programlisting> |
5420 | </informalexample> | 5420 | </informalexample> |
5421 | </para> | 5421 | </para> |
5422 | 5422 | ||
5423 | <para> | 5423 | <para> |
5424 | For the write callback, you can use | 5424 | For the write callback, you can use |
5425 | <function>snd_info_get_line()</function> to get a text line, and | 5425 | <function>snd_info_get_line()</function> to get a text line, and |
5426 | <function>snd_info_get_str()</function> to retrieve a string from | 5426 | <function>snd_info_get_str()</function> to retrieve a string from |
5427 | the line. Some examples are found in | 5427 | the line. Some examples are found in |
5428 | <filename>core/oss/mixer_oss.c</filename>, core/oss/and | 5428 | <filename>core/oss/mixer_oss.c</filename>, core/oss/and |
5429 | <filename>pcm_oss.c</filename>. | 5429 | <filename>pcm_oss.c</filename>. |
5430 | </para> | 5430 | </para> |
5431 | 5431 | ||
5432 | <para> | 5432 | <para> |
5433 | For a raw-data proc-file, set the attributes like the following: | 5433 | For a raw-data proc-file, set the attributes like the following: |
5434 | 5434 | ||
5435 | <informalexample> | 5435 | <informalexample> |
5436 | <programlisting> | 5436 | <programlisting> |
5437 | <![CDATA[ | 5437 | <![CDATA[ |
5438 | static struct snd_info_entry_ops my_file_io_ops = { | 5438 | static struct snd_info_entry_ops my_file_io_ops = { |
5439 | .read = my_file_io_read, | 5439 | .read = my_file_io_read, |
5440 | }; | 5440 | }; |
5441 | 5441 | ||
5442 | entry->content = SNDRV_INFO_CONTENT_DATA; | 5442 | entry->content = SNDRV_INFO_CONTENT_DATA; |
5443 | entry->private_data = chip; | 5443 | entry->private_data = chip; |
5444 | entry->c.ops = &my_file_io_ops; | 5444 | entry->c.ops = &my_file_io_ops; |
5445 | entry->size = 4096; | 5445 | entry->size = 4096; |
5446 | entry->mode = S_IFREG | S_IRUGO; | 5446 | entry->mode = S_IFREG | S_IRUGO; |
5447 | ]]> | 5447 | ]]> |
5448 | </programlisting> | 5448 | </programlisting> |
5449 | </informalexample> | 5449 | </informalexample> |
5450 | </para> | 5450 | </para> |
5451 | 5451 | ||
5452 | <para> | 5452 | <para> |
5453 | The callback is much more complicated than the text-file | 5453 | The callback is much more complicated than the text-file |
5454 | version. You need to use a low-level i/o functions such as | 5454 | version. You need to use a low-level i/o functions such as |
5455 | <function>copy_from/to_user()</function> to transfer the | 5455 | <function>copy_from/to_user()</function> to transfer the |
5456 | data. | 5456 | data. |
5457 | 5457 | ||
5458 | <informalexample> | 5458 | <informalexample> |
5459 | <programlisting> | 5459 | <programlisting> |
5460 | <![CDATA[ | 5460 | <![CDATA[ |
5461 | static long my_file_io_read(struct snd_info_entry *entry, | 5461 | static long my_file_io_read(struct snd_info_entry *entry, |
5462 | void *file_private_data, | 5462 | void *file_private_data, |
5463 | struct file *file, | 5463 | struct file *file, |
5464 | char *buf, | 5464 | char *buf, |
5465 | unsigned long count, | 5465 | unsigned long count, |
5466 | unsigned long pos) | 5466 | unsigned long pos) |
5467 | { | 5467 | { |
5468 | long size = count; | 5468 | long size = count; |
5469 | if (pos + size > local_max_size) | 5469 | if (pos + size > local_max_size) |
5470 | size = local_max_size - pos; | 5470 | size = local_max_size - pos; |
5471 | if (copy_to_user(buf, local_data + pos, size)) | 5471 | if (copy_to_user(buf, local_data + pos, size)) |
5472 | return -EFAULT; | 5472 | return -EFAULT; |
5473 | return size; | 5473 | return size; |
5474 | } | 5474 | } |
5475 | ]]> | 5475 | ]]> |
5476 | </programlisting> | 5476 | </programlisting> |
5477 | </informalexample> | 5477 | </informalexample> |
5478 | </para> | 5478 | </para> |
5479 | 5479 | ||
5480 | </chapter> | 5480 | </chapter> |
5481 | 5481 | ||
5482 | 5482 | ||
5483 | <!-- ****************************************************** --> | 5483 | <!-- ****************************************************** --> |
5484 | <!-- Power Management --> | 5484 | <!-- Power Management --> |
5485 | <!-- ****************************************************** --> | 5485 | <!-- ****************************************************** --> |
5486 | <chapter id="power-management"> | 5486 | <chapter id="power-management"> |
5487 | <title>Power Management</title> | 5487 | <title>Power Management</title> |
5488 | <para> | 5488 | <para> |
5489 | If the chip is supposed to work with with suspend/resume | 5489 | If the chip is supposed to work with suspend/resume |
5490 | functions, you need to add the power-management codes to the | 5490 | functions, you need to add the power-management codes to the |
5491 | driver. The additional codes for the power-management should be | 5491 | driver. The additional codes for the power-management should be |
5492 | <function>ifdef</function>'ed with | 5492 | <function>ifdef</function>'ed with |
5493 | <constant>CONFIG_PM</constant>. | 5493 | <constant>CONFIG_PM</constant>. |
5494 | </para> | 5494 | </para> |
5495 | 5495 | ||
5496 | <para> | 5496 | <para> |
5497 | If the driver supports the suspend/resume | 5497 | If the driver supports the suspend/resume |
5498 | <emphasis>fully</emphasis>, that is, the device can be | 5498 | <emphasis>fully</emphasis>, that is, the device can be |
5499 | properly resumed to the status at the suspend is called, | 5499 | properly resumed to the status at the suspend is called, |
5500 | you can set <constant>SNDRV_PCM_INFO_RESUME</constant> flag | 5500 | you can set <constant>SNDRV_PCM_INFO_RESUME</constant> flag |
5501 | to pcm info field. Usually, this is possible when the | 5501 | to pcm info field. Usually, this is possible when the |
5502 | registers of ths chip can be safely saved and restored to the | 5502 | registers of ths chip can be safely saved and restored to the |
5503 | RAM. If this is set, the trigger callback is called with | 5503 | RAM. If this is set, the trigger callback is called with |
5504 | <constant>SNDRV_PCM_TRIGGER_RESUME</constant> after resume | 5504 | <constant>SNDRV_PCM_TRIGGER_RESUME</constant> after resume |
5505 | callback is finished. | 5505 | callback is finished. |
5506 | </para> | 5506 | </para> |
5507 | 5507 | ||
5508 | <para> | 5508 | <para> |
5509 | Even if the driver doesn't support PM fully but only the | 5509 | Even if the driver doesn't support PM fully but only the |
5510 | partial suspend/resume is possible, it's still worthy to | 5510 | partial suspend/resume is possible, it's still worthy to |
5511 | implement suspend/resume callbacks. In such a case, applications | 5511 | implement suspend/resume callbacks. In such a case, applications |
5512 | would reset the status by calling | 5512 | would reset the status by calling |
5513 | <function>snd_pcm_prepare()</function> and restart the stream | 5513 | <function>snd_pcm_prepare()</function> and restart the stream |
5514 | appropriately. Hence, you can define suspend/resume callbacks | 5514 | appropriately. Hence, you can define suspend/resume callbacks |
5515 | below but don't set <constant>SNDRV_PCM_INFO_RESUME</constant> | 5515 | below but don't set <constant>SNDRV_PCM_INFO_RESUME</constant> |
5516 | info flag to the PCM. | 5516 | info flag to the PCM. |
5517 | </para> | 5517 | </para> |
5518 | 5518 | ||
5519 | <para> | 5519 | <para> |
5520 | Note that the trigger with SUSPEND can be always called when | 5520 | Note that the trigger with SUSPEND can be always called when |
5521 | <function>snd_pcm_suspend_all</function> is called, | 5521 | <function>snd_pcm_suspend_all</function> is called, |
5522 | regardless of <constant>SNDRV_PCM_INFO_RESUME</constant> flag. | 5522 | regardless of <constant>SNDRV_PCM_INFO_RESUME</constant> flag. |
5523 | The <constant>RESUME</constant> flag affects only the behavior | 5523 | The <constant>RESUME</constant> flag affects only the behavior |
5524 | of <function>snd_pcm_resume()</function>. | 5524 | of <function>snd_pcm_resume()</function>. |
5525 | (Thus, in theory, | 5525 | (Thus, in theory, |
5526 | <constant>SNDRV_PCM_TRIGGER_RESUME</constant> isn't needed | 5526 | <constant>SNDRV_PCM_TRIGGER_RESUME</constant> isn't needed |
5527 | to be handled in the trigger callback when no | 5527 | to be handled in the trigger callback when no |
5528 | <constant>SNDRV_PCM_INFO_RESUME</constant> flag is set. But, | 5528 | <constant>SNDRV_PCM_INFO_RESUME</constant> flag is set. But, |
5529 | it's better to keep it for compatibility reason.) | 5529 | it's better to keep it for compatibility reason.) |
5530 | </para> | 5530 | </para> |
5531 | <para> | 5531 | <para> |
5532 | In the earlier version of ALSA drivers, a common | 5532 | In the earlier version of ALSA drivers, a common |
5533 | power-management layer was provided, but it has been removed. | 5533 | power-management layer was provided, but it has been removed. |
5534 | The driver needs to define the suspend/resume hooks according to | 5534 | The driver needs to define the suspend/resume hooks according to |
5535 | the bus the device is assigned. In the case of PCI driver, the | 5535 | the bus the device is assigned. In the case of PCI driver, the |
5536 | callbacks look like below: | 5536 | callbacks look like below: |
5537 | 5537 | ||
5538 | <informalexample> | 5538 | <informalexample> |
5539 | <programlisting> | 5539 | <programlisting> |
5540 | <![CDATA[ | 5540 | <![CDATA[ |
5541 | #ifdef CONFIG_PM | 5541 | #ifdef CONFIG_PM |
5542 | static int snd_my_suspend(struct pci_dev *pci, pm_message_t state) | 5542 | static int snd_my_suspend(struct pci_dev *pci, pm_message_t state) |
5543 | { | 5543 | { |
5544 | .... /* do things for suspsend */ | 5544 | .... /* do things for suspsend */ |
5545 | return 0; | 5545 | return 0; |
5546 | } | 5546 | } |
5547 | static int snd_my_resume(struct pci_dev *pci) | 5547 | static int snd_my_resume(struct pci_dev *pci) |
5548 | { | 5548 | { |
5549 | .... /* do things for suspsend */ | 5549 | .... /* do things for suspsend */ |
5550 | return 0; | 5550 | return 0; |
5551 | } | 5551 | } |
5552 | #endif | 5552 | #endif |
5553 | ]]> | 5553 | ]]> |
5554 | </programlisting> | 5554 | </programlisting> |
5555 | </informalexample> | 5555 | </informalexample> |
5556 | </para> | 5556 | </para> |
5557 | 5557 | ||
5558 | <para> | 5558 | <para> |
5559 | The scheme of the real suspend job is as following. | 5559 | The scheme of the real suspend job is as following. |
5560 | 5560 | ||
5561 | <orderedlist> | 5561 | <orderedlist> |
5562 | <listitem><para>Retrieve the card and the chip data.</para></listitem> | 5562 | <listitem><para>Retrieve the card and the chip data.</para></listitem> |
5563 | <listitem><para>Call <function>snd_power_change_state()</function> with | 5563 | <listitem><para>Call <function>snd_power_change_state()</function> with |
5564 | <constant>SNDRV_CTL_POWER_D3hot</constant> to change the | 5564 | <constant>SNDRV_CTL_POWER_D3hot</constant> to change the |
5565 | power status.</para></listitem> | 5565 | power status.</para></listitem> |
5566 | <listitem><para>Call <function>snd_pcm_suspend_all()</function> to suspend the running PCM streams.</para></listitem> | 5566 | <listitem><para>Call <function>snd_pcm_suspend_all()</function> to suspend the running PCM streams.</para></listitem> |
5567 | <listitem><para>If AC97 codecs are used, call | 5567 | <listitem><para>If AC97 codecs are used, call |
5568 | <function>snd_ac97_suspend()</function> for each codec.</para></listitem> | 5568 | <function>snd_ac97_suspend()</function> for each codec.</para></listitem> |
5569 | <listitem><para>Save the register values if necessary.</para></listitem> | 5569 | <listitem><para>Save the register values if necessary.</para></listitem> |
5570 | <listitem><para>Stop the hardware if necessary.</para></listitem> | 5570 | <listitem><para>Stop the hardware if necessary.</para></listitem> |
5571 | <listitem><para>Disable the PCI device by calling | 5571 | <listitem><para>Disable the PCI device by calling |
5572 | <function>pci_disable_device()</function>. Then, call | 5572 | <function>pci_disable_device()</function>. Then, call |
5573 | <function>pci_save_state()</function> at last.</para></listitem> | 5573 | <function>pci_save_state()</function> at last.</para></listitem> |
5574 | </orderedlist> | 5574 | </orderedlist> |
5575 | </para> | 5575 | </para> |
5576 | 5576 | ||
5577 | <para> | 5577 | <para> |
5578 | A typical code would be like: | 5578 | A typical code would be like: |
5579 | 5579 | ||
5580 | <informalexample> | 5580 | <informalexample> |
5581 | <programlisting> | 5581 | <programlisting> |
5582 | <![CDATA[ | 5582 | <![CDATA[ |
5583 | static int mychip_suspend(struct pci_dev *pci, pm_message_t state) | 5583 | static int mychip_suspend(struct pci_dev *pci, pm_message_t state) |
5584 | { | 5584 | { |
5585 | /* (1) */ | 5585 | /* (1) */ |
5586 | struct snd_card *card = pci_get_drvdata(pci); | 5586 | struct snd_card *card = pci_get_drvdata(pci); |
5587 | struct mychip *chip = card->private_data; | 5587 | struct mychip *chip = card->private_data; |
5588 | /* (2) */ | 5588 | /* (2) */ |
5589 | snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); | 5589 | snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); |
5590 | /* (3) */ | 5590 | /* (3) */ |
5591 | snd_pcm_suspend_all(chip->pcm); | 5591 | snd_pcm_suspend_all(chip->pcm); |
5592 | /* (4) */ | 5592 | /* (4) */ |
5593 | snd_ac97_suspend(chip->ac97); | 5593 | snd_ac97_suspend(chip->ac97); |
5594 | /* (5) */ | 5594 | /* (5) */ |
5595 | snd_mychip_save_registers(chip); | 5595 | snd_mychip_save_registers(chip); |
5596 | /* (6) */ | 5596 | /* (6) */ |
5597 | snd_mychip_stop_hardware(chip); | 5597 | snd_mychip_stop_hardware(chip); |
5598 | /* (7) */ | 5598 | /* (7) */ |
5599 | pci_disable_device(pci); | 5599 | pci_disable_device(pci); |
5600 | pci_save_state(pci); | 5600 | pci_save_state(pci); |
5601 | return 0; | 5601 | return 0; |
5602 | } | 5602 | } |
5603 | ]]> | 5603 | ]]> |
5604 | </programlisting> | 5604 | </programlisting> |
5605 | </informalexample> | 5605 | </informalexample> |
5606 | </para> | 5606 | </para> |
5607 | 5607 | ||
5608 | <para> | 5608 | <para> |
5609 | The scheme of the real resume job is as following. | 5609 | The scheme of the real resume job is as following. |
5610 | 5610 | ||
5611 | <orderedlist> | 5611 | <orderedlist> |
5612 | <listitem><para>Retrieve the card and the chip data.</para></listitem> | 5612 | <listitem><para>Retrieve the card and the chip data.</para></listitem> |
5613 | <listitem><para>Set up PCI. First, call <function>pci_restore_state()</function>. | 5613 | <listitem><para>Set up PCI. First, call <function>pci_restore_state()</function>. |
5614 | Then enable the pci device again by calling <function>pci_enable_device()</function>. | 5614 | Then enable the pci device again by calling <function>pci_enable_device()</function>. |
5615 | Call <function>pci_set_master()</function> if necessary, too.</para></listitem> | 5615 | Call <function>pci_set_master()</function> if necessary, too.</para></listitem> |
5616 | <listitem><para>Re-initialize the chip.</para></listitem> | 5616 | <listitem><para>Re-initialize the chip.</para></listitem> |
5617 | <listitem><para>Restore the saved registers if necessary.</para></listitem> | 5617 | <listitem><para>Restore the saved registers if necessary.</para></listitem> |
5618 | <listitem><para>Resume the mixer, e.g. calling | 5618 | <listitem><para>Resume the mixer, e.g. calling |
5619 | <function>snd_ac97_resume()</function>.</para></listitem> | 5619 | <function>snd_ac97_resume()</function>.</para></listitem> |
5620 | <listitem><para>Restart the hardware (if any).</para></listitem> | 5620 | <listitem><para>Restart the hardware (if any).</para></listitem> |
5621 | <listitem><para>Call <function>snd_power_change_state()</function> with | 5621 | <listitem><para>Call <function>snd_power_change_state()</function> with |
5622 | <constant>SNDRV_CTL_POWER_D0</constant> to notify the processes.</para></listitem> | 5622 | <constant>SNDRV_CTL_POWER_D0</constant> to notify the processes.</para></listitem> |
5623 | </orderedlist> | 5623 | </orderedlist> |
5624 | </para> | 5624 | </para> |
5625 | 5625 | ||
5626 | <para> | 5626 | <para> |
5627 | A typical code would be like: | 5627 | A typical code would be like: |
5628 | 5628 | ||
5629 | <informalexample> | 5629 | <informalexample> |
5630 | <programlisting> | 5630 | <programlisting> |
5631 | <![CDATA[ | 5631 | <![CDATA[ |
5632 | static int mychip_resume(struct pci_dev *pci) | 5632 | static int mychip_resume(struct pci_dev *pci) |
5633 | { | 5633 | { |
5634 | /* (1) */ | 5634 | /* (1) */ |
5635 | struct snd_card *card = pci_get_drvdata(pci); | 5635 | struct snd_card *card = pci_get_drvdata(pci); |
5636 | struct mychip *chip = card->private_data; | 5636 | struct mychip *chip = card->private_data; |
5637 | /* (2) */ | 5637 | /* (2) */ |
5638 | pci_restore_state(pci); | 5638 | pci_restore_state(pci); |
5639 | pci_enable_device(pci); | 5639 | pci_enable_device(pci); |
5640 | pci_set_master(pci); | 5640 | pci_set_master(pci); |
5641 | /* (3) */ | 5641 | /* (3) */ |
5642 | snd_mychip_reinit_chip(chip); | 5642 | snd_mychip_reinit_chip(chip); |
5643 | /* (4) */ | 5643 | /* (4) */ |
5644 | snd_mychip_restore_registers(chip); | 5644 | snd_mychip_restore_registers(chip); |
5645 | /* (5) */ | 5645 | /* (5) */ |
5646 | snd_ac97_resume(chip->ac97); | 5646 | snd_ac97_resume(chip->ac97); |
5647 | /* (6) */ | 5647 | /* (6) */ |
5648 | snd_mychip_restart_chip(chip); | 5648 | snd_mychip_restart_chip(chip); |
5649 | /* (7) */ | 5649 | /* (7) */ |
5650 | snd_power_change_state(card, SNDRV_CTL_POWER_D0); | 5650 | snd_power_change_state(card, SNDRV_CTL_POWER_D0); |
5651 | return 0; | 5651 | return 0; |
5652 | } | 5652 | } |
5653 | ]]> | 5653 | ]]> |
5654 | </programlisting> | 5654 | </programlisting> |
5655 | </informalexample> | 5655 | </informalexample> |
5656 | </para> | 5656 | </para> |
5657 | 5657 | ||
5658 | <para> | 5658 | <para> |
5659 | As shown in the above, it's better to save registers after | 5659 | As shown in the above, it's better to save registers after |
5660 | suspending the PCM operations via | 5660 | suspending the PCM operations via |
5661 | <function>snd_pcm_suspend_all()</function> or | 5661 | <function>snd_pcm_suspend_all()</function> or |
5662 | <function>snd_pcm_suspend()</function>. It means that the PCM | 5662 | <function>snd_pcm_suspend()</function>. It means that the PCM |
5663 | streams are already stoppped when the register snapshot is | 5663 | streams are already stoppped when the register snapshot is |
5664 | taken. But, remind that you don't have to restart the PCM | 5664 | taken. But, remind that you don't have to restart the PCM |
5665 | stream in the resume callback. It'll be restarted via | 5665 | stream in the resume callback. It'll be restarted via |
5666 | trigger call with <constant>SNDRV_PCM_TRIGGER_RESUME</constant> | 5666 | trigger call with <constant>SNDRV_PCM_TRIGGER_RESUME</constant> |
5667 | when necessary. | 5667 | when necessary. |
5668 | </para> | 5668 | </para> |
5669 | 5669 | ||
5670 | <para> | 5670 | <para> |
5671 | OK, we have all callbacks now. Let's set them up. In the | 5671 | OK, we have all callbacks now. Let's set them up. In the |
5672 | initialization of the card, make sure that you can get the chip | 5672 | initialization of the card, make sure that you can get the chip |
5673 | data from the card instance, typically via | 5673 | data from the card instance, typically via |
5674 | <structfield>private_data</structfield> field, in case you | 5674 | <structfield>private_data</structfield> field, in case you |
5675 | created the chip data individually. | 5675 | created the chip data individually. |
5676 | 5676 | ||
5677 | <informalexample> | 5677 | <informalexample> |
5678 | <programlisting> | 5678 | <programlisting> |
5679 | <![CDATA[ | 5679 | <![CDATA[ |
5680 | static int __devinit snd_mychip_probe(struct pci_dev *pci, | 5680 | static int __devinit snd_mychip_probe(struct pci_dev *pci, |
5681 | const struct pci_device_id *pci_id) | 5681 | const struct pci_device_id *pci_id) |
5682 | { | 5682 | { |
5683 | .... | 5683 | .... |
5684 | struct snd_card *card; | 5684 | struct snd_card *card; |
5685 | struct mychip *chip; | 5685 | struct mychip *chip; |
5686 | .... | 5686 | .... |
5687 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); | 5687 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); |
5688 | .... | 5688 | .... |
5689 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); | 5689 | chip = kzalloc(sizeof(*chip), GFP_KERNEL); |
5690 | .... | 5690 | .... |
5691 | card->private_data = chip; | 5691 | card->private_data = chip; |
5692 | .... | 5692 | .... |
5693 | } | 5693 | } |
5694 | ]]> | 5694 | ]]> |
5695 | </programlisting> | 5695 | </programlisting> |
5696 | </informalexample> | 5696 | </informalexample> |
5697 | 5697 | ||
5698 | When you created the chip data with | 5698 | When you created the chip data with |
5699 | <function>snd_card_new()</function>, it's anyway accessible | 5699 | <function>snd_card_new()</function>, it's anyway accessible |
5700 | via <structfield>private_data</structfield> field. | 5700 | via <structfield>private_data</structfield> field. |
5701 | 5701 | ||
5702 | <informalexample> | 5702 | <informalexample> |
5703 | <programlisting> | 5703 | <programlisting> |
5704 | <![CDATA[ | 5704 | <![CDATA[ |
5705 | static int __devinit snd_mychip_probe(struct pci_dev *pci, | 5705 | static int __devinit snd_mychip_probe(struct pci_dev *pci, |
5706 | const struct pci_device_id *pci_id) | 5706 | const struct pci_device_id *pci_id) |
5707 | { | 5707 | { |
5708 | .... | 5708 | .... |
5709 | struct snd_card *card; | 5709 | struct snd_card *card; |
5710 | struct mychip *chip; | 5710 | struct mychip *chip; |
5711 | .... | 5711 | .... |
5712 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, | 5712 | card = snd_card_new(index[dev], id[dev], THIS_MODULE, |
5713 | sizeof(struct mychip)); | 5713 | sizeof(struct mychip)); |
5714 | .... | 5714 | .... |
5715 | chip = card->private_data; | 5715 | chip = card->private_data; |
5716 | .... | 5716 | .... |
5717 | } | 5717 | } |
5718 | ]]> | 5718 | ]]> |
5719 | </programlisting> | 5719 | </programlisting> |
5720 | </informalexample> | 5720 | </informalexample> |
5721 | 5721 | ||
5722 | </para> | 5722 | </para> |
5723 | 5723 | ||
5724 | <para> | 5724 | <para> |
5725 | If you need a space for saving the registers, allocate the | 5725 | If you need a space for saving the registers, allocate the |
5726 | buffer for it here, too, since it would be fatal | 5726 | buffer for it here, too, since it would be fatal |
5727 | if you cannot allocate a memory in the suspend phase. | 5727 | if you cannot allocate a memory in the suspend phase. |
5728 | The allocated buffer should be released in the corresponding | 5728 | The allocated buffer should be released in the corresponding |
5729 | destructor. | 5729 | destructor. |
5730 | </para> | 5730 | </para> |
5731 | 5731 | ||
5732 | <para> | 5732 | <para> |
5733 | And next, set suspend/resume callbacks to the pci_driver. | 5733 | And next, set suspend/resume callbacks to the pci_driver. |
5734 | 5734 | ||
5735 | <informalexample> | 5735 | <informalexample> |
5736 | <programlisting> | 5736 | <programlisting> |
5737 | <![CDATA[ | 5737 | <![CDATA[ |
5738 | static struct pci_driver driver = { | 5738 | static struct pci_driver driver = { |
5739 | .name = "My Chip", | 5739 | .name = "My Chip", |
5740 | .id_table = snd_my_ids, | 5740 | .id_table = snd_my_ids, |
5741 | .probe = snd_my_probe, | 5741 | .probe = snd_my_probe, |
5742 | .remove = __devexit_p(snd_my_remove), | 5742 | .remove = __devexit_p(snd_my_remove), |
5743 | #ifdef CONFIG_PM | 5743 | #ifdef CONFIG_PM |
5744 | .suspend = snd_my_suspend, | 5744 | .suspend = snd_my_suspend, |
5745 | .resume = snd_my_resume, | 5745 | .resume = snd_my_resume, |
5746 | #endif | 5746 | #endif |
5747 | }; | 5747 | }; |
5748 | ]]> | 5748 | ]]> |
5749 | </programlisting> | 5749 | </programlisting> |
5750 | </informalexample> | 5750 | </informalexample> |
5751 | </para> | 5751 | </para> |
5752 | 5752 | ||
5753 | </chapter> | 5753 | </chapter> |
5754 | 5754 | ||
5755 | 5755 | ||
5756 | <!-- ****************************************************** --> | 5756 | <!-- ****************************************************** --> |
5757 | <!-- Module Parameters --> | 5757 | <!-- Module Parameters --> |
5758 | <!-- ****************************************************** --> | 5758 | <!-- ****************************************************** --> |
5759 | <chapter id="module-parameters"> | 5759 | <chapter id="module-parameters"> |
5760 | <title>Module Parameters</title> | 5760 | <title>Module Parameters</title> |
5761 | <para> | 5761 | <para> |
5762 | There are standard module options for ALSA. At least, each | 5762 | There are standard module options for ALSA. At least, each |
5763 | module should have <parameter>index</parameter>, | 5763 | module should have <parameter>index</parameter>, |
5764 | <parameter>id</parameter> and <parameter>enable</parameter> | 5764 | <parameter>id</parameter> and <parameter>enable</parameter> |
5765 | options. | 5765 | options. |
5766 | </para> | 5766 | </para> |
5767 | 5767 | ||
5768 | <para> | 5768 | <para> |
5769 | If the module supports multiple cards (usually up to | 5769 | If the module supports multiple cards (usually up to |
5770 | 8 = <constant>SNDRV_CARDS</constant> cards), they should be | 5770 | 8 = <constant>SNDRV_CARDS</constant> cards), they should be |
5771 | arrays. The default initial values are defined already as | 5771 | arrays. The default initial values are defined already as |
5772 | constants for ease of programming: | 5772 | constants for ease of programming: |
5773 | 5773 | ||
5774 | <informalexample> | 5774 | <informalexample> |
5775 | <programlisting> | 5775 | <programlisting> |
5776 | <![CDATA[ | 5776 | <![CDATA[ |
5777 | static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; | 5777 | static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; |
5778 | static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; | 5778 | static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; |
5779 | static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; | 5779 | static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; |
5780 | ]]> | 5780 | ]]> |
5781 | </programlisting> | 5781 | </programlisting> |
5782 | </informalexample> | 5782 | </informalexample> |
5783 | </para> | 5783 | </para> |
5784 | 5784 | ||
5785 | <para> | 5785 | <para> |
5786 | If the module supports only a single card, they could be single | 5786 | If the module supports only a single card, they could be single |
5787 | variables, instead. <parameter>enable</parameter> option is not | 5787 | variables, instead. <parameter>enable</parameter> option is not |
5788 | always necessary in this case, but it wouldn't be so bad to have a | 5788 | always necessary in this case, but it wouldn't be so bad to have a |
5789 | dummy option for compatibility. | 5789 | dummy option for compatibility. |
5790 | </para> | 5790 | </para> |
5791 | 5791 | ||
5792 | <para> | 5792 | <para> |
5793 | The module parameters must be declared with the standard | 5793 | The module parameters must be declared with the standard |
5794 | <function>module_param()()</function>, | 5794 | <function>module_param()()</function>, |
5795 | <function>module_param_array()()</function> and | 5795 | <function>module_param_array()()</function> and |
5796 | <function>MODULE_PARM_DESC()</function> macros. | 5796 | <function>MODULE_PARM_DESC()</function> macros. |
5797 | </para> | 5797 | </para> |
5798 | 5798 | ||
5799 | <para> | 5799 | <para> |
5800 | The typical coding would be like below: | 5800 | The typical coding would be like below: |
5801 | 5801 | ||
5802 | <informalexample> | 5802 | <informalexample> |
5803 | <programlisting> | 5803 | <programlisting> |
5804 | <![CDATA[ | 5804 | <![CDATA[ |
5805 | #define CARD_NAME "My Chip" | 5805 | #define CARD_NAME "My Chip" |
5806 | 5806 | ||
5807 | module_param_array(index, int, NULL, 0444); | 5807 | module_param_array(index, int, NULL, 0444); |
5808 | MODULE_PARM_DESC(index, "Index value for " CARD_NAME " soundcard."); | 5808 | MODULE_PARM_DESC(index, "Index value for " CARD_NAME " soundcard."); |
5809 | module_param_array(id, charp, NULL, 0444); | 5809 | module_param_array(id, charp, NULL, 0444); |
5810 | MODULE_PARM_DESC(id, "ID string for " CARD_NAME " soundcard."); | 5810 | MODULE_PARM_DESC(id, "ID string for " CARD_NAME " soundcard."); |
5811 | module_param_array(enable, bool, NULL, 0444); | 5811 | module_param_array(enable, bool, NULL, 0444); |
5812 | MODULE_PARM_DESC(enable, "Enable " CARD_NAME " soundcard."); | 5812 | MODULE_PARM_DESC(enable, "Enable " CARD_NAME " soundcard."); |
5813 | ]]> | 5813 | ]]> |
5814 | </programlisting> | 5814 | </programlisting> |
5815 | </informalexample> | 5815 | </informalexample> |
5816 | </para> | 5816 | </para> |
5817 | 5817 | ||
5818 | <para> | 5818 | <para> |
5819 | Also, don't forget to define the module description, classes, | 5819 | Also, don't forget to define the module description, classes, |
5820 | license and devices. Especially, the recent modprobe requires to | 5820 | license and devices. Especially, the recent modprobe requires to |
5821 | define the module license as GPL, etc., otherwise the system is | 5821 | define the module license as GPL, etc., otherwise the system is |
5822 | shown as <quote>tainted</quote>. | 5822 | shown as <quote>tainted</quote>. |
5823 | 5823 | ||
5824 | <informalexample> | 5824 | <informalexample> |
5825 | <programlisting> | 5825 | <programlisting> |
5826 | <![CDATA[ | 5826 | <![CDATA[ |
5827 | MODULE_DESCRIPTION("My Chip"); | 5827 | MODULE_DESCRIPTION("My Chip"); |
5828 | MODULE_LICENSE("GPL"); | 5828 | MODULE_LICENSE("GPL"); |
5829 | MODULE_SUPPORTED_DEVICE("{{Vendor,My Chip Name}}"); | 5829 | MODULE_SUPPORTED_DEVICE("{{Vendor,My Chip Name}}"); |
5830 | ]]> | 5830 | ]]> |
5831 | </programlisting> | 5831 | </programlisting> |
5832 | </informalexample> | 5832 | </informalexample> |
5833 | </para> | 5833 | </para> |
5834 | 5834 | ||
5835 | </chapter> | 5835 | </chapter> |
5836 | 5836 | ||
5837 | 5837 | ||
5838 | <!-- ****************************************************** --> | 5838 | <!-- ****************************************************** --> |
5839 | <!-- How To Put Your Driver --> | 5839 | <!-- How To Put Your Driver --> |
5840 | <!-- ****************************************************** --> | 5840 | <!-- ****************************************************** --> |
5841 | <chapter id="how-to-put-your-driver"> | 5841 | <chapter id="how-to-put-your-driver"> |
5842 | <title>How To Put Your Driver Into ALSA Tree</title> | 5842 | <title>How To Put Your Driver Into ALSA Tree</title> |
5843 | <section> | 5843 | <section> |
5844 | <title>General</title> | 5844 | <title>General</title> |
5845 | <para> | 5845 | <para> |
5846 | So far, you've learned how to write the driver codes. | 5846 | So far, you've learned how to write the driver codes. |
5847 | And you might have a question now: how to put my own | 5847 | And you might have a question now: how to put my own |
5848 | driver into the ALSA driver tree? | 5848 | driver into the ALSA driver tree? |
5849 | Here (finally :) the standard procedure is described briefly. | 5849 | Here (finally :) the standard procedure is described briefly. |
5850 | </para> | 5850 | </para> |
5851 | 5851 | ||
5852 | <para> | 5852 | <para> |
5853 | Suppose that you'll create a new PCI driver for the card | 5853 | Suppose that you'll create a new PCI driver for the card |
5854 | <quote>xyz</quote>. The card module name would be | 5854 | <quote>xyz</quote>. The card module name would be |
5855 | snd-xyz. The new driver is usually put into alsa-driver | 5855 | snd-xyz. The new driver is usually put into alsa-driver |
5856 | tree, <filename>alsa-driver/pci</filename> directory in | 5856 | tree, <filename>alsa-driver/pci</filename> directory in |
5857 | the case of PCI cards. | 5857 | the case of PCI cards. |
5858 | Then the driver is evaluated, audited and tested | 5858 | Then the driver is evaluated, audited and tested |
5859 | by developers and users. After a certain time, the driver | 5859 | by developers and users. After a certain time, the driver |
5860 | will go to alsa-kernel tree (to the corresponding directory, | 5860 | will go to alsa-kernel tree (to the corresponding directory, |
5861 | such as <filename>alsa-kernel/pci</filename>) and eventually | 5861 | such as <filename>alsa-kernel/pci</filename>) and eventually |
5862 | integrated into Linux 2.6 tree (the directory would be | 5862 | integrated into Linux 2.6 tree (the directory would be |
5863 | <filename>linux/sound/pci</filename>). | 5863 | <filename>linux/sound/pci</filename>). |
5864 | </para> | 5864 | </para> |
5865 | 5865 | ||
5866 | <para> | 5866 | <para> |
5867 | In the following sections, the driver code is supposed | 5867 | In the following sections, the driver code is supposed |
5868 | to be put into alsa-driver tree. The two cases are assumed: | 5868 | to be put into alsa-driver tree. The two cases are assumed: |
5869 | a driver consisting of a single source file and one consisting | 5869 | a driver consisting of a single source file and one consisting |
5870 | of several source files. | 5870 | of several source files. |
5871 | </para> | 5871 | </para> |
5872 | </section> | 5872 | </section> |
5873 | 5873 | ||
5874 | <section> | 5874 | <section> |
5875 | <title>Driver with A Single Source File</title> | 5875 | <title>Driver with A Single Source File</title> |
5876 | <para> | 5876 | <para> |
5877 | <orderedlist> | 5877 | <orderedlist> |
5878 | <listitem> | 5878 | <listitem> |
5879 | <para> | 5879 | <para> |
5880 | Modify alsa-driver/pci/Makefile | 5880 | Modify alsa-driver/pci/Makefile |
5881 | </para> | 5881 | </para> |
5882 | 5882 | ||
5883 | <para> | 5883 | <para> |
5884 | Suppose you have a file xyz.c. Add the following | 5884 | Suppose you have a file xyz.c. Add the following |
5885 | two lines | 5885 | two lines |
5886 | <informalexample> | 5886 | <informalexample> |
5887 | <programlisting> | 5887 | <programlisting> |
5888 | <![CDATA[ | 5888 | <![CDATA[ |
5889 | snd-xyz-objs := xyz.o | 5889 | snd-xyz-objs := xyz.o |
5890 | obj-$(CONFIG_SND_XYZ) += snd-xyz.o | 5890 | obj-$(CONFIG_SND_XYZ) += snd-xyz.o |
5891 | ]]> | 5891 | ]]> |
5892 | </programlisting> | 5892 | </programlisting> |
5893 | </informalexample> | 5893 | </informalexample> |
5894 | </para> | 5894 | </para> |
5895 | </listitem> | 5895 | </listitem> |
5896 | 5896 | ||
5897 | <listitem> | 5897 | <listitem> |
5898 | <para> | 5898 | <para> |
5899 | Create the Kconfig entry | 5899 | Create the Kconfig entry |
5900 | </para> | 5900 | </para> |
5901 | 5901 | ||
5902 | <para> | 5902 | <para> |
5903 | Add the new entry of Kconfig for your xyz driver. | 5903 | Add the new entry of Kconfig for your xyz driver. |
5904 | <informalexample> | 5904 | <informalexample> |
5905 | <programlisting> | 5905 | <programlisting> |
5906 | <![CDATA[ | 5906 | <![CDATA[ |
5907 | config SND_XYZ | 5907 | config SND_XYZ |
5908 | tristate "Foobar XYZ" | 5908 | tristate "Foobar XYZ" |
5909 | depends on SND | 5909 | depends on SND |
5910 | select SND_PCM | 5910 | select SND_PCM |
5911 | help | 5911 | help |
5912 | Say Y here to include support for Foobar XYZ soundcard. | 5912 | Say Y here to include support for Foobar XYZ soundcard. |
5913 | 5913 | ||
5914 | To compile this driver as a module, choose M here: the module | 5914 | To compile this driver as a module, choose M here: the module |
5915 | will be called snd-xyz. | 5915 | will be called snd-xyz. |
5916 | ]]> | 5916 | ]]> |
5917 | </programlisting> | 5917 | </programlisting> |
5918 | </informalexample> | 5918 | </informalexample> |
5919 | 5919 | ||
5920 | the line, select SND_PCM, specifies that the driver xyz supports | 5920 | the line, select SND_PCM, specifies that the driver xyz supports |
5921 | PCM. In addition to SND_PCM, the following components are | 5921 | PCM. In addition to SND_PCM, the following components are |
5922 | supported for select command: | 5922 | supported for select command: |
5923 | SND_RAWMIDI, SND_TIMER, SND_HWDEP, SND_MPU401_UART, | 5923 | SND_RAWMIDI, SND_TIMER, SND_HWDEP, SND_MPU401_UART, |
5924 | SND_OPL3_LIB, SND_OPL4_LIB, SND_VX_LIB, SND_AC97_CODEC. | 5924 | SND_OPL3_LIB, SND_OPL4_LIB, SND_VX_LIB, SND_AC97_CODEC. |
5925 | Add the select command for each supported component. | 5925 | Add the select command for each supported component. |
5926 | </para> | 5926 | </para> |
5927 | 5927 | ||
5928 | <para> | 5928 | <para> |
5929 | Note that some selections imply the lowlevel selections. | 5929 | Note that some selections imply the lowlevel selections. |
5930 | For example, PCM includes TIMER, MPU401_UART includes RAWMIDI, | 5930 | For example, PCM includes TIMER, MPU401_UART includes RAWMIDI, |
5931 | AC97_CODEC includes PCM, and OPL3_LIB includes HWDEP. | 5931 | AC97_CODEC includes PCM, and OPL3_LIB includes HWDEP. |
5932 | You don't need to give the lowlevel selections again. | 5932 | You don't need to give the lowlevel selections again. |
5933 | </para> | 5933 | </para> |
5934 | 5934 | ||
5935 | <para> | 5935 | <para> |
5936 | For the details of Kconfig script, refer to the kbuild | 5936 | For the details of Kconfig script, refer to the kbuild |
5937 | documentation. | 5937 | documentation. |
5938 | </para> | 5938 | </para> |
5939 | 5939 | ||
5940 | </listitem> | 5940 | </listitem> |
5941 | 5941 | ||
5942 | <listitem> | 5942 | <listitem> |
5943 | <para> | 5943 | <para> |
5944 | Run cvscompile script to re-generate the configure script and | 5944 | Run cvscompile script to re-generate the configure script and |
5945 | build the whole stuff again. | 5945 | build the whole stuff again. |
5946 | </para> | 5946 | </para> |
5947 | </listitem> | 5947 | </listitem> |
5948 | </orderedlist> | 5948 | </orderedlist> |
5949 | </para> | 5949 | </para> |
5950 | </section> | 5950 | </section> |
5951 | 5951 | ||
5952 | <section> | 5952 | <section> |
5953 | <title>Drivers with Several Source Files</title> | 5953 | <title>Drivers with Several Source Files</title> |
5954 | <para> | 5954 | <para> |
5955 | Suppose that the driver snd-xyz have several source files. | 5955 | Suppose that the driver snd-xyz have several source files. |
5956 | They are located in the new subdirectory, | 5956 | They are located in the new subdirectory, |
5957 | pci/xyz. | 5957 | pci/xyz. |
5958 | 5958 | ||
5959 | <orderedlist> | 5959 | <orderedlist> |
5960 | <listitem> | 5960 | <listitem> |
5961 | <para> | 5961 | <para> |
5962 | Add a new directory (<filename>xyz</filename>) in | 5962 | Add a new directory (<filename>xyz</filename>) in |
5963 | <filename>alsa-driver/pci/Makefile</filename> like below | 5963 | <filename>alsa-driver/pci/Makefile</filename> like below |
5964 | 5964 | ||
5965 | <informalexample> | 5965 | <informalexample> |
5966 | <programlisting> | 5966 | <programlisting> |
5967 | <![CDATA[ | 5967 | <![CDATA[ |
5968 | obj-$(CONFIG_SND) += xyz/ | 5968 | obj-$(CONFIG_SND) += xyz/ |
5969 | ]]> | 5969 | ]]> |
5970 | </programlisting> | 5970 | </programlisting> |
5971 | </informalexample> | 5971 | </informalexample> |
5972 | </para> | 5972 | </para> |
5973 | </listitem> | 5973 | </listitem> |
5974 | 5974 | ||
5975 | <listitem> | 5975 | <listitem> |
5976 | <para> | 5976 | <para> |
5977 | Under the directory <filename>xyz</filename>, create a Makefile | 5977 | Under the directory <filename>xyz</filename>, create a Makefile |
5978 | 5978 | ||
5979 | <example> | 5979 | <example> |
5980 | <title>Sample Makefile for a driver xyz</title> | 5980 | <title>Sample Makefile for a driver xyz</title> |
5981 | <programlisting> | 5981 | <programlisting> |
5982 | <![CDATA[ | 5982 | <![CDATA[ |
5983 | ifndef SND_TOPDIR | 5983 | ifndef SND_TOPDIR |
5984 | SND_TOPDIR=../.. | 5984 | SND_TOPDIR=../.. |
5985 | endif | 5985 | endif |
5986 | 5986 | ||
5987 | include $(SND_TOPDIR)/toplevel.config | 5987 | include $(SND_TOPDIR)/toplevel.config |
5988 | include $(SND_TOPDIR)/Makefile.conf | 5988 | include $(SND_TOPDIR)/Makefile.conf |
5989 | 5989 | ||
5990 | snd-xyz-objs := xyz.o abc.o def.o | 5990 | snd-xyz-objs := xyz.o abc.o def.o |
5991 | 5991 | ||
5992 | obj-$(CONFIG_SND_XYZ) += snd-xyz.o | 5992 | obj-$(CONFIG_SND_XYZ) += snd-xyz.o |
5993 | 5993 | ||
5994 | include $(SND_TOPDIR)/Rules.make | 5994 | include $(SND_TOPDIR)/Rules.make |
5995 | ]]> | 5995 | ]]> |
5996 | </programlisting> | 5996 | </programlisting> |
5997 | </example> | 5997 | </example> |
5998 | </para> | 5998 | </para> |
5999 | </listitem> | 5999 | </listitem> |
6000 | 6000 | ||
6001 | <listitem> | 6001 | <listitem> |
6002 | <para> | 6002 | <para> |
6003 | Create the Kconfig entry | 6003 | Create the Kconfig entry |
6004 | </para> | 6004 | </para> |
6005 | 6005 | ||
6006 | <para> | 6006 | <para> |
6007 | This procedure is as same as in the last section. | 6007 | This procedure is as same as in the last section. |
6008 | </para> | 6008 | </para> |
6009 | </listitem> | 6009 | </listitem> |
6010 | 6010 | ||
6011 | <listitem> | 6011 | <listitem> |
6012 | <para> | 6012 | <para> |
6013 | Run cvscompile script to re-generate the configure script and | 6013 | Run cvscompile script to re-generate the configure script and |
6014 | build the whole stuff again. | 6014 | build the whole stuff again. |
6015 | </para> | 6015 | </para> |
6016 | </listitem> | 6016 | </listitem> |
6017 | </orderedlist> | 6017 | </orderedlist> |
6018 | </para> | 6018 | </para> |
6019 | </section> | 6019 | </section> |
6020 | 6020 | ||
6021 | </chapter> | 6021 | </chapter> |
6022 | 6022 | ||
6023 | <!-- ****************************************************** --> | 6023 | <!-- ****************************************************** --> |
6024 | <!-- Useful Functions --> | 6024 | <!-- Useful Functions --> |
6025 | <!-- ****************************************************** --> | 6025 | <!-- ****************************************************** --> |
6026 | <chapter id="useful-functions"> | 6026 | <chapter id="useful-functions"> |
6027 | <title>Useful Functions</title> | 6027 | <title>Useful Functions</title> |
6028 | 6028 | ||
6029 | <section id="useful-functions-snd-printk"> | 6029 | <section id="useful-functions-snd-printk"> |
6030 | <title><function>snd_printk()</function> and friends</title> | 6030 | <title><function>snd_printk()</function> and friends</title> |
6031 | <para> | 6031 | <para> |
6032 | ALSA provides a verbose version of | 6032 | ALSA provides a verbose version of |
6033 | <function>printk()</function> function. If a kernel config | 6033 | <function>printk()</function> function. If a kernel config |
6034 | <constant>CONFIG_SND_VERBOSE_PRINTK</constant> is set, this | 6034 | <constant>CONFIG_SND_VERBOSE_PRINTK</constant> is set, this |
6035 | function prints the given message together with the file name | 6035 | function prints the given message together with the file name |
6036 | and the line of the caller. The <constant>KERN_XXX</constant> | 6036 | and the line of the caller. The <constant>KERN_XXX</constant> |
6037 | prefix is processed as | 6037 | prefix is processed as |
6038 | well as the original <function>printk()</function> does, so it's | 6038 | well as the original <function>printk()</function> does, so it's |
6039 | recommended to add this prefix, e.g. | 6039 | recommended to add this prefix, e.g. |
6040 | 6040 | ||
6041 | <informalexample> | 6041 | <informalexample> |
6042 | <programlisting> | 6042 | <programlisting> |
6043 | <![CDATA[ | 6043 | <![CDATA[ |
6044 | snd_printk(KERN_ERR "Oh my, sorry, it's extremely bad!\n"); | 6044 | snd_printk(KERN_ERR "Oh my, sorry, it's extremely bad!\n"); |
6045 | ]]> | 6045 | ]]> |
6046 | </programlisting> | 6046 | </programlisting> |
6047 | </informalexample> | 6047 | </informalexample> |
6048 | </para> | 6048 | </para> |
6049 | 6049 | ||
6050 | <para> | 6050 | <para> |
6051 | There are also <function>printk()</function>'s for | 6051 | There are also <function>printk()</function>'s for |
6052 | debugging. <function>snd_printd()</function> can be used for | 6052 | debugging. <function>snd_printd()</function> can be used for |
6053 | general debugging purposes. If | 6053 | general debugging purposes. If |
6054 | <constant>CONFIG_SND_DEBUG</constant> is set, this function is | 6054 | <constant>CONFIG_SND_DEBUG</constant> is set, this function is |
6055 | compiled, and works just like | 6055 | compiled, and works just like |
6056 | <function>snd_printk()</function>. If the ALSA is compiled | 6056 | <function>snd_printk()</function>. If the ALSA is compiled |
6057 | without the debugging flag, it's ignored. | 6057 | without the debugging flag, it's ignored. |
6058 | </para> | 6058 | </para> |
6059 | 6059 | ||
6060 | <para> | 6060 | <para> |
6061 | <function>snd_printdd()</function> is compiled in only when | 6061 | <function>snd_printdd()</function> is compiled in only when |
6062 | <constant>CONFIG_SND_DEBUG_DETECT</constant> is set. Please note | 6062 | <constant>CONFIG_SND_DEBUG_DETECT</constant> is set. Please note |
6063 | that <constant>DEBUG_DETECT</constant> is not set as default | 6063 | that <constant>DEBUG_DETECT</constant> is not set as default |
6064 | even if you configure the alsa-driver with | 6064 | even if you configure the alsa-driver with |
6065 | <option>--with-debug=full</option> option. You need to give | 6065 | <option>--with-debug=full</option> option. You need to give |
6066 | explicitly <option>--with-debug=detect</option> option instead. | 6066 | explicitly <option>--with-debug=detect</option> option instead. |
6067 | </para> | 6067 | </para> |
6068 | </section> | 6068 | </section> |
6069 | 6069 | ||
6070 | <section id="useful-functions-snd-assert"> | 6070 | <section id="useful-functions-snd-assert"> |
6071 | <title><function>snd_assert()</function></title> | 6071 | <title><function>snd_assert()</function></title> |
6072 | <para> | 6072 | <para> |
6073 | <function>snd_assert()</function> macro is similar with the | 6073 | <function>snd_assert()</function> macro is similar with the |
6074 | normal <function>assert()</function> macro. For example, | 6074 | normal <function>assert()</function> macro. For example, |
6075 | 6075 | ||
6076 | <informalexample> | 6076 | <informalexample> |
6077 | <programlisting> | 6077 | <programlisting> |
6078 | <![CDATA[ | 6078 | <![CDATA[ |
6079 | snd_assert(pointer != NULL, return -EINVAL); | 6079 | snd_assert(pointer != NULL, return -EINVAL); |
6080 | ]]> | 6080 | ]]> |
6081 | </programlisting> | 6081 | </programlisting> |
6082 | </informalexample> | 6082 | </informalexample> |
6083 | </para> | 6083 | </para> |
6084 | 6084 | ||
6085 | <para> | 6085 | <para> |
6086 | The first argument is the expression to evaluate, and the | 6086 | The first argument is the expression to evaluate, and the |
6087 | second argument is the action if it fails. When | 6087 | second argument is the action if it fails. When |
6088 | <constant>CONFIG_SND_DEBUG</constant>, is set, it will show an | 6088 | <constant>CONFIG_SND_DEBUG</constant>, is set, it will show an |
6089 | error message such as <computeroutput>BUG? (xxx)</computeroutput> | 6089 | error message such as <computeroutput>BUG? (xxx)</computeroutput> |
6090 | together with stack trace. | 6090 | together with stack trace. |
6091 | </para> | 6091 | </para> |
6092 | <para> | 6092 | <para> |
6093 | When no debug flag is set, this macro is ignored. | 6093 | When no debug flag is set, this macro is ignored. |
6094 | </para> | 6094 | </para> |
6095 | </section> | 6095 | </section> |
6096 | 6096 | ||
6097 | <section id="useful-functions-snd-bug"> | 6097 | <section id="useful-functions-snd-bug"> |
6098 | <title><function>snd_BUG()</function></title> | 6098 | <title><function>snd_BUG()</function></title> |
6099 | <para> | 6099 | <para> |
6100 | It shows <computeroutput>BUG?</computeroutput> message and | 6100 | It shows <computeroutput>BUG?</computeroutput> message and |
6101 | stack trace as well as <function>snd_assert</function> at the point. | 6101 | stack trace as well as <function>snd_assert</function> at the point. |
6102 | It's useful to show that a fatal error happens there. | 6102 | It's useful to show that a fatal error happens there. |
6103 | </para> | 6103 | </para> |
6104 | <para> | 6104 | <para> |
6105 | When no debug flag is set, this macro is ignored. | 6105 | When no debug flag is set, this macro is ignored. |
6106 | </para> | 6106 | </para> |
6107 | </section> | 6107 | </section> |
6108 | </chapter> | 6108 | </chapter> |
6109 | 6109 | ||
6110 | 6110 | ||
6111 | <!-- ****************************************************** --> | 6111 | <!-- ****************************************************** --> |
6112 | <!-- Acknowledgments --> | 6112 | <!-- Acknowledgments --> |
6113 | <!-- ****************************************************** --> | 6113 | <!-- ****************************************************** --> |
6114 | <chapter id="acknowledments"> | 6114 | <chapter id="acknowledments"> |
6115 | <title>Acknowledgments</title> | 6115 | <title>Acknowledgments</title> |
6116 | <para> | 6116 | <para> |
6117 | I would like to thank Phil Kerr for his help for improvement and | 6117 | I would like to thank Phil Kerr for his help for improvement and |
6118 | corrections of this document. | 6118 | corrections of this document. |
6119 | </para> | 6119 | </para> |
6120 | <para> | 6120 | <para> |
6121 | Kevin Conder reformatted the original plain-text to the | 6121 | Kevin Conder reformatted the original plain-text to the |
6122 | DocBook format. | 6122 | DocBook format. |
6123 | </para> | 6123 | </para> |
6124 | <para> | 6124 | <para> |
6125 | Giuliano Pochini corrected typos and contributed the example codes | 6125 | Giuliano Pochini corrected typos and contributed the example codes |
6126 | in the hardware constraints section. | 6126 | in the hardware constraints section. |
6127 | </para> | 6127 | </para> |
6128 | </chapter> | 6128 | </chapter> |
6129 | 6129 | ||
6130 | 6130 | ||
6131 | </book> | 6131 | </book> |
6132 | 6132 |
Documentation/sound/oss/AWE32
1 | Installing and using Creative AWE midi sound under Linux. | 1 | Installing and using Creative AWE midi sound under Linux. |
2 | 2 | ||
3 | This documentation is devoted to the Creative Sound Blaster AWE32, AWE64 and | 3 | This documentation is devoted to the Creative Sound Blaster AWE32, AWE64 and |
4 | SB32. | 4 | SB32. |
5 | 5 | ||
6 | 1) Make sure you have an ORIGINAL Creative SB32, AWE32 or AWE64 card. This | 6 | 1) Make sure you have an ORIGINAL Creative SB32, AWE32 or AWE64 card. This |
7 | is important, because the driver works only with real Creative cards. | 7 | is important, because the driver works only with real Creative cards. |
8 | 8 | ||
9 | 2) The first thing you need to do is re-compile your kernel with support for | 9 | 2) The first thing you need to do is re-compile your kernel with support for |
10 | your sound card. Run your favourite tool to configure the kernel and when | 10 | your sound card. Run your favourite tool to configure the kernel and when |
11 | you get to the "Sound" menu you should enable support for the following: | 11 | you get to the "Sound" menu you should enable support for the following: |
12 | 12 | ||
13 | Sound card support, | 13 | Sound card support, |
14 | OSS sound modules, | 14 | OSS sound modules, |
15 | 100% Sound Blaster compatibles (SB16/32/64, ESS, Jazz16) support, | 15 | 100% Sound Blaster compatibles (SB16/32/64, ESS, Jazz16) support, |
16 | AWE32 synth | 16 | AWE32 synth |
17 | 17 | ||
18 | If your card is "Plug and Play" you will also need to enable these two | 18 | If your card is "Plug and Play" you will also need to enable these two |
19 | options, found under the "Plug and Play configuration" menu: | 19 | options, found under the "Plug and Play configuration" menu: |
20 | 20 | ||
21 | Plug and Play support | 21 | Plug and Play support |
22 | ISA Plug and Play support | 22 | ISA Plug and Play support |
23 | 23 | ||
24 | Now compile and install the kernel in normal fashion. If you don't know | 24 | Now compile and install the kernel in normal fashion. If you don't know |
25 | how to do this you can find instructions for this in the README file | 25 | how to do this you can find instructions for this in the README file |
26 | located in the root directory of the kernel source. | 26 | located in the root directory of the kernel source. |
27 | 27 | ||
28 | 3) Before you can start playing midi files you will have to load a sound | 28 | 3) Before you can start playing midi files you will have to load a sound |
29 | bank file. The utility needed for doing this is called "sfxload", and it | 29 | bank file. The utility needed for doing this is called "sfxload", and it |
30 | is one of the utilities found in a package called "awesfx". If this | 30 | is one of the utilities found in a package called "awesfx". If this |
31 | package is not available in your distribution you can download the AWE | 31 | package is not available in your distribution you can download the AWE |
32 | snapshot from Creative Labs Open Source website: | 32 | snapshot from Creative Labs Open Source website: |
33 | 33 | ||
34 | http://www.opensource.creative.com/snapshot.html | 34 | http://www.opensource.creative.com/snapshot.html |
35 | 35 | ||
36 | Once you have unpacked the AWE snapshot you will see a "awesfx" | 36 | Once you have unpacked the AWE snapshot you will see a "awesfx" |
37 | directory. Follow the instructions in awesfx/docs/INSTALL to install the | 37 | directory. Follow the instructions in awesfx/docs/INSTALL to install the |
38 | utilities in this package. After doing this, sfxload should be installed | 38 | utilities in this package. After doing this, sfxload should be installed |
39 | as: | 39 | as: |
40 | 40 | ||
41 | /usr/local/bin/sfxload | 41 | /usr/local/bin/sfxload |
42 | 42 | ||
43 | To enable AWE general midi synthesis you should also get the sound bank | 43 | To enable AWE general midi synthesis you should also get the sound bank |
44 | file for general midi from: | 44 | file for general midi from: |
45 | 45 | ||
46 | http://members.xoom.com/yar/synthgm.sbk.gz | 46 | http://members.xoom.com/yar/synthgm.sbk.gz |
47 | 47 | ||
48 | Copy it to a directory of your choice, and unpack it there. | 48 | Copy it to a directory of your choice, and unpack it there. |
49 | 49 | ||
50 | 4) Edit /etc/modprobe.conf, and insert the following lines at the end of the | 50 | 4) Edit /etc/modprobe.conf, and insert the following lines at the end of the |
51 | file: | 51 | file: |
52 | 52 | ||
53 | alias sound-slot-0 sb | 53 | alias sound-slot-0 sb |
54 | alias sound-service-0-1 awe_wave | 54 | alias sound-service-0-1 awe_wave |
55 | install awe_wave /sbin/modprobe --first-time -i awe_wave && /usr/local/bin/sfxload PATH_TO_SOUND_BANK_FILE | 55 | install awe_wave /sbin/modprobe --first-time -i awe_wave && /usr/local/bin/sfxload PATH_TO_SOUND_BANK_FILE |
56 | 56 | ||
57 | You will of course have to change "PATH_TO_SOUND_BANK_FILE" to the full | 57 | You will of course have to change "PATH_TO_SOUND_BANK_FILE" to the full |
58 | path of of the sound bank file. That will enable the Sound Blaster and AWE | 58 | path of the sound bank file. That will enable the Sound Blaster and AWE |
59 | wave synthesis. To play midi files you should get one of these programs if | 59 | wave synthesis. To play midi files you should get one of these programs if |
60 | you don't already have them: | 60 | you don't already have them: |
61 | 61 | ||
62 | Playmidi: http://playmidi.openprojects.net | 62 | Playmidi: http://playmidi.openprojects.net |
63 | 63 | ||
64 | AWEMidi Player (drvmidi) Included in the previously mentioned AWE | 64 | AWEMidi Player (drvmidi) Included in the previously mentioned AWE |
65 | snapshot. | 65 | snapshot. |
66 | 66 | ||
67 | You will probably have to pass the "-e" switch to playmidi to have it use | 67 | You will probably have to pass the "-e" switch to playmidi to have it use |
68 | your midi device. drvmidi should work without switches. | 68 | your midi device. drvmidi should work without switches. |
69 | 69 | ||
70 | If something goes wrong please e-mail me. All comments and suggestions are | 70 | If something goes wrong please e-mail me. All comments and suggestions are |
71 | welcome. | 71 | welcome. |
72 | 72 | ||
73 | Yaroslav Rosomakho (alons55@dialup.ptt.ru) | 73 | Yaroslav Rosomakho (alons55@dialup.ptt.ru) |
74 | http://www.yar.opennet.ru | 74 | http://www.yar.opennet.ru |
75 | 75 | ||
76 | Last Updated: Feb 3 2001 | 76 | Last Updated: Feb 3 2001 |
77 | 77 |
Documentation/sound/oss/solo1
1 | Recording | 1 | Recording |
2 | --------- | 2 | --------- |
3 | 3 | ||
4 | Recording does not work on the author's card, but there | 4 | Recording does not work on the author's card, but there |
5 | is at least one report of it working on later silicon. | 5 | is at least one report of it working on later silicon. |
6 | The chip behaves differently than described in the data sheet, | 6 | The chip behaves differently than described in the data sheet, |
7 | likely due to a chip bug. Working around this would require | 7 | likely due to a chip bug. Working around this would require |
8 | the help of ESS (for example by publishing an errata sheet), | 8 | the help of ESS (for example by publishing an errata sheet), |
9 | but ESS has not done so so far. | 9 | but ESS has not done so far. |
10 | 10 | ||
11 | Also, the chip only supports 24 bit addresses for recording, | 11 | Also, the chip only supports 24 bit addresses for recording, |
12 | which means it cannot work on some Alpha mainboards. | 12 | which means it cannot work on some Alpha mainboards. |
13 | 13 | ||
14 | 14 | ||
15 | /proc/sound, /dev/sndstat | 15 | /proc/sound, /dev/sndstat |
16 | ------------------------- | 16 | ------------------------- |
17 | 17 | ||
18 | /proc/sound and /dev/sndstat is not supported by the | 18 | /proc/sound and /dev/sndstat is not supported by the |
19 | driver. To find out whether the driver succeeded loading, | 19 | driver. To find out whether the driver succeeded loading, |
20 | check the kernel log (dmesg). | 20 | check the kernel log (dmesg). |
21 | 21 | ||
22 | 22 | ||
23 | ALaw/uLaw sample formats | 23 | ALaw/uLaw sample formats |
24 | ------------------------ | 24 | ------------------------ |
25 | 25 | ||
26 | This driver does not support the ALaw/uLaw sample formats. | 26 | This driver does not support the ALaw/uLaw sample formats. |
27 | ALaw is the default mode when opening a sound device | 27 | ALaw is the default mode when opening a sound device |
28 | using OSS/Free. The reason for the lack of support is | 28 | using OSS/Free. The reason for the lack of support is |
29 | that the hardware does not support these formats, and adding | 29 | that the hardware does not support these formats, and adding |
30 | conversion routines to the kernel would lead to very ugly | 30 | conversion routines to the kernel would lead to very ugly |
31 | code in the presence of the mmap interface to the driver. | 31 | code in the presence of the mmap interface to the driver. |
32 | And since xquake uses mmap, mmap is considered important :-) | 32 | And since xquake uses mmap, mmap is considered important :-) |
33 | and no sane application uses ALaw/uLaw these days anyway. | 33 | and no sane application uses ALaw/uLaw these days anyway. |
34 | In short, playing a Sun .au file as follows: | 34 | In short, playing a Sun .au file as follows: |
35 | 35 | ||
36 | cat my_file.au > /dev/dsp | 36 | cat my_file.au > /dev/dsp |
37 | 37 | ||
38 | does not work. Instead, you may use the play script from | 38 | does not work. Instead, you may use the play script from |
39 | Chris Bagwell's sox-12.14 package (or later, available from the URL | 39 | Chris Bagwell's sox-12.14 package (or later, available from the URL |
40 | below) to play many different audio file formats. | 40 | below) to play many different audio file formats. |
41 | The script automatically determines the audio format | 41 | The script automatically determines the audio format |
42 | and does do audio conversions if necessary. | 42 | and does do audio conversions if necessary. |
43 | http://home.sprynet.com/sprynet/cbagwell/projects.html | 43 | http://home.sprynet.com/sprynet/cbagwell/projects.html |
44 | 44 | ||
45 | 45 | ||
46 | Blocking vs. nonblocking IO | 46 | Blocking vs. nonblocking IO |
47 | --------------------------- | 47 | --------------------------- |
48 | 48 | ||
49 | Unlike OSS/Free this driver honours the O_NONBLOCK file flag | 49 | Unlike OSS/Free this driver honours the O_NONBLOCK file flag |
50 | not only during open, but also during read and write. | 50 | not only during open, but also during read and write. |
51 | This is an effort to make the sound driver interface more | 51 | This is an effort to make the sound driver interface more |
52 | regular. Timidity has problems with this; a patch | 52 | regular. Timidity has problems with this; a patch |
53 | is available from http://www.ife.ee.ethz.ch/~sailer/linux/pciaudio.html. | 53 | is available from http://www.ife.ee.ethz.ch/~sailer/linux/pciaudio.html. |
54 | (Timidity patched will also run on OSS/Free). | 54 | (Timidity patched will also run on OSS/Free). |
55 | 55 | ||
56 | 56 | ||
57 | MIDI UART | 57 | MIDI UART |
58 | --------- | 58 | --------- |
59 | 59 | ||
60 | The driver supports a simple MIDI UART interface, with | 60 | The driver supports a simple MIDI UART interface, with |
61 | no ioctl's supported. | 61 | no ioctl's supported. |
62 | 62 | ||
63 | 63 | ||
64 | MIDI synthesizer | 64 | MIDI synthesizer |
65 | ---------------- | 65 | ---------------- |
66 | 66 | ||
67 | The card has an OPL compatible FM synthesizer. | 67 | The card has an OPL compatible FM synthesizer. |
68 | 68 | ||
69 | Thomas Sailer | 69 | Thomas Sailer |
70 | t.sailer@alumni.ethz.ch | 70 | t.sailer@alumni.ethz.ch |
71 | 71 |
Documentation/sound/oss/ultrasound
1 | modprobe sound | 1 | modprobe sound |
2 | insmod ad1848 | 2 | insmod ad1848 |
3 | insmod gus io=* irq=* dma=* ... | 3 | insmod gus io=* irq=* dma=* ... |
4 | 4 | ||
5 | This loads the driver for the Gravis Ultrasound family of sound cards. | 5 | This loads the driver for the Gravis Ultrasound family of sound cards. |
6 | 6 | ||
7 | The gus module takes the following arguments | 7 | The gus module takes the following arguments |
8 | 8 | ||
9 | io I/O address of the Ultrasound card (eg. io=0x220) | 9 | io I/O address of the Ultrasound card (eg. io=0x220) |
10 | irq IRQ of the Sound Blaster card | 10 | irq IRQ of the Sound Blaster card |
11 | dma DMA channel for the Sound Blaster | 11 | dma DMA channel for the Sound Blaster |
12 | dma16 2nd DMA channel, only needed for full duplex operation | 12 | dma16 2nd DMA channel, only needed for full duplex operation |
13 | type 1 for PnP card | 13 | type 1 for PnP card |
14 | gus16 1 for using 16 bit sampling daughter board | 14 | gus16 1 for using 16 bit sampling daughter board |
15 | no_wave_dma Set to disable DMA usage for wavetable (see note) | 15 | no_wave_dma Set to disable DMA usage for wavetable (see note) |
16 | db16 ??? | 16 | db16 ??? |
17 | 17 | ||
18 | 18 | ||
19 | no_wave_dma option | 19 | no_wave_dma option |
20 | 20 | ||
21 | This option defaults to a value of 0, which allows the Ultrasound wavetable | 21 | This option defaults to a value of 0, which allows the Ultrasound wavetable |
22 | DSP to use DMA for for playback and downloading samples. This is the same | 22 | DSP to use DMA for playback and downloading samples. This is the same |
23 | as the old behaviour. If set to 1, no DMA is needed for downloading samples, | 23 | as the old behaviour. If set to 1, no DMA is needed for downloading samples, |
24 | and allows owners of a GUS MAX to make use of simultaneous digital audio | 24 | and allows owners of a GUS MAX to make use of simultaneous digital audio |
25 | (/dev/dsp), MIDI, and wavetable playback. | 25 | (/dev/dsp), MIDI, and wavetable playback. |
26 | 26 | ||
27 | 27 | ||
28 | If you have problems in recording with GUS MAX, you could try to use | 28 | If you have problems in recording with GUS MAX, you could try to use |
29 | just one 8 bit DMA channel. Recording will not work with one DMA | 29 | just one 8 bit DMA channel. Recording will not work with one DMA |
30 | channel if it's a 16 bit one. | 30 | channel if it's a 16 bit one. |
31 | 31 |
Documentation/sound/oss/vwsnd
1 | vwsnd - Sound driver for the Silicon Graphics 320 and 540 Visual | 1 | vwsnd - Sound driver for the Silicon Graphics 320 and 540 Visual |
2 | Workstations' onboard audio. | 2 | Workstations' onboard audio. |
3 | 3 | ||
4 | Copyright 1999 Silicon Graphics, Inc. All rights reserved. | 4 | Copyright 1999 Silicon Graphics, Inc. All rights reserved. |
5 | 5 | ||
6 | 6 | ||
7 | At the time of this writing, March 1999, there are two models of | 7 | At the time of this writing, March 1999, there are two models of |
8 | Visual Workstation, the 320 and the 540. This document only describes | 8 | Visual Workstation, the 320 and the 540. This document only describes |
9 | those models. Future Visual Workstation models may have different | 9 | those models. Future Visual Workstation models may have different |
10 | sound capabilities, and this driver will probably not work on those | 10 | sound capabilities, and this driver will probably not work on those |
11 | boxes. | 11 | boxes. |
12 | 12 | ||
13 | The Visual Workstation has an Analog Devices AD1843 "SoundComm" audio | 13 | The Visual Workstation has an Analog Devices AD1843 "SoundComm" audio |
14 | codec chip. The AD1843 is accessed through the Cobalt I/O ASIC, also | 14 | codec chip. The AD1843 is accessed through the Cobalt I/O ASIC, also |
15 | known as Lithium. This driver programs both both chips. | 15 | known as Lithium. This driver programs both chips. |
16 | 16 | ||
17 | ============================================================================== | 17 | ============================================================================== |
18 | QUICK CONFIGURATION | 18 | QUICK CONFIGURATION |
19 | 19 | ||
20 | # insmod soundcore | 20 | # insmod soundcore |
21 | # insmod vwsnd | 21 | # insmod vwsnd |
22 | 22 | ||
23 | ============================================================================== | 23 | ============================================================================== |
24 | I/O CONNECTIONS | 24 | I/O CONNECTIONS |
25 | 25 | ||
26 | On the Visual Workstation, only three of the AD1843 inputs are hooked | 26 | On the Visual Workstation, only three of the AD1843 inputs are hooked |
27 | up. The analog line in jacks are connected to the AD1843's AUX1 | 27 | up. The analog line in jacks are connected to the AD1843's AUX1 |
28 | input. The CD audio lines are connected to the AD1843's AUX2 input. | 28 | input. The CD audio lines are connected to the AD1843's AUX2 input. |
29 | The microphone jack is connected to the AD1843's MIC input. The mic | 29 | The microphone jack is connected to the AD1843's MIC input. The mic |
30 | jack is mono, but the signal is delivered to both the left and right | 30 | jack is mono, but the signal is delivered to both the left and right |
31 | MIC inputs. You can record in stereo from the mic input, but you will | 31 | MIC inputs. You can record in stereo from the mic input, but you will |
32 | get the same signal on both channels (within the limits of A/D | 32 | get the same signal on both channels (within the limits of A/D |
33 | accuracy). Full scale on the Line input is +/- 2.0 V. Full scale on | 33 | accuracy). Full scale on the Line input is +/- 2.0 V. Full scale on |
34 | the MIC input is 20 dB less, or +/- 0.2 V. | 34 | the MIC input is 20 dB less, or +/- 0.2 V. |
35 | 35 | ||
36 | The AD1843's LOUT1 outputs are connected to the Line Out jacks. The | 36 | The AD1843's LOUT1 outputs are connected to the Line Out jacks. The |
37 | AD1843's HPOUT outputs are connected to the speaker/headphone jack. | 37 | AD1843's HPOUT outputs are connected to the speaker/headphone jack. |
38 | LOUT2 is not connected. Line out's maximum level is +/- 2.0 V peak to | 38 | LOUT2 is not connected. Line out's maximum level is +/- 2.0 V peak to |
39 | peak. The speaker/headphone out's maximum is +/- 4.0 V peak to peak. | 39 | peak. The speaker/headphone out's maximum is +/- 4.0 V peak to peak. |
40 | 40 | ||
41 | The AD1843's PCM input channel and one of its output channels (DAC1) | 41 | The AD1843's PCM input channel and one of its output channels (DAC1) |
42 | are connected to Lithium. The other output channel (DAC2) is not | 42 | are connected to Lithium. The other output channel (DAC2) is not |
43 | connected. | 43 | connected. |
44 | 44 | ||
45 | ============================================================================== | 45 | ============================================================================== |
46 | CAPABILITIES | 46 | CAPABILITIES |
47 | 47 | ||
48 | The AD1843 has PCM input and output (Pulse Code Modulation, also known | 48 | The AD1843 has PCM input and output (Pulse Code Modulation, also known |
49 | as wavetable). PCM input and output can be mono or stereo in any of | 49 | as wavetable). PCM input and output can be mono or stereo in any of |
50 | four formats. The formats are 16 bit signed and 8 bit unsigned, | 50 | four formats. The formats are 16 bit signed and 8 bit unsigned, |
51 | u-Law, and A-Law format. Any sample rate from 4 KHz to 49 KHz is | 51 | u-Law, and A-Law format. Any sample rate from 4 KHz to 49 KHz is |
52 | available, in 1 Hz increments. | 52 | available, in 1 Hz increments. |
53 | 53 | ||
54 | The AD1843 includes an analog mixer that can mix all three input | 54 | The AD1843 includes an analog mixer that can mix all three input |
55 | signals (line, mic and CD) into the analog outputs. The mixer has a | 55 | signals (line, mic and CD) into the analog outputs. The mixer has a |
56 | separate gain control and mute switch for each input. | 56 | separate gain control and mute switch for each input. |
57 | 57 | ||
58 | There are two outputs, line out and speaker/headphone out. They | 58 | There are two outputs, line out and speaker/headphone out. They |
59 | always produce the same signal, and the speaker always has 3 dB more | 59 | always produce the same signal, and the speaker always has 3 dB more |
60 | gain than the line out. The speaker/headphone output can be muted, | 60 | gain than the line out. The speaker/headphone output can be muted, |
61 | but this driver does not export that function. | 61 | but this driver does not export that function. |
62 | 62 | ||
63 | The hardware can sync audio to the video clock, but this driver does | 63 | The hardware can sync audio to the video clock, but this driver does |
64 | not have a way to specify syncing to video. | 64 | not have a way to specify syncing to video. |
65 | 65 | ||
66 | ============================================================================== | 66 | ============================================================================== |
67 | PROGRAMMING | 67 | PROGRAMMING |
68 | 68 | ||
69 | This section explains the API supported by the driver. Also see the | 69 | This section explains the API supported by the driver. Also see the |
70 | Open Sound Programming Guide at http://www.opensound.com/pguide/ . | 70 | Open Sound Programming Guide at http://www.opensound.com/pguide/ . |
71 | This section assumes familiarity with that document. | 71 | This section assumes familiarity with that document. |
72 | 72 | ||
73 | The driver has two interfaces, an I/O interface and a mixer interface. | 73 | The driver has two interfaces, an I/O interface and a mixer interface. |
74 | There is no MIDI or sequencer capability. | 74 | There is no MIDI or sequencer capability. |
75 | 75 | ||
76 | ============================================================================== | 76 | ============================================================================== |
77 | PROGRAMMING PCM I/O | 77 | PROGRAMMING PCM I/O |
78 | 78 | ||
79 | The I/O interface is usually accessed as /dev/audio or /dev/dsp. | 79 | The I/O interface is usually accessed as /dev/audio or /dev/dsp. |
80 | Using the standard Open Sound System (OSS) ioctl calls, the sample | 80 | Using the standard Open Sound System (OSS) ioctl calls, the sample |
81 | rate, number of channels, and sample format may be set within the | 81 | rate, number of channels, and sample format may be set within the |
82 | limitations described above. The driver supports triggering. It also | 82 | limitations described above. The driver supports triggering. It also |
83 | supports getting the input and output pointers with one-sample | 83 | supports getting the input and output pointers with one-sample |
84 | accuracy. | 84 | accuracy. |
85 | 85 | ||
86 | The SNDCTL_DSP_GETCAP ioctl returns these capabilities. | 86 | The SNDCTL_DSP_GETCAP ioctl returns these capabilities. |
87 | 87 | ||
88 | DSP_CAP_DUPLEX - driver supports full duplex. | 88 | DSP_CAP_DUPLEX - driver supports full duplex. |
89 | 89 | ||
90 | DSP_CAP_TRIGGER - driver supports triggering. | 90 | DSP_CAP_TRIGGER - driver supports triggering. |
91 | 91 | ||
92 | DSP_CAP_REALTIME - values returned by SNDCTL_DSP_GETIPTR | 92 | DSP_CAP_REALTIME - values returned by SNDCTL_DSP_GETIPTR |
93 | and SNDCTL_DSP_GETOPTR are accurate to a few samples. | 93 | and SNDCTL_DSP_GETOPTR are accurate to a few samples. |
94 | 94 | ||
95 | Memory mapping (mmap) is not implemented. | 95 | Memory mapping (mmap) is not implemented. |
96 | 96 | ||
97 | The driver permits subdivided fragment sizes from 64 to 4096 bytes. | 97 | The driver permits subdivided fragment sizes from 64 to 4096 bytes. |
98 | The number of fragments can be anything from 3 fragments to however | 98 | The number of fragments can be anything from 3 fragments to however |
99 | many fragments fit into 124 kilobytes. It is up to the user to | 99 | many fragments fit into 124 kilobytes. It is up to the user to |
100 | determine how few/small fragments can be used without introducing | 100 | determine how few/small fragments can be used without introducing |
101 | glitches with a given workload. Linux is not realtime, so we can't | 101 | glitches with a given workload. Linux is not realtime, so we can't |
102 | promise anything. (sigh...) | 102 | promise anything. (sigh...) |
103 | 103 | ||
104 | When this driver is switched into or out of mu-Law or A-Law mode on | 104 | When this driver is switched into or out of mu-Law or A-Law mode on |
105 | output, it may produce an audible click. This is unavoidable. To | 105 | output, it may produce an audible click. This is unavoidable. To |
106 | prevent clicking, use signed 16-bit mode instead, and convert from | 106 | prevent clicking, use signed 16-bit mode instead, and convert from |
107 | mu-Law or A-Law format in software. | 107 | mu-Law or A-Law format in software. |
108 | 108 | ||
109 | ============================================================================== | 109 | ============================================================================== |
110 | PROGRAMMING THE MIXER INTERFACE | 110 | PROGRAMMING THE MIXER INTERFACE |
111 | 111 | ||
112 | The mixer interface is usually accessed as /dev/mixer. It is accessed | 112 | The mixer interface is usually accessed as /dev/mixer. It is accessed |
113 | through ioctls. The mixer allows the application to control gain or | 113 | through ioctls. The mixer allows the application to control gain or |
114 | mute several audio signal paths, and also allows selection of the | 114 | mute several audio signal paths, and also allows selection of the |
115 | recording source. | 115 | recording source. |
116 | 116 | ||
117 | Each of the constants described here can be read using the | 117 | Each of the constants described here can be read using the |
118 | MIXER_READ(SOUND_MIXER_xxx) ioctl. Those that are not read-only can | 118 | MIXER_READ(SOUND_MIXER_xxx) ioctl. Those that are not read-only can |
119 | also be written using the MIXER_WRITE(SOUND_MIXER_xxx) ioctl. In most | 119 | also be written using the MIXER_WRITE(SOUND_MIXER_xxx) ioctl. In most |
120 | cases, <sys/soundcard.h> defines constants SOUND_MIXER_READ_xxx and | 120 | cases, <sys/soundcard.h> defines constants SOUND_MIXER_READ_xxx and |
121 | SOUND_MIXER_WRITE_xxx which work just as well. | 121 | SOUND_MIXER_WRITE_xxx which work just as well. |
122 | 122 | ||
123 | SOUND_MIXER_CAPS Read-only | 123 | SOUND_MIXER_CAPS Read-only |
124 | 124 | ||
125 | This is a mask of optional driver capabilities that are implemented. | 125 | This is a mask of optional driver capabilities that are implemented. |
126 | This driver's only capability is SOUND_CAP_EXCL_INPUT, which means | 126 | This driver's only capability is SOUND_CAP_EXCL_INPUT, which means |
127 | that only one recording source can be active at a time. | 127 | that only one recording source can be active at a time. |
128 | 128 | ||
129 | SOUND_MIXER_DEVMASK Read-only | 129 | SOUND_MIXER_DEVMASK Read-only |
130 | 130 | ||
131 | This is a mask of the sound channels. This driver's channels are PCM, | 131 | This is a mask of the sound channels. This driver's channels are PCM, |
132 | LINE, MIC, CD, and RECLEV. | 132 | LINE, MIC, CD, and RECLEV. |
133 | 133 | ||
134 | SOUND_MIXER_STEREODEVS Read-only | 134 | SOUND_MIXER_STEREODEVS Read-only |
135 | 135 | ||
136 | This is a mask of which sound channels are capable of stereo. All | 136 | This is a mask of which sound channels are capable of stereo. All |
137 | channels are capable of stereo. (But see caveat on MIC input in I/O | 137 | channels are capable of stereo. (But see caveat on MIC input in I/O |
138 | CONNECTIONS section above). | 138 | CONNECTIONS section above). |
139 | 139 | ||
140 | SOUND_MIXER_OUTMASK Read-only | 140 | SOUND_MIXER_OUTMASK Read-only |
141 | 141 | ||
142 | This is a mask of channels that route inputs through to outputs. | 142 | This is a mask of channels that route inputs through to outputs. |
143 | Those are LINE, MIC, and CD. | 143 | Those are LINE, MIC, and CD. |
144 | 144 | ||
145 | SOUND_MIXER_RECMASK Read-only | 145 | SOUND_MIXER_RECMASK Read-only |
146 | 146 | ||
147 | This is a mask of channels that can be recording sources. Those are | 147 | This is a mask of channels that can be recording sources. Those are |
148 | PCM, LINE, MIC, CD. | 148 | PCM, LINE, MIC, CD. |
149 | 149 | ||
150 | SOUND_MIXER_PCM Default: 0x5757 (0 dB) | 150 | SOUND_MIXER_PCM Default: 0x5757 (0 dB) |
151 | 151 | ||
152 | This is the gain control for PCM output. The left and right channel | 152 | This is the gain control for PCM output. The left and right channel |
153 | gain are controlled independently. This gain control has 64 levels, | 153 | gain are controlled independently. This gain control has 64 levels, |
154 | which range from -82.5 dB to +12.0 dB in 1.5 dB steps. Those 64 | 154 | which range from -82.5 dB to +12.0 dB in 1.5 dB steps. Those 64 |
155 | levels are mapped onto 100 levels at the ioctl, see below. | 155 | levels are mapped onto 100 levels at the ioctl, see below. |
156 | 156 | ||
157 | SOUND_MIXER_LINE Default: 0x4a4a (0 dB) | 157 | SOUND_MIXER_LINE Default: 0x4a4a (0 dB) |
158 | 158 | ||
159 | This is the gain control for mixing the Line In source into the | 159 | This is the gain control for mixing the Line In source into the |
160 | outputs. The left and right channel gain are controlled | 160 | outputs. The left and right channel gain are controlled |
161 | independently. This gain control has 32 levels, which range from | 161 | independently. This gain control has 32 levels, which range from |
162 | -34.5 dB to +12.0 dB in 1.5 dB steps. Those 32 levels are mapped onto | 162 | -34.5 dB to +12.0 dB in 1.5 dB steps. Those 32 levels are mapped onto |
163 | 100 levels at the ioctl, see below. | 163 | 100 levels at the ioctl, see below. |
164 | 164 | ||
165 | SOUND_MIXER_MIC Default: 0x4a4a (0 dB) | 165 | SOUND_MIXER_MIC Default: 0x4a4a (0 dB) |
166 | 166 | ||
167 | This is the gain control for mixing the MIC source into the outputs. | 167 | This is the gain control for mixing the MIC source into the outputs. |
168 | The left and right channel gain are controlled independently. This | 168 | The left and right channel gain are controlled independently. This |
169 | gain control has 32 levels, which range from -34.5 dB to +12.0 dB in | 169 | gain control has 32 levels, which range from -34.5 dB to +12.0 dB in |
170 | 1.5 dB steps. Those 32 levels are mapped onto 100 levels at the | 170 | 1.5 dB steps. Those 32 levels are mapped onto 100 levels at the |
171 | ioctl, see below. | 171 | ioctl, see below. |
172 | 172 | ||
173 | SOUND_MIXER_CD Default: 0x4a4a (0 dB) | 173 | SOUND_MIXER_CD Default: 0x4a4a (0 dB) |
174 | 174 | ||
175 | This is the gain control for mixing the CD audio source into the | 175 | This is the gain control for mixing the CD audio source into the |
176 | outputs. The left and right channel gain are controlled | 176 | outputs. The left and right channel gain are controlled |
177 | independently. This gain control has 32 levels, which range from | 177 | independently. This gain control has 32 levels, which range from |
178 | -34.5 dB to +12.0 dB in 1.5 dB steps. Those 32 levels are mapped onto | 178 | -34.5 dB to +12.0 dB in 1.5 dB steps. Those 32 levels are mapped onto |
179 | 100 levels at the ioctl, see below. | 179 | 100 levels at the ioctl, see below. |
180 | 180 | ||
181 | SOUND_MIXER_RECLEV Default: 0 (0 dB) | 181 | SOUND_MIXER_RECLEV Default: 0 (0 dB) |
182 | 182 | ||
183 | This is the gain control for PCM input (RECording LEVel). The left | 183 | This is the gain control for PCM input (RECording LEVel). The left |
184 | and right channel gain are controlled independently. This gain | 184 | and right channel gain are controlled independently. This gain |
185 | control has 16 levels, which range from 0 dB to +22.5 dB in 1.5 dB | 185 | control has 16 levels, which range from 0 dB to +22.5 dB in 1.5 dB |
186 | steps. Those 16 levels are mapped onto 100 levels at the ioctl, see | 186 | steps. Those 16 levels are mapped onto 100 levels at the ioctl, see |
187 | below. | 187 | below. |
188 | 188 | ||
189 | SOUND_MIXER_RECSRC Default: SOUND_MASK_LINE | 189 | SOUND_MIXER_RECSRC Default: SOUND_MASK_LINE |
190 | 190 | ||
191 | This is a mask of currently selected PCM input sources (RECording | 191 | This is a mask of currently selected PCM input sources (RECording |
192 | SouRCes). Because the AD1843 can only have a single recording source | 192 | SouRCes). Because the AD1843 can only have a single recording source |
193 | at a time, only one bit at a time can be set in this mask. The | 193 | at a time, only one bit at a time can be set in this mask. The |
194 | allowable values are SOUND_MASK_PCM, SOUND_MASK_LINE, SOUND_MASK_MIC, | 194 | allowable values are SOUND_MASK_PCM, SOUND_MASK_LINE, SOUND_MASK_MIC, |
195 | or SOUND_MASK_CD. Selecting SOUND_MASK_PCM sets up internal | 195 | or SOUND_MASK_CD. Selecting SOUND_MASK_PCM sets up internal |
196 | resampling which is useful for loopback testing and for hardware | 196 | resampling which is useful for loopback testing and for hardware |
197 | sample rate conversion. But software sample rate conversion is | 197 | sample rate conversion. But software sample rate conversion is |
198 | probably faster, so I don't know how useful that is. | 198 | probably faster, so I don't know how useful that is. |
199 | 199 | ||
200 | SOUND_MIXER_OUTSRC DEFAULT: SOUND_MASK_LINE|SOUND_MASK_MIC|SOUND_MASK_CD | 200 | SOUND_MIXER_OUTSRC DEFAULT: SOUND_MASK_LINE|SOUND_MASK_MIC|SOUND_MASK_CD |
201 | 201 | ||
202 | This is a mask of sources that are currently passed through to the | 202 | This is a mask of sources that are currently passed through to the |
203 | outputs. Those sources whose bits are not set are muted. | 203 | outputs. Those sources whose bits are not set are muted. |
204 | 204 | ||
205 | ============================================================================== | 205 | ============================================================================== |
206 | GAIN CONTROL | 206 | GAIN CONTROL |
207 | 207 | ||
208 | There are five gain controls listed above. Each has 16, 32, or 64 | 208 | There are five gain controls listed above. Each has 16, 32, or 64 |
209 | steps. Each control has 1.5 dB of gain per step. Each control is | 209 | steps. Each control has 1.5 dB of gain per step. Each control is |
210 | stereo. | 210 | stereo. |
211 | 211 | ||
212 | The OSS defines the argument to a channel gain ioctl as having two | 212 | The OSS defines the argument to a channel gain ioctl as having two |
213 | components, left and right, each of which ranges from 0 to 100. The | 213 | components, left and right, each of which ranges from 0 to 100. The |
214 | two components are packed into the same word, with the left side gain | 214 | two components are packed into the same word, with the left side gain |
215 | in the least significant byte, and the right side gain in the second | 215 | in the least significant byte, and the right side gain in the second |
216 | least significant byte. In C, we would say this. | 216 | least significant byte. In C, we would say this. |
217 | 217 | ||
218 | #include <assert.h> | 218 | #include <assert.h> |
219 | 219 | ||
220 | ... | 220 | ... |
221 | 221 | ||
222 | assert(leftgain >= 0 && leftgain <= 100); | 222 | assert(leftgain >= 0 && leftgain <= 100); |
223 | assert(rightgain >= 0 && rightgain <= 100); | 223 | assert(rightgain >= 0 && rightgain <= 100); |
224 | arg = leftgain | rightgain << 8; | 224 | arg = leftgain | rightgain << 8; |
225 | 225 | ||
226 | So each OSS gain control has 101 steps. But the hardware has 16, 32, | 226 | So each OSS gain control has 101 steps. But the hardware has 16, 32, |
227 | or 64 steps. The hardware steps are spread across the 101 OSS steps | 227 | or 64 steps. The hardware steps are spread across the 101 OSS steps |
228 | nearly evenly. The conversion formulas are like this, given N equals | 228 | nearly evenly. The conversion formulas are like this, given N equals |
229 | 16, 32, or 64. | 229 | 16, 32, or 64. |
230 | 230 | ||
231 | int round = N/2 - 1; | 231 | int round = N/2 - 1; |
232 | OSS_gain_steps = (hw_gain_steps * 100 + round) / (N - 1); | 232 | OSS_gain_steps = (hw_gain_steps * 100 + round) / (N - 1); |
233 | hw_gain_steps = (OSS_gain_steps * (N - 1) + round) / 100; | 233 | hw_gain_steps = (OSS_gain_steps * (N - 1) + round) / 100; |
234 | 234 | ||
235 | Here is a snippet of C code that will return the left and right gain | 235 | Here is a snippet of C code that will return the left and right gain |
236 | of any channel in dB. Pass it one of the predefined gain_desc_t | 236 | of any channel in dB. Pass it one of the predefined gain_desc_t |
237 | structures to access any of the five channels' gains. | 237 | structures to access any of the five channels' gains. |
238 | 238 | ||
239 | typedef struct gain_desc { | 239 | typedef struct gain_desc { |
240 | float min_gain; | 240 | float min_gain; |
241 | float gain_step; | 241 | float gain_step; |
242 | int nbits; | 242 | int nbits; |
243 | int chan; | 243 | int chan; |
244 | } gain_desc_t; | 244 | } gain_desc_t; |
245 | 245 | ||
246 | const gain_desc_t gain_pcm = { -82.5, 1.5, 6, SOUND_MIXER_PCM }; | 246 | const gain_desc_t gain_pcm = { -82.5, 1.5, 6, SOUND_MIXER_PCM }; |
247 | const gain_desc_t gain_line = { -34.5, 1.5, 5, SOUND_MIXER_LINE }; | 247 | const gain_desc_t gain_line = { -34.5, 1.5, 5, SOUND_MIXER_LINE }; |
248 | const gain_desc_t gain_mic = { -34.5, 1.5, 5, SOUND_MIXER_MIC }; | 248 | const gain_desc_t gain_mic = { -34.5, 1.5, 5, SOUND_MIXER_MIC }; |
249 | const gain_desc_t gain_cd = { -34.5, 1.5, 5, SOUND_MIXER_CD }; | 249 | const gain_desc_t gain_cd = { -34.5, 1.5, 5, SOUND_MIXER_CD }; |
250 | const gain_desc_t gain_reclev = { 0.0, 1.5, 4, SOUND_MIXER_RECLEV }; | 250 | const gain_desc_t gain_reclev = { 0.0, 1.5, 4, SOUND_MIXER_RECLEV }; |
251 | 251 | ||
252 | int get_gain_dB(int fd, const gain_desc_t *gp, | 252 | int get_gain_dB(int fd, const gain_desc_t *gp, |
253 | float *left, float *right) | 253 | float *left, float *right) |
254 | { | 254 | { |
255 | int word; | 255 | int word; |
256 | int lg, rg; | 256 | int lg, rg; |
257 | int mask = (1 << gp->nbits) - 1; | 257 | int mask = (1 << gp->nbits) - 1; |
258 | 258 | ||
259 | if (ioctl(fd, MIXER_READ(gp->chan), &word) != 0) | 259 | if (ioctl(fd, MIXER_READ(gp->chan), &word) != 0) |
260 | return -1; /* fail */ | 260 | return -1; /* fail */ |
261 | lg = word & 0xFF; | 261 | lg = word & 0xFF; |
262 | rg = word >> 8 & 0xFF; | 262 | rg = word >> 8 & 0xFF; |
263 | lg = (lg * mask + mask / 2) / 100; | 263 | lg = (lg * mask + mask / 2) / 100; |
264 | rg = (rg * mask + mask / 2) / 100; | 264 | rg = (rg * mask + mask / 2) / 100; |
265 | *left = gp->min_gain + gp->gain_step * lg; | 265 | *left = gp->min_gain + gp->gain_step * lg; |
266 | *right = gp->min_gain + gp->gain_step * rg; | 266 | *right = gp->min_gain + gp->gain_step * rg; |
267 | return 0; | 267 | return 0; |
268 | } | 268 | } |
269 | 269 | ||
270 | And here is the corresponding routine to set a channel's gain in dB. | 270 | And here is the corresponding routine to set a channel's gain in dB. |
271 | 271 | ||
272 | int set_gain_dB(int fd, const gain_desc_t *gp, float left, float right) | 272 | int set_gain_dB(int fd, const gain_desc_t *gp, float left, float right) |
273 | { | 273 | { |
274 | float max_gain = | 274 | float max_gain = |
275 | gp->min_gain + (1 << gp->nbits) * gp->gain_step; | 275 | gp->min_gain + (1 << gp->nbits) * gp->gain_step; |
276 | float round = gp->gain_step / 2; | 276 | float round = gp->gain_step / 2; |
277 | int mask = (1 << gp->nbits) - 1; | 277 | int mask = (1 << gp->nbits) - 1; |
278 | int word; | 278 | int word; |
279 | int lg, rg; | 279 | int lg, rg; |
280 | 280 | ||
281 | if (left < gp->min_gain || right < gp->min_gain) | 281 | if (left < gp->min_gain || right < gp->min_gain) |
282 | return EINVAL; | 282 | return EINVAL; |
283 | lg = (left - gp->min_gain + round) / gp->gain_step; | 283 | lg = (left - gp->min_gain + round) / gp->gain_step; |
284 | rg = (right - gp->min_gain + round) / gp->gain_step; | 284 | rg = (right - gp->min_gain + round) / gp->gain_step; |
285 | if (lg >= (1 << gp->nbits) || rg >= (1 << gp->nbits)) | 285 | if (lg >= (1 << gp->nbits) || rg >= (1 << gp->nbits)) |
286 | return EINVAL; | 286 | return EINVAL; |
287 | lg = (100 * lg + mask / 2) / mask; | 287 | lg = (100 * lg + mask / 2) / mask; |
288 | rg = (100 * rg + mask / 2) / mask; | 288 | rg = (100 * rg + mask / 2) / mask; |
289 | word = lg | rg << 8; | 289 | word = lg | rg << 8; |
290 | 290 | ||
291 | return ioctl(fd, MIXER_WRITE(gp->chan), &word); | 291 | return ioctl(fd, MIXER_WRITE(gp->chan), &word); |
292 | } | 292 | } |
293 | 293 | ||
294 | 294 |
Documentation/spi/pxa2xx
1 | ๏ปฟPXA2xx SPI on SSP driver HOWTO | 1 | ๏ปฟPXA2xx SPI on SSP driver HOWTO |
2 | =================================================== | 2 | =================================================== |
3 | This a mini howto on the pxa2xx_spi driver. The driver turns a PXA2xx | 3 | This a mini howto on the pxa2xx_spi driver. The driver turns a PXA2xx |
4 | synchronous serial port into a SPI master controller | 4 | synchronous serial port into a SPI master controller |
5 | (see Documentation/spi/spi_summary). The driver has the following features | 5 | (see Documentation/spi/spi_summary). The driver has the following features |
6 | 6 | ||
7 | - Support for any PXA2xx SSP | 7 | - Support for any PXA2xx SSP |
8 | - SSP PIO and SSP DMA data transfers. | 8 | - SSP PIO and SSP DMA data transfers. |
9 | - External and Internal (SSPFRM) chip selects. | 9 | - External and Internal (SSPFRM) chip selects. |
10 | - Per slave device (chip) configuration. | 10 | - Per slave device (chip) configuration. |
11 | - Full suspend, freeze, resume support. | 11 | - Full suspend, freeze, resume support. |
12 | 12 | ||
13 | The driver is built around a "spi_message" fifo serviced by workqueue and a | 13 | The driver is built around a "spi_message" fifo serviced by workqueue and a |
14 | tasklet. The workqueue, "pump_messages", drives message fifo and the tasklet | 14 | tasklet. The workqueue, "pump_messages", drives message fifo and the tasklet |
15 | (pump_transfer) is responsible for queuing SPI transactions and setting up and | 15 | (pump_transfer) is responsible for queuing SPI transactions and setting up and |
16 | launching the dma/interrupt driven transfers. | 16 | launching the dma/interrupt driven transfers. |
17 | 17 | ||
18 | Declaring PXA2xx Master Controllers | 18 | Declaring PXA2xx Master Controllers |
19 | ----------------------------------- | 19 | ----------------------------------- |
20 | Typically a SPI master is defined in the arch/.../mach-*/board-*.c as a | 20 | Typically a SPI master is defined in the arch/.../mach-*/board-*.c as a |
21 | "platform device". The master configuration is passed to the driver via a table | 21 | "platform device". The master configuration is passed to the driver via a table |
22 | found in include/asm-arm/arch-pxa/pxa2xx_spi.h: | 22 | found in include/asm-arm/arch-pxa/pxa2xx_spi.h: |
23 | 23 | ||
24 | struct pxa2xx_spi_master { | 24 | struct pxa2xx_spi_master { |
25 | enum pxa_ssp_type ssp_type; | 25 | enum pxa_ssp_type ssp_type; |
26 | u32 clock_enable; | 26 | u32 clock_enable; |
27 | u16 num_chipselect; | 27 | u16 num_chipselect; |
28 | u8 enable_dma; | 28 | u8 enable_dma; |
29 | }; | 29 | }; |
30 | 30 | ||
31 | The "pxa2xx_spi_master.ssp_type" field must have a value between 1 and 3 and | 31 | The "pxa2xx_spi_master.ssp_type" field must have a value between 1 and 3 and |
32 | informs the driver which features a particular SSP supports. | 32 | informs the driver which features a particular SSP supports. |
33 | 33 | ||
34 | The "pxa2xx_spi_master.clock_enable" field is used to enable/disable the | 34 | The "pxa2xx_spi_master.clock_enable" field is used to enable/disable the |
35 | corresponding SSP peripheral block in the "Clock Enable Register (CKEN"). See | 35 | corresponding SSP peripheral block in the "Clock Enable Register (CKEN"). See |
36 | the "PXA2xx Developer Manual" section "Clocks and Power Management". | 36 | the "PXA2xx Developer Manual" section "Clocks and Power Management". |
37 | 37 | ||
38 | The "pxa2xx_spi_master.num_chipselect" field is used to determine the number of | 38 | The "pxa2xx_spi_master.num_chipselect" field is used to determine the number of |
39 | slave device (chips) attached to this SPI master. | 39 | slave device (chips) attached to this SPI master. |
40 | 40 | ||
41 | The "pxa2xx_spi_master.enable_dma" field informs the driver that SSP DMA should | 41 | The "pxa2xx_spi_master.enable_dma" field informs the driver that SSP DMA should |
42 | be used. This caused the driver to acquire two DMA channels: rx_channel and | 42 | be used. This caused the driver to acquire two DMA channels: rx_channel and |
43 | tx_channel. The rx_channel has a higher DMA service priority the tx_channel. | 43 | tx_channel. The rx_channel has a higher DMA service priority the tx_channel. |
44 | See the "PXA2xx Developer Manual" section "DMA Controller". | 44 | See the "PXA2xx Developer Manual" section "DMA Controller". |
45 | 45 | ||
46 | NSSP MASTER SAMPLE | 46 | NSSP MASTER SAMPLE |
47 | ------------------ | 47 | ------------------ |
48 | Below is a sample configuration using the PXA255 NSSP. | 48 | Below is a sample configuration using the PXA255 NSSP. |
49 | 49 | ||
50 | static struct resource pxa_spi_nssp_resources[] = { | 50 | static struct resource pxa_spi_nssp_resources[] = { |
51 | [0] = { | 51 | [0] = { |
52 | .start = __PREG(SSCR0_P(2)), /* Start address of NSSP */ | 52 | .start = __PREG(SSCR0_P(2)), /* Start address of NSSP */ |
53 | .end = __PREG(SSCR0_P(2)) + 0x2c, /* Range of registers */ | 53 | .end = __PREG(SSCR0_P(2)) + 0x2c, /* Range of registers */ |
54 | .flags = IORESOURCE_MEM, | 54 | .flags = IORESOURCE_MEM, |
55 | }, | 55 | }, |
56 | [1] = { | 56 | [1] = { |
57 | .start = IRQ_NSSP, /* NSSP IRQ */ | 57 | .start = IRQ_NSSP, /* NSSP IRQ */ |
58 | .end = IRQ_NSSP, | 58 | .end = IRQ_NSSP, |
59 | .flags = IORESOURCE_IRQ, | 59 | .flags = IORESOURCE_IRQ, |
60 | }, | 60 | }, |
61 | }; | 61 | }; |
62 | 62 | ||
63 | static struct pxa2xx_spi_master pxa_nssp_master_info = { | 63 | static struct pxa2xx_spi_master pxa_nssp_master_info = { |
64 | .ssp_type = PXA25x_NSSP, /* Type of SSP */ | 64 | .ssp_type = PXA25x_NSSP, /* Type of SSP */ |
65 | .clock_enable = CKEN9_NSSP, /* NSSP Peripheral clock */ | 65 | .clock_enable = CKEN9_NSSP, /* NSSP Peripheral clock */ |
66 | .num_chipselect = 1, /* Matches the number of chips attached to NSSP */ | 66 | .num_chipselect = 1, /* Matches the number of chips attached to NSSP */ |
67 | .enable_dma = 1, /* Enables NSSP DMA */ | 67 | .enable_dma = 1, /* Enables NSSP DMA */ |
68 | }; | 68 | }; |
69 | 69 | ||
70 | static struct platform_device pxa_spi_nssp = { | 70 | static struct platform_device pxa_spi_nssp = { |
71 | .name = "pxa2xx-spi", /* MUST BE THIS VALUE, so device match driver */ | 71 | .name = "pxa2xx-spi", /* MUST BE THIS VALUE, so device match driver */ |
72 | .id = 2, /* Bus number, MUST MATCH SSP number 1..n */ | 72 | .id = 2, /* Bus number, MUST MATCH SSP number 1..n */ |
73 | .resource = pxa_spi_nssp_resources, | 73 | .resource = pxa_spi_nssp_resources, |
74 | .num_resources = ARRAY_SIZE(pxa_spi_nssp_resources), | 74 | .num_resources = ARRAY_SIZE(pxa_spi_nssp_resources), |
75 | .dev = { | 75 | .dev = { |
76 | .platform_data = &pxa_nssp_master_info, /* Passed to driver */ | 76 | .platform_data = &pxa_nssp_master_info, /* Passed to driver */ |
77 | }, | 77 | }, |
78 | }; | 78 | }; |
79 | 79 | ||
80 | static struct platform_device *devices[] __initdata = { | 80 | static struct platform_device *devices[] __initdata = { |
81 | &pxa_spi_nssp, | 81 | &pxa_spi_nssp, |
82 | }; | 82 | }; |
83 | 83 | ||
84 | static void __init board_init(void) | 84 | static void __init board_init(void) |
85 | { | 85 | { |
86 | (void)platform_add_device(devices, ARRAY_SIZE(devices)); | 86 | (void)platform_add_device(devices, ARRAY_SIZE(devices)); |
87 | } | 87 | } |
88 | 88 | ||
89 | Declaring Slave Devices | 89 | Declaring Slave Devices |
90 | ----------------------- | 90 | ----------------------- |
91 | Typically each SPI slave (chip) is defined in the arch/.../mach-*/board-*.c | 91 | Typically each SPI slave (chip) is defined in the arch/.../mach-*/board-*.c |
92 | using the "spi_board_info" structure found in "linux/spi/spi.h". See | 92 | using the "spi_board_info" structure found in "linux/spi/spi.h". See |
93 | "Documentation/spi/spi_summary" for additional information. | 93 | "Documentation/spi/spi_summary" for additional information. |
94 | 94 | ||
95 | Each slave device attached to the PXA must provide slave specific configuration | 95 | Each slave device attached to the PXA must provide slave specific configuration |
96 | information via the structure "pxa2xx_spi_chip" found in | 96 | information via the structure "pxa2xx_spi_chip" found in |
97 | "include/asm-arm/arch-pxa/pxa2xx_spi.h". The pxa2xx_spi master controller driver | 97 | "include/asm-arm/arch-pxa/pxa2xx_spi.h". The pxa2xx_spi master controller driver |
98 | will uses the configuration whenever the driver communicates with the slave | 98 | will uses the configuration whenever the driver communicates with the slave |
99 | device. | 99 | device. |
100 | 100 | ||
101 | struct pxa2xx_spi_chip { | 101 | struct pxa2xx_spi_chip { |
102 | u8 tx_threshold; | 102 | u8 tx_threshold; |
103 | u8 rx_threshold; | 103 | u8 rx_threshold; |
104 | u8 dma_burst_size; | 104 | u8 dma_burst_size; |
105 | u32 timeout_microsecs; | 105 | u32 timeout_microsecs; |
106 | u8 enable_loopback; | 106 | u8 enable_loopback; |
107 | void (*cs_control)(u32 command); | 107 | void (*cs_control)(u32 command); |
108 | }; | 108 | }; |
109 | 109 | ||
110 | The "pxa2xx_spi_chip.tx_threshold" and "pxa2xx_spi_chip.rx_threshold" fields are | 110 | The "pxa2xx_spi_chip.tx_threshold" and "pxa2xx_spi_chip.rx_threshold" fields are |
111 | used to configure the SSP hardware fifo. These fields are critical to the | 111 | used to configure the SSP hardware fifo. These fields are critical to the |
112 | performance of pxa2xx_spi driver and misconfiguration will result in rx | 112 | performance of pxa2xx_spi driver and misconfiguration will result in rx |
113 | fifo overruns (especially in PIO mode transfers). Good default values are | 113 | fifo overruns (especially in PIO mode transfers). Good default values are |
114 | 114 | ||
115 | .tx_threshold = 12, | 115 | .tx_threshold = 12, |
116 | .rx_threshold = 4, | 116 | .rx_threshold = 4, |
117 | 117 | ||
118 | The "pxa2xx_spi_chip.dma_burst_size" field is used to configure PXA2xx DMA | 118 | The "pxa2xx_spi_chip.dma_burst_size" field is used to configure PXA2xx DMA |
119 | engine and is related the "spi_device.bits_per_word" field. Read and understand | 119 | engine and is related the "spi_device.bits_per_word" field. Read and understand |
120 | the PXA2xx "Developer Manual" sections on the DMA controller and SSP Controllers | 120 | the PXA2xx "Developer Manual" sections on the DMA controller and SSP Controllers |
121 | to determine the correct value. An SSP configured for byte-wide transfers would | 121 | to determine the correct value. An SSP configured for byte-wide transfers would |
122 | use a value of 8. | 122 | use a value of 8. |
123 | 123 | ||
124 | The "pxa2xx_spi_chip.timeout_microsecs" fields is used to efficiently handle | 124 | The "pxa2xx_spi_chip.timeout_microsecs" fields is used to efficiently handle |
125 | trailing bytes in the SSP receiver fifo. The correct value for this field is | 125 | trailing bytes in the SSP receiver fifo. The correct value for this field is |
126 | dependent on the SPI bus speed ("spi_board_info.max_speed_hz") and the specific | 126 | dependent on the SPI bus speed ("spi_board_info.max_speed_hz") and the specific |
127 | slave device. Please note the the PXA2xx SSP 1 does not support trailing byte | 127 | slave device. Please note that the PXA2xx SSP 1 does not support trailing byte |
128 | timeouts and must busy-wait any trailing bytes. | 128 | timeouts and must busy-wait any trailing bytes. |
129 | 129 | ||
130 | The "pxa2xx_spi_chip.enable_loopback" field is used to place the SSP porting | 130 | The "pxa2xx_spi_chip.enable_loopback" field is used to place the SSP porting |
131 | into internal loopback mode. In this mode the SSP controller internally | 131 | into internal loopback mode. In this mode the SSP controller internally |
132 | connects the SSPTX pin the the SSPRX pin. This is useful for initial setup | 132 | connects the SSPTX pin to the SSPRX pin. This is useful for initial setup |
133 | testing. | 133 | testing. |
134 | 134 | ||
135 | The "pxa2xx_spi_chip.cs_control" field is used to point to a board specific | 135 | The "pxa2xx_spi_chip.cs_control" field is used to point to a board specific |
136 | function for asserting/deasserting a slave device chip select. If the field is | 136 | function for asserting/deasserting a slave device chip select. If the field is |
137 | NULL, the pxa2xx_spi master controller driver assumes that the SSP port is | 137 | NULL, the pxa2xx_spi master controller driver assumes that the SSP port is |
138 | configured to use SSPFRM instead. | 138 | configured to use SSPFRM instead. |
139 | 139 | ||
140 | NSSP SALVE SAMPLE | 140 | NSSP SALVE SAMPLE |
141 | ----------------- | 141 | ----------------- |
142 | The pxa2xx_spi_chip structure is passed to the pxa2xx_spi driver in the | 142 | The pxa2xx_spi_chip structure is passed to the pxa2xx_spi driver in the |
143 | "spi_board_info.controller_data" field. Below is a sample configuration using | 143 | "spi_board_info.controller_data" field. Below is a sample configuration using |
144 | the PXA255 NSSP. | 144 | the PXA255 NSSP. |
145 | 145 | ||
146 | /* Chip Select control for the CS8415A SPI slave device */ | 146 | /* Chip Select control for the CS8415A SPI slave device */ |
147 | static void cs8415a_cs_control(u32 command) | 147 | static void cs8415a_cs_control(u32 command) |
148 | { | 148 | { |
149 | if (command & PXA2XX_CS_ASSERT) | 149 | if (command & PXA2XX_CS_ASSERT) |
150 | GPCR(2) = GPIO_bit(2); | 150 | GPCR(2) = GPIO_bit(2); |
151 | else | 151 | else |
152 | GPSR(2) = GPIO_bit(2); | 152 | GPSR(2) = GPIO_bit(2); |
153 | } | 153 | } |
154 | 154 | ||
155 | /* Chip Select control for the CS8405A SPI slave device */ | 155 | /* Chip Select control for the CS8405A SPI slave device */ |
156 | static void cs8405a_cs_control(u32 command) | 156 | static void cs8405a_cs_control(u32 command) |
157 | { | 157 | { |
158 | if (command & PXA2XX_CS_ASSERT) | 158 | if (command & PXA2XX_CS_ASSERT) |
159 | GPCR(3) = GPIO_bit(3); | 159 | GPCR(3) = GPIO_bit(3); |
160 | else | 160 | else |
161 | GPSR(3) = GPIO_bit(3); | 161 | GPSR(3) = GPIO_bit(3); |
162 | } | 162 | } |
163 | 163 | ||
164 | static struct pxa2xx_spi_chip cs8415a_chip_info = { | 164 | static struct pxa2xx_spi_chip cs8415a_chip_info = { |
165 | .tx_threshold = 12, /* SSP hardward FIFO threshold */ | 165 | .tx_threshold = 12, /* SSP hardward FIFO threshold */ |
166 | .rx_threshold = 4, /* SSP hardward FIFO threshold */ | 166 | .rx_threshold = 4, /* SSP hardward FIFO threshold */ |
167 | .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ | 167 | .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ |
168 | .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ | 168 | .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ |
169 | .cs_control = cs8415a_cs_control, /* Use external chip select */ | 169 | .cs_control = cs8415a_cs_control, /* Use external chip select */ |
170 | }; | 170 | }; |
171 | 171 | ||
172 | static struct pxa2xx_spi_chip cs8405a_chip_info = { | 172 | static struct pxa2xx_spi_chip cs8405a_chip_info = { |
173 | .tx_threshold = 12, /* SSP hardward FIFO threshold */ | 173 | .tx_threshold = 12, /* SSP hardward FIFO threshold */ |
174 | .rx_threshold = 4, /* SSP hardward FIFO threshold */ | 174 | .rx_threshold = 4, /* SSP hardward FIFO threshold */ |
175 | .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ | 175 | .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ |
176 | .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ | 176 | .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ |
177 | .cs_control = cs8405a_cs_control, /* Use external chip select */ | 177 | .cs_control = cs8405a_cs_control, /* Use external chip select */ |
178 | }; | 178 | }; |
179 | 179 | ||
180 | static struct spi_board_info streetracer_spi_board_info[] __initdata = { | 180 | static struct spi_board_info streetracer_spi_board_info[] __initdata = { |
181 | { | 181 | { |
182 | .modalias = "cs8415a", /* Name of spi_driver for this device */ | 182 | .modalias = "cs8415a", /* Name of spi_driver for this device */ |
183 | .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ | 183 | .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ |
184 | .bus_num = 2, /* Framework bus number */ | 184 | .bus_num = 2, /* Framework bus number */ |
185 | .chip_select = 0, /* Framework chip select */ | 185 | .chip_select = 0, /* Framework chip select */ |
186 | .platform_data = NULL; /* No spi_driver specific config */ | 186 | .platform_data = NULL; /* No spi_driver specific config */ |
187 | .controller_data = &cs8415a_chip_info, /* Master chip config */ | 187 | .controller_data = &cs8415a_chip_info, /* Master chip config */ |
188 | .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ | 188 | .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ |
189 | }, | 189 | }, |
190 | { | 190 | { |
191 | .modalias = "cs8405a", /* Name of spi_driver for this device */ | 191 | .modalias = "cs8405a", /* Name of spi_driver for this device */ |
192 | .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ | 192 | .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ |
193 | .bus_num = 2, /* Framework bus number */ | 193 | .bus_num = 2, /* Framework bus number */ |
194 | .chip_select = 1, /* Framework chip select */ | 194 | .chip_select = 1, /* Framework chip select */ |
195 | .controller_data = &cs8405a_chip_info, /* Master chip config */ | 195 | .controller_data = &cs8405a_chip_info, /* Master chip config */ |
196 | .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ | 196 | .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ |
197 | }, | 197 | }, |
198 | }; | 198 | }; |
199 | 199 | ||
200 | static void __init streetracer_init(void) | 200 | static void __init streetracer_init(void) |
201 | { | 201 | { |
202 | spi_register_board_info(streetracer_spi_board_info, | 202 | spi_register_board_info(streetracer_spi_board_info, |
203 | ARRAY_SIZE(streetracer_spi_board_info)); | 203 | ARRAY_SIZE(streetracer_spi_board_info)); |
204 | } | 204 | } |
205 | 205 | ||
206 | 206 | ||
207 | DMA and PIO I/O Support | 207 | DMA and PIO I/O Support |
208 | ----------------------- | 208 | ----------------------- |
209 | The pxa2xx_spi driver support both DMA and interrupt driven PIO message | 209 | The pxa2xx_spi driver support both DMA and interrupt driven PIO message |
210 | transfers. The driver defaults to PIO mode and DMA transfers must enabled by | 210 | transfers. The driver defaults to PIO mode and DMA transfers must enabled by |
211 | setting the "enable_dma" flag in the "pxa2xx_spi_master" structure and and | 211 | setting the "enable_dma" flag in the "pxa2xx_spi_master" structure and |
212 | ensuring that the "pxa2xx_spi_chip.dma_burst_size" field is non-zero. The DMA | 212 | ensuring that the "pxa2xx_spi_chip.dma_burst_size" field is non-zero. The DMA |
213 | mode support both coherent and stream based DMA mappings. | 213 | mode support both coherent and stream based DMA mappings. |
214 | 214 | ||
215 | The following logic is used to determine the type of I/O to be used on | 215 | The following logic is used to determine the type of I/O to be used on |
216 | a per "spi_transfer" basis: | 216 | a per "spi_transfer" basis: |
217 | 217 | ||
218 | if !enable_dma or dma_burst_size == 0 then | 218 | if !enable_dma or dma_burst_size == 0 then |
219 | always use PIO transfers | 219 | always use PIO transfers |
220 | 220 | ||
221 | if spi_message.is_dma_mapped and rx_dma_buf != 0 and tx_dma_buf != 0 then | 221 | if spi_message.is_dma_mapped and rx_dma_buf != 0 and tx_dma_buf != 0 then |
222 | use coherent DMA mode | 222 | use coherent DMA mode |
223 | 223 | ||
224 | if rx_buf and tx_buf are aligned on 8 byte boundary then | 224 | if rx_buf and tx_buf are aligned on 8 byte boundary then |
225 | use streaming DMA mode | 225 | use streaming DMA mode |
226 | 226 | ||
227 | otherwise | 227 | otherwise |
228 | use PIO transfer | 228 | use PIO transfer |
229 | 229 | ||
230 | THANKS TO | 230 | THANKS TO |
231 | --------- | 231 | --------- |
232 | 232 | ||
233 | David Brownell and others for mentoring the development of this driver. | 233 | David Brownell and others for mentoring the development of this driver. |
234 | 234 | ||
235 | 235 |
Documentation/spi/spi-summary
1 | Overview of Linux kernel SPI support | 1 | Overview of Linux kernel SPI support |
2 | ==================================== | 2 | ==================================== |
3 | 3 | ||
4 | 02-Dec-2005 | 4 | 02-Dec-2005 |
5 | 5 | ||
6 | What is SPI? | 6 | What is SPI? |
7 | ------------ | 7 | ------------ |
8 | The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial | 8 | The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial |
9 | link used to connect microcontrollers to sensors, memory, and peripherals. | 9 | link used to connect microcontrollers to sensors, memory, and peripherals. |
10 | 10 | ||
11 | The three signal wires hold a clock (SCLK, often on the order of 10 MHz), | 11 | The three signal wires hold a clock (SCLK, often on the order of 10 MHz), |
12 | and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, | 12 | and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, |
13 | Slave Out" (MISO) signals. (Other names are also used.) There are four | 13 | Slave Out" (MISO) signals. (Other names are also used.) There are four |
14 | clocking modes through which data is exchanged; mode-0 and mode-3 are most | 14 | clocking modes through which data is exchanged; mode-0 and mode-3 are most |
15 | commonly used. Each clock cycle shifts data out and data in; the clock | 15 | commonly used. Each clock cycle shifts data out and data in; the clock |
16 | doesn't cycle except when there is data to shift. | 16 | doesn't cycle except when there is data to shift. |
17 | 17 | ||
18 | SPI masters may use a "chip select" line to activate a given SPI slave | 18 | SPI masters may use a "chip select" line to activate a given SPI slave |
19 | device, so those three signal wires may be connected to several chips | 19 | device, so those three signal wires may be connected to several chips |
20 | in parallel. All SPI slaves support chipselects. Some devices have | 20 | in parallel. All SPI slaves support chipselects. Some devices have |
21 | other signals, often including an interrupt to the master. | 21 | other signals, often including an interrupt to the master. |
22 | 22 | ||
23 | Unlike serial busses like USB or SMBUS, even low level protocols for | 23 | Unlike serial busses like USB or SMBUS, even low level protocols for |
24 | SPI slave functions are usually not interoperable between vendors | 24 | SPI slave functions are usually not interoperable between vendors |
25 | (except for cases like SPI memory chips). | 25 | (except for cases like SPI memory chips). |
26 | 26 | ||
27 | - SPI may be used for request/response style device protocols, as with | 27 | - SPI may be used for request/response style device protocols, as with |
28 | touchscreen sensors and memory chips. | 28 | touchscreen sensors and memory chips. |
29 | 29 | ||
30 | - It may also be used to stream data in either direction (half duplex), | 30 | - It may also be used to stream data in either direction (half duplex), |
31 | or both of them at the same time (full duplex). | 31 | or both of them at the same time (full duplex). |
32 | 32 | ||
33 | - Some devices may use eight bit words. Others may different word | 33 | - Some devices may use eight bit words. Others may different word |
34 | lengths, such as streams of 12-bit or 20-bit digital samples. | 34 | lengths, such as streams of 12-bit or 20-bit digital samples. |
35 | 35 | ||
36 | In the same way, SPI slaves will only rarely support any kind of automatic | 36 | In the same way, SPI slaves will only rarely support any kind of automatic |
37 | discovery/enumeration protocol. The tree of slave devices accessible from | 37 | discovery/enumeration protocol. The tree of slave devices accessible from |
38 | a given SPI master will normally be set up manually, with configuration | 38 | a given SPI master will normally be set up manually, with configuration |
39 | tables. | 39 | tables. |
40 | 40 | ||
41 | SPI is only one of the names used by such four-wire protocols, and | 41 | SPI is only one of the names used by such four-wire protocols, and |
42 | most controllers have no problem handling "MicroWire" (think of it as | 42 | most controllers have no problem handling "MicroWire" (think of it as |
43 | half-duplex SPI, for request/response protocols), SSP ("Synchronous | 43 | half-duplex SPI, for request/response protocols), SSP ("Synchronous |
44 | Serial Protocol"), PSP ("Programmable Serial Protocol"), and other | 44 | Serial Protocol"), PSP ("Programmable Serial Protocol"), and other |
45 | related protocols. | 45 | related protocols. |
46 | 46 | ||
47 | Microcontrollers often support both master and slave sides of the SPI | 47 | Microcontrollers often support both master and slave sides of the SPI |
48 | protocol. This document (and Linux) currently only supports the master | 48 | protocol. This document (and Linux) currently only supports the master |
49 | side of SPI interactions. | 49 | side of SPI interactions. |
50 | 50 | ||
51 | 51 | ||
52 | Who uses it? On what kinds of systems? | 52 | Who uses it? On what kinds of systems? |
53 | --------------------------------------- | 53 | --------------------------------------- |
54 | Linux developers using SPI are probably writing device drivers for embedded | 54 | Linux developers using SPI are probably writing device drivers for embedded |
55 | systems boards. SPI is used to control external chips, and it is also a | 55 | systems boards. SPI is used to control external chips, and it is also a |
56 | protocol supported by every MMC or SD memory card. (The older "DataFlash" | 56 | protocol supported by every MMC or SD memory card. (The older "DataFlash" |
57 | cards, predating MMC cards but using the same connectors and card shape, | 57 | cards, predating MMC cards but using the same connectors and card shape, |
58 | support only SPI.) Some PC hardware uses SPI flash for BIOS code. | 58 | support only SPI.) Some PC hardware uses SPI flash for BIOS code. |
59 | 59 | ||
60 | SPI slave chips range from digital/analog converters used for analog | 60 | SPI slave chips range from digital/analog converters used for analog |
61 | sensors and codecs, to memory, to peripherals like USB controllers | 61 | sensors and codecs, to memory, to peripherals like USB controllers |
62 | or Ethernet adapters; and more. | 62 | or Ethernet adapters; and more. |
63 | 63 | ||
64 | Most systems using SPI will integrate a few devices on a mainboard. | 64 | Most systems using SPI will integrate a few devices on a mainboard. |
65 | Some provide SPI links on expansion connectors; in cases where no | 65 | Some provide SPI links on expansion connectors; in cases where no |
66 | dedicated SPI controller exists, GPIO pins can be used to create a | 66 | dedicated SPI controller exists, GPIO pins can be used to create a |
67 | low speed "bitbanging" adapter. Very few systems will "hotplug" an SPI | 67 | low speed "bitbanging" adapter. Very few systems will "hotplug" an SPI |
68 | controller; the reasons to use SPI focus on low cost and simple operation, | 68 | controller; the reasons to use SPI focus on low cost and simple operation, |
69 | and if dynamic reconfiguration is important, USB will often be a more | 69 | and if dynamic reconfiguration is important, USB will often be a more |
70 | appropriate low-pincount peripheral bus. | 70 | appropriate low-pincount peripheral bus. |
71 | 71 | ||
72 | Many microcontrollers that can run Linux integrate one or more I/O | 72 | Many microcontrollers that can run Linux integrate one or more I/O |
73 | interfaces with SPI modes. Given SPI support, they could use MMC or SD | 73 | interfaces with SPI modes. Given SPI support, they could use MMC or SD |
74 | cards without needing a special purpose MMC/SD/SDIO controller. | 74 | cards without needing a special purpose MMC/SD/SDIO controller. |
75 | 75 | ||
76 | 76 | ||
77 | How do these driver programming interfaces work? | 77 | How do these driver programming interfaces work? |
78 | ------------------------------------------------ | 78 | ------------------------------------------------ |
79 | The <linux/spi/spi.h> header file includes kerneldoc, as does the | 79 | The <linux/spi/spi.h> header file includes kerneldoc, as does the |
80 | main source code, and you should certainly read that. This is just | 80 | main source code, and you should certainly read that. This is just |
81 | an overview, so you get the big picture before the details. | 81 | an overview, so you get the big picture before the details. |
82 | 82 | ||
83 | SPI requests always go into I/O queues. Requests for a given SPI device | 83 | SPI requests always go into I/O queues. Requests for a given SPI device |
84 | are always executed in FIFO order, and complete asynchronously through | 84 | are always executed in FIFO order, and complete asynchronously through |
85 | completion callbacks. There are also some simple synchronous wrappers | 85 | completion callbacks. There are also some simple synchronous wrappers |
86 | for those calls, including ones for common transaction types like writing | 86 | for those calls, including ones for common transaction types like writing |
87 | a command and then reading its response. | 87 | a command and then reading its response. |
88 | 88 | ||
89 | There are two types of SPI driver, here called: | 89 | There are two types of SPI driver, here called: |
90 | 90 | ||
91 | Controller drivers ... these are often built in to System-On-Chip | 91 | Controller drivers ... these are often built in to System-On-Chip |
92 | processors, and often support both Master and Slave roles. | 92 | processors, and often support both Master and Slave roles. |
93 | These drivers touch hardware registers and may use DMA. | 93 | These drivers touch hardware registers and may use DMA. |
94 | Or they can be PIO bitbangers, needing just GPIO pins. | 94 | Or they can be PIO bitbangers, needing just GPIO pins. |
95 | 95 | ||
96 | Protocol drivers ... these pass messages through the controller | 96 | Protocol drivers ... these pass messages through the controller |
97 | driver to communicate with a Slave or Master device on the | 97 | driver to communicate with a Slave or Master device on the |
98 | other side of an SPI link. | 98 | other side of an SPI link. |
99 | 99 | ||
100 | So for example one protocol driver might talk to the MTD layer to export | 100 | So for example one protocol driver might talk to the MTD layer to export |
101 | data to filesystems stored on SPI flash like DataFlash; and others might | 101 | data to filesystems stored on SPI flash like DataFlash; and others might |
102 | control audio interfaces, present touchscreen sensors as input interfaces, | 102 | control audio interfaces, present touchscreen sensors as input interfaces, |
103 | or monitor temperature and voltage levels during industrial processing. | 103 | or monitor temperature and voltage levels during industrial processing. |
104 | And those might all be sharing the same controller driver. | 104 | And those might all be sharing the same controller driver. |
105 | 105 | ||
106 | A "struct spi_device" encapsulates the master-side interface between | 106 | A "struct spi_device" encapsulates the master-side interface between |
107 | those two types of driver. At this writing, Linux has no slave side | 107 | those two types of driver. At this writing, Linux has no slave side |
108 | programming interface. | 108 | programming interface. |
109 | 109 | ||
110 | There is a minimal core of SPI programming interfaces, focussing on | 110 | There is a minimal core of SPI programming interfaces, focussing on |
111 | using driver model to connect controller and protocol drivers using | 111 | using driver model to connect controller and protocol drivers using |
112 | device tables provided by board specific initialization code. SPI | 112 | device tables provided by board specific initialization code. SPI |
113 | shows up in sysfs in several locations: | 113 | shows up in sysfs in several locations: |
114 | 114 | ||
115 | /sys/devices/.../CTLR/spiB.C ... spi_device for on bus "B", | 115 | /sys/devices/.../CTLR/spiB.C ... spi_device for on bus "B", |
116 | chipselect C, accessed through CTLR. | 116 | chipselect C, accessed through CTLR. |
117 | 117 | ||
118 | /sys/devices/.../CTLR/spiB.C/modalias ... identifies the driver | 118 | /sys/devices/.../CTLR/spiB.C/modalias ... identifies the driver |
119 | that should be used with this device (for hotplug/coldplug) | 119 | that should be used with this device (for hotplug/coldplug) |
120 | 120 | ||
121 | /sys/bus/spi/devices/spiB.C ... symlink to the physical | 121 | /sys/bus/spi/devices/spiB.C ... symlink to the physical |
122 | spiB-C device | 122 | spiB-C device |
123 | 123 | ||
124 | /sys/bus/spi/drivers/D ... driver for one or more spi*.* devices | 124 | /sys/bus/spi/drivers/D ... driver for one or more spi*.* devices |
125 | 125 | ||
126 | /sys/class/spi_master/spiB ... class device for the controller | 126 | /sys/class/spi_master/spiB ... class device for the controller |
127 | managing bus "B". All the spiB.* devices share the same | 127 | managing bus "B". All the spiB.* devices share the same |
128 | physical SPI bus segment, with SCLK, MOSI, and MISO. | 128 | physical SPI bus segment, with SCLK, MOSI, and MISO. |
129 | 129 | ||
130 | 130 | ||
131 | How does board-specific init code declare SPI devices? | 131 | How does board-specific init code declare SPI devices? |
132 | ------------------------------------------------------ | 132 | ------------------------------------------------------ |
133 | Linux needs several kinds of information to properly configure SPI devices. | 133 | Linux needs several kinds of information to properly configure SPI devices. |
134 | That information is normally provided by board-specific code, even for | 134 | That information is normally provided by board-specific code, even for |
135 | chips that do support some of automated discovery/enumeration. | 135 | chips that do support some of automated discovery/enumeration. |
136 | 136 | ||
137 | DECLARE CONTROLLERS | 137 | DECLARE CONTROLLERS |
138 | 138 | ||
139 | The first kind of information is a list of what SPI controllers exist. | 139 | The first kind of information is a list of what SPI controllers exist. |
140 | For System-on-Chip (SOC) based boards, these will usually be platform | 140 | For System-on-Chip (SOC) based boards, these will usually be platform |
141 | devices, and the controller may need some platform_data in order to | 141 | devices, and the controller may need some platform_data in order to |
142 | operate properly. The "struct platform_device" will include resources | 142 | operate properly. The "struct platform_device" will include resources |
143 | like the physical address of the controller's first register and its IRQ. | 143 | like the physical address of the controller's first register and its IRQ. |
144 | 144 | ||
145 | Platforms will often abstract the "register SPI controller" operation, | 145 | Platforms will often abstract the "register SPI controller" operation, |
146 | maybe coupling it with code to initialize pin configurations, so that | 146 | maybe coupling it with code to initialize pin configurations, so that |
147 | the arch/.../mach-*/board-*.c files for several boards can all share the | 147 | the arch/.../mach-*/board-*.c files for several boards can all share the |
148 | same basic controller setup code. This is because most SOCs have several | 148 | same basic controller setup code. This is because most SOCs have several |
149 | SPI-capable controllers, and only the ones actually usable on a given | 149 | SPI-capable controllers, and only the ones actually usable on a given |
150 | board should normally be set up and registered. | 150 | board should normally be set up and registered. |
151 | 151 | ||
152 | So for example arch/.../mach-*/board-*.c files might have code like: | 152 | So for example arch/.../mach-*/board-*.c files might have code like: |
153 | 153 | ||
154 | #include <asm/arch/spi.h> /* for mysoc_spi_data */ | 154 | #include <asm/arch/spi.h> /* for mysoc_spi_data */ |
155 | 155 | ||
156 | /* if your mach-* infrastructure doesn't support kernels that can | 156 | /* if your mach-* infrastructure doesn't support kernels that can |
157 | * run on multiple boards, pdata wouldn't benefit from "__init". | 157 | * run on multiple boards, pdata wouldn't benefit from "__init". |
158 | */ | 158 | */ |
159 | static struct mysoc_spi_data __init pdata = { ... }; | 159 | static struct mysoc_spi_data __init pdata = { ... }; |
160 | 160 | ||
161 | static __init board_init(void) | 161 | static __init board_init(void) |
162 | { | 162 | { |
163 | ... | 163 | ... |
164 | /* this board only uses SPI controller #2 */ | 164 | /* this board only uses SPI controller #2 */ |
165 | mysoc_register_spi(2, &pdata); | 165 | mysoc_register_spi(2, &pdata); |
166 | ... | 166 | ... |
167 | } | 167 | } |
168 | 168 | ||
169 | And SOC-specific utility code might look something like: | 169 | And SOC-specific utility code might look something like: |
170 | 170 | ||
171 | #include <asm/arch/spi.h> | 171 | #include <asm/arch/spi.h> |
172 | 172 | ||
173 | static struct platform_device spi2 = { ... }; | 173 | static struct platform_device spi2 = { ... }; |
174 | 174 | ||
175 | void mysoc_register_spi(unsigned n, struct mysoc_spi_data *pdata) | 175 | void mysoc_register_spi(unsigned n, struct mysoc_spi_data *pdata) |
176 | { | 176 | { |
177 | struct mysoc_spi_data *pdata2; | 177 | struct mysoc_spi_data *pdata2; |
178 | 178 | ||
179 | pdata2 = kmalloc(sizeof *pdata2, GFP_KERNEL); | 179 | pdata2 = kmalloc(sizeof *pdata2, GFP_KERNEL); |
180 | *pdata2 = pdata; | 180 | *pdata2 = pdata; |
181 | ... | 181 | ... |
182 | if (n == 2) { | 182 | if (n == 2) { |
183 | spi2->dev.platform_data = pdata2; | 183 | spi2->dev.platform_data = pdata2; |
184 | register_platform_device(&spi2); | 184 | register_platform_device(&spi2); |
185 | 185 | ||
186 | /* also: set up pin modes so the spi2 signals are | 186 | /* also: set up pin modes so the spi2 signals are |
187 | * visible on the relevant pins ... bootloaders on | 187 | * visible on the relevant pins ... bootloaders on |
188 | * production boards may already have done this, but | 188 | * production boards may already have done this, but |
189 | * developer boards will often need Linux to do it. | 189 | * developer boards will often need Linux to do it. |
190 | */ | 190 | */ |
191 | } | 191 | } |
192 | ... | 192 | ... |
193 | } | 193 | } |
194 | 194 | ||
195 | Notice how the platform_data for boards may be different, even if the | 195 | Notice how the platform_data for boards may be different, even if the |
196 | same SOC controller is used. For example, on one board SPI might use | 196 | same SOC controller is used. For example, on one board SPI might use |
197 | an external clock, where another derives the SPI clock from current | 197 | an external clock, where another derives the SPI clock from current |
198 | settings of some master clock. | 198 | settings of some master clock. |
199 | 199 | ||
200 | 200 | ||
201 | DECLARE SLAVE DEVICES | 201 | DECLARE SLAVE DEVICES |
202 | 202 | ||
203 | The second kind of information is a list of what SPI slave devices exist | 203 | The second kind of information is a list of what SPI slave devices exist |
204 | on the target board, often with some board-specific data needed for the | 204 | on the target board, often with some board-specific data needed for the |
205 | driver to work correctly. | 205 | driver to work correctly. |
206 | 206 | ||
207 | Normally your arch/.../mach-*/board-*.c files would provide a small table | 207 | Normally your arch/.../mach-*/board-*.c files would provide a small table |
208 | listing the SPI devices on each board. (This would typically be only a | 208 | listing the SPI devices on each board. (This would typically be only a |
209 | small handful.) That might look like: | 209 | small handful.) That might look like: |
210 | 210 | ||
211 | static struct ads7846_platform_data ads_info = { | 211 | static struct ads7846_platform_data ads_info = { |
212 | .vref_delay_usecs = 100, | 212 | .vref_delay_usecs = 100, |
213 | .x_plate_ohms = 580, | 213 | .x_plate_ohms = 580, |
214 | .y_plate_ohms = 410, | 214 | .y_plate_ohms = 410, |
215 | }; | 215 | }; |
216 | 216 | ||
217 | static struct spi_board_info spi_board_info[] __initdata = { | 217 | static struct spi_board_info spi_board_info[] __initdata = { |
218 | { | 218 | { |
219 | .modalias = "ads7846", | 219 | .modalias = "ads7846", |
220 | .platform_data = &ads_info, | 220 | .platform_data = &ads_info, |
221 | .mode = SPI_MODE_0, | 221 | .mode = SPI_MODE_0, |
222 | .irq = GPIO_IRQ(31), | 222 | .irq = GPIO_IRQ(31), |
223 | .max_speed_hz = 120000 /* max sample rate at 3V */ * 16, | 223 | .max_speed_hz = 120000 /* max sample rate at 3V */ * 16, |
224 | .bus_num = 1, | 224 | .bus_num = 1, |
225 | .chip_select = 0, | 225 | .chip_select = 0, |
226 | }, | 226 | }, |
227 | }; | 227 | }; |
228 | 228 | ||
229 | Again, notice how board-specific information is provided; each chip may need | 229 | Again, notice how board-specific information is provided; each chip may need |
230 | several types. This example shows generic constraints like the fastest SPI | 230 | several types. This example shows generic constraints like the fastest SPI |
231 | clock to allow (a function of board voltage in this case) or how an IRQ pin | 231 | clock to allow (a function of board voltage in this case) or how an IRQ pin |
232 | is wired, plus chip-specific constraints like an important delay that's | 232 | is wired, plus chip-specific constraints like an important delay that's |
233 | changed by the capacitance at one pin. | 233 | changed by the capacitance at one pin. |
234 | 234 | ||
235 | (There's also "controller_data", information that may be useful to the | 235 | (There's also "controller_data", information that may be useful to the |
236 | controller driver. An example would be peripheral-specific DMA tuning | 236 | controller driver. An example would be peripheral-specific DMA tuning |
237 | data or chipselect callbacks. This is stored in spi_device later.) | 237 | data or chipselect callbacks. This is stored in spi_device later.) |
238 | 238 | ||
239 | The board_info should provide enough information to let the system work | 239 | The board_info should provide enough information to let the system work |
240 | without the chip's driver being loaded. The most troublesome aspect of | 240 | without the chip's driver being loaded. The most troublesome aspect of |
241 | that is likely the SPI_CS_HIGH bit in the spi_device.mode field, since | 241 | that is likely the SPI_CS_HIGH bit in the spi_device.mode field, since |
242 | sharing a bus with a device that interprets chipselect "backwards" is | 242 | sharing a bus with a device that interprets chipselect "backwards" is |
243 | not possible. | 243 | not possible. |
244 | 244 | ||
245 | Then your board initialization code would register that table with the SPI | 245 | Then your board initialization code would register that table with the SPI |
246 | infrastructure, so that it's available later when the SPI master controller | 246 | infrastructure, so that it's available later when the SPI master controller |
247 | driver is registered: | 247 | driver is registered: |
248 | 248 | ||
249 | spi_register_board_info(spi_board_info, ARRAY_SIZE(spi_board_info)); | 249 | spi_register_board_info(spi_board_info, ARRAY_SIZE(spi_board_info)); |
250 | 250 | ||
251 | Like with other static board-specific setup, you won't unregister those. | 251 | Like with other static board-specific setup, you won't unregister those. |
252 | 252 | ||
253 | The widely used "card" style computers bundle memory, cpu, and little else | 253 | The widely used "card" style computers bundle memory, cpu, and little else |
254 | onto a card that's maybe just thirty square centimeters. On such systems, | 254 | onto a card that's maybe just thirty square centimeters. On such systems, |
255 | your arch/.../mach-.../board-*.c file would primarily provide information | 255 | your arch/.../mach-.../board-*.c file would primarily provide information |
256 | about the devices on the mainboard into which such a card is plugged. That | 256 | about the devices on the mainboard into which such a card is plugged. That |
257 | certainly includes SPI devices hooked up through the card connectors! | 257 | certainly includes SPI devices hooked up through the card connectors! |
258 | 258 | ||
259 | 259 | ||
260 | NON-STATIC CONFIGURATIONS | 260 | NON-STATIC CONFIGURATIONS |
261 | 261 | ||
262 | Developer boards often play by different rules than product boards, and one | 262 | Developer boards often play by different rules than product boards, and one |
263 | example is the potential need to hotplug SPI devices and/or controllers. | 263 | example is the potential need to hotplug SPI devices and/or controllers. |
264 | 264 | ||
265 | For those cases you might need to use use spi_busnum_to_master() to look | 265 | For those cases you might need to use spi_busnum_to_master() to look |
266 | up the spi bus master, and will likely need spi_new_device() to provide the | 266 | up the spi bus master, and will likely need spi_new_device() to provide the |
267 | board info based on the board that was hotplugged. Of course, you'd later | 267 | board info based on the board that was hotplugged. Of course, you'd later |
268 | call at least spi_unregister_device() when that board is removed. | 268 | call at least spi_unregister_device() when that board is removed. |
269 | 269 | ||
270 | When Linux includes support for MMC/SD/SDIO/DataFlash cards through SPI, those | 270 | When Linux includes support for MMC/SD/SDIO/DataFlash cards through SPI, those |
271 | configurations will also be dynamic. Fortunately, those devices all support | 271 | configurations will also be dynamic. Fortunately, those devices all support |
272 | basic device identification probes, so that support should hotplug normally. | 272 | basic device identification probes, so that support should hotplug normally. |
273 | 273 | ||
274 | 274 | ||
275 | How do I write an "SPI Protocol Driver"? | 275 | How do I write an "SPI Protocol Driver"? |
276 | ---------------------------------------- | 276 | ---------------------------------------- |
277 | All SPI drivers are currently kernel drivers. A userspace driver API | 277 | All SPI drivers are currently kernel drivers. A userspace driver API |
278 | would just be another kernel driver, probably offering some lowlevel | 278 | would just be another kernel driver, probably offering some lowlevel |
279 | access through aio_read(), aio_write(), and ioctl() calls and using the | 279 | access through aio_read(), aio_write(), and ioctl() calls and using the |
280 | standard userspace sysfs mechanisms to bind to a given SPI device. | 280 | standard userspace sysfs mechanisms to bind to a given SPI device. |
281 | 281 | ||
282 | SPI protocol drivers somewhat resemble platform device drivers: | 282 | SPI protocol drivers somewhat resemble platform device drivers: |
283 | 283 | ||
284 | static struct spi_driver CHIP_driver = { | 284 | static struct spi_driver CHIP_driver = { |
285 | .driver = { | 285 | .driver = { |
286 | .name = "CHIP", | 286 | .name = "CHIP", |
287 | .bus = &spi_bus_type, | 287 | .bus = &spi_bus_type, |
288 | .owner = THIS_MODULE, | 288 | .owner = THIS_MODULE, |
289 | }, | 289 | }, |
290 | 290 | ||
291 | .probe = CHIP_probe, | 291 | .probe = CHIP_probe, |
292 | .remove = __devexit_p(CHIP_remove), | 292 | .remove = __devexit_p(CHIP_remove), |
293 | .suspend = CHIP_suspend, | 293 | .suspend = CHIP_suspend, |
294 | .resume = CHIP_resume, | 294 | .resume = CHIP_resume, |
295 | }; | 295 | }; |
296 | 296 | ||
297 | The driver core will autmatically attempt to bind this driver to any SPI | 297 | The driver core will autmatically attempt to bind this driver to any SPI |
298 | device whose board_info gave a modalias of "CHIP". Your probe() code | 298 | device whose board_info gave a modalias of "CHIP". Your probe() code |
299 | might look like this unless you're creating a class_device: | 299 | might look like this unless you're creating a class_device: |
300 | 300 | ||
301 | static int __devinit CHIP_probe(struct spi_device *spi) | 301 | static int __devinit CHIP_probe(struct spi_device *spi) |
302 | { | 302 | { |
303 | struct CHIP *chip; | 303 | struct CHIP *chip; |
304 | struct CHIP_platform_data *pdata; | 304 | struct CHIP_platform_data *pdata; |
305 | 305 | ||
306 | /* assuming the driver requires board-specific data: */ | 306 | /* assuming the driver requires board-specific data: */ |
307 | pdata = &spi->dev.platform_data; | 307 | pdata = &spi->dev.platform_data; |
308 | if (!pdata) | 308 | if (!pdata) |
309 | return -ENODEV; | 309 | return -ENODEV; |
310 | 310 | ||
311 | /* get memory for driver's per-chip state */ | 311 | /* get memory for driver's per-chip state */ |
312 | chip = kzalloc(sizeof *chip, GFP_KERNEL); | 312 | chip = kzalloc(sizeof *chip, GFP_KERNEL); |
313 | if (!chip) | 313 | if (!chip) |
314 | return -ENOMEM; | 314 | return -ENOMEM; |
315 | dev_set_drvdata(&spi->dev, chip); | 315 | dev_set_drvdata(&spi->dev, chip); |
316 | 316 | ||
317 | ... etc | 317 | ... etc |
318 | return 0; | 318 | return 0; |
319 | } | 319 | } |
320 | 320 | ||
321 | As soon as it enters probe(), the driver may issue I/O requests to | 321 | As soon as it enters probe(), the driver may issue I/O requests to |
322 | the SPI device using "struct spi_message". When remove() returns, | 322 | the SPI device using "struct spi_message". When remove() returns, |
323 | the driver guarantees that it won't submit any more such messages. | 323 | the driver guarantees that it won't submit any more such messages. |
324 | 324 | ||
325 | - An spi_message is a sequence of of protocol operations, executed | 325 | - An spi_message is a sequence of protocol operations, executed |
326 | as one atomic sequence. SPI driver controls include: | 326 | as one atomic sequence. SPI driver controls include: |
327 | 327 | ||
328 | + when bidirectional reads and writes start ... by how its | 328 | + when bidirectional reads and writes start ... by how its |
329 | sequence of spi_transfer requests is arranged; | 329 | sequence of spi_transfer requests is arranged; |
330 | 330 | ||
331 | + optionally defining short delays after transfers ... using | 331 | + optionally defining short delays after transfers ... using |
332 | the spi_transfer.delay_usecs setting; | 332 | the spi_transfer.delay_usecs setting; |
333 | 333 | ||
334 | + whether the chipselect becomes inactive after a transfer and | 334 | + whether the chipselect becomes inactive after a transfer and |
335 | any delay ... by using the spi_transfer.cs_change flag; | 335 | any delay ... by using the spi_transfer.cs_change flag; |
336 | 336 | ||
337 | + hinting whether the next message is likely to go to this same | 337 | + hinting whether the next message is likely to go to this same |
338 | device ... using the spi_transfer.cs_change flag on the last | 338 | device ... using the spi_transfer.cs_change flag on the last |
339 | transfer in that atomic group, and potentially saving costs | 339 | transfer in that atomic group, and potentially saving costs |
340 | for chip deselect and select operations. | 340 | for chip deselect and select operations. |
341 | 341 | ||
342 | - Follow standard kernel rules, and provide DMA-safe buffers in | 342 | - Follow standard kernel rules, and provide DMA-safe buffers in |
343 | your messages. That way controller drivers using DMA aren't forced | 343 | your messages. That way controller drivers using DMA aren't forced |
344 | to make extra copies unless the hardware requires it (e.g. working | 344 | to make extra copies unless the hardware requires it (e.g. working |
345 | around hardware errata that force the use of bounce buffering). | 345 | around hardware errata that force the use of bounce buffering). |
346 | 346 | ||
347 | If standard dma_map_single() handling of these buffers is inappropriate, | 347 | If standard dma_map_single() handling of these buffers is inappropriate, |
348 | you can use spi_message.is_dma_mapped to tell the controller driver | 348 | you can use spi_message.is_dma_mapped to tell the controller driver |
349 | that you've already provided the relevant DMA addresses. | 349 | that you've already provided the relevant DMA addresses. |
350 | 350 | ||
351 | - The basic I/O primitive is spi_async(). Async requests may be | 351 | - The basic I/O primitive is spi_async(). Async requests may be |
352 | issued in any context (irq handler, task, etc) and completion | 352 | issued in any context (irq handler, task, etc) and completion |
353 | is reported using a callback provided with the message. | 353 | is reported using a callback provided with the message. |
354 | After any detected error, the chip is deselected and processing | 354 | After any detected error, the chip is deselected and processing |
355 | of that spi_message is aborted. | 355 | of that spi_message is aborted. |
356 | 356 | ||
357 | - There are also synchronous wrappers like spi_sync(), and wrappers | 357 | - There are also synchronous wrappers like spi_sync(), and wrappers |
358 | like spi_read(), spi_write(), and spi_write_then_read(). These | 358 | like spi_read(), spi_write(), and spi_write_then_read(). These |
359 | may be issued only in contexts that may sleep, and they're all | 359 | may be issued only in contexts that may sleep, and they're all |
360 | clean (and small, and "optional") layers over spi_async(). | 360 | clean (and small, and "optional") layers over spi_async(). |
361 | 361 | ||
362 | - The spi_write_then_read() call, and convenience wrappers around | 362 | - The spi_write_then_read() call, and convenience wrappers around |
363 | it, should only be used with small amounts of data where the | 363 | it, should only be used with small amounts of data where the |
364 | cost of an extra copy may be ignored. It's designed to support | 364 | cost of an extra copy may be ignored. It's designed to support |
365 | common RPC-style requests, such as writing an eight bit command | 365 | common RPC-style requests, such as writing an eight bit command |
366 | and reading a sixteen bit response -- spi_w8r16() being one its | 366 | and reading a sixteen bit response -- spi_w8r16() being one its |
367 | wrappers, doing exactly that. | 367 | wrappers, doing exactly that. |
368 | 368 | ||
369 | Some drivers may need to modify spi_device characteristics like the | 369 | Some drivers may need to modify spi_device characteristics like the |
370 | transfer mode, wordsize, or clock rate. This is done with spi_setup(), | 370 | transfer mode, wordsize, or clock rate. This is done with spi_setup(), |
371 | which would normally be called from probe() before the first I/O is | 371 | which would normally be called from probe() before the first I/O is |
372 | done to the device. | 372 | done to the device. |
373 | 373 | ||
374 | While "spi_device" would be the bottom boundary of the driver, the | 374 | While "spi_device" would be the bottom boundary of the driver, the |
375 | upper boundaries might include sysfs (especially for sensor readings), | 375 | upper boundaries might include sysfs (especially for sensor readings), |
376 | the input layer, ALSA, networking, MTD, the character device framework, | 376 | the input layer, ALSA, networking, MTD, the character device framework, |
377 | or other Linux subsystems. | 377 | or other Linux subsystems. |
378 | 378 | ||
379 | Note that there are two types of memory your driver must manage as part | 379 | Note that there are two types of memory your driver must manage as part |
380 | of interacting with SPI devices. | 380 | of interacting with SPI devices. |
381 | 381 | ||
382 | - I/O buffers use the usual Linux rules, and must be DMA-safe. | 382 | - I/O buffers use the usual Linux rules, and must be DMA-safe. |
383 | You'd normally allocate them from the heap or free page pool. | 383 | You'd normally allocate them from the heap or free page pool. |
384 | Don't use the stack, or anything that's declared "static". | 384 | Don't use the stack, or anything that's declared "static". |
385 | 385 | ||
386 | - The spi_message and spi_transfer metadata used to glue those | 386 | - The spi_message and spi_transfer metadata used to glue those |
387 | I/O buffers into a group of protocol transactions. These can | 387 | I/O buffers into a group of protocol transactions. These can |
388 | be allocated anywhere it's convenient, including as part of | 388 | be allocated anywhere it's convenient, including as part of |
389 | other allocate-once driver data structures. Zero-init these. | 389 | other allocate-once driver data structures. Zero-init these. |
390 | 390 | ||
391 | If you like, spi_message_alloc() and spi_message_free() convenience | 391 | If you like, spi_message_alloc() and spi_message_free() convenience |
392 | routines are available to allocate and zero-initialize an spi_message | 392 | routines are available to allocate and zero-initialize an spi_message |
393 | with several transfers. | 393 | with several transfers. |
394 | 394 | ||
395 | 395 | ||
396 | How do I write an "SPI Master Controller Driver"? | 396 | How do I write an "SPI Master Controller Driver"? |
397 | ------------------------------------------------- | 397 | ------------------------------------------------- |
398 | An SPI controller will probably be registered on the platform_bus; write | 398 | An SPI controller will probably be registered on the platform_bus; write |
399 | a driver to bind to the device, whichever bus is involved. | 399 | a driver to bind to the device, whichever bus is involved. |
400 | 400 | ||
401 | The main task of this type of driver is to provide an "spi_master". | 401 | The main task of this type of driver is to provide an "spi_master". |
402 | Use spi_alloc_master() to allocate the master, and class_get_devdata() | 402 | Use spi_alloc_master() to allocate the master, and class_get_devdata() |
403 | to get the driver-private data allocated for that device. | 403 | to get the driver-private data allocated for that device. |
404 | 404 | ||
405 | struct spi_master *master; | 405 | struct spi_master *master; |
406 | struct CONTROLLER *c; | 406 | struct CONTROLLER *c; |
407 | 407 | ||
408 | master = spi_alloc_master(dev, sizeof *c); | 408 | master = spi_alloc_master(dev, sizeof *c); |
409 | if (!master) | 409 | if (!master) |
410 | return -ENODEV; | 410 | return -ENODEV; |
411 | 411 | ||
412 | c = class_get_devdata(&master->cdev); | 412 | c = class_get_devdata(&master->cdev); |
413 | 413 | ||
414 | The driver will initialize the fields of that spi_master, including the | 414 | The driver will initialize the fields of that spi_master, including the |
415 | bus number (maybe the same as the platform device ID) and three methods | 415 | bus number (maybe the same as the platform device ID) and three methods |
416 | used to interact with the SPI core and SPI protocol drivers. It will | 416 | used to interact with the SPI core and SPI protocol drivers. It will |
417 | also initialize its own internal state. (See below about bus numbering | 417 | also initialize its own internal state. (See below about bus numbering |
418 | and those methods.) | 418 | and those methods.) |
419 | 419 | ||
420 | After you initialize the spi_master, then use spi_register_master() to | 420 | After you initialize the spi_master, then use spi_register_master() to |
421 | publish it to the rest of the system. At that time, device nodes for | 421 | publish it to the rest of the system. At that time, device nodes for |
422 | the controller and any predeclared spi devices will be made available, | 422 | the controller and any predeclared spi devices will be made available, |
423 | and the driver model core will take care of binding them to drivers. | 423 | and the driver model core will take care of binding them to drivers. |
424 | 424 | ||
425 | If you need to remove your SPI controller driver, spi_unregister_master() | 425 | If you need to remove your SPI controller driver, spi_unregister_master() |
426 | will reverse the effect of spi_register_master(). | 426 | will reverse the effect of spi_register_master(). |
427 | 427 | ||
428 | 428 | ||
429 | BUS NUMBERING | 429 | BUS NUMBERING |
430 | 430 | ||
431 | Bus numbering is important, since that's how Linux identifies a given | 431 | Bus numbering is important, since that's how Linux identifies a given |
432 | SPI bus (shared SCK, MOSI, MISO). Valid bus numbers start at zero. On | 432 | SPI bus (shared SCK, MOSI, MISO). Valid bus numbers start at zero. On |
433 | SOC systems, the bus numbers should match the numbers defined by the chip | 433 | SOC systems, the bus numbers should match the numbers defined by the chip |
434 | manufacturer. For example, hardware controller SPI2 would be bus number 2, | 434 | manufacturer. For example, hardware controller SPI2 would be bus number 2, |
435 | and spi_board_info for devices connected to it would use that number. | 435 | and spi_board_info for devices connected to it would use that number. |
436 | 436 | ||
437 | If you don't have such hardware-assigned bus number, and for some reason | 437 | If you don't have such hardware-assigned bus number, and for some reason |
438 | you can't just assign them, then provide a negative bus number. That will | 438 | you can't just assign them, then provide a negative bus number. That will |
439 | then be replaced by a dynamically assigned number. You'd then need to treat | 439 | then be replaced by a dynamically assigned number. You'd then need to treat |
440 | this as a non-static configuration (see above). | 440 | this as a non-static configuration (see above). |
441 | 441 | ||
442 | 442 | ||
443 | SPI MASTER METHODS | 443 | SPI MASTER METHODS |
444 | 444 | ||
445 | master->setup(struct spi_device *spi) | 445 | master->setup(struct spi_device *spi) |
446 | This sets up the device clock rate, SPI mode, and word sizes. | 446 | This sets up the device clock rate, SPI mode, and word sizes. |
447 | Drivers may change the defaults provided by board_info, and then | 447 | Drivers may change the defaults provided by board_info, and then |
448 | call spi_setup(spi) to invoke this routine. It may sleep. | 448 | call spi_setup(spi) to invoke this routine. It may sleep. |
449 | 449 | ||
450 | master->transfer(struct spi_device *spi, struct spi_message *message) | 450 | master->transfer(struct spi_device *spi, struct spi_message *message) |
451 | This must not sleep. Its responsibility is arrange that the | 451 | This must not sleep. Its responsibility is arrange that the |
452 | transfer happens and its complete() callback is issued; the two | 452 | transfer happens and its complete() callback is issued; the two |
453 | will normally happen later, after other transfers complete. | 453 | will normally happen later, after other transfers complete. |
454 | 454 | ||
455 | master->cleanup(struct spi_device *spi) | 455 | master->cleanup(struct spi_device *spi) |
456 | Your controller driver may use spi_device.controller_state to hold | 456 | Your controller driver may use spi_device.controller_state to hold |
457 | state it dynamically associates with that device. If you do that, | 457 | state it dynamically associates with that device. If you do that, |
458 | be sure to provide the cleanup() method to free that state. | 458 | be sure to provide the cleanup() method to free that state. |
459 | 459 | ||
460 | 460 | ||
461 | SPI MESSAGE QUEUE | 461 | SPI MESSAGE QUEUE |
462 | 462 | ||
463 | The bulk of the driver will be managing the I/O queue fed by transfer(). | 463 | The bulk of the driver will be managing the I/O queue fed by transfer(). |
464 | 464 | ||
465 | That queue could be purely conceptual. For example, a driver used only | 465 | That queue could be purely conceptual. For example, a driver used only |
466 | for low-frequency sensor acess might be fine using synchronous PIO. | 466 | for low-frequency sensor acess might be fine using synchronous PIO. |
467 | 467 | ||
468 | But the queue will probably be very real, using message->queue, PIO, | 468 | But the queue will probably be very real, using message->queue, PIO, |
469 | often DMA (especially if the root filesystem is in SPI flash), and | 469 | often DMA (especially if the root filesystem is in SPI flash), and |
470 | execution contexts like IRQ handlers, tasklets, or workqueues (such | 470 | execution contexts like IRQ handlers, tasklets, or workqueues (such |
471 | as keventd). Your driver can be as fancy, or as simple, as you need. | 471 | as keventd). Your driver can be as fancy, or as simple, as you need. |
472 | Such a transfer() method would normally just add the message to a | 472 | Such a transfer() method would normally just add the message to a |
473 | queue, and then start some asynchronous transfer engine (unless it's | 473 | queue, and then start some asynchronous transfer engine (unless it's |
474 | already running). | 474 | already running). |
475 | 475 | ||
476 | 476 | ||
477 | THANKS TO | 477 | THANKS TO |
478 | --------- | 478 | --------- |
479 | Contributors to Linux-SPI discussions include (in alphabetical order, | 479 | Contributors to Linux-SPI discussions include (in alphabetical order, |
480 | by last name): | 480 | by last name): |
481 | 481 | ||
482 | David Brownell | 482 | David Brownell |
483 | Russell King | 483 | Russell King |
484 | Dmitry Pervushin | 484 | Dmitry Pervushin |
485 | Stephen Street | 485 | Stephen Street |
486 | Mark Underwood | 486 | Mark Underwood |
487 | Andrew Victor | 487 | Andrew Victor |
488 | Vitaly Wool | 488 | Vitaly Wool |
489 | 489 | ||
490 | 490 |
Documentation/unshare.txt
1 | 1 | ||
2 | unshare system call: | 2 | unshare system call: |
3 | -------------------- | 3 | -------------------- |
4 | This document describes the new system call, unshare. The document | 4 | This document describes the new system call, unshare. The document |
5 | provides an overview of the feature, why it is needed, how it can | 5 | provides an overview of the feature, why it is needed, how it can |
6 | be used, its interface specification, design, implementation and | 6 | be used, its interface specification, design, implementation and |
7 | how it can be tested. | 7 | how it can be tested. |
8 | 8 | ||
9 | Change Log: | 9 | Change Log: |
10 | ----------- | 10 | ----------- |
11 | version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006 | 11 | version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006 |
12 | 12 | ||
13 | Contents: | 13 | Contents: |
14 | --------- | 14 | --------- |
15 | 1) Overview | 15 | 1) Overview |
16 | 2) Benefits | 16 | 2) Benefits |
17 | 3) Cost | 17 | 3) Cost |
18 | 4) Requirements | 18 | 4) Requirements |
19 | 5) Functional Specification | 19 | 5) Functional Specification |
20 | 6) High Level Design | 20 | 6) High Level Design |
21 | 7) Low Level Design | 21 | 7) Low Level Design |
22 | 8) Test Specification | 22 | 8) Test Specification |
23 | 9) Future Work | 23 | 9) Future Work |
24 | 24 | ||
25 | 1) Overview | 25 | 1) Overview |
26 | ----------- | 26 | ----------- |
27 | Most legacy operating system kernels support an abstraction of threads | 27 | Most legacy operating system kernels support an abstraction of threads |
28 | as multiple execution contexts within a process. These kernels provide | 28 | as multiple execution contexts within a process. These kernels provide |
29 | special resources and mechanisms to maintain these "threads". The Linux | 29 | special resources and mechanisms to maintain these "threads". The Linux |
30 | kernel, in a clever and simple manner, does not make distinction | 30 | kernel, in a clever and simple manner, does not make distinction |
31 | between processes and "threads". The kernel allows processes to share | 31 | between processes and "threads". The kernel allows processes to share |
32 | resources and thus they can achieve legacy "threads" behavior without | 32 | resources and thus they can achieve legacy "threads" behavior without |
33 | requiring additional data structures and mechanisms in the kernel. The | 33 | requiring additional data structures and mechanisms in the kernel. The |
34 | power of implementing threads in this manner comes not only from | 34 | power of implementing threads in this manner comes not only from |
35 | its simplicity but also from allowing application programmers to work | 35 | its simplicity but also from allowing application programmers to work |
36 | outside the confinement of all-or-nothing shared resources of legacy | 36 | outside the confinement of all-or-nothing shared resources of legacy |
37 | threads. On Linux, at the time of thread creation using the clone system | 37 | threads. On Linux, at the time of thread creation using the clone system |
38 | call, applications can selectively choose which resources to share | 38 | call, applications can selectively choose which resources to share |
39 | between threads. | 39 | between threads. |
40 | 40 | ||
41 | unshare system call adds a primitive to the Linux thread model that | 41 | unshare system call adds a primitive to the Linux thread model that |
42 | allows threads to selectively 'unshare' any resources that were being | 42 | allows threads to selectively 'unshare' any resources that were being |
43 | shared at the time of their creation. unshare was conceptualized by | 43 | shared at the time of their creation. unshare was conceptualized by |
44 | Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part | 44 | Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part |
45 | of the discussion on POSIX threads on Linux. unshare augments the | 45 | of the discussion on POSIX threads on Linux. unshare augments the |
46 | usefulness of Linux threads for applications that would like to control | 46 | usefulness of Linux threads for applications that would like to control |
47 | shared resources without creating a new process. unshare is a natural | 47 | shared resources without creating a new process. unshare is a natural |
48 | addition to the set of available primitives on Linux that implement | 48 | addition to the set of available primitives on Linux that implement |
49 | the concept of process/thread as a virtual machine. | 49 | the concept of process/thread as a virtual machine. |
50 | 50 | ||
51 | 2) Benefits | 51 | 2) Benefits |
52 | ----------- | 52 | ----------- |
53 | unshare would be useful to large application frameworks such as PAM | 53 | unshare would be useful to large application frameworks such as PAM |
54 | where creating a new process to control sharing/unsharing of process | 54 | where creating a new process to control sharing/unsharing of process |
55 | resources is not possible. Since namespaces are shared by default | 55 | resources is not possible. Since namespaces are shared by default |
56 | when creating a new process using fork or clone, unshare can benefit | 56 | when creating a new process using fork or clone, unshare can benefit |
57 | even non-threaded applications if they have a need to disassociate | 57 | even non-threaded applications if they have a need to disassociate |
58 | from default shared namespace. The following lists two use-cases | 58 | from default shared namespace. The following lists two use-cases |
59 | where unshare can be used. | 59 | where unshare can be used. |
60 | 60 | ||
61 | 2.1 Per-security context namespaces | 61 | 2.1 Per-security context namespaces |
62 | ----------------------------------- | 62 | ----------------------------------- |
63 | unshare can be used to implement polyinstantiated directories using | 63 | unshare can be used to implement polyinstantiated directories using |
64 | the kernel's per-process namespace mechanism. Polyinstantiated directories, | 64 | the kernel's per-process namespace mechanism. Polyinstantiated directories, |
65 | such as per-user and/or per-security context instance of /tmp, /var/tmp or | 65 | such as per-user and/or per-security context instance of /tmp, /var/tmp or |
66 | per-security context instance of a user's home directory, isolate user | 66 | per-security context instance of a user's home directory, isolate user |
67 | processes when working with these directories. Using unshare, a PAM | 67 | processes when working with these directories. Using unshare, a PAM |
68 | module can easily setup a private namespace for a user at login. | 68 | module can easily setup a private namespace for a user at login. |
69 | Polyinstantiated directories are required for Common Criteria certification | 69 | Polyinstantiated directories are required for Common Criteria certification |
70 | with Labeled System Protection Profile, however, with the availability | 70 | with Labeled System Protection Profile, however, with the availability |
71 | of shared-tree feature in the Linux kernel, even regular Linux systems | 71 | of shared-tree feature in the Linux kernel, even regular Linux systems |
72 | can benefit from setting up private namespaces at login and | 72 | can benefit from setting up private namespaces at login and |
73 | polyinstantiating /tmp, /var/tmp and other directories deemed | 73 | polyinstantiating /tmp, /var/tmp and other directories deemed |
74 | appropriate by system administrators. | 74 | appropriate by system administrators. |
75 | 75 | ||
76 | 2.2 unsharing of virtual memory and/or open files | 76 | 2.2 unsharing of virtual memory and/or open files |
77 | ------------------------------------------------- | 77 | ------------------------------------------------- |
78 | Consider a client/server application where the server is processing | 78 | Consider a client/server application where the server is processing |
79 | client requests by creating processes that share resources such as | 79 | client requests by creating processes that share resources such as |
80 | virtual memory and open files. Without unshare, the server has to | 80 | virtual memory and open files. Without unshare, the server has to |
81 | decide what needs to be shared at the time of creating the process | 81 | decide what needs to be shared at the time of creating the process |
82 | which services the request. unshare allows the server an ability to | 82 | which services the request. unshare allows the server an ability to |
83 | disassociate parts of the context during the servicing of the | 83 | disassociate parts of the context during the servicing of the |
84 | request. For large and complex middleware application frameworks, this | 84 | request. For large and complex middleware application frameworks, this |
85 | ability to unshare after the process was created can be very | 85 | ability to unshare after the process was created can be very |
86 | useful. | 86 | useful. |
87 | 87 | ||
88 | 3) Cost | 88 | 3) Cost |
89 | ------- | 89 | ------- |
90 | In order to not duplicate code and to handle the fact that unshare | 90 | In order to not duplicate code and to handle the fact that unshare |
91 | works on an active task (as opposed to clone/fork working on a newly | 91 | works on an active task (as opposed to clone/fork working on a newly |
92 | allocated inactive task) unshare had to make minor reorganizational | 92 | allocated inactive task) unshare had to make minor reorganizational |
93 | changes to copy_* functions utilized by clone/fork system call. | 93 | changes to copy_* functions utilized by clone/fork system call. |
94 | There is a cost associated with altering existing, well tested and | 94 | There is a cost associated with altering existing, well tested and |
95 | stable code to implement a new feature that may not get exercised | 95 | stable code to implement a new feature that may not get exercised |
96 | extensively in the beginning. However, with proper design and code | 96 | extensively in the beginning. However, with proper design and code |
97 | review of the changes and creation of an unshare test for the LTP | 97 | review of the changes and creation of an unshare test for the LTP |
98 | the benefits of this new feature can exceed its cost. | 98 | the benefits of this new feature can exceed its cost. |
99 | 99 | ||
100 | 4) Requirements | 100 | 4) Requirements |
101 | --------------- | 101 | --------------- |
102 | unshare reverses sharing that was done using clone(2) system call, | 102 | unshare reverses sharing that was done using clone(2) system call, |
103 | so unshare should have a similar interface as clone(2). That is, | 103 | so unshare should have a similar interface as clone(2). That is, |
104 | since flags in clone(int flags, void *stack) specifies what should | 104 | since flags in clone(int flags, void *stack) specifies what should |
105 | be shared, similar flags in unshare(int flags) should specify | 105 | be shared, similar flags in unshare(int flags) should specify |
106 | what should be unshared. Unfortunately, this may appear to invert | 106 | what should be unshared. Unfortunately, this may appear to invert |
107 | the meaning of the flags from the way they are used in clone(2). | 107 | the meaning of the flags from the way they are used in clone(2). |
108 | However, there was no easy solution that was less confusing and that | 108 | However, there was no easy solution that was less confusing and that |
109 | allowed incremental context unsharing in future without an ABI change. | 109 | allowed incremental context unsharing in future without an ABI change. |
110 | 110 | ||
111 | unshare interface should accommodate possible future addition of | 111 | unshare interface should accommodate possible future addition of |
112 | new context flags without requiring a rebuild of old applications. | 112 | new context flags without requiring a rebuild of old applications. |
113 | If and when new context flags are added, unshare design should allow | 113 | If and when new context flags are added, unshare design should allow |
114 | incremental unsharing of those resources on an as needed basis. | 114 | incremental unsharing of those resources on an as needed basis. |
115 | 115 | ||
116 | 5) Functional Specification | 116 | 5) Functional Specification |
117 | --------------------------- | 117 | --------------------------- |
118 | NAME | 118 | NAME |
119 | unshare - disassociate parts of the process execution context | 119 | unshare - disassociate parts of the process execution context |
120 | 120 | ||
121 | SYNOPSIS | 121 | SYNOPSIS |
122 | #include <sched.h> | 122 | #include <sched.h> |
123 | 123 | ||
124 | int unshare(int flags); | 124 | int unshare(int flags); |
125 | 125 | ||
126 | DESCRIPTION | 126 | DESCRIPTION |
127 | unshare allows a process to disassociate parts of its execution | 127 | unshare allows a process to disassociate parts of its execution |
128 | context that are currently being shared with other processes. Part | 128 | context that are currently being shared with other processes. Part |
129 | of execution context, such as the namespace, is shared by default | 129 | of execution context, such as the namespace, is shared by default |
130 | when a new process is created using fork(2), while other parts, | 130 | when a new process is created using fork(2), while other parts, |
131 | such as the virtual memory, open file descriptors, etc, may be | 131 | such as the virtual memory, open file descriptors, etc, may be |
132 | shared by explicit request to share them when creating a process | 132 | shared by explicit request to share them when creating a process |
133 | using clone(2). | 133 | using clone(2). |
134 | 134 | ||
135 | The main use of unshare is to allow a process to control its | 135 | The main use of unshare is to allow a process to control its |
136 | shared execution context without creating a new process. | 136 | shared execution context without creating a new process. |
137 | 137 | ||
138 | The flags argument specifies one or bitwise-or'ed of several of | 138 | The flags argument specifies one or bitwise-or'ed of several of |
139 | the following constants. | 139 | the following constants. |
140 | 140 | ||
141 | CLONE_FS | 141 | CLONE_FS |
142 | If CLONE_FS is set, file system information of the caller | 142 | If CLONE_FS is set, file system information of the caller |
143 | is disassociated from the shared file system information. | 143 | is disassociated from the shared file system information. |
144 | 144 | ||
145 | CLONE_FILES | 145 | CLONE_FILES |
146 | If CLONE_FILES is set, the file descriptor table of the | 146 | If CLONE_FILES is set, the file descriptor table of the |
147 | caller is disassociated from the shared file descriptor | 147 | caller is disassociated from the shared file descriptor |
148 | table. | 148 | table. |
149 | 149 | ||
150 | CLONE_NEWNS | 150 | CLONE_NEWNS |
151 | If CLONE_NEWNS is set, the namespace of the caller is | 151 | If CLONE_NEWNS is set, the namespace of the caller is |
152 | disassociated from the shared namespace. | 152 | disassociated from the shared namespace. |
153 | 153 | ||
154 | CLONE_VM | 154 | CLONE_VM |
155 | If CLONE_VM is set, the virtual memory of the caller is | 155 | If CLONE_VM is set, the virtual memory of the caller is |
156 | disassociated from the shared virtual memory. | 156 | disassociated from the shared virtual memory. |
157 | 157 | ||
158 | RETURN VALUE | 158 | RETURN VALUE |
159 | On success, zero returned. On failure, -1 is returned and errno is | 159 | On success, zero returned. On failure, -1 is returned and errno is |
160 | 160 | ||
161 | ERRORS | 161 | ERRORS |
162 | EPERM CLONE_NEWNS was specified by a non-root process (process | 162 | EPERM CLONE_NEWNS was specified by a non-root process (process |
163 | without CAP_SYS_ADMIN). | 163 | without CAP_SYS_ADMIN). |
164 | 164 | ||
165 | ENOMEM Cannot allocate sufficient memory to copy parts of caller's | 165 | ENOMEM Cannot allocate sufficient memory to copy parts of caller's |
166 | context that need to be unshared. | 166 | context that need to be unshared. |
167 | 167 | ||
168 | EINVAL Invalid flag was specified as an argument. | 168 | EINVAL Invalid flag was specified as an argument. |
169 | 169 | ||
170 | CONFORMING TO | 170 | CONFORMING TO |
171 | The unshare() call is Linux-specific and should not be used | 171 | The unshare() call is Linux-specific and should not be used |
172 | in programs intended to be portable. | 172 | in programs intended to be portable. |
173 | 173 | ||
174 | SEE ALSO | 174 | SEE ALSO |
175 | clone(2), fork(2) | 175 | clone(2), fork(2) |
176 | 176 | ||
177 | 6) High Level Design | 177 | 6) High Level Design |
178 | -------------------- | 178 | -------------------- |
179 | Depending on the flags argument, the unshare system call allocates | 179 | Depending on the flags argument, the unshare system call allocates |
180 | appropriate process context structures, populates it with values from | 180 | appropriate process context structures, populates it with values from |
181 | the current shared version, associates newly duplicated structures | 181 | the current shared version, associates newly duplicated structures |
182 | with the current task structure and releases corresponding shared | 182 | with the current task structure and releases corresponding shared |
183 | versions. Helper functions of clone (copy_*) could not be used | 183 | versions. Helper functions of clone (copy_*) could not be used |
184 | directly by unshare because of the following two reasons. | 184 | directly by unshare because of the following two reasons. |
185 | 1) clone operates on a newly allocated not-yet-active task | 185 | 1) clone operates on a newly allocated not-yet-active task |
186 | structure, where as unshare operates on the current active | 186 | structure, where as unshare operates on the current active |
187 | task. Therefore unshare has to take appropriate task_lock() | 187 | task. Therefore unshare has to take appropriate task_lock() |
188 | before associating newly duplicated context structures | 188 | before associating newly duplicated context structures |
189 | 2) unshare has to allocate and duplicate all context structures | 189 | 2) unshare has to allocate and duplicate all context structures |
190 | that are being unshared, before associating them with the | 190 | that are being unshared, before associating them with the |
191 | current task and releasing older shared structures. Failure | 191 | current task and releasing older shared structures. Failure |
192 | do so will create race conditions and/or oops when trying | 192 | do so will create race conditions and/or oops when trying |
193 | to backout due to an error. Consider the case of unsharing | 193 | to backout due to an error. Consider the case of unsharing |
194 | both virtual memory and namespace. After successfully unsharing | 194 | both virtual memory and namespace. After successfully unsharing |
195 | vm, if the system call encounters an error while allocating | 195 | vm, if the system call encounters an error while allocating |
196 | new namespace structure, the error return code will have to | 196 | new namespace structure, the error return code will have to |
197 | reverse the unsharing of vm. As part of the reversal the | 197 | reverse the unsharing of vm. As part of the reversal the |
198 | system call will have to go back to older, shared, vm | 198 | system call will have to go back to older, shared, vm |
199 | structure, which may not exist anymore. | 199 | structure, which may not exist anymore. |
200 | 200 | ||
201 | Therefore code from copy_* functions that allocated and duplicated | 201 | Therefore code from copy_* functions that allocated and duplicated |
202 | current context structure was moved into new dup_* functions. Now, | 202 | current context structure was moved into new dup_* functions. Now, |
203 | copy_* functions call dup_* functions to allocate and duplicate | 203 | copy_* functions call dup_* functions to allocate and duplicate |
204 | appropriate context structures and then associate them with the | 204 | appropriate context structures and then associate them with the |
205 | task structure that is being constructed. unshare system call on | 205 | task structure that is being constructed. unshare system call on |
206 | the other hand performs the following: | 206 | the other hand performs the following: |
207 | 1) Check flags to force missing, but implied, flags | 207 | 1) Check flags to force missing, but implied, flags |
208 | 2) For each context structure, call the corresponding unshare | 208 | 2) For each context structure, call the corresponding unshare |
209 | helper function to allocate and duplicate a new context | 209 | helper function to allocate and duplicate a new context |
210 | structure, if the appropriate bit is set in the flags argument. | 210 | structure, if the appropriate bit is set in the flags argument. |
211 | 3) If there is no error in allocation and duplication and there | 211 | 3) If there is no error in allocation and duplication and there |
212 | are new context structures then lock the current task structure, | 212 | are new context structures then lock the current task structure, |
213 | associate new context structures with the current task structure, | 213 | associate new context structures with the current task structure, |
214 | and release the lock on the current task structure. | 214 | and release the lock on the current task structure. |
215 | 4) Appropriately release older, shared, context structures. | 215 | 4) Appropriately release older, shared, context structures. |
216 | 216 | ||
217 | 7) Low Level Design | 217 | 7) Low Level Design |
218 | ------------------- | 218 | ------------------- |
219 | Implementation of unshare can be grouped in the following 4 different | 219 | Implementation of unshare can be grouped in the following 4 different |
220 | items: | 220 | items: |
221 | a) Reorganization of existing copy_* functions | 221 | a) Reorganization of existing copy_* functions |
222 | b) unshare system call service function | 222 | b) unshare system call service function |
223 | c) unshare helper functions for each different process context | 223 | c) unshare helper functions for each different process context |
224 | d) Registration of system call number for different architectures | 224 | d) Registration of system call number for different architectures |
225 | 225 | ||
226 | 7.1) Reorganization of copy_* functions | 226 | 7.1) Reorganization of copy_* functions |
227 | Each copy function such as copy_mm, copy_namespace, copy_files, | 227 | Each copy function such as copy_mm, copy_namespace, copy_files, |
228 | etc, had roughly two components. The first component allocated | 228 | etc, had roughly two components. The first component allocated |
229 | and duplicated the appropriate structure and the second component | 229 | and duplicated the appropriate structure and the second component |
230 | linked it to the task structure passed in as an argument to the copy | 230 | linked it to the task structure passed in as an argument to the copy |
231 | function. The first component was split into its own function. | 231 | function. The first component was split into its own function. |
232 | These dup_* functions allocated and duplicated the appropriate | 232 | These dup_* functions allocated and duplicated the appropriate |
233 | context structure. The reorganized copy_* functions invoked | 233 | context structure. The reorganized copy_* functions invoked |
234 | their corresponding dup_* functions and then linked the newly | 234 | their corresponding dup_* functions and then linked the newly |
235 | duplicated structures to the task structure with which the | 235 | duplicated structures to the task structure with which the |
236 | copy function was called. | 236 | copy function was called. |
237 | 237 | ||
238 | 7.2) unshare system call service function | 238 | 7.2) unshare system call service function |
239 | * Check flags | 239 | * Check flags |
240 | Force implied flags. If CLONE_THREAD is set force CLONE_VM. | 240 | Force implied flags. If CLONE_THREAD is set force CLONE_VM. |
241 | If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is | 241 | If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is |
242 | set and signals are also being shared, force CLONE_THREAD. If | 242 | set and signals are also being shared, force CLONE_THREAD. If |
243 | CLONE_NEWNS is set, force CLONE_FS. | 243 | CLONE_NEWNS is set, force CLONE_FS. |
244 | * For each context flag, invoke the corresponding unshare_* | 244 | * For each context flag, invoke the corresponding unshare_* |
245 | helper routine with flags passed into the system call and a | 245 | helper routine with flags passed into the system call and a |
246 | reference to pointer pointing the new unshared structure | 246 | reference to pointer pointing the new unshared structure |
247 | * If any new structures are created by unshare_* helper | 247 | * If any new structures are created by unshare_* helper |
248 | functions, take the task_lock() on the current task, | 248 | functions, take the task_lock() on the current task, |
249 | modify appropriate context pointers, and release the | 249 | modify appropriate context pointers, and release the |
250 | task lock. | 250 | task lock. |
251 | * For all newly unshared structures, release the corresponding | 251 | * For all newly unshared structures, release the corresponding |
252 | older, shared, structures. | 252 | older, shared, structures. |
253 | 253 | ||
254 | 7.3) unshare_* helper functions | 254 | 7.3) unshare_* helper functions |
255 | For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND, | 255 | For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND, |
256 | and CLONE_THREAD, return -EINVAL since they are not implemented yet. | 256 | and CLONE_THREAD, return -EINVAL since they are not implemented yet. |
257 | For others, check the flag value to see if the unsharing is | 257 | For others, check the flag value to see if the unsharing is |
258 | required for that structure. If it is, invoke the corresponding | 258 | required for that structure. If it is, invoke the corresponding |
259 | dup_* function to allocate and duplicate the structure and return | 259 | dup_* function to allocate and duplicate the structure and return |
260 | a pointer to it. | 260 | a pointer to it. |
261 | 261 | ||
262 | 7.4) Appropriately modify architecture specific code to register the | 262 | 7.4) Appropriately modify architecture specific code to register the |
263 | the new system call. | 263 | new system call. |
264 | 264 | ||
265 | 8) Test Specification | 265 | 8) Test Specification |
266 | --------------------- | 266 | --------------------- |
267 | The test for unshare should test the following: | 267 | The test for unshare should test the following: |
268 | 1) Valid flags: Test to check that clone flags for signal and | 268 | 1) Valid flags: Test to check that clone flags for signal and |
269 | signal handlers, for which unsharing is not implemented | 269 | signal handlers, for which unsharing is not implemented |
270 | yet, return -EINVAL. | 270 | yet, return -EINVAL. |
271 | 2) Missing/implied flags: Test to make sure that if unsharing | 271 | 2) Missing/implied flags: Test to make sure that if unsharing |
272 | namespace without specifying unsharing of filesystem, correctly | 272 | namespace without specifying unsharing of filesystem, correctly |
273 | unshares both namespace and filesystem information. | 273 | unshares both namespace and filesystem information. |
274 | 3) For each of the four (namespace, filesystem, files and vm) | 274 | 3) For each of the four (namespace, filesystem, files and vm) |
275 | supported unsharing, verify that the system call correctly | 275 | supported unsharing, verify that the system call correctly |
276 | unshares the appropriate structure. Verify that unsharing | 276 | unshares the appropriate structure. Verify that unsharing |
277 | them individually as well as in combination with each | 277 | them individually as well as in combination with each |
278 | other works as expected. | 278 | other works as expected. |
279 | 4) Concurrent execution: Use shared memory segments and futex on | 279 | 4) Concurrent execution: Use shared memory segments and futex on |
280 | an address in the shm segment to synchronize execution of | 280 | an address in the shm segment to synchronize execution of |
281 | about 10 threads. Have a couple of threads execute execve, | 281 | about 10 threads. Have a couple of threads execute execve, |
282 | a couple _exit and the rest unshare with different combination | 282 | a couple _exit and the rest unshare with different combination |
283 | of flags. Verify that unsharing is performed as expected and | 283 | of flags. Verify that unsharing is performed as expected and |
284 | that there are no oops or hangs. | 284 | that there are no oops or hangs. |
285 | 285 | ||
286 | 9) Future Work | 286 | 9) Future Work |
287 | -------------- | 287 | -------------- |
288 | The current implementation of unshare does not allow unsharing of | 288 | The current implementation of unshare does not allow unsharing of |
289 | signals and signal handlers. Signals are complex to begin with and | 289 | signals and signal handlers. Signals are complex to begin with and |
290 | to unshare signals and/or signal handlers of a currently running | 290 | to unshare signals and/or signal handlers of a currently running |
291 | process is even more complex. If in the future there is a specific | 291 | process is even more complex. If in the future there is a specific |
292 | need to allow unsharing of signals and/or signal handlers, it can | 292 | need to allow unsharing of signals and/or signal handlers, it can |
293 | be incrementally added to unshare without affecting legacy | 293 | be incrementally added to unshare without affecting legacy |
294 | applications using unshare. | 294 | applications using unshare. |
295 | 295 | ||
296 | 296 |
Documentation/usb/error-codes.txt
1 | Revised: 2004-Oct-21 | 1 | Revised: 2004-Oct-21 |
2 | 2 | ||
3 | This is the documentation of (hopefully) all possible error codes (and | 3 | This is the documentation of (hopefully) all possible error codes (and |
4 | their interpretation) that can be returned from usbcore. | 4 | their interpretation) that can be returned from usbcore. |
5 | 5 | ||
6 | Some of them are returned by the Host Controller Drivers (HCDs), which | 6 | Some of them are returned by the Host Controller Drivers (HCDs), which |
7 | device drivers only see through usbcore. As a rule, all the HCDs should | 7 | device drivers only see through usbcore. As a rule, all the HCDs should |
8 | behave the same except for transfer speed dependent behaviors and the | 8 | behave the same except for transfer speed dependent behaviors and the |
9 | way certain faults are reported. | 9 | way certain faults are reported. |
10 | 10 | ||
11 | 11 | ||
12 | ************************************************************************** | 12 | ************************************************************************** |
13 | * Error codes returned by usb_submit_urb * | 13 | * Error codes returned by usb_submit_urb * |
14 | ************************************************************************** | 14 | ************************************************************************** |
15 | 15 | ||
16 | Non-USB-specific: | 16 | Non-USB-specific: |
17 | 17 | ||
18 | 0 URB submission went fine | 18 | 0 URB submission went fine |
19 | 19 | ||
20 | -ENOMEM no memory for allocation of internal structures | 20 | -ENOMEM no memory for allocation of internal structures |
21 | 21 | ||
22 | USB-specific: | 22 | USB-specific: |
23 | 23 | ||
24 | -ENODEV specified USB-device or bus doesn't exist | 24 | -ENODEV specified USB-device or bus doesn't exist |
25 | 25 | ||
26 | -ENOENT specified interface or endpoint does not exist or | 26 | -ENOENT specified interface or endpoint does not exist or |
27 | is not enabled | 27 | is not enabled |
28 | 28 | ||
29 | -ENXIO host controller driver does not support queuing of this type | 29 | -ENXIO host controller driver does not support queuing of this type |
30 | of urb. (treat as a host controller bug.) | 30 | of urb. (treat as a host controller bug.) |
31 | 31 | ||
32 | -EINVAL a) Invalid transfer type specified (or not supported) | 32 | -EINVAL a) Invalid transfer type specified (or not supported) |
33 | b) Invalid or unsupported periodic transfer interval | 33 | b) Invalid or unsupported periodic transfer interval |
34 | c) ISO: attempted to change transfer interval | 34 | c) ISO: attempted to change transfer interval |
35 | d) ISO: number_of_packets is < 0 | 35 | d) ISO: number_of_packets is < 0 |
36 | e) various other cases | 36 | e) various other cases |
37 | 37 | ||
38 | -EAGAIN a) specified ISO start frame too early | 38 | -EAGAIN a) specified ISO start frame too early |
39 | b) (using ISO-ASAP) too much scheduled for the future | 39 | b) (using ISO-ASAP) too much scheduled for the future |
40 | wait some time and try again. | 40 | wait some time and try again. |
41 | 41 | ||
42 | -EFBIG Host controller driver can't schedule that many ISO frames. | 42 | -EFBIG Host controller driver can't schedule that many ISO frames. |
43 | 43 | ||
44 | -EPIPE Specified endpoint is stalled. For non-control endpoints, | 44 | -EPIPE Specified endpoint is stalled. For non-control endpoints, |
45 | reset this status with usb_clear_halt(). | 45 | reset this status with usb_clear_halt(). |
46 | 46 | ||
47 | -EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable | 47 | -EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable |
48 | in the current interface altsetting. | 48 | in the current interface altsetting. |
49 | (b) ISO packet is larger than the endpoint maxpacket. | 49 | (b) ISO packet is larger than the endpoint maxpacket. |
50 | (c) requested data transfer length is invalid: negative | 50 | (c) requested data transfer length is invalid: negative |
51 | or too large for the host controller. | 51 | or too large for the host controller. |
52 | 52 | ||
53 | -ENOSPC This request would overcommit the usb bandwidth reserved | 53 | -ENOSPC This request would overcommit the usb bandwidth reserved |
54 | for periodic transfers (interrupt, isochronous). | 54 | for periodic transfers (interrupt, isochronous). |
55 | 55 | ||
56 | -ESHUTDOWN The device or host controller has been disabled due to some | 56 | -ESHUTDOWN The device or host controller has been disabled due to some |
57 | problem that could not be worked around. | 57 | problem that could not be worked around. |
58 | 58 | ||
59 | -EPERM Submission failed because urb->reject was set. | 59 | -EPERM Submission failed because urb->reject was set. |
60 | 60 | ||
61 | -EHOSTUNREACH URB was rejected because the device is suspended. | 61 | -EHOSTUNREACH URB was rejected because the device is suspended. |
62 | 62 | ||
63 | 63 | ||
64 | ************************************************************************** | 64 | ************************************************************************** |
65 | * Error codes returned by in urb->status * | 65 | * Error codes returned by in urb->status * |
66 | * or in iso_frame_desc[n].status (for ISO) * | 66 | * or in iso_frame_desc[n].status (for ISO) * |
67 | ************************************************************************** | 67 | ************************************************************************** |
68 | 68 | ||
69 | USB device drivers may only test urb status values in completion handlers. | 69 | USB device drivers may only test urb status values in completion handlers. |
70 | This is because otherwise there would be a race between HCDs updating | 70 | This is because otherwise there would be a race between HCDs updating |
71 | these values on one CPU, and device drivers testing them on another CPU. | 71 | these values on one CPU, and device drivers testing them on another CPU. |
72 | 72 | ||
73 | A transfer's actual_length may be positive even when an error has been | 73 | A transfer's actual_length may be positive even when an error has been |
74 | reported. That's because transfers often involve several packets, so that | 74 | reported. That's because transfers often involve several packets, so that |
75 | one or more packets could finish before an error stops further endpoint I/O. | 75 | one or more packets could finish before an error stops further endpoint I/O. |
76 | 76 | ||
77 | 77 | ||
78 | 0 Transfer completed successfully | 78 | 0 Transfer completed successfully |
79 | 79 | ||
80 | -ENOENT URB was synchronously unlinked by usb_unlink_urb | 80 | -ENOENT URB was synchronously unlinked by usb_unlink_urb |
81 | 81 | ||
82 | -EINPROGRESS URB still pending, no results yet | 82 | -EINPROGRESS URB still pending, no results yet |
83 | (That is, if drivers see this it's a bug.) | 83 | (That is, if drivers see this it's a bug.) |
84 | 84 | ||
85 | -EPROTO (*, **) a) bitstuff error | 85 | -EPROTO (*, **) a) bitstuff error |
86 | b) no response packet received within the | 86 | b) no response packet received within the |
87 | prescribed bus turn-around time | 87 | prescribed bus turn-around time |
88 | c) unknown USB error | 88 | c) unknown USB error |
89 | 89 | ||
90 | -EILSEQ (*, **) a) CRC mismatch | 90 | -EILSEQ (*, **) a) CRC mismatch |
91 | b) no response packet received within the | 91 | b) no response packet received within the |
92 | prescribed bus turn-around time | 92 | prescribed bus turn-around time |
93 | c) unknown USB error | 93 | c) unknown USB error |
94 | 94 | ||
95 | Note that often the controller hardware does not | 95 | Note that often the controller hardware does not |
96 | distinguish among cases a), b), and c), so a | 96 | distinguish among cases a), b), and c), so a |
97 | driver cannot tell whether there was a protocol | 97 | driver cannot tell whether there was a protocol |
98 | error, a failure to respond (often caused by | 98 | error, a failure to respond (often caused by |
99 | device disconnect), or some other fault. | 99 | device disconnect), or some other fault. |
100 | 100 | ||
101 | -ETIME (**) No response packet received within the prescribed | 101 | -ETIME (**) No response packet received within the prescribed |
102 | bus turn-around time. This error may instead be | 102 | bus turn-around time. This error may instead be |
103 | reported as -EPROTO or -EILSEQ. | 103 | reported as -EPROTO or -EILSEQ. |
104 | 104 | ||
105 | -ETIMEDOUT Synchronous USB message functions use this code | 105 | -ETIMEDOUT Synchronous USB message functions use this code |
106 | to indicate timeout expired before the transfer | 106 | to indicate timeout expired before the transfer |
107 | completed, and no other error was reported by HC. | 107 | completed, and no other error was reported by HC. |
108 | 108 | ||
109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, | 109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, |
110 | reset this status with usb_clear_halt(). | 110 | reset this status with usb_clear_halt(). |
111 | 111 | ||
112 | -ECOMM During an IN transfer, the host controller | 112 | -ECOMM During an IN transfer, the host controller |
113 | received data from an endpoint faster than it | 113 | received data from an endpoint faster than it |
114 | could be written to system memory | 114 | could be written to system memory |
115 | 115 | ||
116 | -ENOSR During an OUT transfer, the host controller | 116 | -ENOSR During an OUT transfer, the host controller |
117 | could not retrieve data from system memory fast | 117 | could not retrieve data from system memory fast |
118 | enough to keep up with the USB data rate | 118 | enough to keep up with the USB data rate |
119 | 119 | ||
120 | -EOVERFLOW (*) The amount of data returned by the endpoint was | 120 | -EOVERFLOW (*) The amount of data returned by the endpoint was |
121 | greater than either the max packet size of the | 121 | greater than either the max packet size of the |
122 | endpoint or the remaining buffer size. "Babble". | 122 | endpoint or the remaining buffer size. "Babble". |
123 | 123 | ||
124 | -EREMOTEIO The data read from the endpoint did not fill the | 124 | -EREMOTEIO The data read from the endpoint did not fill the |
125 | specified buffer, and URB_SHORT_NOT_OK was set in | 125 | specified buffer, and URB_SHORT_NOT_OK was set in |
126 | urb->transfer_flags. | 126 | urb->transfer_flags. |
127 | 127 | ||
128 | -ENODEV Device was removed. Often preceded by a burst of | 128 | -ENODEV Device was removed. Often preceded by a burst of |
129 | other errors, since the hub driver doesn't detect | 129 | other errors, since the hub driver doesn't detect |
130 | device removal events immediately. | 130 | device removal events immediately. |
131 | 131 | ||
132 | -EXDEV ISO transfer only partially completed | 132 | -EXDEV ISO transfer only partially completed |
133 | look at individual frame status for details | 133 | look at individual frame status for details |
134 | 134 | ||
135 | -EINVAL ISO madness, if this happens: Log off and go home | 135 | -EINVAL ISO madness, if this happens: Log off and go home |
136 | 136 | ||
137 | -ECONNRESET URB was asynchronously unlinked by usb_unlink_urb | 137 | -ECONNRESET URB was asynchronously unlinked by usb_unlink_urb |
138 | 138 | ||
139 | -ESHUTDOWN The device or host controller has been disabled due | 139 | -ESHUTDOWN The device or host controller has been disabled due |
140 | to some problem that could not be worked around, | 140 | to some problem that could not be worked around, |
141 | such as a physical disconnect. | 141 | such as a physical disconnect. |
142 | 142 | ||
143 | 143 | ||
144 | (*) Error codes like -EPROTO, -EILSEQ and -EOVERFLOW normally indicate | 144 | (*) Error codes like -EPROTO, -EILSEQ and -EOVERFLOW normally indicate |
145 | hardware problems such as bad devices (including firmware) or cables. | 145 | hardware problems such as bad devices (including firmware) or cables. |
146 | 146 | ||
147 | (**) This is also one of several codes that different kinds of host | 147 | (**) This is also one of several codes that different kinds of host |
148 | controller use to to indicate a transfer has failed because of device | 148 | controller use to indicate a transfer has failed because of device |
149 | disconnect. In the interval before the hub driver starts disconnect | 149 | disconnect. In the interval before the hub driver starts disconnect |
150 | processing, devices may receive such fault reports for every request. | 150 | processing, devices may receive such fault reports for every request. |
151 | 151 | ||
152 | 152 | ||
153 | 153 | ||
154 | ************************************************************************** | 154 | ************************************************************************** |
155 | * Error codes returned by usbcore-functions * | 155 | * Error codes returned by usbcore-functions * |
156 | * (expect also other submit and transfer status codes) * | 156 | * (expect also other submit and transfer status codes) * |
157 | ************************************************************************** | 157 | ************************************************************************** |
158 | 158 | ||
159 | usb_register(): | 159 | usb_register(): |
160 | -EINVAL error during registering new driver | 160 | -EINVAL error during registering new driver |
161 | 161 | ||
162 | usb_get_*/usb_set_*(): | 162 | usb_get_*/usb_set_*(): |
163 | usb_control_msg(): | 163 | usb_control_msg(): |
164 | usb_bulk_msg(): | 164 | usb_bulk_msg(): |
165 | -ETIMEDOUT Timeout expired before the transfer completed. | 165 | -ETIMEDOUT Timeout expired before the transfer completed. |
166 | 166 |
Documentation/usb/hiddev.txt
1 | Care and feeding of your Human Interface Devices | 1 | Care and feeding of your Human Interface Devices |
2 | 2 | ||
3 | INTRODUCTION | 3 | INTRODUCTION |
4 | 4 | ||
5 | In addition to the normal input type HID devices, USB also uses the | 5 | In addition to the normal input type HID devices, USB also uses the |
6 | human interface device protocols for things that are not really human | 6 | human interface device protocols for things that are not really human |
7 | interfaces, but have similar sorts of communication needs. The two big | 7 | interfaces, but have similar sorts of communication needs. The two big |
8 | examples for this are power devices (especially uninterruptable power | 8 | examples for this are power devices (especially uninterruptable power |
9 | supplies) and monitor control on higher end monitors. | 9 | supplies) and monitor control on higher end monitors. |
10 | 10 | ||
11 | To support these disparite requirements, the Linux USB system provides | 11 | To support these disparite requirements, the Linux USB system provides |
12 | HID events to two separate interfaces: | 12 | HID events to two separate interfaces: |
13 | * the input subsystem, which converts HID events into normal input | 13 | * the input subsystem, which converts HID events into normal input |
14 | device interfaces (such as keyboard, mouse and joystick) and a | 14 | device interfaces (such as keyboard, mouse and joystick) and a |
15 | normalised event interface - see Documentation/input/input.txt | 15 | normalised event interface - see Documentation/input/input.txt |
16 | * the hiddev interface, which provides fairly raw HID events | 16 | * the hiddev interface, which provides fairly raw HID events |
17 | 17 | ||
18 | The data flow for a HID event produced by a device is something like | 18 | The data flow for a HID event produced by a device is something like |
19 | the following : | 19 | the following : |
20 | 20 | ||
21 | usb.c ---> hid-core.c ----> hid-input.c ----> [keyboard/mouse/joystick/event] | 21 | usb.c ---> hid-core.c ----> hid-input.c ----> [keyboard/mouse/joystick/event] |
22 | | | 22 | | |
23 | | | 23 | | |
24 | --> hiddev.c ----> POWER / MONITOR CONTROL | 24 | --> hiddev.c ----> POWER / MONITOR CONTROL |
25 | 25 | ||
26 | In addition, other subsystems (apart from USB) can potentially feed | 26 | In addition, other subsystems (apart from USB) can potentially feed |
27 | events into the input subsystem, but these have no effect on the hid | 27 | events into the input subsystem, but these have no effect on the hid |
28 | device interface. | 28 | device interface. |
29 | 29 | ||
30 | USING THE HID DEVICE INTERFACE | 30 | USING THE HID DEVICE INTERFACE |
31 | 31 | ||
32 | The hiddev interface is a char interface using the normal USB major, | 32 | The hiddev interface is a char interface using the normal USB major, |
33 | with the minor numbers starting at 96 and finishing at 111. Therefore, | 33 | with the minor numbers starting at 96 and finishing at 111. Therefore, |
34 | you need the following commands: | 34 | you need the following commands: |
35 | mknod /dev/usb/hiddev0 c 180 96 | 35 | mknod /dev/usb/hiddev0 c 180 96 |
36 | mknod /dev/usb/hiddev1 c 180 97 | 36 | mknod /dev/usb/hiddev1 c 180 97 |
37 | mknod /dev/usb/hiddev2 c 180 98 | 37 | mknod /dev/usb/hiddev2 c 180 98 |
38 | mknod /dev/usb/hiddev3 c 180 99 | 38 | mknod /dev/usb/hiddev3 c 180 99 |
39 | mknod /dev/usb/hiddev4 c 180 100 | 39 | mknod /dev/usb/hiddev4 c 180 100 |
40 | mknod /dev/usb/hiddev5 c 180 101 | 40 | mknod /dev/usb/hiddev5 c 180 101 |
41 | mknod /dev/usb/hiddev6 c 180 102 | 41 | mknod /dev/usb/hiddev6 c 180 102 |
42 | mknod /dev/usb/hiddev7 c 180 103 | 42 | mknod /dev/usb/hiddev7 c 180 103 |
43 | mknod /dev/usb/hiddev8 c 180 104 | 43 | mknod /dev/usb/hiddev8 c 180 104 |
44 | mknod /dev/usb/hiddev9 c 180 105 | 44 | mknod /dev/usb/hiddev9 c 180 105 |
45 | mknod /dev/usb/hiddev10 c 180 106 | 45 | mknod /dev/usb/hiddev10 c 180 106 |
46 | mknod /dev/usb/hiddev11 c 180 107 | 46 | mknod /dev/usb/hiddev11 c 180 107 |
47 | mknod /dev/usb/hiddev12 c 180 108 | 47 | mknod /dev/usb/hiddev12 c 180 108 |
48 | mknod /dev/usb/hiddev13 c 180 109 | 48 | mknod /dev/usb/hiddev13 c 180 109 |
49 | mknod /dev/usb/hiddev14 c 180 110 | 49 | mknod /dev/usb/hiddev14 c 180 110 |
50 | mknod /dev/usb/hiddev15 c 180 111 | 50 | mknod /dev/usb/hiddev15 c 180 111 |
51 | 51 | ||
52 | So you point your hiddev compliant user-space program at the correct | 52 | So you point your hiddev compliant user-space program at the correct |
53 | interface for your device, and it all just works. | 53 | interface for your device, and it all just works. |
54 | 54 | ||
55 | Assuming that you have a hiddev compliant user-space program, of | 55 | Assuming that you have a hiddev compliant user-space program, of |
56 | course. If you need to write one, read on. | 56 | course. If you need to write one, read on. |
57 | 57 | ||
58 | 58 | ||
59 | THE HIDDEV API | 59 | THE HIDDEV API |
60 | This description should be read in conjunction with the HID | 60 | This description should be read in conjunction with the HID |
61 | specification, freely available from http://www.usb.org, and | 61 | specification, freely available from http://www.usb.org, and |
62 | conveniently linked of http://www.linux-usb.org. | 62 | conveniently linked of http://www.linux-usb.org. |
63 | 63 | ||
64 | The hiddev API uses a read() interface, and a set of ioctl() calls. | 64 | The hiddev API uses a read() interface, and a set of ioctl() calls. |
65 | 65 | ||
66 | HID devices exchange data with the host computer using data | 66 | HID devices exchange data with the host computer using data |
67 | bundles called "reports". Each report is divided into "fields", | 67 | bundles called "reports". Each report is divided into "fields", |
68 | each of which can have one or more "usages". In the hid-core, | 68 | each of which can have one or more "usages". In the hid-core, |
69 | each one of these usages has a single signed 32 bit value. | 69 | each one of these usages has a single signed 32 bit value. |
70 | 70 | ||
71 | read(): | 71 | read(): |
72 | This is the event interface. When the HID device's state changes, | 72 | This is the event interface. When the HID device's state changes, |
73 | it performs an interrupt transfer containing a report which contains | 73 | it performs an interrupt transfer containing a report which contains |
74 | the changed value. The hid-core.c module parses the report, and | 74 | the changed value. The hid-core.c module parses the report, and |
75 | returns to hiddev.c the individual usages that have changed within | 75 | returns to hiddev.c the individual usages that have changed within |
76 | the report. In its basic mode, the hiddev will make these individual | 76 | the report. In its basic mode, the hiddev will make these individual |
77 | usage changes available to the reader using a struct hiddev_event: | 77 | usage changes available to the reader using a struct hiddev_event: |
78 | 78 | ||
79 | struct hiddev_event { | 79 | struct hiddev_event { |
80 | unsigned hid; | 80 | unsigned hid; |
81 | signed int value; | 81 | signed int value; |
82 | }; | 82 | }; |
83 | 83 | ||
84 | containing the HID usage identifier for the status that changed, and | 84 | containing the HID usage identifier for the status that changed, and |
85 | the value that it was changed to. Note that the structure is defined | 85 | the value that it was changed to. Note that the structure is defined |
86 | within <linux/hiddev.h>, along with some other useful #defines and | 86 | within <linux/hiddev.h>, along with some other useful #defines and |
87 | structures. The HID usage identifier is a composite of the HID usage | 87 | structures. The HID usage identifier is a composite of the HID usage |
88 | page shifted to the 16 high order bits ORed with the usage code. The | 88 | page shifted to the 16 high order bits ORed with the usage code. The |
89 | behavior of the read() function can be modified using the HIDIOCSFLAG | 89 | behavior of the read() function can be modified using the HIDIOCSFLAG |
90 | ioctl() described below. | 90 | ioctl() described below. |
91 | 91 | ||
92 | 92 | ||
93 | ioctl(): | 93 | ioctl(): |
94 | This is the control interface. There are a number of controls: | 94 | This is the control interface. There are a number of controls: |
95 | 95 | ||
96 | HIDIOCGVERSION - int (read) | 96 | HIDIOCGVERSION - int (read) |
97 | Gets the version code out of the hiddev driver. | 97 | Gets the version code out of the hiddev driver. |
98 | 98 | ||
99 | HIDIOCAPPLICATION - (none) | 99 | HIDIOCAPPLICATION - (none) |
100 | This ioctl call returns the HID application usage associated with the | 100 | This ioctl call returns the HID application usage associated with the |
101 | hid device. The third argument to ioctl() specifies which application | 101 | hid device. The third argument to ioctl() specifies which application |
102 | index to get. This is useful when the device has more than one | 102 | index to get. This is useful when the device has more than one |
103 | application collection. If the index is invalid (greater or equal to | 103 | application collection. If the index is invalid (greater or equal to |
104 | the number of application collections this device has) the ioctl | 104 | the number of application collections this device has) the ioctl |
105 | returns -1. You can find out beforehand how many application | 105 | returns -1. You can find out beforehand how many application |
106 | collections the device has from the num_applications field from the | 106 | collections the device has from the num_applications field from the |
107 | hiddev_devinfo structure. | 107 | hiddev_devinfo structure. |
108 | 108 | ||
109 | HIDIOCGCOLLECTIONINFO - struct hiddev_collection_info (read/write) | 109 | HIDIOCGCOLLECTIONINFO - struct hiddev_collection_info (read/write) |
110 | This returns a superset of the information above, providing not only | 110 | This returns a superset of the information above, providing not only |
111 | application collections, but all the collections the device has. It | 111 | application collections, but all the collections the device has. It |
112 | also returns the level the collection lives in the hierarchy. | 112 | also returns the level the collection lives in the hierarchy. |
113 | The user passes in a hiddev_collection_info struct with the index | 113 | The user passes in a hiddev_collection_info struct with the index |
114 | field set to the index that should be returned. The ioctl fills in | 114 | field set to the index that should be returned. The ioctl fills in |
115 | the other fields. If the index is larger than the last collection | 115 | the other fields. If the index is larger than the last collection |
116 | index, the ioctl returns -1 and sets errno to -EINVAL. | 116 | index, the ioctl returns -1 and sets errno to -EINVAL. |
117 | 117 | ||
118 | HIDIOCGDEVINFO - struct hiddev_devinfo (read) | 118 | HIDIOCGDEVINFO - struct hiddev_devinfo (read) |
119 | Gets a hiddev_devinfo structure which describes the device. | 119 | Gets a hiddev_devinfo structure which describes the device. |
120 | 120 | ||
121 | HIDIOCGSTRING - struct struct hiddev_string_descriptor (read/write) | 121 | HIDIOCGSTRING - struct hiddev_string_descriptor (read/write) |
122 | Gets a string descriptor from the device. The caller must fill in the | 122 | Gets a string descriptor from the device. The caller must fill in the |
123 | "index" field to indicate which descriptor should be returned. | 123 | "index" field to indicate which descriptor should be returned. |
124 | 124 | ||
125 | HIDIOCINITREPORT - (none) | 125 | HIDIOCINITREPORT - (none) |
126 | Instructs the kernel to retrieve all input and feature report values | 126 | Instructs the kernel to retrieve all input and feature report values |
127 | from the device. At this point, all the usage structures will contain | 127 | from the device. At this point, all the usage structures will contain |
128 | current values for the device, and will maintain it as the device | 128 | current values for the device, and will maintain it as the device |
129 | changes. Note that the use of this ioctl is unnecessary in general, | 129 | changes. Note that the use of this ioctl is unnecessary in general, |
130 | since later kernels automatically initialize the reports from the | 130 | since later kernels automatically initialize the reports from the |
131 | device at attach time. | 131 | device at attach time. |
132 | 132 | ||
133 | HIDIOCGNAME - string (variable length) | 133 | HIDIOCGNAME - string (variable length) |
134 | Gets the device name | 134 | Gets the device name |
135 | 135 | ||
136 | HIDIOCGREPORT - struct hiddev_report_info (write) | 136 | HIDIOCGREPORT - struct hiddev_report_info (write) |
137 | Instructs the kernel to get a feature or input report from the device, | 137 | Instructs the kernel to get a feature or input report from the device, |
138 | in order to selectively update the usage structures (in contrast to | 138 | in order to selectively update the usage structures (in contrast to |
139 | INITREPORT). | 139 | INITREPORT). |
140 | 140 | ||
141 | HIDIOCSREPORT - struct hiddev_report_info (write) | 141 | HIDIOCSREPORT - struct hiddev_report_info (write) |
142 | Instructs the kernel to send a report to the device. This report can | 142 | Instructs the kernel to send a report to the device. This report can |
143 | be filled in by the user through HIDIOCSUSAGE calls (below) to fill in | 143 | be filled in by the user through HIDIOCSUSAGE calls (below) to fill in |
144 | individual usage values in the report before sending the report in full | 144 | individual usage values in the report before sending the report in full |
145 | to the device. | 145 | to the device. |
146 | 146 | ||
147 | HIDIOCGREPORTINFO - struct hiddev_report_info (read/write) | 147 | HIDIOCGREPORTINFO - struct hiddev_report_info (read/write) |
148 | Fills in a hiddev_report_info structure for the user. The report is | 148 | Fills in a hiddev_report_info structure for the user. The report is |
149 | looked up by type (input, output or feature) and id, so these fields | 149 | looked up by type (input, output or feature) and id, so these fields |
150 | must be filled in by the user. The ID can be absolute -- the actual | 150 | must be filled in by the user. The ID can be absolute -- the actual |
151 | report id as reported by the device -- or relative -- | 151 | report id as reported by the device -- or relative -- |
152 | HID_REPORT_ID_FIRST for the first report, and (HID_REPORT_ID_NEXT | | 152 | HID_REPORT_ID_FIRST for the first report, and (HID_REPORT_ID_NEXT | |
153 | report_id) for the next report after report_id. Without a-priori | 153 | report_id) for the next report after report_id. Without a-priori |
154 | information about report ids, the right way to use this ioctl is to | 154 | information about report ids, the right way to use this ioctl is to |
155 | use the relative IDs above to enumerate the valid IDs. The ioctl | 155 | use the relative IDs above to enumerate the valid IDs. The ioctl |
156 | returns non-zero when there is no more next ID. The real report ID is | 156 | returns non-zero when there is no more next ID. The real report ID is |
157 | filled into the returned hiddev_report_info structure. | 157 | filled into the returned hiddev_report_info structure. |
158 | 158 | ||
159 | HIDIOCGFIELDINFO - struct hiddev_field_info (read/write) | 159 | HIDIOCGFIELDINFO - struct hiddev_field_info (read/write) |
160 | Returns the field information associated with a report in a | 160 | Returns the field information associated with a report in a |
161 | hiddev_field_info structure. The user must fill in report_id and | 161 | hiddev_field_info structure. The user must fill in report_id and |
162 | report_type in this structure, as above. The field_index should also | 162 | report_type in this structure, as above. The field_index should also |
163 | be filled in, which should be a number from 0 and maxfield-1, as | 163 | be filled in, which should be a number from 0 and maxfield-1, as |
164 | returned from a previous HIDIOCGREPORTINFO call. | 164 | returned from a previous HIDIOCGREPORTINFO call. |
165 | 165 | ||
166 | HIDIOCGUCODE - struct hiddev_usage_ref (read/write) | 166 | HIDIOCGUCODE - struct hiddev_usage_ref (read/write) |
167 | Returns the usage_code in a hiddev_usage_ref structure, given that | 167 | Returns the usage_code in a hiddev_usage_ref structure, given that |
168 | given its report type, report id, field index, and index within the | 168 | given its report type, report id, field index, and index within the |
169 | field have already been filled into the structure. | 169 | field have already been filled into the structure. |
170 | 170 | ||
171 | HIDIOCGUSAGE - struct hiddev_usage_ref (read/write) | 171 | HIDIOCGUSAGE - struct hiddev_usage_ref (read/write) |
172 | Returns the value of a usage in a hiddev_usage_ref structure. The | 172 | Returns the value of a usage in a hiddev_usage_ref structure. The |
173 | usage to be retrieved can be specified as above, or the user can | 173 | usage to be retrieved can be specified as above, or the user can |
174 | choose to fill in the report_type field and specify the report_id as | 174 | choose to fill in the report_type field and specify the report_id as |
175 | HID_REPORT_ID_UNKNOWN. In this case, the hiddev_usage_ref will be | 175 | HID_REPORT_ID_UNKNOWN. In this case, the hiddev_usage_ref will be |
176 | filled in with the report and field information associated with this | 176 | filled in with the report and field information associated with this |
177 | usage if it is found. | 177 | usage if it is found. |
178 | 178 | ||
179 | HIDIOCSUSAGE - struct hiddev_usage_ref (write) | 179 | HIDIOCSUSAGE - struct hiddev_usage_ref (write) |
180 | Sets the value of a usage in an output report. The user fills in | 180 | Sets the value of a usage in an output report. The user fills in |
181 | the hiddev_usage_ref structure as above, but additionally fills in | 181 | the hiddev_usage_ref structure as above, but additionally fills in |
182 | the value field. | 182 | the value field. |
183 | 183 | ||
184 | HIDIOGCOLLECTIONINDEX - struct hiddev_usage_ref (write) | 184 | HIDIOGCOLLECTIONINDEX - struct hiddev_usage_ref (write) |
185 | Returns the collection index associated with this usage. This | 185 | Returns the collection index associated with this usage. This |
186 | indicates where in the collection hierarchy this usage sits. | 186 | indicates where in the collection hierarchy this usage sits. |
187 | 187 | ||
188 | HIDIOCGFLAG - int (read) | 188 | HIDIOCGFLAG - int (read) |
189 | HIDIOCSFLAG - int (write) | 189 | HIDIOCSFLAG - int (write) |
190 | These operations respectively inspect and replace the mode flags | 190 | These operations respectively inspect and replace the mode flags |
191 | that influence the read() call above. The flags are as follows: | 191 | that influence the read() call above. The flags are as follows: |
192 | 192 | ||
193 | HIDDEV_FLAG_UREF - read() calls will now return | 193 | HIDDEV_FLAG_UREF - read() calls will now return |
194 | struct hiddev_usage_ref instead of struct hiddev_event. | 194 | struct hiddev_usage_ref instead of struct hiddev_event. |
195 | This is a larger structure, but in situations where the | 195 | This is a larger structure, but in situations where the |
196 | device has more than one usage in its reports with the | 196 | device has more than one usage in its reports with the |
197 | same usage code, this mode serves to resolve such | 197 | same usage code, this mode serves to resolve such |
198 | ambiguity. | 198 | ambiguity. |
199 | 199 | ||
200 | HIDDEV_FLAG_REPORT - This flag can only be used in conjunction | 200 | HIDDEV_FLAG_REPORT - This flag can only be used in conjunction |
201 | with HIDDEV_FLAG_UREF. With this flag set, when the device | 201 | with HIDDEV_FLAG_UREF. With this flag set, when the device |
202 | sends a report, a struct hiddev_usage_ref will be returned | 202 | sends a report, a struct hiddev_usage_ref will be returned |
203 | to read() filled in with the report_type and report_id, but | 203 | to read() filled in with the report_type and report_id, but |
204 | with field_index set to FIELD_INDEX_NONE. This serves as | 204 | with field_index set to FIELD_INDEX_NONE. This serves as |
205 | additional notification when the device has sent a report. | 205 | additional notification when the device has sent a report. |
206 | 206 |
Documentation/usb/usb-serial.txt
1 | INTRODUCTION | 1 | INTRODUCTION |
2 | 2 | ||
3 | The USB serial driver currently supports a number of different USB to | 3 | The USB serial driver currently supports a number of different USB to |
4 | serial converter products, as well as some devices that use a serial | 4 | serial converter products, as well as some devices that use a serial |
5 | interface from userspace to talk to the device. | 5 | interface from userspace to talk to the device. |
6 | 6 | ||
7 | See the individual product section below for specific information about | 7 | See the individual product section below for specific information about |
8 | the different devices. | 8 | the different devices. |
9 | 9 | ||
10 | 10 | ||
11 | CONFIGURATION | 11 | CONFIGURATION |
12 | 12 | ||
13 | Currently the driver can handle up to 256 different serial interfaces at | 13 | Currently the driver can handle up to 256 different serial interfaces at |
14 | one time. | 14 | one time. |
15 | 15 | ||
16 | The major number that the driver uses is 188 so to use the driver, | 16 | The major number that the driver uses is 188 so to use the driver, |
17 | create the following nodes: | 17 | create the following nodes: |
18 | mknod /dev/ttyUSB0 c 188 0 | 18 | mknod /dev/ttyUSB0 c 188 0 |
19 | mknod /dev/ttyUSB1 c 188 1 | 19 | mknod /dev/ttyUSB1 c 188 1 |
20 | mknod /dev/ttyUSB2 c 188 2 | 20 | mknod /dev/ttyUSB2 c 188 2 |
21 | mknod /dev/ttyUSB3 c 188 3 | 21 | mknod /dev/ttyUSB3 c 188 3 |
22 | . | 22 | . |
23 | . | 23 | . |
24 | . | 24 | . |
25 | mknod /dev/ttyUSB254 c 188 254 | 25 | mknod /dev/ttyUSB254 c 188 254 |
26 | mknod /dev/ttyUSB255 c 188 255 | 26 | mknod /dev/ttyUSB255 c 188 255 |
27 | 27 | ||
28 | When the device is connected and recognized by the driver, the driver | 28 | When the device is connected and recognized by the driver, the driver |
29 | will print to the system log, which node(s) the device has been bound | 29 | will print to the system log, which node(s) the device has been bound |
30 | to. | 30 | to. |
31 | 31 | ||
32 | 32 | ||
33 | SPECIFIC DEVICES SUPPORTED | 33 | SPECIFIC DEVICES SUPPORTED |
34 | 34 | ||
35 | 35 | ||
36 | ConnectTech WhiteHEAT 4 port converter | 36 | ConnectTech WhiteHEAT 4 port converter |
37 | 37 | ||
38 | ConnectTech has been very forthcoming with information about their | 38 | ConnectTech has been very forthcoming with information about their |
39 | device, including providing a unit to test with. | 39 | device, including providing a unit to test with. |
40 | 40 | ||
41 | The driver is officially supported by Connect Tech Inc. | 41 | The driver is officially supported by Connect Tech Inc. |
42 | http://www.connecttech.com | 42 | http://www.connecttech.com |
43 | 43 | ||
44 | For any questions or problems with this driver, please contact | 44 | For any questions or problems with this driver, please contact |
45 | Stuart MacDonald at stuartm@connecttech.com | 45 | Stuart MacDonald at stuartm@connecttech.com |
46 | 46 | ||
47 | 47 | ||
48 | HandSpring Visor, Palm USB, and Cliรฉ USB driver | 48 | HandSpring Visor, Palm USB, and Cliรฉ USB driver |
49 | 49 | ||
50 | This driver works with all HandSpring USB, Palm USB, and Sony Cliรฉ USB | 50 | This driver works with all HandSpring USB, Palm USB, and Sony Cliรฉ USB |
51 | devices. | 51 | devices. |
52 | 52 | ||
53 | Only when the device tries to connect to the host, will the device show | 53 | Only when the device tries to connect to the host, will the device show |
54 | up to the host as a valid USB device. When this happens, the device is | 54 | up to the host as a valid USB device. When this happens, the device is |
55 | properly enumerated, assigned a port, and then communication _should_ be | 55 | properly enumerated, assigned a port, and then communication _should_ be |
56 | possible. The driver cleans up properly when the device is removed, or | 56 | possible. The driver cleans up properly when the device is removed, or |
57 | the connection is canceled on the device. | 57 | the connection is canceled on the device. |
58 | 58 | ||
59 | NOTE: | 59 | NOTE: |
60 | This means that in order to talk to the device, the sync button must be | 60 | This means that in order to talk to the device, the sync button must be |
61 | pressed BEFORE trying to get any program to communicate to the device. | 61 | pressed BEFORE trying to get any program to communicate to the device. |
62 | This goes against the current documentation for pilot-xfer and other | 62 | This goes against the current documentation for pilot-xfer and other |
63 | packages, but is the only way that it will work due to the hardware | 63 | packages, but is the only way that it will work due to the hardware |
64 | in the device. | 64 | in the device. |
65 | 65 | ||
66 | When the device is connected, try talking to it on the second port | 66 | When the device is connected, try talking to it on the second port |
67 | (this is usually /dev/ttyUSB1 if you do not have any other usb-serial | 67 | (this is usually /dev/ttyUSB1 if you do not have any other usb-serial |
68 | devices in the system.) The system log should tell you which port is | 68 | devices in the system.) The system log should tell you which port is |
69 | the port to use for the HotSync transfer. The "Generic" port can be used | 69 | the port to use for the HotSync transfer. The "Generic" port can be used |
70 | for other device communication, such as a PPP link. | 70 | for other device communication, such as a PPP link. |
71 | 71 | ||
72 | For some Sony Cliรฉ devices, /dev/ttyUSB0 must be used to talk to the | 72 | For some Sony Cliรฉ devices, /dev/ttyUSB0 must be used to talk to the |
73 | device. This is true for all OS version 3.5 devices, and most devices | 73 | device. This is true for all OS version 3.5 devices, and most devices |
74 | that have had a flash upgrade to a newer version of the OS. See the | 74 | that have had a flash upgrade to a newer version of the OS. See the |
75 | kernel system log for information on which is the correct port to use. | 75 | kernel system log for information on which is the correct port to use. |
76 | 76 | ||
77 | If after pressing the sync button, nothing shows up in the system log, | 77 | If after pressing the sync button, nothing shows up in the system log, |
78 | try resetting the device, first a hot reset, and then a cold reset if | 78 | try resetting the device, first a hot reset, and then a cold reset if |
79 | necessary. Some devices need this before they can talk to the USB port | 79 | necessary. Some devices need this before they can talk to the USB port |
80 | properly. | 80 | properly. |
81 | 81 | ||
82 | Devices that are not compiled into the kernel can be specified with module | 82 | Devices that are not compiled into the kernel can be specified with module |
83 | parameters. e.g. modprobe visor vendor=0x54c product=0x66 | 83 | parameters. e.g. modprobe visor vendor=0x54c product=0x66 |
84 | 84 | ||
85 | There is a webpage and mailing lists for this portion of the driver at: | 85 | There is a webpage and mailing lists for this portion of the driver at: |
86 | http://usbvisor.sourceforge.net/ | 86 | http://usbvisor.sourceforge.net/ |
87 | 87 | ||
88 | For any questions or problems with this driver, please contact Greg | 88 | For any questions or problems with this driver, please contact Greg |
89 | Kroah-Hartman at greg@kroah.com | 89 | Kroah-Hartman at greg@kroah.com |
90 | 90 | ||
91 | 91 | ||
92 | PocketPC PDA Driver | 92 | PocketPC PDA Driver |
93 | 93 | ||
94 | This driver can be used to connect to Compaq iPAQ, HP Jornada, Casio EM500 | 94 | This driver can be used to connect to Compaq iPAQ, HP Jornada, Casio EM500 |
95 | and other PDAs running Windows CE 3.0 or PocketPC 2002 using a USB | 95 | and other PDAs running Windows CE 3.0 or PocketPC 2002 using a USB |
96 | cable/cradle. | 96 | cable/cradle. |
97 | Most devices supported by ActiveSync are supported out of the box. | 97 | Most devices supported by ActiveSync are supported out of the box. |
98 | For others, please use module parameters to specify the product and vendor | 98 | For others, please use module parameters to specify the product and vendor |
99 | id. e.g. modprobe ipaq vendor=0x3f0 product=0x1125 | 99 | id. e.g. modprobe ipaq vendor=0x3f0 product=0x1125 |
100 | 100 | ||
101 | The driver presents a serial interface (usually on /dev/ttyUSB0) over | 101 | The driver presents a serial interface (usually on /dev/ttyUSB0) over |
102 | which one may run ppp and establish a TCP/IP link to the PDA. Once this | 102 | which one may run ppp and establish a TCP/IP link to the PDA. Once this |
103 | is done, you can transfer files, backup, download email etc. The most | 103 | is done, you can transfer files, backup, download email etc. The most |
104 | significant advantage of using USB is speed - I can get 73 to 113 | 104 | significant advantage of using USB is speed - I can get 73 to 113 |
105 | kbytes/sec for download/upload to my iPAQ. | 105 | kbytes/sec for download/upload to my iPAQ. |
106 | 106 | ||
107 | This driver is only one of a set of components required to utilize | 107 | This driver is only one of a set of components required to utilize |
108 | the USB connection. Please visit http://synce.sourceforge.net which | 108 | the USB connection. Please visit http://synce.sourceforge.net which |
109 | contains the necessary packages and a simple step-by-step howto. | 109 | contains the necessary packages and a simple step-by-step howto. |
110 | 110 | ||
111 | Once connected, you can use Win CE programs like ftpView, Pocket Outlook | 111 | Once connected, you can use Win CE programs like ftpView, Pocket Outlook |
112 | from the PDA and xcerdisp, synce utilities from the Linux side. | 112 | from the PDA and xcerdisp, synce utilities from the Linux side. |
113 | 113 | ||
114 | To use Pocket IE, follow the instructions given at | 114 | To use Pocket IE, follow the instructions given at |
115 | http://www.tekguru.co.uk/EM500/usbtonet.htm to achieve the same thing | 115 | http://www.tekguru.co.uk/EM500/usbtonet.htm to achieve the same thing |
116 | on Win98. Omit the proxy server part; Linux is quite capable of forwarding | 116 | on Win98. Omit the proxy server part; Linux is quite capable of forwarding |
117 | packets unlike Win98. Another modification is required at least for the | 117 | packets unlike Win98. Another modification is required at least for the |
118 | iPAQ - disable autosync by going to the Start/Settings/Connections menu | 118 | iPAQ - disable autosync by going to the Start/Settings/Connections menu |
119 | and unchecking the "Automatically synchronize ..." box. Go to | 119 | and unchecking the "Automatically synchronize ..." box. Go to |
120 | Start/Programs/Connections, connect the cable and select "usbdial" (or | 120 | Start/Programs/Connections, connect the cable and select "usbdial" (or |
121 | whatever you named your new USB connection). You should finally wind | 121 | whatever you named your new USB connection). You should finally wind |
122 | up with a "Connected to usbdial" window with status shown as connected. | 122 | up with a "Connected to usbdial" window with status shown as connected. |
123 | Now start up PIE and browse away. | 123 | Now start up PIE and browse away. |
124 | 124 | ||
125 | If it doesn't work for some reason, load both the usbserial and ipaq module | 125 | If it doesn't work for some reason, load both the usbserial and ipaq module |
126 | with the module parameter "debug" set to 1 and examine the system log. | 126 | with the module parameter "debug" set to 1 and examine the system log. |
127 | You can also try soft-resetting your PDA before attempting a connection. | 127 | You can also try soft-resetting your PDA before attempting a connection. |
128 | 128 | ||
129 | Other functionality may be possible depending on your PDA. According to | 129 | Other functionality may be possible depending on your PDA. According to |
130 | Wes Cilldhaire <billybobjoehenrybob@hotmail.com>, with the Toshiba E570, | 130 | Wes Cilldhaire <billybobjoehenrybob@hotmail.com>, with the Toshiba E570, |
131 | ...if you boot into the bootloader (hold down the power when hitting the | 131 | ...if you boot into the bootloader (hold down the power when hitting the |
132 | reset button, continuing to hold onto the power until the bootloader screen | 132 | reset button, continuing to hold onto the power until the bootloader screen |
133 | is displayed), then put it in the cradle with the ipaq driver loaded, open | 133 | is displayed), then put it in the cradle with the ipaq driver loaded, open |
134 | a terminal on /dev/ttyUSB0, it gives you a "USB Reflash" terminal, which can | 134 | a terminal on /dev/ttyUSB0, it gives you a "USB Reflash" terminal, which can |
135 | be used to flash the ROM, as well as the microP code.. so much for needing | 135 | be used to flash the ROM, as well as the microP code.. so much for needing |
136 | Toshiba's $350 serial cable for flashing!! :D | 136 | Toshiba's $350 serial cable for flashing!! :D |
137 | NOTE: This has NOT been tested. Use at your own risk. | 137 | NOTE: This has NOT been tested. Use at your own risk. |
138 | 138 | ||
139 | For any questions or problems with the driver, please contact Ganesh | 139 | For any questions or problems with the driver, please contact Ganesh |
140 | Varadarajan <ganesh@veritas.com> | 140 | Varadarajan <ganesh@veritas.com> |
141 | 141 | ||
142 | 142 | ||
143 | Keyspan PDA Serial Adapter | 143 | Keyspan PDA Serial Adapter |
144 | 144 | ||
145 | Single port DB-9 serial adapter, pushed as a PDA adapter for iMacs (mostly | 145 | Single port DB-9 serial adapter, pushed as a PDA adapter for iMacs (mostly |
146 | sold in Macintosh catalogs, comes in a translucent white/green dongle). | 146 | sold in Macintosh catalogs, comes in a translucent white/green dongle). |
147 | Fairly simple device. Firmware is homebrew. | 147 | Fairly simple device. Firmware is homebrew. |
148 | This driver also works for the Xircom/Entrgra single port serial adapter. | 148 | This driver also works for the Xircom/Entrgra single port serial adapter. |
149 | 149 | ||
150 | Current status: | 150 | Current status: |
151 | Things that work: | 151 | Things that work: |
152 | basic input/output (tested with 'cu') | 152 | basic input/output (tested with 'cu') |
153 | blocking write when serial line can't keep up | 153 | blocking write when serial line can't keep up |
154 | changing baud rates (up to 115200) | 154 | changing baud rates (up to 115200) |
155 | getting/setting modem control pins (TIOCM{GET,SET,BIS,BIC}) | 155 | getting/setting modem control pins (TIOCM{GET,SET,BIS,BIC}) |
156 | sending break (although duration looks suspect) | 156 | sending break (although duration looks suspect) |
157 | Things that don't: | 157 | Things that don't: |
158 | device strings (as logged by kernel) have trailing binary garbage | 158 | device strings (as logged by kernel) have trailing binary garbage |
159 | device ID isn't right, might collide with other Keyspan products | 159 | device ID isn't right, might collide with other Keyspan products |
160 | changing baud rates ought to flush tx/rx to avoid mangled half characters | 160 | changing baud rates ought to flush tx/rx to avoid mangled half characters |
161 | Big Things on the todo list: | 161 | Big Things on the todo list: |
162 | parity, 7 vs 8 bits per char, 1 or 2 stop bits | 162 | parity, 7 vs 8 bits per char, 1 or 2 stop bits |
163 | HW flow control | 163 | HW flow control |
164 | not all of the standard USB descriptors are handled: Get_Status, Set_Feature | 164 | not all of the standard USB descriptors are handled: Get_Status, Set_Feature |
165 | O_NONBLOCK, select() | 165 | O_NONBLOCK, select() |
166 | 166 | ||
167 | For any questions or problems with this driver, please contact Brian | 167 | For any questions or problems with this driver, please contact Brian |
168 | Warner at warner@lothar.com | 168 | Warner at warner@lothar.com |
169 | 169 | ||
170 | 170 | ||
171 | Keyspan USA-series Serial Adapters | 171 | Keyspan USA-series Serial Adapters |
172 | 172 | ||
173 | Single, Dual and Quad port adapters - driver uses Keyspan supplied | 173 | Single, Dual and Quad port adapters - driver uses Keyspan supplied |
174 | firmware and is being developed with their support. | 174 | firmware and is being developed with their support. |
175 | 175 | ||
176 | Current status: | 176 | Current status: |
177 | The USA-18X, USA-28X, USA-19, USA-19W and USA-49W are supported and | 177 | The USA-18X, USA-28X, USA-19, USA-19W and USA-49W are supported and |
178 | have been pretty throughly tested at various baud rates with 8-N-1 | 178 | have been pretty throughly tested at various baud rates with 8-N-1 |
179 | character settings. Other character lengths and parity setups are | 179 | character settings. Other character lengths and parity setups are |
180 | presently untested. | 180 | presently untested. |
181 | 181 | ||
182 | The USA-28 isn't yet supported though doing so should be pretty | 182 | The USA-28 isn't yet supported though doing so should be pretty |
183 | straightforward. Contact the maintainer if you require this | 183 | straightforward. Contact the maintainer if you require this |
184 | functionality. | 184 | functionality. |
185 | 185 | ||
186 | More information is available at: | 186 | More information is available at: |
187 | http://misc.nu/hugh/keyspan.html | 187 | http://misc.nu/hugh/keyspan.html |
188 | 188 | ||
189 | For any questions or problems with this driver, please contact Hugh | 189 | For any questions or problems with this driver, please contact Hugh |
190 | Blemings at hugh@misc.nu | 190 | Blemings at hugh@misc.nu |
191 | 191 | ||
192 | 192 | ||
193 | FTDI Single Port Serial Driver | 193 | FTDI Single Port Serial Driver |
194 | 194 | ||
195 | This is a single port DB-25 serial adapter. More information about this | 195 | This is a single port DB-25 serial adapter. More information about this |
196 | device and the Linux driver can be found at: | 196 | device and the Linux driver can be found at: |
197 | http://reality.sgi.com/bryder_wellington/ftdi_sio/ | 197 | http://reality.sgi.com/bryder_wellington/ftdi_sio/ |
198 | 198 | ||
199 | For any questions or problems with this driver, please contact Bill Ryder | 199 | For any questions or problems with this driver, please contact Bill Ryder |
200 | at bryder@sgi.com | 200 | at bryder@sgi.com |
201 | 201 | ||
202 | 202 | ||
203 | ZyXEL omni.net lcd plus ISDN TA | 203 | ZyXEL omni.net lcd plus ISDN TA |
204 | 204 | ||
205 | This is an ISDN TA. Please report both successes and troubles to | 205 | This is an ISDN TA. Please report both successes and troubles to |
206 | azummo@towertech.it | 206 | azummo@towertech.it |
207 | 207 | ||
208 | 208 | ||
209 | Cypress M8 CY4601 Family Serial Driver | 209 | Cypress M8 CY4601 Family Serial Driver |
210 | 210 | ||
211 | This driver was in most part developed by Neil "koyama" Whelchel. It | 211 | This driver was in most part developed by Neil "koyama" Whelchel. It |
212 | has been improved since that previous form to support dynamic serial | 212 | has been improved since that previous form to support dynamic serial |
213 | line settings and improved line handling. The driver is for the most | 213 | line settings and improved line handling. The driver is for the most |
214 | part stable and has been tested on an smp machine. (dual p2) | 214 | part stable and has been tested on an smp machine. (dual p2) |
215 | 215 | ||
216 | Chipsets supported under CY4601 family: | 216 | Chipsets supported under CY4601 family: |
217 | 217 | ||
218 | CY7C63723, CY7C63742, CY7C63743, CY7C64013 | 218 | CY7C63723, CY7C63742, CY7C63743, CY7C64013 |
219 | 219 | ||
220 | Devices supported: | 220 | Devices supported: |
221 | 221 | ||
222 | -DeLorme's USB Earthmate (SiRF Star II lp arch) | 222 | -DeLorme's USB Earthmate (SiRF Star II lp arch) |
223 | -Cypress HID->COM RS232 adapter | 223 | -Cypress HID->COM RS232 adapter |
224 | 224 | ||
225 | Note: Cypress Semiconductor claims no affiliation with the | 225 | Note: Cypress Semiconductor claims no affiliation with the |
226 | the hid->com device. | 226 | hid->com device. |
227 | 227 | ||
228 | Most devices using chipsets under the CY4601 family should | 228 | Most devices using chipsets under the CY4601 family should |
229 | work with the driver. As long as they stay true to the CY4601 | 229 | work with the driver. As long as they stay true to the CY4601 |
230 | usbserial specification. | 230 | usbserial specification. |
231 | 231 | ||
232 | Technical notes: | 232 | Technical notes: |
233 | 233 | ||
234 | The Earthmate starts out at 4800 8N1 by default... the driver will | 234 | The Earthmate starts out at 4800 8N1 by default... the driver will |
235 | upon start init to this setting. usbserial core provides the rest | 235 | upon start init to this setting. usbserial core provides the rest |
236 | of the termios settings, along with some custom termios so that the | 236 | of the termios settings, along with some custom termios so that the |
237 | output is in proper format and parsable. | 237 | output is in proper format and parsable. |
238 | 238 | ||
239 | The device can be put into sirf mode by issuing NMEA command: | 239 | The device can be put into sirf mode by issuing NMEA command: |
240 | $PSRF100,<protocol>,<baud>,<databits>,<stopbits>,<parity>*CHECKSUM | 240 | $PSRF100,<protocol>,<baud>,<databits>,<stopbits>,<parity>*CHECKSUM |
241 | $PSRF100,0,9600,8,1,0*0C | 241 | $PSRF100,0,9600,8,1,0*0C |
242 | 242 | ||
243 | It should then be sufficient to change the port termios to match this | 243 | It should then be sufficient to change the port termios to match this |
244 | to begin communicating. | 244 | to begin communicating. |
245 | 245 | ||
246 | As far as I can tell it supports pretty much every sirf command as | 246 | As far as I can tell it supports pretty much every sirf command as |
247 | documented online available with firmware 2.31, with some unknown | 247 | documented online available with firmware 2.31, with some unknown |
248 | message ids. | 248 | message ids. |
249 | 249 | ||
250 | The hid->com adapter can run at a maximum baud of 115200bps. Please note | 250 | The hid->com adapter can run at a maximum baud of 115200bps. Please note |
251 | that the device has trouble or is incapable of raising line voltage properly. | 251 | that the device has trouble or is incapable of raising line voltage properly. |
252 | It will be fine with null modem links, as long as you do not try to link two | 252 | It will be fine with null modem links, as long as you do not try to link two |
253 | together without hacking the adapter to set the line high. | 253 | together without hacking the adapter to set the line high. |
254 | 254 | ||
255 | The driver is smp safe. Performance with the driver is rather low when using | 255 | The driver is smp safe. Performance with the driver is rather low when using |
256 | it for transfering files. This is being worked on, but I would be willing to | 256 | it for transfering files. This is being worked on, but I would be willing to |
257 | accept patches. An urb queue or packet buffer would likely fit the bill here. | 257 | accept patches. An urb queue or packet buffer would likely fit the bill here. |
258 | 258 | ||
259 | If you have any questions, problems, patches, feature requests, etc. you can | 259 | If you have any questions, problems, patches, feature requests, etc. you can |
260 | contact me here via email: | 260 | contact me here via email: |
261 | dignome@gmail.com | 261 | dignome@gmail.com |
262 | (your problems/patches can alternately be submitted to usb-devel) | 262 | (your problems/patches can alternately be submitted to usb-devel) |
263 | 263 | ||
264 | 264 | ||
265 | Digi AccelePort Driver | 265 | Digi AccelePort Driver |
266 | 266 | ||
267 | This driver supports the Digi AccelePort USB 2 and 4 devices, 2 port | 267 | This driver supports the Digi AccelePort USB 2 and 4 devices, 2 port |
268 | (plus a parallel port) and 4 port USB serial converters. The driver | 268 | (plus a parallel port) and 4 port USB serial converters. The driver |
269 | does NOT yet support the Digi AccelePort USB 8. | 269 | does NOT yet support the Digi AccelePort USB 8. |
270 | 270 | ||
271 | This driver works under SMP with the usb-uhci driver. It does not | 271 | This driver works under SMP with the usb-uhci driver. It does not |
272 | work under SMP with the uhci driver. | 272 | work under SMP with the uhci driver. |
273 | 273 | ||
274 | The driver is generally working, though we still have a few more ioctls | 274 | The driver is generally working, though we still have a few more ioctls |
275 | to implement and final testing and debugging to do. The parallel port | 275 | to implement and final testing and debugging to do. The parallel port |
276 | on the USB 2 is supported as a serial to parallel converter; in other | 276 | on the USB 2 is supported as a serial to parallel converter; in other |
277 | words, it appears as another USB serial port on Linux, even though | 277 | words, it appears as another USB serial port on Linux, even though |
278 | physically it is really a parallel port. The Digi Acceleport USB 8 | 278 | physically it is really a parallel port. The Digi Acceleport USB 8 |
279 | is not yet supported. | 279 | is not yet supported. |
280 | 280 | ||
281 | Please contact Peter Berger (pberger@brimson.com) or Al Borchers | 281 | Please contact Peter Berger (pberger@brimson.com) or Al Borchers |
282 | (alborchers@steinerpoint.com) for questions or problems with this | 282 | (alborchers@steinerpoint.com) for questions or problems with this |
283 | driver. | 283 | driver. |
284 | 284 | ||
285 | 285 | ||
286 | Belkin USB Serial Adapter F5U103 | 286 | Belkin USB Serial Adapter F5U103 |
287 | 287 | ||
288 | Single port DB-9/PS-2 serial adapter from Belkin with firmware by eTEK Labs. | 288 | Single port DB-9/PS-2 serial adapter from Belkin with firmware by eTEK Labs. |
289 | The Peracom single port serial adapter also works with this driver, as | 289 | The Peracom single port serial adapter also works with this driver, as |
290 | well as the GoHubs adapter. | 290 | well as the GoHubs adapter. |
291 | 291 | ||
292 | Current status: | 292 | Current status: |
293 | The following have been tested and work: | 293 | The following have been tested and work: |
294 | Baud rate 300-230400 | 294 | Baud rate 300-230400 |
295 | Data bits 5-8 | 295 | Data bits 5-8 |
296 | Stop bits 1-2 | 296 | Stop bits 1-2 |
297 | Parity N,E,O,M,S | 297 | Parity N,E,O,M,S |
298 | Handshake None, Software (XON/XOFF), Hardware (CTSRTS,CTSDTR)* | 298 | Handshake None, Software (XON/XOFF), Hardware (CTSRTS,CTSDTR)* |
299 | Break Set and clear | 299 | Break Set and clear |
300 | Line contrl Input/Output query and control ** | 300 | Line contrl Input/Output query and control ** |
301 | 301 | ||
302 | * Hardware input flow control is only enabled for firmware | 302 | * Hardware input flow control is only enabled for firmware |
303 | levels above 2.06. Read source code comments describing Belkin | 303 | levels above 2.06. Read source code comments describing Belkin |
304 | firmware errata. Hardware output flow control is working for all | 304 | firmware errata. Hardware output flow control is working for all |
305 | firmware versions. | 305 | firmware versions. |
306 | ** Queries of inputs (CTS,DSR,CD,RI) show the last | 306 | ** Queries of inputs (CTS,DSR,CD,RI) show the last |
307 | reported state. Queries of outputs (DTR,RTS) show the last | 307 | reported state. Queries of outputs (DTR,RTS) show the last |
308 | requested state and may not reflect current state as set by | 308 | requested state and may not reflect current state as set by |
309 | automatic hardware flow control. | 309 | automatic hardware flow control. |
310 | 310 | ||
311 | TO DO List: | 311 | TO DO List: |
312 | -- Add true modem contol line query capability. Currently tracks the | 312 | -- Add true modem contol line query capability. Currently tracks the |
313 | states reported by the interrupt and the states requested. | 313 | states reported by the interrupt and the states requested. |
314 | -- Add error reporting back to application for UART error conditions. | 314 | -- Add error reporting back to application for UART error conditions. |
315 | -- Add support for flush ioctls. | 315 | -- Add support for flush ioctls. |
316 | -- Add everything else that is missing :) | 316 | -- Add everything else that is missing :) |
317 | 317 | ||
318 | For any questions or problems with this driver, please contact William | 318 | For any questions or problems with this driver, please contact William |
319 | Greathouse at wgreathouse@smva.com | 319 | Greathouse at wgreathouse@smva.com |
320 | 320 | ||
321 | 321 | ||
322 | Empeg empeg-car Mark I/II Driver | 322 | Empeg empeg-car Mark I/II Driver |
323 | 323 | ||
324 | This is an experimental driver to provide connectivity support for the | 324 | This is an experimental driver to provide connectivity support for the |
325 | client synchronization tools for an Empeg empeg-car mp3 player. | 325 | client synchronization tools for an Empeg empeg-car mp3 player. |
326 | 326 | ||
327 | Tips: | 327 | Tips: |
328 | * Don't forget to create the device nodes for ttyUSB{0,1,2,...} | 328 | * Don't forget to create the device nodes for ttyUSB{0,1,2,...} |
329 | * modprobe empeg (modprobe is your friend) | 329 | * modprobe empeg (modprobe is your friend) |
330 | * emptool --usb /dev/ttyUSB0 (or whatever you named your device node) | 330 | * emptool --usb /dev/ttyUSB0 (or whatever you named your device node) |
331 | 331 | ||
332 | For any questions or problems with this driver, please contact Gary | 332 | For any questions or problems with this driver, please contact Gary |
333 | Brubaker at xavyer@ix.netcom.com | 333 | Brubaker at xavyer@ix.netcom.com |
334 | 334 | ||
335 | 335 | ||
336 | MCT USB Single Port Serial Adapter U232 | 336 | MCT USB Single Port Serial Adapter U232 |
337 | 337 | ||
338 | This driver is for the MCT USB-RS232 Converter (25 pin, Model No. | 338 | This driver is for the MCT USB-RS232 Converter (25 pin, Model No. |
339 | U232-P25) from Magic Control Technology Corp. (there is also a 9 pin | 339 | U232-P25) from Magic Control Technology Corp. (there is also a 9 pin |
340 | Model No. U232-P9). More information about this device can be found at | 340 | Model No. U232-P9). More information about this device can be found at |
341 | the manufacture's web-site: http://www.mct.com.tw. | 341 | the manufacture's web-site: http://www.mct.com.tw. |
342 | 342 | ||
343 | The driver is generally working, though it still needs some more testing. | 343 | The driver is generally working, though it still needs some more testing. |
344 | It is derived from the Belkin USB Serial Adapter F5U103 driver and its | 344 | It is derived from the Belkin USB Serial Adapter F5U103 driver and its |
345 | TODO list is valid for this driver as well. | 345 | TODO list is valid for this driver as well. |
346 | 346 | ||
347 | This driver has also been found to work for other products, which have | 347 | This driver has also been found to work for other products, which have |
348 | the same Vendor ID but different Product IDs. Sitecom's U232-P25 serial | 348 | the same Vendor ID but different Product IDs. Sitecom's U232-P25 serial |
349 | converter uses Product ID 0x230 and Vendor ID 0x711 and works with this | 349 | converter uses Product ID 0x230 and Vendor ID 0x711 and works with this |
350 | driver. Also, D-Link's DU-H3SP USB BAY also works with this driver. | 350 | driver. Also, D-Link's DU-H3SP USB BAY also works with this driver. |
351 | 351 | ||
352 | For any questions or problems with this driver, please contact Wolfgang | 352 | For any questions or problems with this driver, please contact Wolfgang |
353 | Grandegger at wolfgang@ces.ch | 353 | Grandegger at wolfgang@ces.ch |
354 | 354 | ||
355 | 355 | ||
356 | Inside Out Networks Edgeport Driver | 356 | Inside Out Networks Edgeport Driver |
357 | 357 | ||
358 | This driver supports all devices made by Inside Out Networks, specifically | 358 | This driver supports all devices made by Inside Out Networks, specifically |
359 | the following models: | 359 | the following models: |
360 | Edgeport/4 | 360 | Edgeport/4 |
361 | Rapidport/4 | 361 | Rapidport/4 |
362 | Edgeport/4t | 362 | Edgeport/4t |
363 | Edgeport/2 | 363 | Edgeport/2 |
364 | Edgeport/4i | 364 | Edgeport/4i |
365 | Edgeport/2i | 365 | Edgeport/2i |
366 | Edgeport/421 | 366 | Edgeport/421 |
367 | Edgeport/21 | 367 | Edgeport/21 |
368 | Edgeport/8 | 368 | Edgeport/8 |
369 | Edgeport/8 Dual | 369 | Edgeport/8 Dual |
370 | Edgeport/2D8 | 370 | Edgeport/2D8 |
371 | Edgeport/4D8 | 371 | Edgeport/4D8 |
372 | Edgeport/8i | 372 | Edgeport/8i |
373 | Edgeport/2 DIN | 373 | Edgeport/2 DIN |
374 | Edgeport/4 DIN | 374 | Edgeport/4 DIN |
375 | Edgeport/16 Dual | 375 | Edgeport/16 Dual |
376 | 376 | ||
377 | For any questions or problems with this driver, please contact Greg | 377 | For any questions or problems with this driver, please contact Greg |
378 | Kroah-Hartman at greg@kroah.com | 378 | Kroah-Hartman at greg@kroah.com |
379 | 379 | ||
380 | 380 | ||
381 | REINER SCT cyberJack pinpad/e-com USB chipcard reader | 381 | REINER SCT cyberJack pinpad/e-com USB chipcard reader |
382 | 382 | ||
383 | Interface to ISO 7816 compatible contactbased chipcards, e.g. GSM SIMs. | 383 | Interface to ISO 7816 compatible contactbased chipcards, e.g. GSM SIMs. |
384 | 384 | ||
385 | Current status: | 385 | Current status: |
386 | This is the kernel part of the driver for this USB card reader. | 386 | This is the kernel part of the driver for this USB card reader. |
387 | There is also a user part for a CT-API driver available. A site | 387 | There is also a user part for a CT-API driver available. A site |
388 | for downloading is TBA. For now, you can request it from the | 388 | for downloading is TBA. For now, you can request it from the |
389 | maintainer (linux-usb@sii.li). | 389 | maintainer (linux-usb@sii.li). |
390 | 390 | ||
391 | For any questions or problems with this driver, please contact | 391 | For any questions or problems with this driver, please contact |
392 | linux-usb@sii.li | 392 | linux-usb@sii.li |
393 | 393 | ||
394 | 394 | ||
395 | Prolific PL2303 Driver | 395 | Prolific PL2303 Driver |
396 | 396 | ||
397 | This driver supports any device that has the PL2303 chip from Prolific | 397 | This driver supports any device that has the PL2303 chip from Prolific |
398 | in it. This includes a number of single port USB to serial | 398 | in it. This includes a number of single port USB to serial |
399 | converters and USB GPS devices. Devices from Aten (the UC-232) and | 399 | converters and USB GPS devices. Devices from Aten (the UC-232) and |
400 | IO-Data work with this driver, as does the DCU-11 mobile-phone cable. | 400 | IO-Data work with this driver, as does the DCU-11 mobile-phone cable. |
401 | 401 | ||
402 | For any questions or problems with this driver, please contact Greg | 402 | For any questions or problems with this driver, please contact Greg |
403 | Kroah-Hartman at greg@kroah.com | 403 | Kroah-Hartman at greg@kroah.com |
404 | 404 | ||
405 | 405 | ||
406 | KL5KUSB105 chipset / PalmConnect USB single-port adapter | 406 | KL5KUSB105 chipset / PalmConnect USB single-port adapter |
407 | 407 | ||
408 | Current status: | 408 | Current status: |
409 | The driver was put together by looking at the usb bus transactions | 409 | The driver was put together by looking at the usb bus transactions |
410 | done by Palm's driver under Windows, so a lot of functionality is | 410 | done by Palm's driver under Windows, so a lot of functionality is |
411 | still missing. Notably, serial ioctls are sometimes faked or not yet | 411 | still missing. Notably, serial ioctls are sometimes faked or not yet |
412 | implemented. Support for finding out about DSR and CTS line status is | 412 | implemented. Support for finding out about DSR and CTS line status is |
413 | however implemented (though not nicely), so your favorite autopilot(1) | 413 | however implemented (though not nicely), so your favorite autopilot(1) |
414 | and pilot-manager -daemon calls will work. Baud rates up to 115200 | 414 | and pilot-manager -daemon calls will work. Baud rates up to 115200 |
415 | are supported, but handshaking (software or hardware) is not, which is | 415 | are supported, but handshaking (software or hardware) is not, which is |
416 | why it is wise to cut down on the rate used is wise for large | 416 | why it is wise to cut down on the rate used is wise for large |
417 | transfers until this is settled. | 417 | transfers until this is settled. |
418 | 418 | ||
419 | Options supported: | 419 | Options supported: |
420 | If this driver is compiled as a module you can pass the following | 420 | If this driver is compiled as a module you can pass the following |
421 | options to it: | 421 | options to it: |
422 | debug - extra verbose debugging info | 422 | debug - extra verbose debugging info |
423 | (default: 0; nonzero enables) | 423 | (default: 0; nonzero enables) |
424 | use_lowlatency - use low_latency flag to speed up tty layer | 424 | use_lowlatency - use low_latency flag to speed up tty layer |
425 | when reading from from the device. | 425 | when reading from the device. |
426 | (default: 0; nonzero enables) | 426 | (default: 0; nonzero enables) |
427 | 427 | ||
428 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date | 428 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date |
429 | information on this driver. | 429 | information on this driver. |
430 | 430 | ||
431 | AIRcable USB Dongle Bluetooth driver | 431 | AIRcable USB Dongle Bluetooth driver |
432 | If there is the cdc_acm driver loaded in the system, you will find that the | 432 | If there is the cdc_acm driver loaded in the system, you will find that the |
433 | cdc_acm claims the device before AIRcable can. This is simply corrected | 433 | cdc_acm claims the device before AIRcable can. This is simply corrected |
434 | by unloading both modules and then loading the aircable module before | 434 | by unloading both modules and then loading the aircable module before |
435 | cdc_acm module | 435 | cdc_acm module |
436 | 436 | ||
437 | Generic Serial driver | 437 | Generic Serial driver |
438 | 438 | ||
439 | If your device is not one of the above listed devices, compatible with | 439 | If your device is not one of the above listed devices, compatible with |
440 | the above models, you can try out the "generic" interface. This | 440 | the above models, you can try out the "generic" interface. This |
441 | interface does not provide any type of control messages sent to the | 441 | interface does not provide any type of control messages sent to the |
442 | device, and does not support any kind of device flow control. All that | 442 | device, and does not support any kind of device flow control. All that |
443 | is required of your device is that it has at least one bulk in endpoint, | 443 | is required of your device is that it has at least one bulk in endpoint, |
444 | or one bulk out endpoint. | 444 | or one bulk out endpoint. |
445 | 445 | ||
446 | To enable the generic driver to recognize your device, build the driver | 446 | To enable the generic driver to recognize your device, build the driver |
447 | as a module and load it by the following invocation: | 447 | as a module and load it by the following invocation: |
448 | insmod usbserial vendor=0x#### product=0x#### | 448 | insmod usbserial vendor=0x#### product=0x#### |
449 | where the #### is replaced with the hex representation of your device's | 449 | where the #### is replaced with the hex representation of your device's |
450 | vendor id and product id. | 450 | vendor id and product id. |
451 | 451 | ||
452 | This driver has been successfully used to connect to the NetChip USB | 452 | This driver has been successfully used to connect to the NetChip USB |
453 | development board, providing a way to develop USB firmware without | 453 | development board, providing a way to develop USB firmware without |
454 | having to write a custom driver. | 454 | having to write a custom driver. |
455 | 455 | ||
456 | For any questions or problems with this driver, please contact Greg | 456 | For any questions or problems with this driver, please contact Greg |
457 | Kroah-Hartman at greg@kroah.com | 457 | Kroah-Hartman at greg@kroah.com |
458 | 458 | ||
459 | 459 | ||
460 | CONTACT: | 460 | CONTACT: |
461 | 461 | ||
462 | If anyone has any problems using these drivers, with any of the above | 462 | If anyone has any problems using these drivers, with any of the above |
463 | specified products, please contact the specific driver's author listed | 463 | specified products, please contact the specific driver's author listed |
464 | above, or join the Linux-USB mailing list (information on joining the | 464 | above, or join the Linux-USB mailing list (information on joining the |
465 | mailing list, as well as a link to its searchable archive is at | 465 | mailing list, as well as a link to its searchable archive is at |
466 | http://www.linux-usb.org/ ) | 466 | http://www.linux-usb.org/ ) |
467 | 467 | ||
468 | 468 | ||
469 | Greg Kroah-Hartman | 469 | Greg Kroah-Hartman |
470 | greg@kroah.com | 470 | greg@kroah.com |
471 | 471 |
Documentation/video4linux/README.pvrusb2
1 | 1 | ||
2 | $Id$ | 2 | $Id$ |
3 | Mike Isely <isely@pobox.com> | 3 | Mike Isely <isely@pobox.com> |
4 | 4 | ||
5 | pvrusb2 driver | 5 | pvrusb2 driver |
6 | 6 | ||
7 | Background: | 7 | Background: |
8 | 8 | ||
9 | This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which | 9 | This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which |
10 | is a USB 2.0 hosted TV Tuner. This driver is a work in progress. | 10 | is a USB 2.0 hosted TV Tuner. This driver is a work in progress. |
11 | Its history started with the reverse-engineering effort by Bjรถrn | 11 | Its history started with the reverse-engineering effort by Bjรถrn |
12 | Danielsson <pvrusb2@dax.nu> whose web page can be found here: | 12 | Danielsson <pvrusb2@dax.nu> whose web page can be found here: |
13 | 13 | ||
14 | http://pvrusb2.dax.nu/ | 14 | http://pvrusb2.dax.nu/ |
15 | 15 | ||
16 | From there Aurelien Alleaume <slts@free.fr> began an effort to | 16 | From there Aurelien Alleaume <slts@free.fr> began an effort to |
17 | create a video4linux compatible driver. I began with Aurelien's | 17 | create a video4linux compatible driver. I began with Aurelien's |
18 | last known snapshot and evolved the driver to the state it is in | 18 | last known snapshot and evolved the driver to the state it is in |
19 | here. | 19 | here. |
20 | 20 | ||
21 | More information on this driver can be found at: | 21 | More information on this driver can be found at: |
22 | 22 | ||
23 | http://www.isely.net/pvrusb2.html | 23 | http://www.isely.net/pvrusb2.html |
24 | 24 | ||
25 | 25 | ||
26 | This driver has a strong separation of layers. They are very | 26 | This driver has a strong separation of layers. They are very |
27 | roughly: | 27 | roughly: |
28 | 28 | ||
29 | 1a. Low level wire-protocol implementation with the device. | 29 | 1a. Low level wire-protocol implementation with the device. |
30 | 30 | ||
31 | 1b. I2C adaptor implementation and corresponding I2C client drivers | 31 | 1b. I2C adaptor implementation and corresponding I2C client drivers |
32 | implemented elsewhere in V4L. | 32 | implemented elsewhere in V4L. |
33 | 33 | ||
34 | 1c. High level hardware driver implementation which coordinates all | 34 | 1c. High level hardware driver implementation which coordinates all |
35 | activities that ensure correct operation of the device. | 35 | activities that ensure correct operation of the device. |
36 | 36 | ||
37 | 2. A "context" layer which manages instancing of driver, setup, | 37 | 2. A "context" layer which manages instancing of driver, setup, |
38 | tear-down, arbitration, and interaction with high level | 38 | tear-down, arbitration, and interaction with high level |
39 | interfaces appropriately as devices are hotplugged in the | 39 | interfaces appropriately as devices are hotplugged in the |
40 | system. | 40 | system. |
41 | 41 | ||
42 | 3. High level interfaces which glue the driver to various published | 42 | 3. High level interfaces which glue the driver to various published |
43 | Linux APIs (V4L, sysfs, maybe DVB in the future). | 43 | Linux APIs (V4L, sysfs, maybe DVB in the future). |
44 | 44 | ||
45 | The most important shearing layer is between the top 2 layers. A | 45 | The most important shearing layer is between the top 2 layers. A |
46 | lot of work went into the driver to ensure that any kind of | 46 | lot of work went into the driver to ensure that any kind of |
47 | conceivable API can be laid on top of the core driver. (Yes, the | 47 | conceivable API can be laid on top of the core driver. (Yes, the |
48 | driver internally leverages V4L to do its work but that really has | 48 | driver internally leverages V4L to do its work but that really has |
49 | nothing to do with the API published by the driver to the outside | 49 | nothing to do with the API published by the driver to the outside |
50 | world.) The architecture allows for different APIs to | 50 | world.) The architecture allows for different APIs to |
51 | simultaneously access the driver. I have a strong sense of fairness | 51 | simultaneously access the driver. I have a strong sense of fairness |
52 | about APIs and also feel that it is a good design principle to keep | 52 | about APIs and also feel that it is a good design principle to keep |
53 | implementation and interface isolated from each other. Thus while | 53 | implementation and interface isolated from each other. Thus while |
54 | right now the V4L high level interface is the most complete, the | 54 | right now the V4L high level interface is the most complete, the |
55 | sysfs high level interface will work equally well for similar | 55 | sysfs high level interface will work equally well for similar |
56 | functions, and there's no reason I see right now why it shouldn't be | 56 | functions, and there's no reason I see right now why it shouldn't be |
57 | possible to produce a DVB high level interface that can sit right | 57 | possible to produce a DVB high level interface that can sit right |
58 | alongside V4L. | 58 | alongside V4L. |
59 | 59 | ||
60 | NOTE: Complete documentation on the pvrusb2 driver is contained in | 60 | NOTE: Complete documentation on the pvrusb2 driver is contained in |
61 | the html files within the doc directory; these are exactly the same | 61 | the html files within the doc directory; these are exactly the same |
62 | as what is on the web site at the time. Browse those files | 62 | as what is on the web site at the time. Browse those files |
63 | (especially the FAQ) before asking questions. | 63 | (especially the FAQ) before asking questions. |
64 | 64 | ||
65 | 65 | ||
66 | Building | 66 | Building |
67 | 67 | ||
68 | To build these modules essentially amounts to just running "Make", | 68 | To build these modules essentially amounts to just running "Make", |
69 | but you need the kernel source tree nearby and you will likely also | 69 | but you need the kernel source tree nearby and you will likely also |
70 | want to set a few controlling environment variables first in order | 70 | want to set a few controlling environment variables first in order |
71 | to link things up with that source tree. Please see the Makefile | 71 | to link things up with that source tree. Please see the Makefile |
72 | here for comments that explain how to do that. | 72 | here for comments that explain how to do that. |
73 | 73 | ||
74 | 74 | ||
75 | Source file list / functional overview: | 75 | Source file list / functional overview: |
76 | 76 | ||
77 | (Note: The term "module" used below generally refers to loosely | 77 | (Note: The term "module" used below generally refers to loosely |
78 | defined functional units within the pvrusb2 driver and bears no | 78 | defined functional units within the pvrusb2 driver and bears no |
79 | relation to the Linux kernel's concept of a loadable module.) | 79 | relation to the Linux kernel's concept of a loadable module.) |
80 | 80 | ||
81 | pvrusb2-audio.[ch] - This is glue logic that resides between this | 81 | pvrusb2-audio.[ch] - This is glue logic that resides between this |
82 | driver and the msp3400.ko I2C client driver (which is found | 82 | driver and the msp3400.ko I2C client driver (which is found |
83 | elsewhere in V4L). | 83 | elsewhere in V4L). |
84 | 84 | ||
85 | pvrusb2-context.[ch] - This module implements the context for an | 85 | pvrusb2-context.[ch] - This module implements the context for an |
86 | instance of the driver. Everything else eventually ties back to | 86 | instance of the driver. Everything else eventually ties back to |
87 | or is otherwise instanced within the data structures implemented | 87 | or is otherwise instanced within the data structures implemented |
88 | here. Hotplugging is ultimately coordinated here. All high level | 88 | here. Hotplugging is ultimately coordinated here. All high level |
89 | interfaces tie into the driver through this module. This module | 89 | interfaces tie into the driver through this module. This module |
90 | helps arbitrate each interface's access to the actual driver core, | 90 | helps arbitrate each interface's access to the actual driver core, |
91 | and is designed to allow concurrent access through multiple | 91 | and is designed to allow concurrent access through multiple |
92 | instances of multiple interfaces (thus you can for example change | 92 | instances of multiple interfaces (thus you can for example change |
93 | the tuner's frequency through sysfs while simultaneously streaming | 93 | the tuner's frequency through sysfs while simultaneously streaming |
94 | video through V4L out to an instance of mplayer). | 94 | video through V4L out to an instance of mplayer). |
95 | 95 | ||
96 | pvrusb2-debug.h - This header defines a printk() wrapper and a mask | 96 | pvrusb2-debug.h - This header defines a printk() wrapper and a mask |
97 | of debugging bit definitions for the various kinds of debug | 97 | of debugging bit definitions for the various kinds of debug |
98 | messages that can be enabled within the driver. | 98 | messages that can be enabled within the driver. |
99 | 99 | ||
100 | pvrusb2-debugifc.[ch] - This module implements a crude command line | 100 | pvrusb2-debugifc.[ch] - This module implements a crude command line |
101 | oriented debug interface into the driver. Aside from being part | 101 | oriented debug interface into the driver. Aside from being part |
102 | of the process for implementing manual firmware extraction (see | 102 | of the process for implementing manual firmware extraction (see |
103 | the pvrusb2 web site mentioned earlier), probably I'm the only one | 103 | the pvrusb2 web site mentioned earlier), probably I'm the only one |
104 | who has ever used this. It is mainly a debugging aid. | 104 | who has ever used this. It is mainly a debugging aid. |
105 | 105 | ||
106 | pvrusb2-eeprom.[ch] - This is glue logic that resides between this | 106 | pvrusb2-eeprom.[ch] - This is glue logic that resides between this |
107 | driver the tveeprom.ko module, which is itself implemented | 107 | driver the tveeprom.ko module, which is itself implemented |
108 | elsewhere in V4L. | 108 | elsewhere in V4L. |
109 | 109 | ||
110 | pvrusb2-encoder.[ch] - This module implements all protocol needed to | 110 | pvrusb2-encoder.[ch] - This module implements all protocol needed to |
111 | interact with the Conexant mpeg2 encoder chip within the pvrusb2 | 111 | interact with the Conexant mpeg2 encoder chip within the pvrusb2 |
112 | device. It is a crude echo of corresponding logic in ivtv, | 112 | device. It is a crude echo of corresponding logic in ivtv, |
113 | however the design goals (strict isolation) and physical layer | 113 | however the design goals (strict isolation) and physical layer |
114 | (proxy through USB instead of PCI) are enough different that this | 114 | (proxy through USB instead of PCI) are enough different that this |
115 | implementation had to be completely different. | 115 | implementation had to be completely different. |
116 | 116 | ||
117 | pvrusb2-hdw-internal.h - This header defines the core data structure | 117 | pvrusb2-hdw-internal.h - This header defines the core data structure |
118 | in the driver used to track ALL internal state related to control | 118 | in the driver used to track ALL internal state related to control |
119 | of the hardware. Nobody outside of the core hardware-handling | 119 | of the hardware. Nobody outside of the core hardware-handling |
120 | modules should have any business using this header. All external | 120 | modules should have any business using this header. All external |
121 | access to the driver should be through one of the high level | 121 | access to the driver should be through one of the high level |
122 | interfaces (e.g. V4L, sysfs, etc), and in fact even those high | 122 | interfaces (e.g. V4L, sysfs, etc), and in fact even those high |
123 | level interfaces are restricted to the API defined in | 123 | level interfaces are restricted to the API defined in |
124 | pvrusb2-hdw.h and NOT this header. | 124 | pvrusb2-hdw.h and NOT this header. |
125 | 125 | ||
126 | pvrusb2-hdw.h - This header defines the full internal API for | 126 | pvrusb2-hdw.h - This header defines the full internal API for |
127 | controlling the hardware. High level interfaces (e.g. V4L, sysfs) | 127 | controlling the hardware. High level interfaces (e.g. V4L, sysfs) |
128 | will work through here. | 128 | will work through here. |
129 | 129 | ||
130 | pvrusb2-hdw.c - This module implements all the various bits of logic | 130 | pvrusb2-hdw.c - This module implements all the various bits of logic |
131 | that handle overall control of a specific pvrusb2 device. | 131 | that handle overall control of a specific pvrusb2 device. |
132 | (Policy, instantiation, and arbitration of pvrusb2 devices fall | 132 | (Policy, instantiation, and arbitration of pvrusb2 devices fall |
133 | within the jurisdiction of pvrusb-context not here). | 133 | within the jurisdiction of pvrusb-context not here). |
134 | 134 | ||
135 | pvrusb2-i2c-chips-*.c - These modules implement the glue logic to | 135 | pvrusb2-i2c-chips-*.c - These modules implement the glue logic to |
136 | tie together and configure various I2C modules as they attach to | 136 | tie together and configure various I2C modules as they attach to |
137 | the I2C bus. There are two versions of this file. The "v4l2" | 137 | the I2C bus. There are two versions of this file. The "v4l2" |
138 | version is intended to be used in-tree alongside V4L, where we | 138 | version is intended to be used in-tree alongside V4L, where we |
139 | implement just the logic that makes sense for a pure V4L | 139 | implement just the logic that makes sense for a pure V4L |
140 | environment. The "all" version is intended for use outside of | 140 | environment. The "all" version is intended for use outside of |
141 | V4L, where we might encounter other possibly "challenging" modules | 141 | V4L, where we might encounter other possibly "challenging" modules |
142 | from ivtv or older kernel snapshots (or even the support modules | 142 | from ivtv or older kernel snapshots (or even the support modules |
143 | in the standalone snapshot). | 143 | in the standalone snapshot). |
144 | 144 | ||
145 | pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1 | 145 | pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1 |
146 | compatible commands to the I2C modules. It is here where state | 146 | compatible commands to the I2C modules. It is here where state |
147 | changes inside the pvrusb2 driver are translated into V4L1 | 147 | changes inside the pvrusb2 driver are translated into V4L1 |
148 | commands that are in turn send to the various I2C modules. | 148 | commands that are in turn send to the various I2C modules. |
149 | 149 | ||
150 | pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2 | 150 | pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2 |
151 | compatible commands to the I2C modules. It is here where state | 151 | compatible commands to the I2C modules. It is here where state |
152 | changes inside the pvrusb2 driver are translated into V4L2 | 152 | changes inside the pvrusb2 driver are translated into V4L2 |
153 | commands that are in turn send to the various I2C modules. | 153 | commands that are in turn send to the various I2C modules. |
154 | 154 | ||
155 | pvrusb2-i2c-core.[ch] - This module provides an implementation of a | 155 | pvrusb2-i2c-core.[ch] - This module provides an implementation of a |
156 | kernel-friendly I2C adaptor driver, through which other external | 156 | kernel-friendly I2C adaptor driver, through which other external |
157 | I2C client drivers (e.g. msp3400, tuner, lirc) may connect and | 157 | I2C client drivers (e.g. msp3400, tuner, lirc) may connect and |
158 | operate corresponding chips within the the pvrusb2 device. It is | 158 | operate corresponding chips within the pvrusb2 device. It is |
159 | through here that other V4L modules can reach into this driver to | 159 | through here that other V4L modules can reach into this driver to |
160 | operate specific pieces (and those modules are in turn driven by | 160 | operate specific pieces (and those modules are in turn driven by |
161 | glue logic which is coordinated by pvrusb2-hdw, doled out by | 161 | glue logic which is coordinated by pvrusb2-hdw, doled out by |
162 | pvrusb2-context, and then ultimately made available to users | 162 | pvrusb2-context, and then ultimately made available to users |
163 | through one of the high level interfaces). | 163 | through one of the high level interfaces). |
164 | 164 | ||
165 | pvrusb2-io.[ch] - This module implements a very low level ring of | 165 | pvrusb2-io.[ch] - This module implements a very low level ring of |
166 | transfer buffers, required in order to stream data from the | 166 | transfer buffers, required in order to stream data from the |
167 | device. This module is *very* low level. It only operates the | 167 | device. This module is *very* low level. It only operates the |
168 | buffers and makes no attempt to define any policy or mechanism for | 168 | buffers and makes no attempt to define any policy or mechanism for |
169 | how such buffers might be used. | 169 | how such buffers might be used. |
170 | 170 | ||
171 | pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch] | 171 | pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch] |
172 | to provide a streaming API usable by a read() system call style of | 172 | to provide a streaming API usable by a read() system call style of |
173 | I/O. Right now this is the only layer on top of pvrusb2-io.[ch], | 173 | I/O. Right now this is the only layer on top of pvrusb2-io.[ch], |
174 | however the underlying architecture here was intended to allow for | 174 | however the underlying architecture here was intended to allow for |
175 | other styles of I/O to be implemented with additonal modules, like | 175 | other styles of I/O to be implemented with additonal modules, like |
176 | mmap()'ed buffers or something even more exotic. | 176 | mmap()'ed buffers or something even more exotic. |
177 | 177 | ||
178 | pvrusb2-main.c - This is the top level of the driver. Module level | 178 | pvrusb2-main.c - This is the top level of the driver. Module level |
179 | and USB core entry points are here. This is our "main". | 179 | and USB core entry points are here. This is our "main". |
180 | 180 | ||
181 | pvrusb2-sysfs.[ch] - This is the high level interface which ties the | 181 | pvrusb2-sysfs.[ch] - This is the high level interface which ties the |
182 | pvrusb2 driver into sysfs. Through this interface you can do | 182 | pvrusb2 driver into sysfs. Through this interface you can do |
183 | everything with the driver except actually stream data. | 183 | everything with the driver except actually stream data. |
184 | 184 | ||
185 | pvrusb2-tuner.[ch] - This is glue logic that resides between this | 185 | pvrusb2-tuner.[ch] - This is glue logic that resides between this |
186 | driver and the tuner.ko I2C client driver (which is found | 186 | driver and the tuner.ko I2C client driver (which is found |
187 | elsewhere in V4L). | 187 | elsewhere in V4L). |
188 | 188 | ||
189 | pvrusb2-util.h - This header defines some common macros used | 189 | pvrusb2-util.h - This header defines some common macros used |
190 | throughout the driver. These macros are not really specific to | 190 | throughout the driver. These macros are not really specific to |
191 | the driver, but they had to go somewhere. | 191 | the driver, but they had to go somewhere. |
192 | 192 | ||
193 | pvrusb2-v4l2.[ch] - This is the high level interface which ties the | 193 | pvrusb2-v4l2.[ch] - This is the high level interface which ties the |
194 | pvrusb2 driver into video4linux. It is through here that V4L | 194 | pvrusb2 driver into video4linux. It is through here that V4L |
195 | applications can open and operate the driver in the usual V4L | 195 | applications can open and operate the driver in the usual V4L |
196 | ways. Note that **ALL** V4L functionality is published only | 196 | ways. Note that **ALL** V4L functionality is published only |
197 | through here and nowhere else. | 197 | through here and nowhere else. |
198 | 198 | ||
199 | pvrusb2-video-*.[ch] - This is glue logic that resides between this | 199 | pvrusb2-video-*.[ch] - This is glue logic that resides between this |
200 | driver and the saa711x.ko I2C client driver (which is found | 200 | driver and the saa711x.ko I2C client driver (which is found |
201 | elsewhere in V4L). Note that saa711x.ko used to be known as | 201 | elsewhere in V4L). Note that saa711x.ko used to be known as |
202 | saa7115.ko in ivtv. There are two versions of this; one is | 202 | saa7115.ko in ivtv. There are two versions of this; one is |
203 | selected depending on the particular saa711[5x].ko that is found. | 203 | selected depending on the particular saa711[5x].ko that is found. |
204 | 204 | ||
205 | pvrusb2.h - This header contains compile time tunable parameters | 205 | pvrusb2.h - This header contains compile time tunable parameters |
206 | (and at the moment the driver has very little that needs to be | 206 | (and at the moment the driver has very little that needs to be |
207 | tuned). | 207 | tuned). |
208 | 208 | ||
209 | 209 | ||
210 | -Mike Isely | 210 | -Mike Isely |
211 | isely@pobox.com | 211 | isely@pobox.com |
212 | 212 | ||
213 | 213 |
Documentation/video4linux/Zoran
1 | Frequently Asked Questions: | 1 | Frequently Asked Questions: |
2 | =========================== | 2 | =========================== |
3 | subject: unified zoran driver (zr360x7, zoran, buz, dc10(+), dc30(+), lml33) | 3 | subject: unified zoran driver (zr360x7, zoran, buz, dc10(+), dc30(+), lml33) |
4 | website: http://mjpeg.sourceforge.net/driver-zoran/ | 4 | website: http://mjpeg.sourceforge.net/driver-zoran/ |
5 | 5 | ||
6 | 1. What cards are supported | 6 | 1. What cards are supported |
7 | 1.1 What the TV decoder can do an what not | 7 | 1.1 What the TV decoder can do an what not |
8 | 1.2 What the TV encoder can do an what not | 8 | 1.2 What the TV encoder can do an what not |
9 | 2. How do I get this damn thing to work | 9 | 2. How do I get this damn thing to work |
10 | 3. What mainboard should I use (or why doesn't my card work) | 10 | 3. What mainboard should I use (or why doesn't my card work) |
11 | 4. Programming interface | 11 | 4. Programming interface |
12 | 5. Applications | 12 | 5. Applications |
13 | 6. Concerning buffer sizes, quality, output size etc. | 13 | 6. Concerning buffer sizes, quality, output size etc. |
14 | 7. It hangs/crashes/fails/whatevers! Help! | 14 | 7. It hangs/crashes/fails/whatevers! Help! |
15 | 8. Maintainers/Contacting | 15 | 8. Maintainers/Contacting |
16 | 9. License | 16 | 9. License |
17 | 17 | ||
18 | =========================== | 18 | =========================== |
19 | 19 | ||
20 | 1. What cards are supported | 20 | 1. What cards are supported |
21 | 21 | ||
22 | Iomega Buz, Linux Media Labs LML33/LML33R10, Pinnacle/Miro | 22 | Iomega Buz, Linux Media Labs LML33/LML33R10, Pinnacle/Miro |
23 | DC10/DC10+/DC30/DC30+ and related boards (available under various names). | 23 | DC10/DC10+/DC30/DC30+ and related boards (available under various names). |
24 | 24 | ||
25 | Iomega Buz: | 25 | Iomega Buz: |
26 | * Zoran zr36067 PCI controller | 26 | * Zoran zr36067 PCI controller |
27 | * Zoran zr36060 MJPEG codec | 27 | * Zoran zr36060 MJPEG codec |
28 | * Philips saa7111 TV decoder | 28 | * Philips saa7111 TV decoder |
29 | * Philips saa7185 TV encoder | 29 | * Philips saa7185 TV encoder |
30 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 30 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
31 | videocodec, saa7111, saa7185, zr36060, zr36067 | 31 | videocodec, saa7111, saa7185, zr36060, zr36067 |
32 | Inputs/outputs: Composite and S-video | 32 | Inputs/outputs: Composite and S-video |
33 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | 33 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) |
34 | Card number: 7 | 34 | Card number: 7 |
35 | 35 | ||
36 | AverMedia 6 Eyes AVS6EYES: | 36 | AverMedia 6 Eyes AVS6EYES: |
37 | * Zoran zr36067 PCI controller | 37 | * Zoran zr36067 PCI controller |
38 | * Zoran zr36060 MJPEG codec | 38 | * Zoran zr36060 MJPEG codec |
39 | * Samsung ks0127 TV decoder | 39 | * Samsung ks0127 TV decoder |
40 | * Conexant bt866 TV encoder | 40 | * Conexant bt866 TV encoder |
41 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 41 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
42 | videocodec, ks0127, bt866, zr36060, zr36067 | 42 | videocodec, ks0127, bt866, zr36060, zr36067 |
43 | Inputs/outputs: Six physical inputs. 1-6 are composite, | 43 | Inputs/outputs: Six physical inputs. 1-6 are composite, |
44 | 1-2, 3-4, 5-6 doubles as S-video, | 44 | 1-2, 3-4, 5-6 doubles as S-video, |
45 | 1-3 triples as component. | 45 | 1-3 triples as component. |
46 | One composite output. | 46 | One composite output. |
47 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | 47 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) |
48 | Card number: 8 | 48 | Card number: 8 |
49 | Not autodetected, card=8 is necessary. | 49 | Not autodetected, card=8 is necessary. |
50 | 50 | ||
51 | Linux Media Labs LML33: | 51 | Linux Media Labs LML33: |
52 | * Zoran zr36067 PCI controller | 52 | * Zoran zr36067 PCI controller |
53 | * Zoran zr36060 MJPEG codec | 53 | * Zoran zr36060 MJPEG codec |
54 | * Brooktree bt819 TV decoder | 54 | * Brooktree bt819 TV decoder |
55 | * Brooktree bt856 TV encoder | 55 | * Brooktree bt856 TV encoder |
56 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 56 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
57 | videocodec, bt819, bt856, zr36060, zr36067 | 57 | videocodec, bt819, bt856, zr36060, zr36067 |
58 | Inputs/outputs: Composite and S-video | 58 | Inputs/outputs: Composite and S-video |
59 | Norms: PAL (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | 59 | Norms: PAL (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) |
60 | Card number: 5 | 60 | Card number: 5 |
61 | 61 | ||
62 | Linux Media Labs LML33R10: | 62 | Linux Media Labs LML33R10: |
63 | * Zoran zr36067 PCI controller | 63 | * Zoran zr36067 PCI controller |
64 | * Zoran zr36060 MJPEG codec | 64 | * Zoran zr36060 MJPEG codec |
65 | * Philips saa7114 TV decoder | 65 | * Philips saa7114 TV decoder |
66 | * Analog Devices adv7170 TV encoder | 66 | * Analog Devices adv7170 TV encoder |
67 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 67 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
68 | videocodec, saa7114, adv7170, zr36060, zr36067 | 68 | videocodec, saa7114, adv7170, zr36060, zr36067 |
69 | Inputs/outputs: Composite and S-video | 69 | Inputs/outputs: Composite and S-video |
70 | Norms: PAL (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | 70 | Norms: PAL (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) |
71 | Card number: 6 | 71 | Card number: 6 |
72 | 72 | ||
73 | Pinnacle/Miro DC10(new): | 73 | Pinnacle/Miro DC10(new): |
74 | * Zoran zr36057 PCI controller | 74 | * Zoran zr36057 PCI controller |
75 | * Zoran zr36060 MJPEG codec | 75 | * Zoran zr36060 MJPEG codec |
76 | * Philips saa7110a TV decoder | 76 | * Philips saa7110a TV decoder |
77 | * Analog Devices adv7176 TV encoder | 77 | * Analog Devices adv7176 TV encoder |
78 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 78 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
79 | videocodec, saa7110, adv7175, zr36060, zr36067 | 79 | videocodec, saa7110, adv7175, zr36060, zr36067 |
80 | Inputs/outputs: Composite, S-video and Internal | 80 | Inputs/outputs: Composite, S-video and Internal |
81 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) | 81 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) |
82 | Card number: 1 | 82 | Card number: 1 |
83 | 83 | ||
84 | Pinnacle/Miro DC10+: | 84 | Pinnacle/Miro DC10+: |
85 | * Zoran zr36067 PCI controller | 85 | * Zoran zr36067 PCI controller |
86 | * Zoran zr36060 MJPEG codec | 86 | * Zoran zr36060 MJPEG codec |
87 | * Philips saa7110a TV decoder | 87 | * Philips saa7110a TV decoder |
88 | * Analog Devices adv7176 TV encoder | 88 | * Analog Devices adv7176 TV encoder |
89 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 89 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
90 | videocodec, sa7110, adv7175, zr36060, zr36067 | 90 | videocodec, sa7110, adv7175, zr36060, zr36067 |
91 | Inputs/outputs: Composite, S-video and Internal | 91 | Inputs/outputs: Composite, S-video and Internal |
92 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) | 92 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) |
93 | Card number: 2 | 93 | Card number: 2 |
94 | 94 | ||
95 | Pinnacle/Miro DC10(old): * | 95 | Pinnacle/Miro DC10(old): * |
96 | * Zoran zr36057 PCI controller | 96 | * Zoran zr36057 PCI controller |
97 | * Zoran zr36050 MJPEG codec | 97 | * Zoran zr36050 MJPEG codec |
98 | * Zoran zr36016 Video Front End or Fuji md0211 Video Front End (clone?) | 98 | * Zoran zr36016 Video Front End or Fuji md0211 Video Front End (clone?) |
99 | * Micronas vpx3220a TV decoder | 99 | * Micronas vpx3220a TV decoder |
100 | * mse3000 TV encoder or Analog Devices adv7176 TV encoder * | 100 | * mse3000 TV encoder or Analog Devices adv7176 TV encoder * |
101 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 101 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
102 | videocodec, vpx3220, mse3000/adv7175, zr36050, zr36016, zr36067 | 102 | videocodec, vpx3220, mse3000/adv7175, zr36050, zr36016, zr36067 |
103 | Inputs/outputs: Composite, S-video and Internal | 103 | Inputs/outputs: Composite, S-video and Internal |
104 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) | 104 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) |
105 | Card number: 0 | 105 | Card number: 0 |
106 | 106 | ||
107 | Pinnacle/Miro DC30: * | 107 | Pinnacle/Miro DC30: * |
108 | * Zoran zr36057 PCI controller | 108 | * Zoran zr36057 PCI controller |
109 | * Zoran zr36050 MJPEG codec | 109 | * Zoran zr36050 MJPEG codec |
110 | * Zoran zr36016 Video Front End | 110 | * Zoran zr36016 Video Front End |
111 | * Micronas vpx3225d/vpx3220a/vpx3216b TV decoder | 111 | * Micronas vpx3225d/vpx3220a/vpx3216b TV decoder |
112 | * Analog Devices adv7176 TV encoder | 112 | * Analog Devices adv7176 TV encoder |
113 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 113 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
114 | videocodec, vpx3220/vpx3224, adv7175, zr36050, zr36016, zr36067 | 114 | videocodec, vpx3220/vpx3224, adv7175, zr36050, zr36016, zr36067 |
115 | Inputs/outputs: Composite, S-video and Internal | 115 | Inputs/outputs: Composite, S-video and Internal |
116 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) | 116 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) |
117 | Card number: 3 | 117 | Card number: 3 |
118 | 118 | ||
119 | Pinnacle/Miro DC30+: * | 119 | Pinnacle/Miro DC30+: * |
120 | * Zoran zr36067 PCI controller | 120 | * Zoran zr36067 PCI controller |
121 | * Zoran zr36050 MJPEG codec | 121 | * Zoran zr36050 MJPEG codec |
122 | * Zoran zr36016 Video Front End | 122 | * Zoran zr36016 Video Front End |
123 | * Micronas vpx3225d/vpx3220a/vpx3216b TV decoder | 123 | * Micronas vpx3225d/vpx3220a/vpx3216b TV decoder |
124 | * Analog Devices adv7176 TV encoder | 124 | * Analog Devices adv7176 TV encoder |
125 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | 125 | Drivers to use: videodev, i2c-core, i2c-algo-bit, |
126 | videocodec, vpx3220/vpx3224, adv7175, zr36050, zr36015, zr36067 | 126 | videocodec, vpx3220/vpx3224, adv7175, zr36050, zr36015, zr36067 |
127 | Inputs/outputs: Composite, S-video and Internal | 127 | Inputs/outputs: Composite, S-video and Internal |
128 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) | 128 | Norms: PAL, SECAM (768x576 @ 25 fps), NTSC (640x480 @ 29.97 fps) |
129 | Card number: 4 | 129 | Card number: 4 |
130 | 130 | ||
131 | Note: No module for the mse3000 is available yet | 131 | Note: No module for the mse3000 is available yet |
132 | Note: No module for the vpx3224 is available yet | 132 | Note: No module for the vpx3224 is available yet |
133 | Note: use encoder=X or decoder=X for non-default i2c chips (see i2c-id.h) | 133 | Note: use encoder=X or decoder=X for non-default i2c chips (see i2c-id.h) |
134 | 134 | ||
135 | =========================== | 135 | =========================== |
136 | 136 | ||
137 | 1.1 What the TV decoder can do an what not | 137 | 1.1 What the TV decoder can do an what not |
138 | 138 | ||
139 | The best know TV standards are NTSC/PAL/SECAM. but for decoding a frame that | 139 | The best know TV standards are NTSC/PAL/SECAM. but for decoding a frame that |
140 | information is not enough. There are several formats of the TV standards. | 140 | information is not enough. There are several formats of the TV standards. |
141 | And not every TV decoder is able to handle every format. Also the every | 141 | And not every TV decoder is able to handle every format. Also the every |
142 | combination is supported by the driver. There are currently 11 different | 142 | combination is supported by the driver. There are currently 11 different |
143 | tv broadcast formats all aver the world. | 143 | tv broadcast formats all aver the world. |
144 | 144 | ||
145 | The CCIR defines parameters needed for broadcasting the signal. | 145 | The CCIR defines parameters needed for broadcasting the signal. |
146 | The CCIR has defined different standards: A,B,D,E,F,G,D,H,I,K,K1,L,M,N,... | 146 | The CCIR has defined different standards: A,B,D,E,F,G,D,H,I,K,K1,L,M,N,... |
147 | The CCIR says not much about about the colorsystem used !!! | 147 | The CCIR says not much about the colorsystem used !!! |
148 | And talking about a colorsystem says not to much about how it is broadcast. | 148 | And talking about a colorsystem says not to much about how it is broadcast. |
149 | 149 | ||
150 | The CCIR standards A,E,F are not used any more. | 150 | The CCIR standards A,E,F are not used any more. |
151 | 151 | ||
152 | When you speak about NTSC, you usually mean the standard: CCIR - M using | 152 | When you speak about NTSC, you usually mean the standard: CCIR - M using |
153 | the NTSC colorsystem which is used in the USA, Japan, Mexico, Canada | 153 | the NTSC colorsystem which is used in the USA, Japan, Mexico, Canada |
154 | and a few others. | 154 | and a few others. |
155 | 155 | ||
156 | When you talk about PAL, you usually mean: CCIR - B/G using the PAL | 156 | When you talk about PAL, you usually mean: CCIR - B/G using the PAL |
157 | colorsystem which is used in many Countries. | 157 | colorsystem which is used in many Countries. |
158 | 158 | ||
159 | When you talk about SECAM, you mean: CCIR - L using the SECAM Colorsystem | 159 | When you talk about SECAM, you mean: CCIR - L using the SECAM Colorsystem |
160 | which is used in France, and a few others. | 160 | which is used in France, and a few others. |
161 | 161 | ||
162 | There the other version of SECAM, CCIR - D/K is used in Bulgaria, China, | 162 | There the other version of SECAM, CCIR - D/K is used in Bulgaria, China, |
163 | Slovakai, Hungary, Korea (Rep.), Poland, Rumania and a others. | 163 | Slovakai, Hungary, Korea (Rep.), Poland, Rumania and a others. |
164 | 164 | ||
165 | The CCIR - H uses the PAL colorsystem (sometimes SECAM) and is used in | 165 | The CCIR - H uses the PAL colorsystem (sometimes SECAM) and is used in |
166 | Egypt, Libya, Sri Lanka, Syrain Arab. Rep. | 166 | Egypt, Libya, Sri Lanka, Syrain Arab. Rep. |
167 | 167 | ||
168 | The CCIR - I uses the PAL colorsystem, and is used in Great Britain, Hong Kong, | 168 | The CCIR - I uses the PAL colorsystem, and is used in Great Britain, Hong Kong, |
169 | Ireland, Nigeria, South Africa. | 169 | Ireland, Nigeria, South Africa. |
170 | 170 | ||
171 | The CCIR - N uses the PAL colorsystem and PAL frame size but the NTSC framerate, | 171 | The CCIR - N uses the PAL colorsystem and PAL frame size but the NTSC framerate, |
172 | and is used in Argentinia, Uruguay, an a few others | 172 | and is used in Argentinia, Uruguay, an a few others |
173 | 173 | ||
174 | We do not talk about how the audio is broadcast ! | 174 | We do not talk about how the audio is broadcast ! |
175 | 175 | ||
176 | A rather good sites about the TV standards are: | 176 | A rather good sites about the TV standards are: |
177 | http://www.sony.jp/ServiceArea/Voltage_map/ | 177 | http://www.sony.jp/ServiceArea/Voltage_map/ |
178 | http://info.electronicwerkstatt.de/bereiche/fernsehtechnik/frequenzen_und_normen/Fernsehnormen/ | 178 | http://info.electronicwerkstatt.de/bereiche/fernsehtechnik/frequenzen_und_normen/Fernsehnormen/ |
179 | and http://www.cabl.com/restaurant/channel.html | 179 | and http://www.cabl.com/restaurant/channel.html |
180 | 180 | ||
181 | Other weird things around: NTSC 4.43 is a modificated NTSC, which is mainly | 181 | Other weird things around: NTSC 4.43 is a modificated NTSC, which is mainly |
182 | used in PAL VCR's that are able to play back NTSC. PAL 60 seems to be the same | 182 | used in PAL VCR's that are able to play back NTSC. PAL 60 seems to be the same |
183 | as NTSC 4.43 . The Datasheets also talk about NTSC 44, It seems as if it would | 183 | as NTSC 4.43 . The Datasheets also talk about NTSC 44, It seems as if it would |
184 | be the same as NTSC 4.43. | 184 | be the same as NTSC 4.43. |
185 | NTSC Combs seems to be a decoder mode where the decoder uses a comb filter | 185 | NTSC Combs seems to be a decoder mode where the decoder uses a comb filter |
186 | to split coma and luma instead of a Delay line. | 186 | to split coma and luma instead of a Delay line. |
187 | 187 | ||
188 | But I did not defiantly find out what NTSC Comb is. | 188 | But I did not defiantly find out what NTSC Comb is. |
189 | 189 | ||
190 | Philips saa7111 TV decoder | 190 | Philips saa7111 TV decoder |
191 | was introduced in 1997, is used in the BUZ and | 191 | was introduced in 1997, is used in the BUZ and |
192 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC N, NTSC 4.43 and SECAM | 192 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC N, NTSC 4.43 and SECAM |
193 | 193 | ||
194 | Philips saa7110a TV decoder | 194 | Philips saa7110a TV decoder |
195 | was introduced in 1995, is used in the Pinnacle/Miro DC10(new), DC10+ and | 195 | was introduced in 1995, is used in the Pinnacle/Miro DC10(new), DC10+ and |
196 | can handle: PAL B/G, NTSC M and SECAM | 196 | can handle: PAL B/G, NTSC M and SECAM |
197 | 197 | ||
198 | Philips saa7114 TV decoder | 198 | Philips saa7114 TV decoder |
199 | was introduced in 2000, is used in the LML33R10 and | 199 | was introduced in 2000, is used in the LML33R10 and |
200 | can handle: PAL B/G/D/H/I/N, PAL N, PAL M, NTSC M, NTSC 4.43 and SECAM | 200 | can handle: PAL B/G/D/H/I/N, PAL N, PAL M, NTSC M, NTSC 4.43 and SECAM |
201 | 201 | ||
202 | Brooktree bt819 TV decoder | 202 | Brooktree bt819 TV decoder |
203 | was introduced in 1996, and is used in the LML33 and | 203 | was introduced in 1996, and is used in the LML33 and |
204 | can handle: PAL B/D/G/H/I, NTSC M | 204 | can handle: PAL B/D/G/H/I, NTSC M |
205 | 205 | ||
206 | Micronas vpx3220a TV decoder | 206 | Micronas vpx3220a TV decoder |
207 | was introduced in 1996, is used in the DC30 and DC30+ and | 207 | was introduced in 1996, is used in the DC30 and DC30+ and |
208 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb | 208 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb |
209 | 209 | ||
210 | Samsung ks0127 TV decoder | 210 | Samsung ks0127 TV decoder |
211 | is used in the AVS6EYES card and | 211 | is used in the AVS6EYES card and |
212 | can handle: NTSC-M/N/44, PAL-M/N/B/G/H/I/D/K/L and SECAM | 212 | can handle: NTSC-M/N/44, PAL-M/N/B/G/H/I/D/K/L and SECAM |
213 | 213 | ||
214 | =========================== | 214 | =========================== |
215 | 215 | ||
216 | 1.2 What the TV encoder can do an what not | 216 | 1.2 What the TV encoder can do an what not |
217 | 217 | ||
218 | The TV encoder are doing the "same" as the decoder, but in the oder direction. | 218 | The TV encoder are doing the "same" as the decoder, but in the oder direction. |
219 | You feed them digital data and the generate a Composite or SVHS signal. | 219 | You feed them digital data and the generate a Composite or SVHS signal. |
220 | For information about the colorsystems and TV norm take a look in the | 220 | For information about the colorsystems and TV norm take a look in the |
221 | TV decoder section. | 221 | TV decoder section. |
222 | 222 | ||
223 | Philips saa7185 TV Encoder | 223 | Philips saa7185 TV Encoder |
224 | was introduced in 1996, is used in the BUZ | 224 | was introduced in 1996, is used in the BUZ |
225 | can generate: PAL B/G, NTSC M | 225 | can generate: PAL B/G, NTSC M |
226 | 226 | ||
227 | Brooktree bt856 TV Encoder | 227 | Brooktree bt856 TV Encoder |
228 | was introduced in 1994, is used in the LML33 | 228 | was introduced in 1994, is used in the LML33 |
229 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M, PAL-N (Argentina) | 229 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M, PAL-N (Argentina) |
230 | 230 | ||
231 | Analog Devices adv7170 TV Encoder | 231 | Analog Devices adv7170 TV Encoder |
232 | was introduced in 2000, is used in the LML300R10 | 232 | was introduced in 2000, is used in the LML300R10 |
233 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M, PAL 60 | 233 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M, PAL 60 |
234 | 234 | ||
235 | Analog Devices adv7175 TV Encoder | 235 | Analog Devices adv7175 TV Encoder |
236 | was introduced in 1996, is used in the DC10, DC10+, DC10 old, DC30, DC30+ | 236 | was introduced in 1996, is used in the DC10, DC10+, DC10 old, DC30, DC30+ |
237 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M | 237 | can generate: PAL B/D/G/H/I/N, PAL M, NTSC M |
238 | 238 | ||
239 | ITT mse3000 TV encoder | 239 | ITT mse3000 TV encoder |
240 | was introduced in 1991, is used in the DC10 old | 240 | was introduced in 1991, is used in the DC10 old |
241 | can generate: PAL , NTSC , SECAM | 241 | can generate: PAL , NTSC , SECAM |
242 | 242 | ||
243 | Conexant bt866 TV encoder | 243 | Conexant bt866 TV encoder |
244 | is used in AVS6EYES, and | 244 | is used in AVS6EYES, and |
245 | can generate: NTSC/PAL, PALยญM, PALยญN | 245 | can generate: NTSC/PAL, PALยญM, PALยญN |
246 | 246 | ||
247 | The adv717x, should be able to produce PAL N. But you find nothing PAL N | 247 | The adv717x, should be able to produce PAL N. But you find nothing PAL N |
248 | specific in the registers. Seem that you have to reuse a other standard | 248 | specific in the registers. Seem that you have to reuse a other standard |
249 | to generate PAL N, maybe it would work if you use the PAL M settings. | 249 | to generate PAL N, maybe it would work if you use the PAL M settings. |
250 | 250 | ||
251 | ========================== | 251 | ========================== |
252 | 252 | ||
253 | 2. How do I get this damn thing to work | 253 | 2. How do I get this damn thing to work |
254 | 254 | ||
255 | Load zr36067.o. If it can't autodetect your card, use the card=X insmod | 255 | Load zr36067.o. If it can't autodetect your card, use the card=X insmod |
256 | option with X being the card number as given in the previous section. | 256 | option with X being the card number as given in the previous section. |
257 | To have more than one card, use card=X1[,X2[,X3,[X4[..]]]] | 257 | To have more than one card, use card=X1[,X2[,X3,[X4[..]]]] |
258 | 258 | ||
259 | To automate this, add the following to your /etc/modprobe.conf: | 259 | To automate this, add the following to your /etc/modprobe.conf: |
260 | 260 | ||
261 | options zr36067 card=X1[,X2[,X3[,X4[..]]]] | 261 | options zr36067 card=X1[,X2[,X3[,X4[..]]]] |
262 | alias char-major-81-0 zr36067 | 262 | alias char-major-81-0 zr36067 |
263 | 263 | ||
264 | One thing to keep in mind is that this doesn't load zr36067.o itself yet. It | 264 | One thing to keep in mind is that this doesn't load zr36067.o itself yet. It |
265 | just automates loading. If you start using xawtv, the device won't load on | 265 | just automates loading. If you start using xawtv, the device won't load on |
266 | some systems, since you're trying to load modules as a user, which is not | 266 | some systems, since you're trying to load modules as a user, which is not |
267 | allowed ("permission denied"). A quick workaround is to add 'Load "v4l"' to | 267 | allowed ("permission denied"). A quick workaround is to add 'Load "v4l"' to |
268 | XF86Config-4 when you use X by default, or to run 'v4l-conf -c <device>' in | 268 | XF86Config-4 when you use X by default, or to run 'v4l-conf -c <device>' in |
269 | one of your startup scripts (normally rc.local) if you don't use X. Both | 269 | one of your startup scripts (normally rc.local) if you don't use X. Both |
270 | make sure that the modules are loaded on startup, under the root account. | 270 | make sure that the modules are loaded on startup, under the root account. |
271 | 271 | ||
272 | =========================== | 272 | =========================== |
273 | 273 | ||
274 | 3. What mainboard should I use (or why doesn't my card work) | 274 | 3. What mainboard should I use (or why doesn't my card work) |
275 | 275 | ||
276 | <insert lousy disclaimer here>. In short: good=SiS/Intel, bad=VIA. | 276 | <insert lousy disclaimer here>. In short: good=SiS/Intel, bad=VIA. |
277 | 277 | ||
278 | Experience tells us that people with a Buz, on average, have more problems | 278 | Experience tells us that people with a Buz, on average, have more problems |
279 | than users with a DC10+/LML33. Also, it tells us that people owning a VIA- | 279 | than users with a DC10+/LML33. Also, it tells us that people owning a VIA- |
280 | based mainboard (ktXXX, MVP3) have more problems than users with a mainboard | 280 | based mainboard (ktXXX, MVP3) have more problems than users with a mainboard |
281 | based on a different chipset. Here's some notes from Andrew Stevens: | 281 | based on a different chipset. Here's some notes from Andrew Stevens: |
282 | -- | 282 | -- |
283 | Here's my experience of using LML33 and Buz on various motherboards: | 283 | Here's my experience of using LML33 and Buz on various motherboards: |
284 | 284 | ||
285 | VIA MVP3 | 285 | VIA MVP3 |
286 | Forget it. Pointless. Doesn't work. | 286 | Forget it. Pointless. Doesn't work. |
287 | Intel 430FX (Pentium 200) | 287 | Intel 430FX (Pentium 200) |
288 | LML33 perfect, Buz tolerable (3 or 4 frames dropped per movie) | 288 | LML33 perfect, Buz tolerable (3 or 4 frames dropped per movie) |
289 | Intel 440BX (early stepping) | 289 | Intel 440BX (early stepping) |
290 | LML33 tolerable. Buz starting to get annoying (6-10 frames/hour) | 290 | LML33 tolerable. Buz starting to get annoying (6-10 frames/hour) |
291 | Intel 440BX (late stepping) | 291 | Intel 440BX (late stepping) |
292 | Buz tolerable, LML3 almost perfect (occasional single frame drops) | 292 | Buz tolerable, LML3 almost perfect (occasional single frame drops) |
293 | SiS735 | 293 | SiS735 |
294 | LML33 perfect, Buz tolerable. | 294 | LML33 perfect, Buz tolerable. |
295 | VIA KT133(*) | 295 | VIA KT133(*) |
296 | LML33 starting to get annoying, Buz poor enough that I have up. | 296 | LML33 starting to get annoying, Buz poor enough that I have up. |
297 | 297 | ||
298 | Both 440BX boards were dual CPU versions. | 298 | Both 440BX boards were dual CPU versions. |
299 | -- | 299 | -- |
300 | Bernhard Praschinger later added: | 300 | Bernhard Praschinger later added: |
301 | -- | 301 | -- |
302 | AMD 751 | 302 | AMD 751 |
303 | Buz perfect-tolerable | 303 | Buz perfect-tolerable |
304 | AMD 760 | 304 | AMD 760 |
305 | Buz perfect-tolerable | 305 | Buz perfect-tolerable |
306 | -- | 306 | -- |
307 | In general, people on the user mailinglist won't give you much of a chance | 307 | In general, people on the user mailinglist won't give you much of a chance |
308 | if you have a VIA-based motherboard. They may be cheap, but sometimes, you'd | 308 | if you have a VIA-based motherboard. They may be cheap, but sometimes, you'd |
309 | rather want to spend some more money on better boards. In general, VIA | 309 | rather want to spend some more money on better boards. In general, VIA |
310 | mainboard's IDE/PCI performance will also suck badly compared to others. | 310 | mainboard's IDE/PCI performance will also suck badly compared to others. |
311 | You'll noticed the DC10+/DC30+ aren't mentioned anywhere in the overview. | 311 | You'll noticed the DC10+/DC30+ aren't mentioned anywhere in the overview. |
312 | Basically, you can assume that if the Buz works, the LML33 will work too. If | 312 | Basically, you can assume that if the Buz works, the LML33 will work too. If |
313 | the LML33 works, the DC10+/DC30+ will work too. They're most tolerant to | 313 | the LML33 works, the DC10+/DC30+ will work too. They're most tolerant to |
314 | different mainboard chipsets from all of the supported cards. | 314 | different mainboard chipsets from all of the supported cards. |
315 | 315 | ||
316 | If you experience timeouts during capture, buy a better mainboard or lower | 316 | If you experience timeouts during capture, buy a better mainboard or lower |
317 | the quality/buffersize during capture (see 'Concerning buffer sizes, quality, | 317 | the quality/buffersize during capture (see 'Concerning buffer sizes, quality, |
318 | output size etc.'). If it hangs, there's little we can do as of now. Check | 318 | output size etc.'). If it hangs, there's little we can do as of now. Check |
319 | your IRQs and make sure the card has its own interrupts. | 319 | your IRQs and make sure the card has its own interrupts. |
320 | 320 | ||
321 | =========================== | 321 | =========================== |
322 | 322 | ||
323 | 4. Programming interface | 323 | 4. Programming interface |
324 | 324 | ||
325 | This driver conforms to video4linux and video4linux2, both can be used to | 325 | This driver conforms to video4linux and video4linux2, both can be used to |
326 | use the driver. Since video4linux didn't provide adequate calls to fully | 326 | use the driver. Since video4linux didn't provide adequate calls to fully |
327 | use the cards' features, we've introduced several programming extensions, | 327 | use the cards' features, we've introduced several programming extensions, |
328 | which are currently officially accepted in the 2.4.x branch of the kernel. | 328 | which are currently officially accepted in the 2.4.x branch of the kernel. |
329 | These extensions are known as the v4l/mjpeg extensions. See zoran.h for | 329 | These extensions are known as the v4l/mjpeg extensions. See zoran.h for |
330 | details (structs/ioctls). | 330 | details (structs/ioctls). |
331 | 331 | ||
332 | Information - video4linux: | 332 | Information - video4linux: |
333 | http://roadrunner.swansea.linux.org.uk/v4lapi.shtml | 333 | http://roadrunner.swansea.linux.org.uk/v4lapi.shtml |
334 | Documentation/video4linux/API.html | 334 | Documentation/video4linux/API.html |
335 | /usr/include/linux/videodev.h | 335 | /usr/include/linux/videodev.h |
336 | 336 | ||
337 | Information - video4linux/mjpeg extensions: | 337 | Information - video4linux/mjpeg extensions: |
338 | ./zoran.h | 338 | ./zoran.h |
339 | (also see below) | 339 | (also see below) |
340 | 340 | ||
341 | Information - video4linux2: | 341 | Information - video4linux2: |
342 | http://www.thedirks.org/v4l2/ | 342 | http://www.thedirks.org/v4l2/ |
343 | /usr/include/linux/videodev2.h | 343 | /usr/include/linux/videodev2.h |
344 | http://www.bytesex.org/v4l/ | 344 | http://www.bytesex.org/v4l/ |
345 | 345 | ||
346 | More information on the video4linux/mjpeg extensions, by Serguei | 346 | More information on the video4linux/mjpeg extensions, by Serguei |
347 | Miridonovi and Rainer Johanni: | 347 | Miridonovi and Rainer Johanni: |
348 | -- | 348 | -- |
349 | The ioctls for that interface are as follows: | 349 | The ioctls for that interface are as follows: |
350 | 350 | ||
351 | BUZIOC_G_PARAMS | 351 | BUZIOC_G_PARAMS |
352 | BUZIOC_S_PARAMS | 352 | BUZIOC_S_PARAMS |
353 | 353 | ||
354 | Get and set the parameters of the buz. The user should always do a | 354 | Get and set the parameters of the buz. The user should always do a |
355 | BUZIOC_G_PARAMS (with a struct buz_params) to obtain the default | 355 | BUZIOC_G_PARAMS (with a struct buz_params) to obtain the default |
356 | settings, change what he likes and then make a BUZIOC_S_PARAMS call. | 356 | settings, change what he likes and then make a BUZIOC_S_PARAMS call. |
357 | 357 | ||
358 | BUZIOC_REQBUFS | 358 | BUZIOC_REQBUFS |
359 | 359 | ||
360 | Before being able to capture/playback, the user has to request | 360 | Before being able to capture/playback, the user has to request |
361 | the buffers he is wanting to use. Fill the structure | 361 | the buffers he is wanting to use. Fill the structure |
362 | zoran_requestbuffers with the size (recommended: 256*1024) and | 362 | zoran_requestbuffers with the size (recommended: 256*1024) and |
363 | the number (recommended 32 up to 256). There are no such restrictions | 363 | the number (recommended 32 up to 256). There are no such restrictions |
364 | as for the Video for Linux buffers, you should LEAVE SUFFICIENT | 364 | as for the Video for Linux buffers, you should LEAVE SUFFICIENT |
365 | MEMORY for your system however, else strange things will happen .... | 365 | MEMORY for your system however, else strange things will happen .... |
366 | On return, the zoran_requestbuffers structure contains number and | 366 | On return, the zoran_requestbuffers structure contains number and |
367 | size of the actually allocated buffers. | 367 | size of the actually allocated buffers. |
368 | You should use these numbers for doing a mmap of the buffers | 368 | You should use these numbers for doing a mmap of the buffers |
369 | into the user space. | 369 | into the user space. |
370 | The BUZIOC_REQBUFS ioctl also makes it happen, that the next mmap | 370 | The BUZIOC_REQBUFS ioctl also makes it happen, that the next mmap |
371 | maps the MJPEG buffer instead of the V4L buffers. | 371 | maps the MJPEG buffer instead of the V4L buffers. |
372 | 372 | ||
373 | BUZIOC_QBUF_CAPT | 373 | BUZIOC_QBUF_CAPT |
374 | BUZIOC_QBUF_PLAY | 374 | BUZIOC_QBUF_PLAY |
375 | 375 | ||
376 | Queue a buffer for capture or playback. The first call also starts | 376 | Queue a buffer for capture or playback. The first call also starts |
377 | streaming capture. When streaming capture is going on, you may | 377 | streaming capture. When streaming capture is going on, you may |
378 | only queue further buffers or issue syncs until streaming | 378 | only queue further buffers or issue syncs until streaming |
379 | capture is switched off again with a argument of -1 to | 379 | capture is switched off again with a argument of -1 to |
380 | a BUZIOC_QBUF_CAPT/BUZIOC_QBUF_PLAY ioctl. | 380 | a BUZIOC_QBUF_CAPT/BUZIOC_QBUF_PLAY ioctl. |
381 | 381 | ||
382 | BUZIOC_SYNC | 382 | BUZIOC_SYNC |
383 | 383 | ||
384 | Issue this ioctl when all buffers are queued. This ioctl will | 384 | Issue this ioctl when all buffers are queued. This ioctl will |
385 | block until the first buffer becomes free for saving its | 385 | block until the first buffer becomes free for saving its |
386 | data to disk (after BUZIOC_QBUF_CAPT) or for reuse (after BUZIOC_QBUF_PLAY). | 386 | data to disk (after BUZIOC_QBUF_CAPT) or for reuse (after BUZIOC_QBUF_PLAY). |
387 | 387 | ||
388 | BUZIOC_G_STATUS | 388 | BUZIOC_G_STATUS |
389 | 389 | ||
390 | Get the status of the input lines (video source connected/norm). | 390 | Get the status of the input lines (video source connected/norm). |
391 | 391 | ||
392 | For programming example, please, look at lavrec.c and lavplay.c code in | 392 | For programming example, please, look at lavrec.c and lavplay.c code in |
393 | lavtools-1.2p2 package (URL: http://www.cicese.mx/~mirsev/DC10plus/) | 393 | lavtools-1.2p2 package (URL: http://www.cicese.mx/~mirsev/DC10plus/) |
394 | and the 'examples' directory in the original Buz driver distribution. | 394 | and the 'examples' directory in the original Buz driver distribution. |
395 | 395 | ||
396 | Additional notes for software developers: | 396 | Additional notes for software developers: |
397 | 397 | ||
398 | The driver returns maxwidth and maxheight parameters according to | 398 | The driver returns maxwidth and maxheight parameters according to |
399 | the current TV standard (norm). Therefore, the software which | 399 | the current TV standard (norm). Therefore, the software which |
400 | communicates with the driver and "asks" for these parameters should | 400 | communicates with the driver and "asks" for these parameters should |
401 | first set the correct norm. Well, it seems logically correct: TV | 401 | first set the correct norm. Well, it seems logically correct: TV |
402 | standard is "more constant" for current country than geometry | 402 | standard is "more constant" for current country than geometry |
403 | settings of a variety of TV capture cards which may work in ITU or | 403 | settings of a variety of TV capture cards which may work in ITU or |
404 | square pixel format. Remember that users now can lock the norm to | 404 | square pixel format. Remember that users now can lock the norm to |
405 | avoid any ambiguity. | 405 | avoid any ambiguity. |
406 | -- | 406 | -- |
407 | Please note that lavplay/lavrec are also included in the MJPEG-tools | 407 | Please note that lavplay/lavrec are also included in the MJPEG-tools |
408 | (http://mjpeg.sf.net/). | 408 | (http://mjpeg.sf.net/). |
409 | 409 | ||
410 | =========================== | 410 | =========================== |
411 | 411 | ||
412 | 5. Applications | 412 | 5. Applications |
413 | 413 | ||
414 | Applications known to work with this driver: | 414 | Applications known to work with this driver: |
415 | 415 | ||
416 | TV viewing: | 416 | TV viewing: |
417 | * xawtv | 417 | * xawtv |
418 | * kwintv | 418 | * kwintv |
419 | * probably any TV application that supports video4linux or video4linux2. | 419 | * probably any TV application that supports video4linux or video4linux2. |
420 | 420 | ||
421 | MJPEG capture/playback: | 421 | MJPEG capture/playback: |
422 | * mjpegtools/lavtools (or Linux Video Studio) | 422 | * mjpegtools/lavtools (or Linux Video Studio) |
423 | * gstreamer | 423 | * gstreamer |
424 | * mplayer | 424 | * mplayer |
425 | 425 | ||
426 | General raw capture: | 426 | General raw capture: |
427 | * xawtv | 427 | * xawtv |
428 | * gstreamer | 428 | * gstreamer |
429 | * probably any application that supports video4linux or video4linux2 | 429 | * probably any application that supports video4linux or video4linux2 |
430 | 430 | ||
431 | Video editing: | 431 | Video editing: |
432 | * Cinelerra | 432 | * Cinelerra |
433 | * MainActor | 433 | * MainActor |
434 | * mjpegtools (or Linux Video Studio) | 434 | * mjpegtools (or Linux Video Studio) |
435 | 435 | ||
436 | =========================== | 436 | =========================== |
437 | 437 | ||
438 | 6. Concerning buffer sizes, quality, output size etc. | 438 | 6. Concerning buffer sizes, quality, output size etc. |
439 | 439 | ||
440 | The zr36060 can do 1:2 JPEG compression. This is really the theoretical | 440 | The zr36060 can do 1:2 JPEG compression. This is really the theoretical |
441 | maximum that the chipset can reach. The driver can, however, limit compression | 441 | maximum that the chipset can reach. The driver can, however, limit compression |
442 | to a maximum (size) of 1:4. The reason for this is that some cards (e.g. Buz) | 442 | to a maximum (size) of 1:4. The reason for this is that some cards (e.g. Buz) |
443 | can't handle 1:2 compression without stopping capture after only a few minutes. | 443 | can't handle 1:2 compression without stopping capture after only a few minutes. |
444 | With 1:4, it'll mostly work. If you have a Buz, use 'low_bitrate=1' to go into | 444 | With 1:4, it'll mostly work. If you have a Buz, use 'low_bitrate=1' to go into |
445 | 1:4 max. compression mode. | 445 | 1:4 max. compression mode. |
446 | 446 | ||
447 | 100% JPEG quality is thus 1:2 compression in practice. So for a full PAL frame | 447 | 100% JPEG quality is thus 1:2 compression in practice. So for a full PAL frame |
448 | (size 720x576). The JPEG fields are stored in YUY2 format, so the size of the | 448 | (size 720x576). The JPEG fields are stored in YUY2 format, so the size of the |
449 | fields are 720x288x16/2 bits/field (2 fields/frame) = 207360 bytes/field x 2 = | 449 | fields are 720x288x16/2 bits/field (2 fields/frame) = 207360 bytes/field x 2 = |
450 | 414720 bytes/frame (add some more bytes for headers and DHT (huffman)/DQT | 450 | 414720 bytes/frame (add some more bytes for headers and DHT (huffman)/DQT |
451 | (quantization) tables, and you'll get to something like 512kB per frame for | 451 | (quantization) tables, and you'll get to something like 512kB per frame for |
452 | 1:2 compression. For 1:4 compression, you'd have frames of half this size. | 452 | 1:2 compression. For 1:4 compression, you'd have frames of half this size. |
453 | 453 | ||
454 | Some additional explanation by Martin Samuelsson, which also explains the | 454 | Some additional explanation by Martin Samuelsson, which also explains the |
455 | importance of buffer sizes: | 455 | importance of buffer sizes: |
456 | -- | 456 | -- |
457 | > Hmm, I do not think it is really that way. With the current (downloaded | 457 | > Hmm, I do not think it is really that way. With the current (downloaded |
458 | > at 18:00 Monday) driver I get that output sizes for 10 sec: | 458 | > at 18:00 Monday) driver I get that output sizes for 10 sec: |
459 | > -q 50 -b 128 : 24.283.332 Bytes | 459 | > -q 50 -b 128 : 24.283.332 Bytes |
460 | > -q 50 -b 256 : 48.442.368 | 460 | > -q 50 -b 256 : 48.442.368 |
461 | > -q 25 -b 128 : 24.655.992 | 461 | > -q 25 -b 128 : 24.655.992 |
462 | > -q 25 -b 256 : 25.859.820 | 462 | > -q 25 -b 256 : 25.859.820 |
463 | 463 | ||
464 | I woke up, and can't go to sleep again. I'll kill some time explaining why | 464 | I woke up, and can't go to sleep again. I'll kill some time explaining why |
465 | this doesn't look strange to me. | 465 | this doesn't look strange to me. |
466 | 466 | ||
467 | Let's do some math using a width of 704 pixels. I'm not sure whether the Buz | 467 | Let's do some math using a width of 704 pixels. I'm not sure whether the Buz |
468 | actually use that number or not, but that's not too important right now. | 468 | actually use that number or not, but that's not too important right now. |
469 | 469 | ||
470 | 704x288 pixels, one field, is 202752 pixels. Divided by 64 pixels per block; | 470 | 704x288 pixels, one field, is 202752 pixels. Divided by 64 pixels per block; |
471 | 3168 blocks per field. Each pixel consist of two bytes; 128 bytes per block; | 471 | 3168 blocks per field. Each pixel consist of two bytes; 128 bytes per block; |
472 | 1024 bits per block. 100% in the new driver mean 1:2 compression; the maximum | 472 | 1024 bits per block. 100% in the new driver mean 1:2 compression; the maximum |
473 | output becomes 512 bits per block. Actually 510, but 512 is simpler to use | 473 | output becomes 512 bits per block. Actually 510, but 512 is simpler to use |
474 | for calculations. | 474 | for calculations. |
475 | 475 | ||
476 | Let's say that we specify d1q50. We thus want 256 bits per block; times 3168 | 476 | Let's say that we specify d1q50. We thus want 256 bits per block; times 3168 |
477 | becomes 811008 bits; 101376 bytes per field. We're talking raw bits and bytes | 477 | becomes 811008 bits; 101376 bytes per field. We're talking raw bits and bytes |
478 | here, so we don't need to do any fancy corrections for bits-per-pixel or such | 478 | here, so we don't need to do any fancy corrections for bits-per-pixel or such |
479 | things. 101376 bytes per field. | 479 | things. 101376 bytes per field. |
480 | 480 | ||
481 | d1 video contains two fields per frame. Those sum up to 202752 bytes per | 481 | d1 video contains two fields per frame. Those sum up to 202752 bytes per |
482 | frame, and one of those frames goes into each buffer. | 482 | frame, and one of those frames goes into each buffer. |
483 | 483 | ||
484 | But wait a second! -b128 gives 128kB buffers! It's not possible to cram | 484 | But wait a second! -b128 gives 128kB buffers! It's not possible to cram |
485 | 202752 bytes of JPEG data into 128kB! | 485 | 202752 bytes of JPEG data into 128kB! |
486 | 486 | ||
487 | This is what the driver notice and automatically compensate for in your | 487 | This is what the driver notice and automatically compensate for in your |
488 | examples. Let's do some math using this information: | 488 | examples. Let's do some math using this information: |
489 | 489 | ||
490 | 128kB is 131072 bytes. In this buffer, we want to store two fields, which | 490 | 128kB is 131072 bytes. In this buffer, we want to store two fields, which |
491 | leaves 65536 bytes for each field. Using 3168 blocks per field, we get | 491 | leaves 65536 bytes for each field. Using 3168 blocks per field, we get |
492 | 20.68686868... available bytes per block; 165 bits. We can't allow the | 492 | 20.68686868... available bytes per block; 165 bits. We can't allow the |
493 | request for 256 bits per block when there's only 165 bits available! The -q50 | 493 | request for 256 bits per block when there's only 165 bits available! The -q50 |
494 | option is silently overridden, and the -b128 option takes precedence, leaving | 494 | option is silently overridden, and the -b128 option takes precedence, leaving |
495 | us with the equivalence of -q32. | 495 | us with the equivalence of -q32. |
496 | 496 | ||
497 | This gives us a data rate of 165 bits per block, which, times 3168, sums up | 497 | This gives us a data rate of 165 bits per block, which, times 3168, sums up |
498 | to 65340 bytes per field, out of the allowed 65536. The current driver has | 498 | to 65340 bytes per field, out of the allowed 65536. The current driver has |
499 | another level of rate limiting; it won't accept -q values that fill more than | 499 | another level of rate limiting; it won't accept -q values that fill more than |
500 | 6/8 of the specified buffers. (I'm not sure why. "Playing it safe" seem to be | 500 | 6/8 of the specified buffers. (I'm not sure why. "Playing it safe" seem to be |
501 | a safe bet. Personally, I think I would have lowered requested-bits-per-block | 501 | a safe bet. Personally, I think I would have lowered requested-bits-per-block |
502 | by one, or something like that.) We can't use 165 bits per block, but have to | 502 | by one, or something like that.) We can't use 165 bits per block, but have to |
503 | lower it again, to 6/8 of the available buffer space: We end up with 124 bits | 503 | lower it again, to 6/8 of the available buffer space: We end up with 124 bits |
504 | per block, the equivalence of -q24. With 128kB buffers, you can't use greater | 504 | per block, the equivalence of -q24. With 128kB buffers, you can't use greater |
505 | than -q24 at -d1. (And PAL, and 704 pixels width...) | 505 | than -q24 at -d1. (And PAL, and 704 pixels width...) |
506 | 506 | ||
507 | The third example is limited to -q24 through the same process. The second | 507 | The third example is limited to -q24 through the same process. The second |
508 | example, using very similar calculations, is limited to -q48. The only | 508 | example, using very similar calculations, is limited to -q48. The only |
509 | example that actually grab at the specified -q value is the last one, which | 509 | example that actually grab at the specified -q value is the last one, which |
510 | is clearly visible, looking at the file size. | 510 | is clearly visible, looking at the file size. |
511 | -- | 511 | -- |
512 | 512 | ||
513 | Conclusion: the quality of the resulting movie depends on buffer size, quality, | 513 | Conclusion: the quality of the resulting movie depends on buffer size, quality, |
514 | whether or not you use 'low_bitrate=1' as insmod option for the zr36060.c | 514 | whether or not you use 'low_bitrate=1' as insmod option for the zr36060.c |
515 | module to do 1:4 instead of 1:2 compression, etc. | 515 | module to do 1:4 instead of 1:2 compression, etc. |
516 | 516 | ||
517 | If you experience timeouts, lowering the quality/buffersize or using | 517 | If you experience timeouts, lowering the quality/buffersize or using |
518 | 'low_bitrate=1 as insmod option for zr36060.o might actually help, as is | 518 | 'low_bitrate=1 as insmod option for zr36060.o might actually help, as is |
519 | proven by the Buz. | 519 | proven by the Buz. |
520 | 520 | ||
521 | =========================== | 521 | =========================== |
522 | 522 | ||
523 | 7. It hangs/crashes/fails/whatevers! Help! | 523 | 7. It hangs/crashes/fails/whatevers! Help! |
524 | 524 | ||
525 | Make sure that the card has its own interrupts (see /proc/interrupts), check | 525 | Make sure that the card has its own interrupts (see /proc/interrupts), check |
526 | the output of dmesg at high verbosity (load zr36067.o with debug=2, | 526 | the output of dmesg at high verbosity (load zr36067.o with debug=2, |
527 | load all other modules with debug=1). Check that your mainboard is favorable | 527 | load all other modules with debug=1). Check that your mainboard is favorable |
528 | (see question 2) and if not, test the card in another computer. Also see the | 528 | (see question 2) and if not, test the card in another computer. Also see the |
529 | notes given in question 3 and try lowering quality/buffersize/capturesize | 529 | notes given in question 3 and try lowering quality/buffersize/capturesize |
530 | if recording fails after a period of time. | 530 | if recording fails after a period of time. |
531 | 531 | ||
532 | If all this doesn't help, give a clear description of the problem including | 532 | If all this doesn't help, give a clear description of the problem including |
533 | detailed hardware information (memory+brand, mainboard+chipset+brand, which | 533 | detailed hardware information (memory+brand, mainboard+chipset+brand, which |
534 | MJPEG card, processor, other PCI cards that might be of interest), give the | 534 | MJPEG card, processor, other PCI cards that might be of interest), give the |
535 | system PnP information (/proc/interrupts, /proc/dma, /proc/devices), and give | 535 | system PnP information (/proc/interrupts, /proc/dma, /proc/devices), and give |
536 | the kernel version, driver version, glibc version, gcc version and any other | 536 | the kernel version, driver version, glibc version, gcc version and any other |
537 | information that might possibly be of interest. Also provide the dmesg output | 537 | information that might possibly be of interest. Also provide the dmesg output |
538 | at high verbosity. See 'Contacting' on how to contact the developers. | 538 | at high verbosity. See 'Contacting' on how to contact the developers. |
539 | 539 | ||
540 | =========================== | 540 | =========================== |
541 | 541 | ||
542 | 8. Maintainers/Contacting | 542 | 8. Maintainers/Contacting |
543 | 543 | ||
544 | The driver is currently maintained by Laurent Pinchart and Ronald Bultje | 544 | The driver is currently maintained by Laurent Pinchart and Ronald Bultje |
545 | (<laurent.pinchart@skynet.be> and <rbultje@ronald.bitfreak.net>). For bug | 545 | (<laurent.pinchart@skynet.be> and <rbultje@ronald.bitfreak.net>). For bug |
546 | reports or questions, please contact the mailinglist instead of the developers | 546 | reports or questions, please contact the mailinglist instead of the developers |
547 | individually. For user questions (i.e. bug reports or how-to questions), send | 547 | individually. For user questions (i.e. bug reports or how-to questions), send |
548 | an email to <mjpeg-users@lists.sf.net>, for developers (i.e. if you want to | 548 | an email to <mjpeg-users@lists.sf.net>, for developers (i.e. if you want to |
549 | help programming), send an email to <mjpeg-developer@lists.sf.net>. See | 549 | help programming), send an email to <mjpeg-developer@lists.sf.net>. See |
550 | http://www.sf.net/projects/mjpeg/ for subscription information. | 550 | http://www.sf.net/projects/mjpeg/ for subscription information. |
551 | 551 | ||
552 | For bug reports, be sure to include all the information as described in | 552 | For bug reports, be sure to include all the information as described in |
553 | the section 'It hangs/crashes/fails/whatevers! Help!'. Please make sure | 553 | the section 'It hangs/crashes/fails/whatevers! Help!'. Please make sure |
554 | you're using the latest version (http://mjpeg.sf.net/driver-zoran/). | 554 | you're using the latest version (http://mjpeg.sf.net/driver-zoran/). |
555 | 555 | ||
556 | Previous maintainers/developers of this driver include Serguei Miridonov | 556 | Previous maintainers/developers of this driver include Serguei Miridonov |
557 | <mirsev@cicese.mx>, Wolfgang Scherr <scherr@net4you.net>, Dave Perks | 557 | <mirsev@cicese.mx>, Wolfgang Scherr <scherr@net4you.net>, Dave Perks |
558 | <dperks@ibm.net> and Rainer Johanni <Rainer@Johanni.de>. | 558 | <dperks@ibm.net> and Rainer Johanni <Rainer@Johanni.de>. |
559 | 559 | ||
560 | =========================== | 560 | =========================== |
561 | 561 | ||
562 | 9. License | 562 | 9. License |
563 | 563 | ||
564 | This driver is distributed under the terms of the General Public License. | 564 | This driver is distributed under the terms of the General Public License. |
565 | 565 | ||
566 | This program is free software; you can redistribute it and/or modify | 566 | This program is free software; you can redistribute it and/or modify |
567 | it under the terms of the GNU General Public License as published by | 567 | it under the terms of the GNU General Public License as published by |
568 | the Free Software Foundation; either version 2 of the License, or | 568 | the Free Software Foundation; either version 2 of the License, or |
569 | (at your option) any later version. | 569 | (at your option) any later version. |
570 | 570 | ||
571 | This program is distributed in the hope that it will be useful, | 571 | This program is distributed in the hope that it will be useful, |
572 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 572 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
573 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 573 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
574 | GNU General Public License for more details. | 574 | GNU General Public License for more details. |
575 | 575 | ||
576 | You should have received a copy of the GNU General Public License | 576 | You should have received a copy of the GNU General Public License |
577 | along with this program; if not, write to the Free Software | 577 | along with this program; if not, write to the Free Software |
578 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 578 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
579 | 579 | ||
580 | See http://www.gnu.org/ for more information. | 580 | See http://www.gnu.org/ for more information. |
581 | 581 |
Documentation/vm/numa
1 | Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com> | 1 | Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com> |
2 | 2 | ||
3 | The intent of this file is to have an uptodate, running commentary | 3 | The intent of this file is to have an uptodate, running commentary |
4 | from different people about NUMA specific code in the Linux vm. | 4 | from different people about NUMA specific code in the Linux vm. |
5 | 5 | ||
6 | What is NUMA? It is an architecture where the memory access times | 6 | What is NUMA? It is an architecture where the memory access times |
7 | for different regions of memory from a given processor varies | 7 | for different regions of memory from a given processor varies |
8 | according to the "distance" of the memory region from the processor. | 8 | according to the "distance" of the memory region from the processor. |
9 | Each region of memory to which access times are the same from any | 9 | Each region of memory to which access times are the same from any |
10 | cpu, is called a node. On such architectures, it is beneficial if | 10 | cpu, is called a node. On such architectures, it is beneficial if |
11 | the kernel tries to minimize inter node communications. Schemes | 11 | the kernel tries to minimize inter node communications. Schemes |
12 | for this range from kernel text and read-only data replication | 12 | for this range from kernel text and read-only data replication |
13 | across nodes, and trying to house all the data structures that | 13 | across nodes, and trying to house all the data structures that |
14 | key components of the kernel need on memory on that node. | 14 | key components of the kernel need on memory on that node. |
15 | 15 | ||
16 | Currently, all the numa support is to provide efficient handling | 16 | Currently, all the numa support is to provide efficient handling |
17 | of widely discontiguous physical memory, so architectures which | 17 | of widely discontiguous physical memory, so architectures which |
18 | are not NUMA but can have huge holes in the physical address space | 18 | are not NUMA but can have huge holes in the physical address space |
19 | can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM. | 19 | can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM. |
20 | 20 | ||
21 | The initial port includes NUMAizing the bootmem allocator code by | 21 | The initial port includes NUMAizing the bootmem allocator code by |
22 | encapsulating all the pieces of information into a bootmem_data_t | 22 | encapsulating all the pieces of information into a bootmem_data_t |
23 | structure. Node specific calls have been added to the allocator. | 23 | structure. Node specific calls have been added to the allocator. |
24 | In theory, any platform which uses the bootmem allocator should | 24 | In theory, any platform which uses the bootmem allocator should |
25 | be able to to put the bootmem and mem_map data structures anywhere | 25 | be able to put the bootmem and mem_map data structures anywhere |
26 | it deems best. | 26 | it deems best. |
27 | 27 | ||
28 | Each node's page allocation data structures have also been encapsulated | 28 | Each node's page allocation data structures have also been encapsulated |
29 | into a pg_data_t. The bootmem_data_t is just one part of this. To | 29 | into a pg_data_t. The bootmem_data_t is just one part of this. To |
30 | make the code look uniform between NUMA and regular UMA platforms, | 30 | make the code look uniform between NUMA and regular UMA platforms, |
31 | UMA platforms have a statically allocated pg_data_t too (contig_page_data). | 31 | UMA platforms have a statically allocated pg_data_t too (contig_page_data). |
32 | For the sake of uniformity, the function num_online_nodes() is also defined | 32 | For the sake of uniformity, the function num_online_nodes() is also defined |
33 | for all platforms. As we run benchmarks, we might decide to NUMAize | 33 | for all platforms. As we run benchmarks, we might decide to NUMAize |
34 | more variables like low_on_memory, nr_free_pages etc into the pg_data_t. | 34 | more variables like low_on_memory, nr_free_pages etc into the pg_data_t. |
35 | 35 | ||
36 | The NUMA aware page allocation code currently tries to allocate pages | 36 | The NUMA aware page allocation code currently tries to allocate pages |
37 | from different nodes in a round robin manner. This will be changed to | 37 | from different nodes in a round robin manner. This will be changed to |
38 | do concentratic circle search, starting from current node, once the | 38 | do concentratic circle search, starting from current node, once the |
39 | NUMA port achieves more maturity. The call alloc_pages_node has been | 39 | NUMA port achieves more maturity. The call alloc_pages_node has been |
40 | added, so that drivers can make the call and not worry about whether | 40 | added, so that drivers can make the call and not worry about whether |
41 | it is running on a NUMA or UMA platform. | 41 | it is running on a NUMA or UMA platform. |
42 | 42 |