Blame view

Documentation/watchdog/hpwdt.txt 3.81 KB
0215efc02   Brian Boylston   watchdog: hpwdt: ...
1
  Last reviewed: 05/20/2016
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
2

84df082ca   Nigel Croxon   watchdog: hpwdt: ...
3
4
                       HPE iLO NMI Watchdog Driver
                NMI sourcing for iLO based ProLiant Servers
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
5
                       Documentation and Driver by
0215efc02   Brian Boylston   watchdog: hpwdt: ...
6
                           Thomas Mingarelli
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
7

84df082ca   Nigel Croxon   watchdog: hpwdt: ...
8
   The HPE iLO NMI Watchdog driver is a kernel module that provides basic
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
9
10
   watchdog functionality and the added benefit of NMI sourcing. Both the
   watchdog functionality and the NMI sourcing capability need to be enabled
25985edce   Lucas De Marchi   Fix common misspe...
11
   by the user. Remember that the two modes are not dependent on one another.
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
12
   A user can have the NMI sourcing without the watchdog timer and vice-versa.
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
13
14
   All references to iLO in this document imply it also works on iLO2 and all
   subsequent generations.
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
15
16
17
18
19
  
   Watchdog functionality is enabled like any other common watchdog driver. That
   is, an application needs to be started that kicks off the watchdog timer. A
   basic application exists in the Documentation/watchdog/src directory called
   watchdog-test.c. Simply compile the C file and kick it off. If the system
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
20
   gets into a bad state and hangs, the HPE ProLiant iLO timer register will
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
21
22
   not be updated in a timely fashion and a hardware system reset (also known as
   an Automatic Server Recovery (ASR)) event will occur.
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
23
   The hpwdt driver also has three (3) module parameters. They are the following:
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
24

84df082ca   Nigel Croxon   watchdog: hpwdt: ...
25
26
27
28
   soft_margin - allows the user to set the watchdog timer value.
                 Default value is 30 seconds.
   allow_kdump - allows the user to save off a kernel dump image after an NMI.
                 Default value is 1/ON
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
29
30
   nowayout    - basic watchdog parameter that does not allow the timer to
                 be restarted or an impending ASR to be escaped.
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
31
32
33
                 Default value is set when compiling the kernel. If it is set
                 to "Y", then there is no way of disabling the watchdog once
                 it has been started.
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
34
35
36
37
  
   NOTE: More information about watchdog drivers in general, including the ioctl
         interface to /dev/watchdog can be found in
         Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
44df75353   Tom Mingarelli   [WATCHDOG] hpwdt:...
38
   The NMI sourcing capability is disabled by default due to the inability to
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
39
40
41
42
43
   distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
   Linux kernel. What this means is that the hpwdt nmi handler code is called
   each time the NMI signal fires off. This could amount to several thousands of
   NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
   confused" message in the logs or if the system gets into a hung state, then
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
44
   the hpwdt driver can be reloaded.
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
45
46
  
   1. If the kernel has not been booted with nmi_watchdog turned off then
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
47
48
49
50
51
52
53
54
      edit and place the nmi_watchdog=0 at the end of the currently booting
      kernel line. Depending on your Linux distribution and platform setup:
      For non-UEFI systems
         /boot/grub/grub.conf   or
         /boot/grub/menu.lst
      For UEFI systems
        /boot/efi/EFI/distroname/grub.conf   or
        /boot/efi/efi/distroname/elilo.conf
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
55
   2. reboot the sever
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
56
57
   3. Once the system comes up perform a modprobe -r hpwdt
   4. modprobe /lib/modules/`uname -r`/kernel/drivers/watchdog/hpwdt.ko
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
58
59
  
   Now, the hpwdt can successfully receive and source the NMI and provide a log
84df082ca   Nigel Croxon   watchdog: hpwdt: ...
60
   message that details the reason for the NMI (as determined by the HPE BIOS).
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
61

84df082ca   Nigel Croxon   watchdog: hpwdt: ...
62
   Below is a list of NMIs the HPE BIOS understands along with the associated
47bece87b   Thomas Mingarelli   [WATCHDOG] hpwdt:...
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
   code (reason):
  
  	No source found                00h
  
  	Uncorrectable Memory Error     01h
  
  	ASR NMI                        1Bh
  
  	PCI Parity Error               20h
  
  	NMI Button Press               27h
  
  	SB_BUS_NMI                     28h
  
  	ILO Doorbell NMI               29h
  
  	ILO IOP NMI                    2Ah
  
  	ILO Watchdog NMI               2Bh
  
  	Proc Throt NMI                 2Ch
  
  	Front Side Bus NMI             2Dh
  
  	PCI Express Error              2Fh
  
  	DMA controller NMI             30h
  
  	Hypertransport/CSI Error       31h
  
  
  
   -- Tom Mingarelli