Monitoring the VMware ESXi with Nagios (2)

Monitoring the VMware ESXi with Nagios (1)でVMware-vSphere-SDK-for-Perl-4.0.0-161974.i386.tar.gzとcheck_vmware_api.plをインストールしましたが、 Nagios::Plugin perl moduleがインストールされていなかったのでインストールを行います

How do I use the Nagios::Plugin perl module?
http://nagiosplugins.org/faq/development/nagios-plugin-perl

インストール方法は

■from the nagiosplug tarball with an extra configure option
■from CPAN

がありますが、CPANはよくわからない(^^;のでtarboolからインストールします

Official Nagios Plugins
http://www.nagios.org/download/plugins

nagios-plugins-1.4.15.tar.gzを展開

[root@host1 src]# tar xvfz nagios-plugins-1.4.15.tar.gz
[root@host1 src]# cd nagios-plugins-1.4.15

1 2	[root@host1 src]# tar xvfz nagios-plugins-1.4.15.tar.gz [root@host1 src]# cd nagios-plugins-1.4.15

configure

オプション–enable-perl-modulesを指定してconfigureを行います

[root@host1 nagios-plugins-1.4.15]# ./configure --enable-perl-modules
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
(snip)
               --with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin
                   --enable-libtap: no

[root@host1 nagios-plugins-1.4.15]# ./configure --enable-perl-modules

checking for a BSD-compatible install... /usr/bin/install -c

checking whether build environment is sane... yes

checking for a thread-safe mkdir -p... /bin/mkdir -p

checking for gawk... gawk

(snip)

--with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin

--enable-libtap: no

make all

[root@host1 nagios-plugins-1.4.15]# make all
make  all-recursive
make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' に入ります
Making all in gl
make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります
(snip)
make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます
make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

[root@host1 nagios-plugins-1.4.15]# make all

make all-recursive

make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' に入ります

Making all in gl

make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります

(snip)

make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

make install

Nagios::Plugin perl moduleは/usr/local/nagios/perl配下にインストールされています

[root@host1 nagios-plugins-1.4.15]# make install
Making install in gl
make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります
make  install-recursive
make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります
(snip)
make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます
make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

[root@host1 nagios-plugins-1.4.15]# make install

Making install in gl

make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります

make install-recursive

make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15/gl' に入ります

(snip)

make[2]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

make[1]: ディレクトリ `/usr/local/src/nagios-plugins-1.4.15' から出ます

check_vmware_api.plの実行

さて、Nagios::Pluginもインストールできたので改めてcheck_vmware_api.plを実行してみますが相変わらずCan’t locate Nagios/Plugin/Functions.pmのエラーで実行できません

[root@host1 ~]# /usr/local/src/libexec/check_vmware_api.pl
Can't locate Nagios/Plugin/Functions.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at libexec/check_vmware_api.pl line 31.
BEGIN failed--compilation aborted at libexec/check_vmware_api.pl line 31.

[root@host1 ~]# /usr/local/src/libexec/check_vmware_api.pl

Can't locate Nagios/Plugin/Functions.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at libexec/check_vmware_api.pl line 31.

BEGIN failed--compilation aborted at libexec/check_vmware_api.pl line 31.

check_vmware_api.plの変更

いろいろ調べると単純にpathが通っていないだけのようで export PERL5LIB=XXXXXで追加すればいけそうですが、手っ取り早くcheck_vmware_api.plを変更してしまいました

Basic – Perl5lib
http://www.perlmonks.org/?node_id=867860

[root@host1 nagios]# diff -c ~root/op5plugins/check_vmware_api.pl /usr/local/nagios
/libexec/check_vmware_api.pl
*** /root/op5plugins/check_vmware_api.pl        2013-08-18 14:52:22.000000000 +0900
--- /usr/local/nagios/libexec/check_vmware_api.pl       2013-08-20 13:14:32.000000000 +0900
***************
*** 25,30 ****
--- 25,40 ----
  # along with this program.  If not, see <http://www.gnu.org/licenses/>.
  #

+ BEGIN {
+   $my_module_dir = '/usr/local/nagios/perl/lib';
+   push(@INC, ('.',"$my_module_dir"));
+ }
+
+ BEGIN {
+   $my_module_dir = '/usr/local/nagios/perl/lib/i386-linux-thread-multi';
+   push(@INC, ('.',"$my_module_dir"));
+ }
+
  use strict;
  use warnings;
  use vars qw($PROGNAME $VERSION $output $values $result $defperfargs);

[root@host1 nagios]# diff -c ~root/op5plugins/check_vmware_api.pl /usr/local/nagios

/libexec/check_vmware_api.pl

*** /root/op5plugins/check_vmware_api.pl 2013-08-18 14:52:22.000000000 +0900

--- /usr/local/nagios/libexec/check_vmware_api.pl 2013-08-20 13:14:32.000000000 +0900

***************

*** 25,30 ****

--- 25,40 ----

# along with this program. If not, see <http://www.gnu.org/licenses/>.

+ BEGIN {

+ $my_module_dir = '/usr/local/nagios/perl/lib';

+ push(@INC, ('.',"$my_module_dir"));

+ }

+ BEGIN {

+ $my_module_dir = '/usr/local/nagios/perl/lib/i386-linux-thread-multi';

+ push(@INC, ('.',"$my_module_dir"));

+ }

use strict;

use warnings;

use vars qw($PROGNAME $VERSION $output $values $result $defperfargs);

check_vmware_api.plの実行

エラーが無くなって使用方法が表示されました(^_^V

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl
Usage: check_vmware_api.pl -D <data_center> | -H <host_name> [ -C <cluster_name> ] [ -N <vm_name> ]
    -u <user> -p <pass> | -f <authfile>
    -l <command> [ -s <subcommand> ] [ -T <timeshift> ] [ -i <interval> ]
    [ -x <black_list> ] [ -o <additional_options> ]
    [ -t <timeout> ] [ -w <warn_range> ] [ -c <crit_range> ]
    [ -V ] [ -h ]

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl

Usage: check_vmware_api.pl -D <data_center> | -H <host_name> [ -C <cluster_name> ] [ -N <vm_name> ]

-u <user> -p <pass> | -f <authfile>

-l <command> [ -s <subcommand> ] [ -T <timeshift> ] [ -i <interval> ]

[ -x <black_list> ] [ -o <additional_options> ]

[ -t <timeout> ] [ -w <warn_range> ] [ -c <crit_range> ]

[ -V ] [ -h ]

VMware ESXiの確認

事務所のVMware ESXi 5.1U1に対してcheck_vmware_api.plを実行してみます
cpu, memory情報が取得できましたので次回は実際にNagiosに組み込んでみます

cpu情報

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -H 192.168.1.10 -u root -p password -l cpu
CHECK_VMWARE_API.PL OK - cpu usage=798.00 MHz (7.50%) | cpu_usagemhz=798.00Mhz;; cpu_usage=7.50%;;

1 2	[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -H 192.168.1.10 -u root -p password -l cpu CHECK_VMWARE_API.PL OK - cpu usage=798.00 MHz (7.50%) \| cpu_usagemhz=798.00Mhz;; cpu_usage=7.50%;;

メモリ情報

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -H 192.168.1.10 -u root -p password -l mem
CHECK_VMWARE_API.PL OK - mem usage=6352.55 MB (78.79%), overhead=658.84 MB, swapped=0.00 MB, memctl=0.00 MB | mem_usagemb=6352.55MB;; mem_usage=78.79%;; mem_overhead=658.84MB;; mem_swap=0.00MB;; mem_memctl=0.00MB;;

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -H 192.168.1.10 -u root -p password -l mem

CHECK_VMWARE_API.PL OK - mem usage=6352.55 MB (78.79%), overhead=658.84 MB, swapped=0.00 MB, memctl=0.00 MB | mem_usagemb=6352.55MB;; mem_usage=78.79%;; mem_overhead=658.84MB;; mem_swap=0.00MB;; mem_memctl=0.00MB;;

ヘルプを見ると凄い高機能です

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -h
check_vmware_api.pl 0.7.0

This nagios plugin is free software, and comes with ABSOLUTELY NO WARRANTY.
It may be used, redistributed and/or modified under the terms of the GNU
General Public Licence (see http://www.fsf.org/licensing/licenses/gpl.txt).

VMWare Infrastructure plugin

Usage: check_vmware_api.pl -D <data_center> | -H <host_name> [ -C <cluster_name> ] [ -N <vm_name> ]
    -u <user> -p <pass> | -f <authfile>
    -l <command> [ -s <subcommand> ] [ -T <timeshift> ] [ -i <interval> ]
    [ -x <black_list> ] [ -o <additional_options> ]
    [ -t <timeout> ] [ -w <warn_range> ] [ -c <crit_range> ]
    [ -V ] [ -h ]

 -?, --usage
   Print usage information
 -h, --help
   Print detailed help screen
 -V, --version
   Print version information
 --extra-opts=[section][@file]
   Read options from an ini file. See http://nagiosplugins.org/extra-opts for usage
 -H, --host=<hostname>
   ESX or ESXi hostname.
 -C, --cluster=<clustername>
   ESX or ESXi clustername.
 -D, --datacenter=<DCname>
   Datacenter hostname.
 -N, --name=<vmname>
   Virtual machine name.
 -u, --username=<username>
   Username to connect with.
 -p, --password=<password>
   Password to use with the username.
 -f, --authfile=<path>
   Authentication file with login and password. File syntax :
   username=<login>
   password=<password>
 -w, --warning=THRESHOLD
   Warning threshold. See
   http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT
   for the threshold format.
 -c, --critical=THRESHOLD
   Critical threshold. See
   http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT
   for the threshold format.
 -l, --command=COMMAND
   Specify command type (CPU, MEM, NET, IO, VMFS, RUNTIME, ...)
 -s, --subcommand=SUBCOMMAND
   Specify subcommand
 -S, --sessionfile=SESSIONFILE
   Specify a filename to store sessions for faster authentication
 -x, --exclude=<black_list>
   Specify black list
 -o, --options=<additional_options>
   Specify additional command options (quickstats, ...)
 -T, --timestamp=<timeshift>
   Timeshift in seconds that could fix issues with "Unknown error". Use values like 5, 10, 20, etc
 -i, --interval=<sampling period>
   Sampling Period in seconds. Basic historic intervals: 300, 1800, 7200 or 86400. See config for any changes.
   Supports literval values to autonegotiate interval value: r - realtime interval, h<number> - historical interval specified by position.
   Default value is 20 (realtime). Since cluster does not have realtime stats interval other than 20(default realtime) is mandatory.
 -M, --maxsamples=<max sample count>
   Maximum number of samples to retrieve. Max sample number is ignored for historic intervals.
   Default value is 1 (latest available sample).
 --trace=<level>
   Set verbosity level of vSphere API request/respond trace
 -t, --timeout=INTEGER
   Seconds before plugin times out (default: 30)
 -v, --verbose
   Show details for command-line debugging (can repeat up to 3 times)
Supported commands(^ - blank or not specified parameter, o - options, T - timeshift value, x - blacklist) :
    VM specific :
        * cpu - shows cpu info
            + usage - CPU usage in percentage
            + usagemhz - CPU usage in MHz
            + wait - CPU wait time in ms
            + ready - CPU ready time in ms
            ^ all cpu info(no thresholds)
        * mem - shows mem info
            + usage - mem usage in percentage
            + usagemb - mem usage in MB
            + swap - swap mem usage in MB
            + swapin - swapin mem usage in MB
            + swapout - swapout mem usage in MB
            + overhead - additional mem used by VM Server in MB
            + overall - overall mem used by VM Server in MB
            + active - active mem usage in MB
            + memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning
            ^ all mem info(except overall and no thresholds)
        * net - shows net info
            + usage - overall network usage in KBps(Kilobytes per Second)
            + receive - receive in KBps(Kilobytes per Second)
            + send - send in KBps(Kilobytes per Second)
            ^ all net info(except usage and no thresholds)
        * io - shows disk I/O info
            + usage - overall disk usage in MB/s
            + read - read latency in ms (totalReadLatency.average)
            + write - write latency in ms (totalWriteLatency.average)
            ^ all disk io info(no thresholds)
        * runtime - shows runtime info
            + con - connection state
            + cpu - allocated CPU in MHz
            + mem - allocated mem in MB
            + state - virtual machine state (UP, DOWN, SUSPENDED)
            + status - overall object status (gray/green/red/yellow)
            + consoleconnections - console connections to VM
            + guest - guest OS status, needs VMware Tools
            + tools - VMWare Tools status
            + issues - all issues for the host
            ^ all runtime info(except con and no thresholds)
    Host specific :
        * cpu - shows cpu info
            + usage - CPU usage in percentage
                o quickstats - switch for query either PerfCounter values or Runtime info
            + usagemhz - CPU usage in MHz
                o quickstats - switch for query either PerfCounter values or Runtime info
            ^ all cpu info
                o quickstats - switch for query either PerfCounter values or Runtime info
        * mem - shows mem info
            + usage - mem usage in percentage
                o quickstats - switch for query either PerfCounter values or Runtime info
            + usagemb - mem usage in MB
                o quickstats - switch for query either PerfCounter values or Runtime info
            + swap - swap mem usage in MB
                o listvm - turn on/off output list of swapping VM's
            + overhead - additional mem used by VM Server in MB
            + overall - overall mem used by VM Server in MB
            + memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning
                o listvm - turn on/off output list of ballooning VM's
            ^ all mem info(except overall and no thresholds)
        * net - shows net info
            + usage - overall network usage in KBps(Kilobytes per Second)
            + receive - receive in KBps(Kilobytes per Second)
            + send - send in KBps(Kilobytes per Second)
            + nic - makes sure all active NICs are plugged in
            ^ all net info(except usage and no thresholds)
        * io - shows disk io info
            + aborted - aborted commands count
            + resets - bus resets count
            + read - read latency in ms (totalReadLatency.average)
            + write - write latency in ms (totalWriteLatency.average)
            + kernel - kernel latency in ms
            + device - device latency in ms
            + queue - queue latency in ms
            ^ all disk io info
        * vmfs - shows Datastore info
            + (name) - free space info for datastore with name (name)
                o used - output used space instead of free
                o breif - list only alerting volumes
                o regexp - whether to treat name as regexp
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh
            ^ all datastore info
                o used - output used space instead of free
                o breif - list only alerting volumes
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh
        * runtime - shows runtime info
            + con - connection state
            + health - checks cpu/storage/memory/sensor status
                o listitems - list all available sensors(use for listing purpose only)
                o blackregexpflag - whether to treat blacklist as regexp
                x - blacklist status objects
            + storagehealth - storage status check
                o blackregexpflag - whether to treat blacklist as regexp
                x - blacklist status objects
            + temperature - temperature sensors
                o blackregexpflag - whether to treat blacklist as regexp
                x - blacklist status objects
            + sensor - threshold specified sensor
            + maintenance - shows whether host is in maintenance mode
            + list(vm) - list of VMWare machines and their statuses
            + status - overall object status (gray/green/red/yellow)
            + issues - all issues for the host
                x - blacklist issues
            ^ all runtime info(health, storagehealth, temperature and sensor are represented as one value and no thresholds)
        * service - shows Host service info
            + (names) - check the state of one or several services specified by (names), syntax for (names):<service1>,<service2>,...,<serviceN>
            ^ show all services
        * storage - shows Host storage info
            + adapter - list bus adapters
                x - blacklist adapters
            + lun - list SCSI logical units
                x - blacklist LUN's
            + path - list logical unit paths
                x - blacklist paths
            ^ show all storage info
        * uptime - shows Host uptime
                o quickstats - switch for query either PerfCounter values or Runtime info
        * device - shows Host specific device info
            + cd/dvd - list vm's with attached cd/dvd drives
                o listall - list all available devices(use for listing purpose only)
    DC specific :
        * cpu - shows cpu info
            + usage - CPU usage in percentage
                o quickstats - switch for query either PerfCounter values or Runtime info
            + usagemhz - CPU usage in MHz
                o quickstats - switch for query either PerfCounter values or Runtime info
            ^ all cpu info
                o quickstats - switch for query either PerfCounter values or Runtime info
        * mem - shows mem info
            + usage - mem usage in percentage
                o quickstats - switch for query either PerfCounter values or Runtime info
            + usagemb - mem usage in MB
                o quickstats - switch for query either PerfCounter values or Runtime info
            + swap - swap mem usage in MB
            + overhead - additional mem used by VM Server in MB
            + overall - overall mem used by VM Server in MB
            + memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning
            ^ all mem info(except overall and no thresholds)
        * net - shows net info
            + usage - overall network usage in KBps(Kilobytes per Second)
            + receive - receive in KBps(Kilobytes per Second)
            + send - send in KBps(Kilobytes per Second)
            ^ all net info(except usage and no thresholds)
        * io - shows disk io info
            + aborted - aborted commands count
            + resets - bus resets count
            + read - read latency in ms (totalReadLatency.average)
            + write - write latency in ms (totalWriteLatency.average)
            + kernel - kernel latency in ms
            + device - device latency in ms
            + queue - queue latency in ms
            ^ all disk io info
        * vmfs - shows Datastore info
            + (name) - free space info for datastore with name (name)
                o used - output used space instead of free
                o breif - list only alerting volumes
                o regexp - whether to treat name as regexp
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh
            ^ all datastore info
                o used - output used space instead of free
                o breif - list only alerting volumes
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh
        * runtime - shows runtime info
            + list(vm) - list of VMWare machines and their statuses
            + listhost - list of VMWare esx host servers and their statuses
            + listcluster - list of VMWare clusters and their statuses
            + tools - VMWare Tools status
                x - blacklist VM's
            + status - overall object status (gray/green/red/yellow)
            + issues - all issues for the host
                x - blacklist issues
            ^ all runtime info(except cluster and tools and no thresholds)
        * recommendations - shows recommendations for cluster
            + (name) - recommendations for cluster with name (name)
            ^ all clusters recommendations
    Cluster specific :
        * cpu - shows cpu info
            + usage - CPU usage in percentage
            + usagemhz - CPU usage in MHz
            ^ all cpu info
        * mem - shows mem info
            + usage - mem usage in percentage
            + usagemb - mem usage in MB
            + swap - swap mem usage in MB
                o listvm - turn on/off output list of swapping VM's
            + memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning
                o listvm - turn on/off output list of ballooning VM's
            ^ all mem info(plus overhead and no thresholds)
        * cluster - shows cluster services info
            + effectivecpu - total available cpu resources of all hosts within cluster
            + effectivemem - total amount of machine memory of all hosts in the cluster
            + failover - VMWare HA number of failures that can be tolerated
            + cpufainess - fairness of distributed cpu resource allocation
            + memfainess - fairness of distributed mem resource allocation
            ^ only effectivecpu and effectivemem values for cluster services
        * runtime - shows runtime info
            + list(vm) - list of VMWare machines in cluster and their statuses
            + listhost - list of VMWare esx host servers in cluster and their statuses
            + status - overall cluster status (gray/green/red/yellow)
            + issues - all issues for the cluster
                x - blacklist issues
            ^ all cluster runtime info
        * vmfs - shows Datastore info
            + (name) - free space info for datastore with name (name)
                o used - output used space instead of free
                o breif - list only alerting volumes
                o regexp - whether to treat name as regexp
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh
            ^ all datastore info
                o used - output used space instead of free
                o breif - list only alerting volumes
                o blacklistregexp - whether to treat blacklist as regexp
                x - blacklist VMFS's
                T (value) - timeshift to detemine if we need to refresh

Copyright (c) 2008-2013 op5 AB

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

[root@host1 ~]# /usr/local/nagios/libexec/check_vmware_api.pl -h

check_vmware_api.pl 0.7.0

This nagios plugin is free software, and comes with ABSOLUTELY NO WARRANTY.

It may be used, redistributed and/or modified under the terms of the GNU

General Public Licence (see http://www.fsf.org/licensing/licenses/gpl.txt).

VMWare Infrastructure plugin

Usage: check_vmware_api.pl -D <data_center> | -H <host_name> [ -C <cluster_name> ] [ -N <vm_name> ]

-u <user> -p <pass> | -f <authfile>

-l <command> [ -s <subcommand> ] [ -T <timeshift> ] [ -i <interval> ]

[ -x <black_list> ] [ -o <additional_options> ]

[ -t <timeout> ] [ -w <warn_range> ] [ -c <crit_range> ]

[ -V ] [ -h ]

-?, --usage

Print usage information

-h, --help

Print detailed help screen

-V, --version

Print version information

--extra-opts=[section][@file]

Read options from an ini file. See http://nagiosplugins.org/extra-opts for usage

-H, --host=<hostname>

ESX or ESXi hostname.

-C, --cluster=<clustername>

ESX or ESXi clustername.

-D, --datacenter=<DCname>

Datacenter hostname.

-N, --name=<vmname>

Virtual machine name.

-u, --username=<username>

Username to connect with.

-p, --password=<password>

Password to use with the username.

-f, --authfile=<path>

Authentication file with login and password. File syntax :

username=<login>

password=<password>

-w, --warning=THRESHOLD

Warning threshold. See

http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT

for the threshold format.

-c, --critical=THRESHOLD

Critical threshold. See

http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT

for the threshold format.

-l, --command=COMMAND

Specify command type (CPU, MEM, NET, IO, VMFS, RUNTIME, ...)

-s, --subcommand=SUBCOMMAND

Specify subcommand

-S, --sessionfile=SESSIONFILE

Specify a filename to store sessions for faster authentication

-x, --exclude=<black_list>

Specify black list

-o, --options=<additional_options>

Specify additional command options (quickstats, ...)

-T, --timestamp=<timeshift>

Timeshift in seconds that could fix issues with "Unknown error". Use values like 5, 10, 20, etc

-i, --interval=<sampling period>

Sampling Period in seconds. Basic historic intervals: 300, 1800, 7200 or 86400. See config for any changes.

Supports literval values to autonegotiate interval value: r - realtime interval, h<number> - historical interval specified by position.

Default value is 20 (realtime). Since cluster does not have realtime stats interval other than 20(default realtime) is mandatory.

-M, --maxsamples=<max sample count>

Maximum number of samples to retrieve. Max sample number is ignored for historic intervals.

Default value is 1 (latest available sample).

--trace=<level>

Set verbosity level of vSphere API request/respond trace

-t, --timeout=INTEGER

Seconds before plugin times out (default: 30)

-v, --verbose

Show details for command-line debugging (can repeat up to 3 times)

Supported commands(^ - blank or not specified parameter, o - options, T - timeshift value, x - blacklist) :

VM specific :

* cpu - shows cpu info

+ usage - CPU usage in percentage

+ usagemhz - CPU usage in MHz

+ wait - CPU wait time in ms

+ ready - CPU ready time in ms

^ all cpu info(no thresholds)

* mem - shows mem info

+ usage - mem usage in percentage

+ usagemb - mem usage in MB

+ swap - swap mem usage in MB

+ swapin - swapin mem usage in MB

+ swapout - swapout mem usage in MB

+ overhead - additional mem used by VM Server in MB

+ overall - overall mem used by VM Server in MB

+ active - active mem usage in MB

+ memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning

^ all mem info(except overall and no thresholds)

* net - shows net info

+ usage - overall network usage in KBps(Kilobytes per Second)

+ receive - receive in KBps(Kilobytes per Second)

+ send - send in KBps(Kilobytes per Second)

^ all net info(except usage and no thresholds)

* io - shows disk I/O info

+ usage - overall disk usage in MB/s

+ read - read latency in ms (totalReadLatency.average)

+ write - write latency in ms (totalWriteLatency.average)

^ all disk io info(no thresholds)

* runtime - shows runtime info

+ con - connection state

+ cpu - allocated CPU in MHz

+ mem - allocated mem in MB

+ state - virtual machine state (UP, DOWN, SUSPENDED)

+ status - overall object status (gray/green/red/yellow)

+ consoleconnections - console connections to VM

+ guest - guest OS status, needs VMware Tools

+ tools - VMWare Tools status

+ issues - all issues for the host

^ all runtime info(except con and no thresholds)

Host specific :

* cpu - shows cpu info

+ usage - CPU usage in percentage

o quickstats - switch for query either PerfCounter values or Runtime info

+ usagemhz - CPU usage in MHz

o quickstats - switch for query either PerfCounter values or Runtime info

^ all cpu info

o quickstats - switch for query either PerfCounter values or Runtime info

* mem - shows mem info

+ usage - mem usage in percentage

o quickstats - switch for query either PerfCounter values or Runtime info

+ usagemb - mem usage in MB

o quickstats - switch for query either PerfCounter values or Runtime info

+ swap - swap mem usage in MB

o listvm - turn on/off output list of swapping VM's

+ overhead - additional mem used by VM Server in MB

+ overall - overall mem used by VM Server in MB

+ memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning

o listvm - turn on/off output list of ballooning VM's

^ all mem info(except overall and no thresholds)

* net - shows net info

+ usage - overall network usage in KBps(Kilobytes per Second)

+ receive - receive in KBps(Kilobytes per Second)

+ send - send in KBps(Kilobytes per Second)

+ nic - makes sure all active NICs are plugged in

^ all net info(except usage and no thresholds)

* io - shows disk io info

+ aborted - aborted commands count

+ resets - bus resets count

+ read - read latency in ms (totalReadLatency.average)

+ write - write latency in ms (totalWriteLatency.average)

+ kernel - kernel latency in ms

+ device - device latency in ms

+ queue - queue latency in ms

^ all disk io info

* vmfs - shows Datastore info

+ (name) - free space info for datastore with name (name)

o used - output used space instead of free

o breif - list only alerting volumes

o regexp - whether to treat name as regexp

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

^ all datastore info

o used - output used space instead of free

o breif - list only alerting volumes

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

* runtime - shows runtime info

+ con - connection state

+ health - checks cpu/storage/memory/sensor status

o listitems - list all available sensors(use for listing purpose only)

o blackregexpflag - whether to treat blacklist as regexp

x - blacklist status objects

+ storagehealth - storage status check

o blackregexpflag - whether to treat blacklist as regexp

x - blacklist status objects

+ temperature - temperature sensors

o blackregexpflag - whether to treat blacklist as regexp

x - blacklist status objects

+ sensor - threshold specified sensor

+ maintenance - shows whether host is in maintenance mode

+ list(vm) - list of VMWare machines and their statuses

+ status - overall object status (gray/green/red/yellow)

+ issues - all issues for the host

x - blacklist issues

^ all runtime info(health, storagehealth, temperature and sensor are represented as one value and no thresholds)

* service - shows Host service info

+ (names) - check the state of one or several services specified by (names), syntax for (names):<service1>,<service2>,...,<serviceN>

^ show all services

* storage - shows Host storage info

+ adapter - list bus adapters

x - blacklist adapters

+ lun - list SCSI logical units

x - blacklist LUN's

+ path - list logical unit paths

x - blacklist paths

^ show all storage info

* uptime - shows Host uptime

o quickstats - switch for query either PerfCounter values or Runtime info

* device - shows Host specific device info

+ cd/dvd - list vm's with attached cd/dvd drives

o listall - list all available devices(use for listing purpose only)

DC specific :

* cpu - shows cpu info

+ usage - CPU usage in percentage

o quickstats - switch for query either PerfCounter values or Runtime info

+ usagemhz - CPU usage in MHz

o quickstats - switch for query either PerfCounter values or Runtime info

^ all cpu info

o quickstats - switch for query either PerfCounter values or Runtime info

* mem - shows mem info

+ usage - mem usage in percentage

o quickstats - switch for query either PerfCounter values or Runtime info

+ usagemb - mem usage in MB

o quickstats - switch for query either PerfCounter values or Runtime info

+ swap - swap mem usage in MB

+ overhead - additional mem used by VM Server in MB

+ overall - overall mem used by VM Server in MB

+ memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning

^ all mem info(except overall and no thresholds)

* net - shows net info

+ usage - overall network usage in KBps(Kilobytes per Second)

+ receive - receive in KBps(Kilobytes per Second)

+ send - send in KBps(Kilobytes per Second)

^ all net info(except usage and no thresholds)

* io - shows disk io info

+ aborted - aborted commands count

+ resets - bus resets count

+ read - read latency in ms (totalReadLatency.average)

+ write - write latency in ms (totalWriteLatency.average)

+ kernel - kernel latency in ms

+ device - device latency in ms

+ queue - queue latency in ms

^ all disk io info

* vmfs - shows Datastore info

+ (name) - free space info for datastore with name (name)

o used - output used space instead of free

o breif - list only alerting volumes

o regexp - whether to treat name as regexp

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

^ all datastore info

o used - output used space instead of free

o breif - list only alerting volumes

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

* runtime - shows runtime info

+ list(vm) - list of VMWare machines and their statuses

+ listhost - list of VMWare esx host servers and their statuses

+ listcluster - list of VMWare clusters and their statuses

+ tools - VMWare Tools status

x - blacklist VM's

+ status - overall object status (gray/green/red/yellow)

+ issues - all issues for the host

x - blacklist issues

^ all runtime info(except cluster and tools and no thresholds)

* recommendations - shows recommendations for cluster

+ (name) - recommendations for cluster with name (name)

^ all clusters recommendations

Cluster specific :

* cpu - shows cpu info

+ usage - CPU usage in percentage

+ usagemhz - CPU usage in MHz

^ all cpu info

* mem - shows mem info

+ usage - mem usage in percentage

+ usagemb - mem usage in MB

+ swap - swap mem usage in MB

o listvm - turn on/off output list of swapping VM's

+ memctl - mem used by VM memory control driver(vmmemctl) that controls ballooning

o listvm - turn on/off output list of ballooning VM's

^ all mem info(plus overhead and no thresholds)

* cluster - shows cluster services info

+ effectivecpu - total available cpu resources of all hosts within cluster

+ effectivemem - total amount of machine memory of all hosts in the cluster

+ failover - VMWare HA number of failures that can be tolerated

+ cpufainess - fairness of distributed cpu resource allocation

+ memfainess - fairness of distributed mem resource allocation

^ only effectivecpu and effectivemem values for cluster services

* runtime - shows runtime info

+ list(vm) - list of VMWare machines in cluster and their statuses

+ listhost - list of VMWare esx host servers in cluster and their statuses

+ status - overall cluster status (gray/green/red/yellow)

+ issues - all issues for the cluster

x - blacklist issues

^ all cluster runtime info

* vmfs - shows Datastore info

+ (name) - free space info for datastore with name (name)

o used - output used space instead of free

o breif - list only alerting volumes

o regexp - whether to treat name as regexp

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

^ all datastore info

o used - output used space instead of free

o breif - list only alerting volumes

o blacklistregexp - whether to treat blacklist as regexp

x - blacklist VMFS's

T (value) - timeshift to detemine if we need to refresh

RootLinks Co., Ltd.

Monitoring the VMware ESXi with Nagios (2)

Leave a Reply Cancel Reply

2013年8月
日	月	火	水	木	金	土
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

RootLinks Co., Ltd.

Monitoring the VMware ESXi with Nagios (2)

Related posts:

Leave a Reply Cancel Reply