1、查看系统服务状态:
[root@vm1 ebao_uat2]# service --status-all
crond (pid 9701) is running...
Chain INPUT (policy ACCEPT 9169 packets, 4606K bytes)
pkts bytestargetprot optinoutsourcedestination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytestargetprot optinoutsourcedestination
Chain OUTPUT (policy ACCEPT 10249 packets, 7082K bytes)
pkts bytestargetprot optinoutsourcedestination
Incoming and outgoing ports blocked by default.
Enabled services: CIMSLP VCB swISCSIClient CIMHttpsServervpxHeartbeats LicenseClient sshServer
Opened ports:
gpm (pid 9623) is running...
ipmi_msghandler module loaded.
ipmi_si_drv module loaded.
ipmi_devintf module loaded.
/dev/ipmi0 exists.
Table: filter
Chain INPUT (policy ACCEPT)
targetprot optsourcedestination
Chain FORWARD (policy ACCEPT)
targetprot optsourcedestination
Chain OUTPUT (policy ACCEPT)
targetprot optsourcedestination
irqbalance is stopped
iSCSI driver is not loaded
Usage: /etc/init.d/megaraid_sas_ioctl {start|stop|restart}
vmware-hostd (pid 7750) is running...
/etc/init.d/microcode_ctl: reading microcode status is not yetsupported
Usage: /etc/init.d/mptctlnode {start|stop|restart}
Configured devices:
lo eth1 eth2 vswif0
Currently active devices:
lo vmnic1 vmnic0 vmnic4 vmnic3 vswif0
rpc.mountd is stopped
nfsd is stopped
rpc.statd is stopped
nscd is stopped
ntpd is stopped
cimserver (pid 9748) is running...
portmap is stopped
The random data source exists
rdisc is stopped
saslauthd is stopped
smartd is stopped
snmpd is stopped
snmptrapd is stopped
sshd (pid 1967 1923) is running...
syslogd (pid 1717) is running...
klogd (pid 1721) is running...
Usage: (halt|reboot|start) {start}
The vmnixmod kernel module is loaded.
The VMkernel is loaded.
vmware-aam is not running[ OK]
Usage: vmware-autostart {start|stop|restart}
none
VMware VMkernel authorization daemon is running (pid 9712).
vmware-vpxa is running
webAccess (pid 9650) is running...
winbindd is stopped
openwsmand (pid 9788) is running...
xinetd (pid 9614) is running...
ypbind is stopped
Nightly yum update is disabled.
A.
检查软件兼容性
vCenterServer软件版本号必需高于/等于置管ESX的版本;
vSphereClient软件版本号必需高于/等于被访问的vCenter版本;
vSphereClient软件版本号必需高于/等于被访问的ESX版本;
在使用Update Manager时必需保持vCenter、vSphereClient、UpdateManager这三者的版本是一致的。
更多VI 架构组件的兼容性请查阅使用文档与以下软件兼容手册:
vSphere CompatibilityMatrix
http://www.vmware.com/resources/compatibility/docs/vSphere_Comp_Matrix.pdf
VMware Infrastructure CompatibilityMatrixes
http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_compat_matrix.pdf
B.
检查网络:
1.使用PING命令测试网络连接,如果PING命令测试不成功。请检查:
vCenter与ESX所在的Vlan
检查vCenter和ESX的网关与网络配置
物理链路检查
2.
检查虚拟交换机配置
esxcfg-vswitch–l
esxcfg-vswif–l
esxcfg-nics–l
3.
检查服务器端口开放情况
查看ESX防火墙端口开放:esxcfg-firewall–q
查看服务器端口使用情况netstat -na
4.
检查物理交换机QOS配置
5.
vCenter到ESX的网络是否经过NAT地址转换
C.
检查存储状态,ESX是否BOOT FROMSAN
检查存储连接
vSphere Client–Configuration-Storage Adaptes
vSphere Client–Configuration-Storage
执行”esxcfg-mpatch -l”查看已连接的存储
执行”esxcfg-scsidevs –l”,查看已签名的SCSI设备
确认存储空间
在ESX执行”vdf-h”检查各分区存储使用情况
vSphere 4服务控制台的文件系统
查看存放Service Console的虚拟磁盘
#vsd–l
vsa0:0:0 /dev/sda
查找Service Console的VMDK路径
#vsd–g
/vmfs/volumesesxconsole.vmdk
D.
vCenter与ESX配置检查:
1.
vCenter与ESX系统的配置会影响到程序的正常运行
检查/etc/hosts文件的默认记录是否存在(hosts文件中127.0.0.1该条系统自带解析记录不可删除)
127.0.0.1localhost.localdomain localhost
2.
查看HOSTD与VPXA进程是否正在运行
#ps –ef | grep hostd
#ps –ef | grep vpxa
(如果HOSTD与VPXA程序没有在运行,可直接执行以下命令启动:
启动HOSTD:
# /bin/sh /usr/bin/vmware-watchdog -shostd -u 60 -q 5 -c /usr/sbin/vmware-hostd-support/usr/sbin/vmware-hostd /etc/vmware/hostd/config.xml–u
启动VPXA
#/bin/sh/opt/vmware/vpxa/bin/vmware-watchdog -s vpxa -u 30 -q 5/opt/vmware/vpxa/sbin/vpxa
3.
重新启动HOSTD与VPXA检查程序是否异常
#service mgmt-vmwarerestart
#service vmware-vpxarestart
(重启如果提示如下,说明重启服务正常
[root@vsphere1 ~]#service mgmt-vmware restart
Stopping VMware ESXManagement services:
VMware ESX Host AgentWatchdog [ OK ]
VMware ESX Host Agent[ OK ]
Starting VMware ESXManagement services:
VMware ESX Host Agent(background) [ OK ]
Availability reportstartup (background) [ OK ]
[root@vsphere1 ~]#service vmware-vpxa restart
Stopping vmware-vpxa:[ OK ]
Starting vmware-vpxa:[ OK ])
它们的工作目录在/etc/init.d/,如果不能正常启动服务请查看KB :http://kb.vmware.com/kb/ 1003490
4.
审核vpxa代理配置文件:
文件地址:/etc/opt/vmware/vpxa/vpxa.cfg
5.
审核hostd配置文件:
文件地址:/etc/vmware/esx.conf
6.
在vCenter指定管理ESX的地址
Viclient—login to vCenter—Administration—vCenter ServerSettings—Runtime Settings—Managed IPAddress—vCenter Server Managed IP:
7.
检查vCenter的SSL验证
Vi client—login tovCenter—Administration—vCenter Server Settings—SSLSettings—去掉”vCenter requires verifiedhost SSL certificates”的勾选。
综上:如果仔细检查并执行以上每一项没有问题,但是ESX在vCenter中状态依旧不正常,请执行以下操作:
1. 使用VI Client连接到vCenter
2. 停止该cluster的HA与DRS
3. 右键有故障的ESX -> Disconnect-> 再次右键将该ESX从vCenter上移除
4. 使用root用户登录到ESX执行以下操作
i. 执行命令停止系统管理服务的运行
service mgmt-vmwarestop
service vmware-vpxastop
ii. 删除vpxa Agent程序
rpm –qa |grep vpxa
rpm –e<上一条指令返回的结果>
iii. 将vpxa旧的配置文件改名
Esx 3.5 配置文件的路径
mv /etc/opt/vmware/vpxa/vpxa.cfg/etc/opt/vmware/vpxa/vpxa.cfg.old
Esx 4.0 配置文件的路径
mv /etc/opt/vmware/vpxa/vpxa.cfg/etc/opt/vmware/vpxa/vpxa.cfg.old
5. 登录ESX 的命令行终端:
#service iptablesstop
# chkconfig --level 35 iptablesoff
6. service mgmt-vmwarestart
7. 重新将ESX添加到vCenter
8. 启用HA与DRS