R&D/클라우드

CloudStack HA

sunshout 2012. 3. 16. 16:03
Abstraction
- How to CloudStack migrate VMs, if hypervisor was broken.

1. Detect Cnode was unreacable or broken

2012-03-16 11:44:21,952 ERROR [agent.manager.AgentManagerImpl] (AgentTaskPool-13:null) Host is down: 94-cnode04-m.pod1.kr-0.dmz.xxx.com.  Starting HA on the VMs


2. Call scheduleRestartForVmsOnHost(HostVO, boolean)
   - file: server.src.com.cloud.ha.HighAvailabilityManagerImpl.java

2012-03-16 11:44:21,963 WARN  [cloud.ha.HighAvailabilityManagerImpl] (AgentTaskPool-13:null) Scheduling restart for VMs on host 94


3. Find VMs in that host
    - send alert email
    - scheduleRestart(vm, investigate)

2012-03-16 11:44:21,987 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (AgentTaskPool-13:null) Notifying HA Mgr of to restart vm 88-i-9-88-VM
2012-03-16 11:44:21,998 INFO  [cloud.ha.HighAvailabilityManagerImpl] (AgentTaskPool-13:null) Schedule vm for HA:  VM[User|i-9-88-VM]

 
4. scheduleRestart()
 - new HaWorkVO() 
 - call wakeupWorkers();

5. run work

2012-03-16 11:44:22,012 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) Processing HAWork[40-HA-88-Running-Investigating]


6. restart()

2012-03-16 11:44:22,027 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) HA on VM[User|i-9-88-VM]

 
 - find Investigator
 - In OVM it failed, since OVM does not implement OvmInvestigator 

2012-03-16 11:44:22,052 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) SimpleInvestigator found VM[User|i-9-88-VM]to be alive? null

2012-03-16 11:44:22,053 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) VmwareInvestigator found VM[User|i-9-88-VM]to be alive? null
2012-03-16 11:44:22,053 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) XenServerInvestigator found VM[User|i-9-88-VM]to be alive? null


- since there is no Investigator
- Fencing off VM
- but there is no OvmFenceBuilder (Failed)

2012-03-16 11:46:28,712 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) Fencing off VM that we don't know the state of
2012-03-16 11:46:28,712 DEBUG [cloud.ha.XenServerFencer] (HA-Worker-4:work-40) Don't know how to fence non XenServer hosts Ovm
2012-03-16 11:46:28,712 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) Fencer XenServerFenceBuilder returned null
2012-03-16 11:46:28,713 DEBUG [cloud.ha.KVMFencer] (HA-Worker-4:work-40) Don't know how to fence non kvm hosts Ovm
2012-03-16 11:46:28,713 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) Fencer KVMFenceBuilder returned null

Finally unable to restart VM

2012-03-16 11:56:28,923 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-40) We were unable to fence off the VM VM[User|i-9-88-VM]