﻿ 分层检查点的近似最优周期计算模型
 计算机应用   2017, Vol. 37 Issue (1): 103-107  DOI: 10.11772/j.issn.1001-9081.2017.01.0103 0

### 引用本文

LYU Hongwu, GU Lei, WANG Huiqiang, ZOU Shichen, FENG Guangsheng. Quasi-optimal period computation model for hierarchical checkpoint protocol[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 103-107. DOI: 10.11772/j.issn.1001-9081.2017.01.0103.

### 文章历史

Quasi-optimal period computation model for hierarchical checkpoint protocol
LYU Hongwu, GU Lei, WANG Huiqiang, ZOU Shichen, FENG Guangsheng
College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China
Abstract: With the increase of High Performance Computation (HPC) system scale, it's very important to increase the efficiency of the checkpoint. A model to compute the quasi-optimal period for hierarchical checkpoint protocol was proposed. First, the execution of an application in HPC system was assessed, and checkpoint period optimization problem was abstracted as the nonlinear checkpoint cost model. Second, the hierarchical checkpoint cost formula was derived by simulating the possible fault location; two deceleration parameters and an acceleration parameter were introduced to reflect the impact of message logging on the hierarchical checkpoint. The simulation results show that, compared with the quasi-optimal period checkpoint cost, the average error value of the proposed model is below 5%, which is 20% less than that of the traditional model based on Markov chain. The proposed model can signally increase the efficiency of the hierarchical checkpoint protocol; meanwhile enhance the availability of the HPC system.
Key words: High Performance Computation (HPC)    fault tolerance    hierarchical checkpoint    checkpoint period    quasi-optimal solution
0 引言

1 相关工作

2 分层检查点近似最优周期计算模型 2.1 检查点成本模型

1) TIMEbase是应用程序的规模，即应用程序无开销(不设置检查点、无故障)的基础运行时间。

2) μ是运行平台的平均无故障时间。

3) T是检查点周期。

4) Delay是检查点延迟的集合，Delay={Delay1,Delay2,…,DelayN}。其中：N是设置检查点的次数，Delayi是存储检查点而消耗的时间，1≤i≤N。

5) LOST是故障损失时间，LOST={LOST1,LOST2,…,LOSTM}，其中，M是故障发生次数。特别的，LOSTi=Fi+Di+Ri，Fi是故障发生到最近检查点的任务丢失时间，Di 是故障后停机时间，Ri是恢复花费的时间，0≤i≤M。

 $TIME=TIM{{E}_{\text{base}}}+\sum\limits_{i=1}^{N}{Dela{{y}_{i}}}+\sum\limits_{j=0}^{M}{({{F}_{i}}+{{D}_{i}}+{{R}_{i}})}$ (1)

 \begin{align} & TIME=TIM{{E}_{\text{base}}}+TIM{{E}_{\text{base}}}\left( T-C \right)\times C+ \\ & +M(T/2+D+R) \end{align} (2)

 $COST(T)=TIME(T)-TIM{{E}_{\text{base}}}/TIME(T)$ (3)
 \begin{align} & COST(T)=COS{{T}_{\text{Delay}}}(T)+COS{{T}_{\text{Fault}}}(T)- \\ & COS{{T}_{\text{Delay}}}(T)COS{{T}_{\text{Fault}}}(T) \end{align} (4)

2.2 检查点成本模型优化

2.2.1 故障位置对故障损失时间的影响

 $WORK=T-(1-\alpha )C$ (5)

 图 1 分层检查点可能故障位置 Figure 1 Possible fault position of hierarchical checkpoint

 \begin{align} & {{F}_{w}}(q)=\left[ T-GC(q) \right]/T. \\ & \frac{1}{G}\sum\limits_{g=1}^{G}{\left[ (G-g+1)\alpha C(q)+\frac{T-GC(q)}{2} \right]} \end{align} (6)
 \begin{align} & {{F}_{s}}(q)=\frac{GC(q)}{T}\frac{1}{{{G}^{2}}}\sum\limits_{g=1}^{G}{\left[ \sum\limits_{s=0}^{g-2}{(G-g+s+2)\alpha C(q)+} \right.} \\ & T-GC(q)+G\alpha C(q)+T-\alpha C(q)+C(q)/2+ \\ & \left. \sum\limits_{s=1}^{G-g}{(s+1)\alpha C(q)} \right] \end{align} (7)

 $COS{{T}_{\text{hier}}}=\frac{T-WORK}{T}+\frac{1}{\mu }(D(q)+R(q)+F(q))$ (8)
2.2.2 消息日志对检查点成本的影响因子

 $COS{{T}_{\text{hier}}}=\frac{T-\beta \lambda WORK}{T}+\frac{1}{\mu }(D(q)+R(q)+\frac{F(q)}{\rho })$ (9)
3 仿真实验 3.1 仿真环境

 图 2 不同周期的检查点成本 Figure 2 Checkpoint costs for different periods
 图 3 μ=120 s时的两种算法对比结果 Figure 3 Analysis of two methods (μ=120 s)
 图 4 μ=240 s时的两种算法对比结果 Figure 4 Analysis results of two methods (μ=240 s)
3.2 仿真结果分析

4 结语

 [1] DONGARRA J, BECKMAN P, MOORE T, et al. The international exascale software project roadmap[J]. International Journal of High Performance Computing Applications, 2011, 25 (1) : 3-60. doi: 10.1177/1094342010391989 [2] SCHROEDER B, GIBSON G A. A large-scale study of failures in high-performance computing systems[J]. IEEE Transactions on Dependable and Secure Computing, 2010, 7 (4) : 337-350. doi: 10.1109/TDSC.2009.4 [3] YOUNG J W. A first order approximation to the optimum checkpoint interval[J]. Communications of the ACM, 1974, 17 (9) : 530-531. doi: 10.1145/361147.361115 [4] DALY J T. A higher order estimate of the optimum checkpoint interval for restart dumps[J]. Future Generation Computer Systems, 2006, 22 (3) : 303-312. doi: 10.1016/j.future.2004.11.016 [5] 鄢喜爱, 杨金民, 田华. 双机容错系统中最佳检查点间隔的分析[J]. 计算机工程, 2007, 33 (5) : 283-285. ( YAN X A, YANG J M, TIAN H. Analysis of best checkpoint interval of duplicated fault tolerance system[J]. Computer Engineering, 2007, 33 (5) : 283-285. ) [6] GE Y, YANG Y, ZHU C. Study of the best checkpoint interval in the distributed simulation system based on virtualization technology[C]//AMCCE 2015:Proceedings of 2015 International Conference on Automation, Mechanical Control and Computational Engineering. Amsterdam:Atlantis Press, 2015:193-197. [7] 黄琼, 尚利宏, 周密, 等. 一种面向大规模并行系统的分组协同检查点算法[J]. 计算机研究与发展, 2010, 47 (S1) : 158-163. ( HUANG Q, SHANG L H, ZHOU M, et al. A group-based coordinated checkpointing algorithm for large-scale parallel system[J]. Journal of Computer Research and Development, 2010, 47 (S1) : 158-163. ) [8] JIN H, CHEN Y, ZHU H, et al. Optimizing HPC fault-tolerant environment:An analytical approach[C]//ICPP 2010:Proceedings of the 201039th International Conference on Parallel Processing. Piscataway, NJ:IEEE, 2010:525-534. [9] ZHENG Z, LAN Z. Reliability-aware scalability models for high performance computing[C]//CLUSTER' 09:Proceedings of 2009 IEEE International Conference on Cluster Computing and Workshops. Piscataway, NJ:IEEE, 2009:1-9. [10] WANG L, PATTABIRAMAN K, KALBARCZYK Z, et al. Modeling coordinated checkpointing for large-scale supercomputers[C]//DSN 2005:Proceedings of the 2005 International Conference on Dependable Systems and Networks. Piscataway, NJ:IEEE, 2005:812-821. [11] GUERMOUCHE A, ROPARS T, SNIR M, et al. HydEE:failure containment without event logging for large scale send-deterministic mpi applications[C]//IPDPS 2012:Proceedings of 2012 IEEE 26th International Conference on Parallel & Distributed Processing Symposium. Piscataway, NJ:IEEE, 2012:1216-1227. [12] BOUTEILLER A, HERAULT T, BOSILCA G, et al. Correlated set coordination in fault tolerant message logging protocols[C]//Euro-Par' 11:Proceedings of the 17th International Conference on Parallel Processing. Berlin:Springer, 2011:51-64. [13] BOUTEILLER A, BOSILCA G, DONGARRA J. Redesigning the message logging model for high performance[J]. Concurrency and Computation:Practice and Experience, 2010, 22 (16) : 2196-2211. doi: 10.1002/cpe.v22:16