Date of Award
9-30-2014
Document Type
Thesis
Degree Name
Computer Science, MS
First Advisor
Hai Jiang
Committee Members
Hung-Chi Su; Jeff Jenness; Xiuzhen Huang
Call Number
LD 251 .A566t 2014 G74
Abstract
To achieve high performance parallel computing, the graphic processing unit (GPU) plays a critical role. NVIDIA invented CUDA as a parallel processing platform and programming model in the late 1990s. With CUDA, we can directly use GPU with C, C++, Fortran, Java or Python code by NVCC compiler. We introduced checkpoint/restart scheme and computation states migration strategy for fault tolerance. Checkpoint/Restart scheme is used to save all the computation state in run-time for later restoration if necessary. Migrating computation state is the process of moving computation states from one heavily loaded host to a lightly loaded host for load balancing and load sharing. This thesis focuses on the implementations of constructing computation states including local variables, execution counter and application-level stack structures in GPU, achieving GPU and CPU communication and migrating computation state from one machine to another through the support of a run-time module.
Rights Management
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Guo, Xinyuan, "GPU Computation Checkpoint/Restart Scheme with Application-Level Stacks" (2014). Student Theses and Dissertations. 784.
https://arch.astate.edu/all-etd/784