Lec12-Hardware Acceleration + GPUs
Contents
GPU Acceleration
GPU Programming
gpu 具有良好的并行性
a single CUDA example
注意到计算所需变量互不相关,所以可以并行计算
数据IO操作是瓶颈
keep data in GPU memory as long as possible –> call
.numpy()
less frequently
GPU memory hierarchy
利用shared memory
- launch thread grid and blocks
- cooperative fetch common to shared memory to increase reuse
case study: matrix multiplication on GPU
|
|
thread level
|
|
block level: shared memory tiling
🤯 吃了没有完全学习好架构体系的亏!
多线程使得计算和加载数据同时进行
合作fetching也有意思捏