AI-X Cluster Architecture
- Management Network(10G)
- Data Network(100G RoCE)
- Internal Network(100G 1B)
- Internal Network(200G 1B)
Computing
DGX A100
- DGX A100
- 8x NVIDIA A100 GPUs
- 320GB GPU Memory
- 1TB System Memory
- Dual AMD Rome 2.25GHz(64-core) CPU
- 9x 200Gb/s NIC
DGX-1V
- 8x NVIDIA V100 GPUs
- 256GB GPU Memory
- 512GB System Memory
- Dual Intel Xeon E5-2698 v4 2.2GHz(20-core) CPU
- 4x 100Gb/s NIC
Storage
Network File System(NFS v3)
Ceph
- Software-defined storage ( Object storage, Block storage, File storage)
Networking
AI-X Computing Cluster
- Data network : Mellanox SN2100 (100G RoCE)
- Internal network : Mellanox QM8700 (200G Infiniband)
Commonly used in HPC
- 높은 처리량과 짧은 대기시간
- 슈퍼컴퓨터와 스토리지 시스템 연결
- RoCE : 통합 이더넷을 통한 RDMA
- Infiniband(IB) : SAN(System Area Network)을 위한 짧은 대기 시간 및 고대역폭
Link speed
- Enhanced Data Rate(EDR) - 25Gb/s per lane (100Gb/s for 4x)
- High Data Rate(HDR) - 50Gb/s per lane (200Gb/s for 4x)