volume | PIER Journals

Abstract

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

1. Krakiwsky, S. E., L. E. Turner, and M. Okoniewski, "Acceleration of finite difference time-domain (FDTD) using graphics processor units (GPU) ," IEEE MTT-S International Microwave Symposium Digest 2004, 1033-1036, June 2004. Google Scholar

2. Adams, S., J. Payne, and R. Boppana, "Finite difference time domain (FDTD) simulations using graphics processors," High Performance Computing Modernization Program Users Group Conference, 2007.

3. Sypek, P., A. Dziekonski, and M. Mrozowski, "How to render FDTD computations more effective using a graphics accelerator," IEEE Transactions on Magnetic, Vol. 45, No. 3, 1324-1327, March 2009.
doi:10.1109/TMAG.2009.2012614 Google Scholar

4. Xu, K., Z. Fan, D.-Z. Ding, and R.-S. Chen, "GPU accelerated unconditionally stable crank-nicolson FDTD method for the analysis of three-dimensional microwave circuits ," Progress In Electromagnetics Research, Vol. 102, 381-395, 2010.
doi:10.2528/PIER10020606 Google Scholar

5. Stefanski, T. P. and T. D. Drysdale, "Acceleration of the 3D ADIFDTD method using graphics processor units," IEEE MTT-S International Microwave Symposium Digest 2009, 241-244, June 2009.
doi:10.1109/MWSYM.2009.5165678 Google Scholar

6. Rossi, F. V., P. P. M. So, N. Fichtner, and P. Russer, "Massively parallel two-dimensional TLM algorithm on graphics processing units," IEEE MTT-S International Microwave Symposium Digest 2008, 153-156, June 2008.
doi:10.1109/MWSYM.2008.4633126 Google Scholar

7. Rossi, F. and P. P. M. So, "Hardware accelerated symmetric condensed node TLM procedure for NVIDIA graphics processing units," IEEE APSURSI Antennas and Propagation Society International Symposium 2009 , 1-4, June 2009.
doi:10.1109/APS.2009.5171726 Google Scholar

8. Tao, Y. B., H. Lin, and H. J. Bao, "From CPU to GPU: GPU-based electromagnetic computing (GPUECO)," Progress In Electromagnetics Research, Vol. 81, 1-19, 2008.
doi:10.2528/PIER07121302 Google Scholar

9. Gao, P. C., Y. B. Tao, and H. Lin, "Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU," Progress In Electromagnetics Research, Vol. 107, 187-202, 2010.
doi:10.2528/PIER10061807 Google Scholar

10. Lezar, E. and D. B. Davidson, "GPU-accelerated method of moments by example: Monostatic scattering," IEEE Antennas and Propagation Magazine, Vol. 52, 120-135, 2010.
doi:10.1109/MAP.2010.5723240 Google Scholar

11. Garcia-Castillo, L. E., I. Gomez-Revuelto, F. Saez de Adana, and M. Salazar-Palma, "A finite element method for the analysis of radiation and scattering of electromagnetic waves on complex environments," Computer Methods in Applied Mechanics and Engineering , Vol. 194, No. 2-5, 637-655, February 2005.
doi:10.1016/j.cma.2004.05.025 Google Scholar

12. Gomez-Revuelto, I., L. E. Garcia-Castillo, D. Pardo, and L. Demkowicz, "A two-dimensional self-adaptive finite element method for the analysis of open region problems in electromagnetics," IEEE Transactions on Magnetics, Vol. 43, No. 4, 1337-1340, April 2007.
doi:10.1109/TMAG.2007.892413 Google Scholar

13. Lezar, E. and D. B. Davidson, "GPU-based arnoldi factorisation for accelerating finite element eigenanalysis," Proceedings of the 11th International Conference on Electromagnetics in Advanced Applications --- ICEAA'09, 380-383, September 2009.
doi:10.1109/ICEAA.2009.5297413 Google Scholar

14. Jian, L. and K. T. Chau, "Design and analysis of a magnetic-geared electronic-continuously variable transmission system using finite element method," Progress In Electromagnetics Research, Vol. 107, 47-61, 2010.
doi:10.2528/PIER10062806 Google Scholar

15. Ping, X. W. and T. J. Cui, "The factorized sparse approximate inverse preconditioned conjugate gradient algorithm for finite element analysis of scattering problems," Progress In Electromagnetics Research, Vol. 98, 15-31, 2009.
doi:10.2528/PIER09071703 Google Scholar

16. Tian, J., Z. Q. Lv, X. W. Shi, L. Xu, and F. Wei, "An efficient approach for multifrontal algorithm to solve non-positive-definite finite element equations in electromagnetic problems," Progress In Electromagnetics Research, Vol. 95, 121-133, 2009.
doi:10.2528/PIER09070207 Google Scholar

17. Saad, Y., Iterative Methods for Sparse Linear Systems, SIAM, 2004.

18. Velamparambil, S., S. MacKinnon-Cormier, J. Perry, R. Lemos, M. Okoniewski, and J. Leon, "GPU accelerated krylov subspace methods for computational electromagnetics," 38th European Microwave Conference EuMC 2008, 1312-1314, October 27-31, 2008. Google Scholar

19. Cwikla, A., M. Mrozowski, and M. Rewienski, "Finite-difference analysis of a loaded hemispherical resonator," IEEE Transactions on Microwave Theory and Techniques, Vol. 51, No. 5, 1506-1511, May 2003.
doi:10.1109/TMTT.2003.810131 Google Scholar

20. Yang, X., "A survey of various conjugate gradient algorithms for iterative solution of the largest/smallest eigenvalue and eigenvector of a symmetric matrix," Progress In Electromagnetics Research, Vol. 5, 567-588, 1991. Google Scholar

21. Bell, N. and M. Garland, "Efficient sparse matrix-vector multiplication on CUDA," NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, December 2008. Google Scholar

22. Vazquez, F., E. M. Garzon, J. A. Martinez, and J. J. Fernandez, "The sparse matrix vector product on GPUs," Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, Vol. 2, 1081-1092, July 2009. Google Scholar

23. Monakov, A., A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix-vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers, Lecture Notes in Computer Science, Vol. 5952, 111-125, 2010.
doi:10.1007/978-3-642-11515-8_10 Google Scholar

24. Vazquez, F., G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," IEEE 10th International Conference on Computer and Information Technology (CIT), 1146-1151, 2010.
doi:10.1109/CIT.2010.208 Google Scholar

25. Dziekonski, A., A. Lamecki, and M. Mrozowski, "GPU acceleration of multilevel solvers for analysis of microwave components with finite element method ," IEEE Microwave and Wireless Components Letters, Vol. 21, No. 1, January 1-3, 2011. Google Scholar

26. Kirk, D. B. and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Elsevier Inc., 2010.

27. Sanders, J. and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Nvidia Corporation, 2011.

28. Programming Guide Version 3.2, Nvidia Corporation, 2011.

29. http://www.nvidia.com/object/fermi architecture.html.

30. CUDA CUSPARSE Library, Nvidia Corporation, 2011.

31. Lee, V. W., C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU ," ACM SIGARCH Computer Architecture News --- ISCA'10, Vol. 38, June 2010. Google Scholar

32. "http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading/#5,". Google Scholar

33. Kucharski, A. and P. Slobodzian, "The application of macromodels to the analysis of a dielectric resonator antenna excited by a cavity backed slot ," 38th European Microwave Conference, EuMC 2008, 519-522, October 27-31, 2008. Google Scholar

Mr. Adam Dziekonski

Dr. Adam Lamecki

Professor Michal Mrozowski