This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates its applications in the FDTD simulations of radiation and scattering problems. Unfortunately, implementation of the new DGF formula in software requires a multiple precision arithmetic and may cause long runtimes. Therefore, an acceleration of the DGF computations on a CPU-GPU heterogeneous parallel processing system was developed using the multiple precision arithmetic and the OpenMP and CUDA parallel programming interfaces. The method avoids drawbacks of the CPU- and GPU-only accelerated implementations of the DGF, i.e. long runtime on the CPU and significant overhead of the GPU initialization respectively for long and short lengths of the DGF waveform. As a result, the seven-fold speedup was obtained relative to the reference DGF implementation on a multicore CPU thus applicability of the DGF in FDTD simulations was significantly improved.
2. Clemens, , M. and T. Weiland, "Discrete electromagnetism with the finite integration technique," Progress In Electromagnetics Research, Vol. 32, 65-87, 2001.
3. Schuhmann, , R., T. Weiland, and , "Conservation of discrete energy and related laws in the finite integration technique," Progress In Electromagnetics Research, Vol. 32, 301-316, 2001.
4. Bossavit, , A., Progress In Electromagnetics Research, and , "`Generalized finite differences' in computational electromagnetics,", Vol. 32, 45-64, 2001.
5. Teixeira, F. L., "Geometric aspects of the simplicial discretization of Maxwell's equations," Progress In Electromagnetics Research, Vol. 32, 171-188, 2001.
6. Vazquez, , J. , C. G. Parini, and , "Discrete Green's function formulation of FDTD method for electromagnetic modelling," Electron. Lett., Vol. 35, No. 7, 554-555, 1999.
7. Holtzman, , R. , R. Kastner, and , "The time-domain discrete Green's function method (GFM) characterizing the FDTD grid boundary," IEEE Trans. Antennas Propag., , Vol. 49, No. 7, 1079-1093, 2001.
8. Holtzman, , R, , R. Kastner, E. Heyman, and R. W. Ziolkowski, "Stability analysis of the Green's function method (GFM) used as an ABC for arbitrarily shaped boundaries," IEEE Trans. Antennas Propag., Vol. 50, No. 7, 1017-1029, 2002.
9. Jeng, S.-K., "An analytical expression for 3-D dyadic FDTD-compatible Green's function in infinite free space via z-transform and partial di®erence operators," IEEE Trans. Antennas Propag.,, Vol. 59, No. 4, 1347-1355, 2011.
10. Vazquez, , J., C. G. Parini, and , "Antenna modelling using discrete Green's function formulation of FDTD method," Electron. Lett.,, Vol. 35, No. 13, 1033-1034, 1999.
11. Ma, W., , M. R. Rayner, and C. G. Parini, "Discrete Green's function formulation of the FDTD method and its application in antenna modeling," IEEE Trans. Antennas Propag., Vol. 53, No. 1, 339-346, 2005.
12. Holtzman, , R, , R. Kastner, E. Heyman, and R. W. Ziolkowski, "Ultra-wideband cylindrical antenna design using the Green's function method (GFM) as an absorbing boundary condition (ABC) and the radiated ¯eld propagator in a genetic optimization ," Microw. Opt. Tech. Lett., Vol. 48, No. 2, 348-354, 2006.
13. De Hon, B. P. , J. M. Arnold, and , "Stable FDTD on disjoint domains --- A discrete Green's function diakoptics approach," Proc. The 2nd European Conf. on Antennas and Propag., 1-6, 2007.
14. Malevsky, , S., E. Heyman, and R. Kastner, "Source decomposition as a diakoptic boundary condition in FDTD with reflecting external regions," IEEE Trans. Antennas Propag., Vol. 58, No. 11, 3602-3609, 2010.
15. Schneider, J. B., K. Abdijalilov, and , "Analytic fleld propagation TFSF boundary for FDTD problems involving planar interfaces: PECs, TE, and TM," IEEE Trans. Antennas Propag., Vol. 54, No. 9, 2531-2542, 2006.
16. Stefanski, , T. P., "Fast implementation of FDTD-compatible Green's function on multicore processor," IEEE Antennas Wireless Propag. Lett., Vol. 11, 81-84, 2012.
17. Stefanski, T. P. and K. Krzyzanowska, "Implementation of FDTD-compatible Green's function on graphics processing unit," IEEE Antennas Wireless Propag. Lett., Vol. 11, 1422-1425, 2012.
18. Sypek, , P., A. Dziekonski, and M. Mrozowski, "How to render FDTD computations more effective using a graphics accelerator," IEEE Trans. Magn., Vol. 45, No. 3, 1324-1327, 2009.
19. Toivanen, , J. I., , T. P. Stefanski, N. Kuster, and N. Chavannes, "Comparison of CPML implementations for the GPU-accelerated FDTD solver ," Progress In Electromagnetics Research M,, Vol. 19, 61-75, 2011.
20. Tay, , W. C., , D. Y. Heh, and E. L. Tan, "GPU-accelerated funda-mental ADI-FDTD with complex frequency shifted convolutional perfectly matched layer," Progress In Electromagnetics Research M, Vol. 14, 177-192, 2010 .
21. Stefanski, T. P. and Acceleration of the 3D, "Acceleration of the 3D ADI-FDTD method using graphics processor units," IEEE MTT-S International Microwave Symposium Digest, 241-244, 2009.
22. Xu, , K., , Z. Fan, D.-Z. Ding, and R.-S. Chen, "GPU accelerated unconditionally stable Crank-Nicolson FDTD method for the analysis of three-dimensional microwave circuits," Progress In Electromagnetics Research, Vol. 102, 381-395, 2010.
23. Shahmansouri, , A. , B. Rashidian, and , "GPU implementation of split-field finite-difference time-domain method for Drude-Lorentz dispersive media," Progress In Electromagnetics Research , Vol. 125, 55-77, 2012.
24. Zainud-Deen, , S. H. , E. El-Deen, and , "Electromagnetic scattering using GPU-based finite difference frequency domain method," Progress In Electromagnetics Research B, Vol. 16, 351-369, 2009..
25. Demir, , V., "Graphics processor unit (GPU) acceleration of finite-difference frequency-domain (FDFD) method," Progress In Electromagnetics Research M, Vol. 23, 29-51, 2012.
26. Dziekonski, , A., , A. Lamecki, and M. Mrozowski, "GPU acceleration of multilevel solvers for analysis of microwave components with finite element method," IEEE Microw. Wireless Comp. Lett., Vol. 21, No. 1, 1-3, 2011.
27. Dziekonski, , A., , A. Lamecki, and M. Mrozowski, , "Tuning a hybrid GPU-CPU V-cycle multilevel preconditioner for solving large real and complex systems of FEM equations," IEEE Antennas Wireless Propag. Lett., Vol. 10, 619-622, 2011.
28. Dziekonski, , A., P. Sypek, A. Lamecki, and M. Mrozowski, "Finite element matrix generation on a GPU," Progress In Electromagnetics Research, Vol. 249, 249-265, 2012.
29. Dziekonski, A., , A. Lamecki, and M. Mrozowski, "A memory e±cient and fast sparse matrix vector product on a GPU," Progress In Electromagnetics Research, Vol. 116, 49-63, 2011.
30. Peng, , S. , Z. Nie, and , "Acceleration of the method of moments calculations by using graphics processing units," IEEE Trans. Antennas Propag., Vol. 56, No. 7, 2130-2133, 2008..
31. Xu, , K., , D. Z. Ding, Z. H. Fan, and R. S. Chen, "Multilevel fast multipole algorithm enhanced by GPU parallel technique for electromagnetic scattering problems," Microw. Opt. Technol. Lett., Vol. 52, No. 3, 502-507, 2010.
32. Lopez-Fernandez, J. A., , M. Lopez-Portugues, Y. Alvarez-Lopez, C. Garcia-Gonzalez, D. Martinez, and F. Las-Heras, "Fast antenna characterization using the sources reconstruction method on graphics processors," Progress In Electromagnetics Research , Vol. 126, 185-201, , 2012.
33. Gao, , P. C., Y. B. Tao, Z. H. Bai, and H. Lin, , "Mapping the SBR and TW-ILDCs to heterogeneous CPU-GPU architecture for fast computation of electromagnetic scattering," Progress In Electromagnetics Research, Vol. 122, 137-154, 2012.
34. Granlund, , T., "The multiple precision integers and ratio-nals library," Edition 2.2.1, GMP Development Team, 2010,.
35. Nakayama, , T., D. Takahashi, and , "Implementation of multiple-precision floating-point arithmetic library for GPU computing," Proc. 23rd IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), 343-349, , 2011.
36. OpenMP Architecture Review Board, "OpenMP application program interface," Version 3.1, 2011.
37. Nvidia, "CUDA C programming guide," Version 4.2,.
38. Harris, , M., "Optimizing parallel reduction in CUDA," NVIDIA.
39. Shen, , W., , D. Wei, W. Xu, X. Zhu, and S. Yuan, "Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU ," Computer Methods and Programs in Biomedicine,, Vol. 100, No. 1, 87-96, 2010 .