CROSSTALK AND DISCONTINUITIES REDUCTION ON MULTI-MODULE MEMORY BUS BY PARTICLE SWARM OPTIMIZATION

D.-B. Lin, F.-N. Wu*, W.-S. Liu, C.-K. Wang, and H.-Y. Shih

Graduate Institute of Computer and Communication Engineering, National Taipei University of Technology, Taiwan, R.O.C.

Abstract—Due to high-density routing under the CPU and DIMM areas, the original design of even and odd mode characteristic impedances changes. The occurrence of multi-drop problem between the CPU and memory chip causes over- and under-driven that reduce the eye opening. Furthermore, the different phase velocities of even- and odd-modes cause timing jitter at the receiver end. This paper proposes two steps to solve the complex issue of signal integrity for the multi-module memory bus. First, particle swarm optimization (PSO) is used to tune the characteristic impedance of the transmission line and on-die termination (ODT) values to improve transmission line impedance changes to obtain maximum power delivery. The fitness function of the algorithm is defined by selecting the minimum reflection coefficient at the driver side and maximum the transmission coefficient at the receiver side to reduce the over- and under-driven. Second, the timing jitter can be reduced by placing a capacitor to compensate for the velocity difference caused by different propagation modes. Finally, signal integrity enhancements for the DDR3 are verified by measuring $S$ parameters in the frequency domain and postprocessed eye diagrams in the time domain.

1. INTRODUCTION

With the rapid development of computer hardware and software in recent years, data communication systems are increasingly demanding high speed, capacity, and complexity. Users not only aspire to for high system performance but also require a high-speed multi-core processor. A high-performance system must be able to access
memory at a high speed. Thus, increased memory access speed to improve system performance is more important than ever. Double data rate three synchronous dynamic random access memory (DDR3 SDRAM) [1] is currently the most popular memory bus. Additional memory can be added by using a multi-module memory bus to form a multi-drop topology. DDR3 works at a faster speed, higher data rate, and lower operating voltage than DDR2. The data rate reaches speeds of up to 1.6 Gb/s whereas the bias is 1.5 V less than DDR2 systems and thus consumes less power. The trend of device miniaturization causes more compact routing on printed circuit boards (PCBs), which in turn causes signal integrity issues, such as crosstalk and waveform distortion. Therefore, the electromagnetic effect on the PCB cannot be ignored [2–6]. CPU and memory are high-speed digital switching devices, and hence, they are affected by coupling and crosstalk to an even greater extent. Low operating voltages also decrease the signal anti-interference capability and easily cause errors on the receiver side. In order to increase the transmitted signal speeds and decrease signal distortions, terminating resistors are typically added at the ends of receivers to reduce multiple reflection problems [7, 8]. The use of terminators inside the chip is called on-die termination (ODT) [9]. Signal integrity (SI) is destroyed with the use of improper terminator values. The other critical SI issue of DDR3 systems is multiple crosstalk on the channel when multiple bits are transmitted simultaneously. Crosstalk is of particular concern in high-density, high-speed, and parallel data communications. Several studies on broadband impedance matching in multi-module memory bus have been carried out; in these studies, the topology uses one data line to connect to a multi-port [10–12]. The PCB routing area for the memory bus is compact due to the increased cores of the CPU, which requires more memory channels. The tight routing area significantly limits the traces spacing and changes the characteristic impedance of the traces, which results in discontinuities and increased crosstalk. Extensive literature is available on crosstalk related to the memory bus.

The topologies used in the previously mentioned studies only concerned single modules and did not consider the discontinuities on multi-ports. It is not enough to discuss coupling between adjacent transmission lines. These coupling effects were studied in [13–16]. In practice, the impedance changes and discontinuities problem of a multi-port and the coupling of two adjacent lines should be considered simultaneously. Identifying interference and applying a solution to minimize the disturbances as early as possible in the design phase provides many advantages. Such design principles can reduce time
and money wasted because of repetitive debugging and redesigning. An algorithm is proposed to enhance the efficiency of the design methodology for solving electromagnetic problems in complex circuits. In this study, the model of three microstrip lines of DDR3 memory bus is considered. Particle swarm optimization (PSO) is used to calculate the characteristic impedance of each transmission lines segment and the optimized ODT values for the write- and read-states. The compensating capacitance between adjacent traces is calculated after PSO optimizes the geometric parameters of transmission lines. Thus, the SI on the multi-module memory bus is further improved by compensating for jitter.

2. PARTICLE SWARM OPTIMIZATION

Optimization problems arise in a wide variety of scientific and engineering applications including signal processing, system identification, filter design, function approximation, regression analysis, and so on. In many engineering and scientific applications, a real-time solution of optimization problems is required. The genetic algorithm (GA)

![Flowchart of PSO](image-url)

**Figure 1.** Flowchart of PSO.
and PSO methods are used in many optimization problems. In [17], GA-based approaches for obtaining optimal design solutions were compared. PSO, first introduced by Eberhart and Kennedy in 1995, is a relatively new optimization algorithm. For about a decade, PSO has been successfully applied in multiple research applications, such as the PSO-based method for obtaining optimal design solutions [18]. Although both PSO and GA are optimization methods, the computation time required by PSO is shorter than GA in this study. The PSO method can duplicate cooperation between individuals in the group by the exchange of information and experiences from one generation to another [19]. Unlike GA, PSO does not apply evolution operators as crossover and mutation. There are some advantages in exploiting the global optimum with the PSO method, especially in the convergence speed. In PSO, each particle adjusts its moving of direction and distance according to the best-fitness particle information, instead of competition by cooperation. PSO has been found to be robust and fast in obtaining the optimal value. The flow chart of PSO is shown in Fig. 1 and the pseudo code is denoted in the appendix.

3. THE MULTI-MODULE MEMORY BUS STRUCTURE

Electromagnetic coupling is inversely proportional to the distance between adjacent traces. A simplified DDR3 architecture is shown in Fig. 2. The CPU through the three microstrip lines connects to the two dual in-line memory modules (DIMMs), DIMM_0 and DIMM_1. This study considers the interference of two adjacent lines to the central one. The trace width and spacing of the three microstrip lines are the same. In this study, the topology is simplified by ignoring the impacts of the DIMM connectors (the equivalent circuit is shown in Fig. 3).

![Figure 2. The DDR3 architecture.](image-url)
The transmission lines between the DIMM connector and the memory chip on the DIMM is denoted by $L_1$. The transmission line between the DIMM$_0$ and DIMM$_1$ connectors is denoted by $L_2$. The $L_3$ represents the transmission line between the DIMM$_0$ connector and the CPU. The mutual capacitors and inductors of section $n$ are denoted as $C_{mn}$ and $L_{mn}$, respectively.

The system operates in four states: (a) write to DIMM$_1$, (b) write to DIMM$_0$, (c) read from DIMM$_1$, and (d) read from DIMM$_0$. The specifications for DDR3 SDRAM were specified by the Joint Electron Device Engineering Council (JEDEC) [20]. The writing-state values used for ODT ($R_0$ and $R_1$) and the output resistor ($R_c$) follow the JEDEC standard and are listed in Table 1.

**Figure 3.** The equivalent circuit of DDR3.

**Table 1.** The values of the driver resistor and ODTs for the memory module.

<table>
<thead>
<tr>
<th>$R_c$ ($\Omega$)</th>
<th>$R_0/ R_1$ ($\Omega$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>34</td>
<td>60</td>
</tr>
<tr>
<td>40</td>
<td>120</td>
</tr>
<tr>
<td></td>
<td>40</td>
</tr>
<tr>
<td></td>
<td>20</td>
</tr>
<tr>
<td></td>
<td>30</td>
</tr>
</tbody>
</table>
4. SIGNAL ANALYSIS OF DDR3

4.1. Signal Spectrum

The data rate of DDR3 reaches up to 1.6 Gb/s, the rising/falling time is 50 ps and the bias is only 1.5 V. Using the advanced design system (ADS), the spectrum of a DDR3 random signal can be simulated as showed in Fig. 4. The signal power is concentrated between 0 and 1.6 GHz and has less power from 2.4 GHz to 4 GHz. The spectrum of 1.6 Gb/s was distributed in the odd harmonic at 800 MHz. At higher frequencies, the energy distribution is significantly reduced. To achieve a large bandwidth, it is desired to match the impedance of all ports. In practice, length and line-to-line spacing of transmission lines were limited by the real circuit architecture and the compact routing area. JEDEC defines a limited set of ODT values which results the limited impedance matching of the driver, receiver and terminators. Thus, the channel can not achieve a large bandwidth due to the impedance mismatches and the discontinuities. The proposed algorithm optimizes the impedance matching of the ports based on the signal spectrum by providing different weightings to balance the channel’s frequency response.

4.2. Single-Line Equivalent Model (SLEM)

The different modes of signal propagation change the electric-magnetic fields of adjacent lines which in turn change the characterization impedance of transmission lines. In this paper, the worst-cases are considered as shown in the Fig. 5. The signal on the centerline is in-phase and out-of-phase compared to the signals present on the adjacent lines, respectively.

![Figure 4. The signal spectrum.](image-url)
The characteristic impedance and velocity variations can be determined by applying Kirchoff’s current and voltage law to the equivalent inductance and capacitance matrices.

\[
\begin{bmatrix}
I_1 \\
I_2 \\
I_3
\end{bmatrix} =
\begin{bmatrix}
C_{11} & -C_{12} & -C_{13} \\
-C_{21} & C_{22} & -C_{23} \\
-C_{31} & -C_{32} & C_{33}
\end{bmatrix}
\begin{bmatrix}
dV_1 \\
dV_2 \\
dV_3
\end{bmatrix},
\]

(1)

\[
\begin{bmatrix}
V_1 \\
V_2 \\
V_3
\end{bmatrix} =
\begin{bmatrix}
L_{11} & L_{12} & L_{13} \\
L_{21} & L_{22} & L_{23} \\
L_{31} & L_{32} & L_{33}
\end{bmatrix}
\begin{bmatrix}
dI_1 \\
dI_2 \\
dI_3
\end{bmatrix},
\]

(2)

where \(C_{ii}\) is the total capacitance and \(C_{ij}\) (\(i \neq j\)) is the mutual capacitance. Due to the symmetry of the circuit structure, the \(C_{12} = C_{21} = C_{23} = C_{32} = C_m\), and the \(L_{12} = L_{21} = L_{23} = L_{32} = L_m\). When two adjacent lines have the same signal phase, then, \(I_1 = I_2 = I_3\) and \(V_1 = V_2 = V_3\). The voltage of centerline becomes:

\[
V_2 = (L_{21} + L_{22} + L_{23}) \frac{dI_2}{dt} = (L_{22} + 2L_m) \frac{dI_2}{dt}.
\]

(3)

The equivalent capacitances and inductances of the center trace are

\[
L_{\text{eff}}^{\text{inphase}} = L_{22} + 2L_m,
\]

(4)

\[
C_{\text{eff}}^{\text{inphase}} = C_{2g}.
\]

(5)
Furthermore, the characteristic impedance and velocity of the in-phase mode can be obtained as follows:

\[
Z_{\text{inphase}}^{\text{eff}} = \sqrt{\frac{L_{\text{inphase}}^{\text{eff}}}{C_{\text{inphase}}^{\text{eff}}}} = \sqrt{\frac{L_{22} + 2L_m}{C_{2g}}} , \tag{6}
\]

\[
V_{\text{inphase}}^{\text{eff}} = \frac{1}{\sqrt{L_{\text{inphase}}^{\text{eff}} . C_{\text{inphase}}^{\text{eff}}}} = \frac{1}{\sqrt{(L_{22} + 2L_m) \cdot C_{2g}}} . \tag{7}
\]

When the phase of two adjacent lines is out-of-phase, \(-I_1 = I_2 = -I_3\) and \(-V_1 = V_2 = -V_3\). The voltage and current on the center trace of the SLEM can be determined by

\[
V_2 = (L_{21} - L_{22} - L_{23}) \frac{dI_2}{dt} = (L_{22} - 2L_m) \frac{dI_2}{dt} , \tag{8}
\]

\[
I_2 = (2C_{21} + C_{2g} + 2C_{23}) \frac{dV_2}{dt} = (C_{2g} + 4C_m) \frac{dV_2}{dt} . \tag{9}
\]

The equivalent capacitance, inductance and characteristic impedance of the center trace can be expressed as

\[
L_{\text{eff}}^{\text{out of phase}} = L_{22} - 2L_m , \tag{10}
\]

\[
C_{\text{eff}}^{\text{out of phase}} = C_{2g} + 4C_m , \tag{11}
\]

\[
Z_{\text{eff}}^{\text{out of phase}} = \sqrt{\frac{L_{\text{eff}}^{\text{out of phase}}}{C_{\text{eff}}^{\text{out of phase}}}} = \sqrt{\frac{L_{22} - 2L_m}{C_{2g} + 4C_m}} , \tag{12}
\]

\[
V_{\text{eff}}^{\text{out of phase}} = \frac{1}{\sqrt{L_{\text{eff}}^{\text{out of phase}} . C_{\text{eff}}^{\text{out of phase}}}} = \frac{1}{\sqrt{(L_{22} - 2L_m) \cdot (C_{2g} + 4C_m)}} . \tag{13}
\]

The values of the \(C\) and \(L\) matrices can be extracted by using the geometric information of the SLEM [21].

4.3. \(S\) Parameters

When the system works in the write-state, the CPU/MCH transmits the signals to DIMM\(_1\). The path includes \(L_3, L_2, L_1\) and an \(L_1\) stub that connects DIMM\(_0\) as shown in Figs. 3 and 6. The ABCD matrix
The SLEM of DDR3 can be expressed as follows:

\[
\begin{bmatrix}
A & B \\
C & D
\end{bmatrix} = \begin{bmatrix}
\cos (\theta_3^{\text{mode}}, \text{eff}) & j (Z_3^{\text{mode}}, \text{eff}) \\
j Z_3^{\text{mode}}, \text{eff} \cdot \sin (\theta_3^{\text{mode}}, \text{eff}) & \cos (\theta_3^{\text{mode}}, \text{eff})
\end{bmatrix} \times \begin{bmatrix} 1 & 0 \\ Y & 1 \end{bmatrix}\times \begin{bmatrix}
\cos (\theta_2^{\text{mode}}, \text{eff}) & j (Z_2^{\text{mode}}, \text{eff}) \\
j Z_2^{\text{mode}}, \text{eff} \cdot \sin (\theta_2^{\text{mode}}, \text{eff}) & \cos (\theta_2^{\text{mode}}, \text{eff})
\end{bmatrix} \times \begin{bmatrix}
\cos (\theta_1^{\text{mode}}, \text{eff}) & j (Z_1^{\text{mode}}, \text{eff}) \\
j Z_1^{\text{mode}}, \text{eff} \cdot \sin (\theta_1^{\text{mode}}, \text{eff}) & \cos (\theta_1^{\text{mode}}, \text{eff})
\end{bmatrix}.
\] (14)

\(Y\) represents the admittance caused by the \(L_1\)-stub and the terminator, \(R_0\). Where the \(Z_{n, \text{mode}}^{\text{eff}}\) represents the effective impedance of section \(n\) trace for the center line when the signal is transmitted by in- or out-of-phase. The \(S\) parameters can be obtained by normalizing the input and output resistors, \(Z_{01}\) and \(Z_{02}\), to \(R_c\) and \(R_1\), respectively.

\[
S_{11} = \frac{A \sqrt{Z_{02}/Z_{01}} + B \frac{1}{\sqrt{Z_{01} \cdot Z_{02}}} - C \sqrt{Z_{01} \cdot Z_{02}} - D \sqrt{Z_{01}/Z_{02}}}{A \sqrt{Z_{02}/Z_{01}} + B \frac{1}{\sqrt{Z_{01} \cdot Z_{02}}} + C \sqrt{Z_{01} \cdot Z_{02}} + D \sqrt{Z_{01}/Z_{02}}}. \] (15)

\[
S_{21} = \frac{2}{A \sqrt{Z_{02}/Z_{01}} + B \frac{1}{\sqrt{Z_{01} \cdot Z_{02}}} + C \sqrt{Z_{01} \cdot Z_{02}} + D \sqrt{Z_{01}/Z_{02}}}.
\]

Similarly, when the circuit works on the read-state, the \(S\)-parameters can be obtained by (15), by changing the propagation path of the signal in (14).
To focus on the discontinuities of the multi-drop memory bus, the transmission lines are considered as lossless. To achieve small discontinuities so that the circuit can deliver the maximum power from the driver to different receivers in the read- and write-states, the fitness function uses 81 frequency points to calculate the $S$-parameters to obtain values for the capacitors, inductors, impedance, and velocity for the in- and out-of phases. The return loss should be minimized when the system works on both read- and write-states and the delivered power should be maximized when the CPU/MCH writes data to DIMM$_{0}$/DIMM$_{1}$ and reads data from DIMM$_{0}$/DIMM$_{1}$. Four fitness functions are proposed to satisfy the different working configurations of the system.

When the CPU/MCH transmits data to DIMM$_{0}$, the system works in the write-state. The driver end is set to port 1, the power is delivered to port 2 and the power is increased while port acts as a load to reduce multiple reflections. If the network is lossless, then $|S_{11}|^2 + |S_{21}|^2 + |S_{31}|^2 = 1$. The fitness function can be defined as:

$$fit_{\text{write to DIMM}_0} = \sum \left( |S_{11}(f)|^2 + 1 - |S_{21}(f)|^2 \right),$$

where $f$ is the sampling frequency of the spectrum of the DDR3 signal. The best case is when $|S_{11}|^2 = 0$ and $|S_{21}|^2 = 1$ because all of the power is delivered to DIMM$_{0}$ and no power is delivered to DIMM$_{1}$. Additionally, the fitness value equals 0. Similarly, the other cases can be defined as:

$$fit_{\text{write to DIMM}_1} = \sum \left( |S_{11}(f)|^2 + 1 - |S_{31}(f)|^2 \right),$$

$$fit_{\text{read from DIMM}_0} = \sum \left( |S_{22}(f)|^2 + 1 - |S_{12}(f)|^2 \right),$$

$$fit_{\text{read from DIMM}_1} = \sum \left( |S_{33}(f)|^2 + 1 - |S_{13}(f)|^2 \right).$$

By integrating (16)–(19) and considering the in-and out-of-phases of the central line, the fitness function can be defined as:
fitness function

\[
= \left( \text{fit}_{\text{in phase}}^{\text{write to DIMM}_0} \right)^2 + \left( \text{fit}_{\text{in phase}}^{\text{write to DIMM}_1} \right)^2 + \left( \text{fit}_{\text{in phase}}^{\text{read from DIMM}_0} \right)^2 \\
+ \left( \text{fit}_{\text{in phase}}^{\text{write to DIMM}_1} \right)^2 + \left( \text{fit}_{\text{out of phase}}^{\text{write to DIMM}_0} \right)^2 + \left( \text{fit}_{\text{out of phase}}^{\text{write to DIMM}_1} \right)^2 \\
+ \left( \text{fit}_{\text{read from DIMM}_0} \right)^2 + \left( \text{fit}_{\text{read from DIMM}_1} \right)^2 \\
\]

(20)

where the in-phase and out-of-phase represent the fitness value of the signal propagation mode on the center line. The fitness value was used to balance SI for every system configurations to avoid converging into local optimal solutions instead of global optimal solutions.

6. IMPROVE TIMING JITTER

The adaptive function of the algorithm can solve discontinuities for the multi-port modules memory bus, but cannot solve the clock jitter caused by the different modes of signal velocity due to the coupling between adjacent traces. The different propagation velocities of odd and even modes increase the jitter and also decrease the eye-width. According to [13–16], controlling the difference of the capacitance and inductance ratios of the routing structure can reduce the differences between the propagation velocities of the two modes. Thus, propagation time ($\Delta TD$) differences can be reduced. In [13–16] it is also suggested to place a compensation capacitor between adjacent traces close to the DIMM connector.

The time difference, $\Delta TD$, caused by the equivalent model of the center trace interfered by the coupling of two adjacent traces can be expressed as

\[
\Delta TD = \frac{\text{length}_1}{V_{\text{eff}}^{\text{in phase}}} - \frac{\text{length}_1}{V_{\text{eff}}^{\text{out of phase}}}
\]

\[
= \text{length}_1 \left( \sqrt{(L_{22} + 2L_m)C_{2g}} - \sqrt{(L_{22} - 2L_m)(C_{2g} + 4C_m)} \right)
\]

(21)

In [13], a two-traces model is proposed and a compensation capacitor, $C_C$ was placed between the two adjacent traces. Let $\Delta TD = 0$,

\[
\Delta TD = 0 = \text{length}_1 \left( \sqrt{(L_{22} + 2L_m)C_{2g}} - \sqrt{(L_{22} - 2L_m)(C_{2g} + 4C_m + C_C)} \right),
\]

(22)

and

\[
C_C = \frac{1}{(L_{22} - 2L_m)} \cdot \left( \frac{\text{length}_1 \sqrt{(L_{22} + 2L_m)C_{2g}}}{\text{length}_1} \right)^2 - (C_{2g} + 4C_m).
\]

(23)
Substituting into (21) and (22), (23) can be rewritten as

\[ C_C = \frac{1}{L_{\text{eff}}^{\text{out of phase}}} \left( \frac{L_{\text{eff}}^{\text{in phase}} \cdot C_{\text{eff}}^{\text{in phase}}}{\text{length}_1} \right)^2 - C_{\text{eff}}^{\text{out of phase}}. \] (24)

Equation (24) can be applied to the multi-segment of transmission lines as follows:

\[ C_C = \frac{1}{L_{\text{eff}}^{\text{out of phase}}} \left( \frac{L_{\text{eff}}^{\text{in phase}} \cdot C_{\text{eff}}^{\text{in phase}} + \Delta T D_{\text{other lines}}}{\text{length}_1} \right)^2 - C_{\text{eff}}^{\text{out of phase}}, \] (25)

where \( \Delta T D_{\text{other lines}} \) is the difference of propagation delay caused by the other transmission lines. After obtaining the optimized results from the PSO calculation, the geometric information can be substituted into (25) and the compensated capacitance can be obtained. Thus, jitter caused by the coupling from adjacent traces on the multi-module memory bus can be further improved.

7. SIMULATION AND EXPERIMENTATION RESULTS

7.1. Simulation Results

This section verifies the signal integrity on the DDR3 bus via in the time [22] and frequency domains. The waveform was examined at the receiver end in the time domain. Conversely, the insertion loss was used to examine the channel performance in the frequency domain. The simulation parameters are shown in Table 2.

### Table 2. Signal, geometric and PSO parameters used in the simulation.

<table>
<thead>
<tr>
<th>Signal parameter</th>
<th>Geometric parameter</th>
</tr>
</thead>
<tbody>
<tr>
<td>amplitude</td>
<td>1.5 V</td>
</tr>
<tr>
<td>data rate</td>
<td>1.6 GB/s</td>
</tr>
<tr>
<td>rise time</td>
<td>50 ps</td>
</tr>
<tr>
<td>PSO parameter</td>
<td>L_2</td>
</tr>
<tr>
<td>swarm size</td>
<td>50</td>
</tr>
<tr>
<td>iteration</td>
<td>80</td>
</tr>
<tr>
<td>( \varepsilon_r )</td>
<td>4.4</td>
</tr>
<tr>
<td>( \text{FR4 thick} )</td>
<td>0.8 mm</td>
</tr>
<tr>
<td>( L_1 )</td>
<td>30 mm</td>
</tr>
<tr>
<td>( L_3 )</td>
<td>90 mm</td>
</tr>
<tr>
<td>line space</td>
<td>5 mm</td>
</tr>
</tbody>
</table>
Table 3. Output results of the driver resistor, ODTs, capacitor and length of each transmission section.

<table>
<thead>
<tr>
<th></th>
<th>Length (mm)</th>
<th>$Z_0(\Omega)$</th>
<th>Write to DIMM$_1$</th>
<th>Read from DIMM$_1$</th>
</tr>
</thead>
<tbody>
<tr>
<td>$L_1$</td>
<td>30</td>
<td>48</td>
<td>$R_C(\Omega)$</td>
<td>34</td>
</tr>
<tr>
<td>$L_2$</td>
<td>10</td>
<td>52</td>
<td>$R_0(\Omega)$</td>
<td>40</td>
</tr>
<tr>
<td>$L_{31}$</td>
<td>25</td>
<td>45</td>
<td>$R_1(\Omega)$</td>
<td>60</td>
</tr>
<tr>
<td>$L_{32}$</td>
<td>18</td>
<td>39</td>
<td>Write to DIMM$_0$</td>
<td>Read from DIMM$_0$</td>
</tr>
<tr>
<td>$L_{33}$</td>
<td>47</td>
<td>37</td>
<td>$R_C(\Omega)$</td>
<td>34</td>
</tr>
<tr>
<td>$C_C$</td>
<td>0.582 (pF)</td>
<td></td>
<td>$R_0(\Omega)$</td>
<td>60</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>$R_1(\Omega)$</td>
<td>40</td>
</tr>
</tbody>
</table>

To improve performance and obtain a more realizable design, the trace, $L_3$, between CPU/MCH is divided into three segments ($L_{31}$, $L_{32}$, $L_{33}$), where $L_{31} + L_{32} + L_{33} = L_3$. Table 3 indicates the output results of PSO. The geometric information, input resistors and output resistors are set up in ADS to examine the performances in the time and frequency domains. The four system configurations are examined. Four circuits according to the four system configurations were simulated by ADS using the parameters indicated in Table 3. The length and impedance of $L_1$, $L_2$, $L_{31}$, $L_{32}$, and $L_{33}$ and the value of $C_C$ are the same in all four circuits. The locations and the values of the terminators were changed according to the system configuration. For example, when the system was in the write-state as shown in Fig. 3, $R_C = 34(\Omega)$, when the CPU/MCH writes data to DIMM$_1$, the $R_0 = 40(\Omega)$ and $R_1 = 60(\Omega)$.

7.2. Frequency Domain Analysis

The frequency domain simulation results are shown in Fig. 7. Due to the setting of the weights in PSO, the frequency response is consistent with the signal spectrum. It results most of signal power can be transmitted to the receiver end. In addition, the channel response exhibited equal loss over the bandwidth 0–4 GHz meaning that since the digital signal contained wideband components, the signal contained fewer distortions at the receiver. The variation of insertion loss was approximately 8 dB before the performance was optimized by PSO. The variations were approximately 3 dB after the optimization and the insertion losses were similar in all four system configurations.
Figure 7. Simulated transmission coefficient of the four system configurations. (a) Before PSO. (b) After PSO.

Figure 8. Simulation of eye-diagram without compensating capacitor. (a) Write to DIMM\textsubscript{1}. (b) Write to DIMM\textsubscript{0}. (c) Read From DIMM\textsubscript{1}. (d) Read from DIMM\textsubscript{0}. The received signal and crosstalk measured at the received end are denoted by red solid line and blue dot lines.
7.3. Time Domain Analysis

The data rate of the drive signal was set to 1.6 Gb/s with the amplitude equal to 1.5 V. The signal was set to three drivers on three traces with individually different random patterns. Thus, the waveforms at the receiver end on the center trace contained the coupled signals, as shown in Fig. 8. It can be seen that jitter is large due to the coupling of the adjacent traces signal into the center one. Placing a capacitor, with values calculated using the information from PSO, between the adjacent traces can significantly decrease jitter by removing the coupled signals. Thus, the eye-pattern can be further improved as shown in Fig. 9.

![Eye-pattern diagrams](image)

**Figure 9.** Simulation of eye diagram with compensating capacitor. (a) Write to DIMM1. (b) Write to DIMM0. (c) Read from DIMM1. (d) Read from DIMM0. The received signal and crosstalk measured at the received end are denoted by red solid line and blue dot dot lines.
7.4. Validations

The ODT settings of each state listed in Table 3 are implemented on different test boards. To simplify the implementation, the ports

![Test boards for insertion loss measurements](image)

**Figure 10.** Test boards for insertion loss measurements. (a) Write to DIMM$_0$, read form DIMM$_0$. (b) Zoom in.

![Comparison of simulation and measurement results](image)

**Figure 11.** Comparison of simulation and measurement results (before PSO). (a) Write state: write to DIMM$_1$ and write to DIMM$_0$. (b) Read state: read from DIMM$_1$ and read from DIMM$_0$.

![Comparison of simulation and measurement results](image)

**Figure 12.** Comparison of simulation and measurement results (after PSO). (a) Write state: write to DIMM$_1$ and write to DIMM$_0$. (b) Read state: read from DIMM$_1$ and read from DIMM$_0$. 
Figure 13. The post-processed eye-diagrams with compensating capacitor based on the frequency domain measurement. (a) Write to DIMM$_1$. (b) Write to DIMM$_0$. (c) Read from DIMM$_1$. (d) Read from DIMM$_0$. The received signal and crosstalk measured at the received end are denoted by red solid line and blue dot lines. (SMA) representing the driver and receiver are placed on the same test board as shown in Fig. 10. Another DIMM containing driver and receiver is raised and soldered onto the test board in a perpendicular direction. This approach is used to avoid damaging the test boards and over bending the cables connecting the VNA with the boards when performing measurements. The transmission coefficients of different write configurations were compared and are shown in Figs. 11 and 12(a). Similarly, the comparisons performed in the read-state are showed in Figs. 11 and 12(b). The variation of insertion loss was approximately 9 dB before the performance was optimized by PSO. The variations were approximately 6 dB after the optimization. In Fig. 12, the correlation below 1.5 GHz is good and the trends in higher frequencies are close. However, because the connection between the driver test board and the raised test board does not use a connector, some errors were introduced. Extra discontinuities were
Table 4. Comparison on the post-processed eye-diagram data.

<table>
<thead>
<tr>
<th>Structure</th>
<th>Parameter</th>
<th>Write to DIMM₁</th>
<th>Write to DIMM₀</th>
<th>Read from DIMM₁</th>
<th>Read from DIMM₀</th>
</tr>
</thead>
<tbody>
<tr>
<td>N=3 (165 sec)</td>
<td>Eye Height (mV)</td>
<td>514.8</td>
<td>491.2</td>
<td>512.5</td>
<td>490.0</td>
</tr>
<tr>
<td></td>
<td>Eye Width (ps)</td>
<td>589.5</td>
<td>589.5</td>
<td>603.5</td>
<td>586.7</td>
</tr>
<tr>
<td></td>
<td>Jitter RMS (ps)</td>
<td>10.9</td>
<td>11.4</td>
<td>7.56</td>
<td>10.6</td>
</tr>
</tbody>
</table>

introduced in the measurements because the edge of the perpendicular PCB overlapped the trace of the test board and the compensating capacitor. The post-processed eye-diagrams based on the insertion loss measurements (Figs. 12(a) and (b)) and crosstalk with the same simulation setting of DDR3 signals are shown in Fig. 13. By comparing Fig. 13 with Fig. 9, it can be seen that the magnitudes of the signal and noise are similar.

Due to ODT choice limitations defined by the JEDEC specifications, PSO cannot select ODT values outside the specified range. This choice of ODT values causes the ringing still present on the improved DDR3 memory bus. The parameters of the eye-pattern are listed in the Table 4. The eye-pattern measured from the circuit with the compensated capacitor significantly reduces the jitter due to the improved FEXT on the circuit.

In this study, the fitness function is designed as the sum of the squares of the fitness values of the four different system configurations. The PSO classifies the fitness values and weights the channel performance with the signal spectrums. Based on the observed results, the fitness function works well for this application.

8. CONCLUSION

In this paper, an optimization algorithm is proposed to adjust the transmission line width to match the specified ODT values for the DDR3 design with maximum power delivery. We propose an effective method using PSO to enhance SI for two-module memory buses. We used the $S$ parameter to define the fitness function, so it can easily be extended to multi-module memory and used it to improve the impedance changes caused by compact routing. The capacitor value can be calculated by using the derived formula. It can be used to further improve the SI by compensating for the different velocities caused by even and odd mode propagation and to reduce FEXT.
APPENDIX A.

As Eberhart and Kennedy (1995) attributed Particle Swarm Optimization, PSO to simulate social behavior. Each particle not only presents the local optimal but shares with each other to make sure the optimal value what it is.

\[
v_{i}^{k+1} = v_{i}^{k} + c_{1}r_{1} (P_{\text{best}_i} - x_{i}^{k}) + c_{2}r_{2} (G_{\text{best}} - x_{i}^{k}) .
\]

(A1)

Particle position updating

\[
x_{i}^{k+1} = x_{i}^{k} + v_{i}^{k+1},
\]

(A2)

where \( v_{i}^{k+1} \) represents the velocity of particle \( i \) at generation \( k \). \( c_{1} \) and \( c_{2} \) are the weighting of local optimal and global optimal. \( r_{1} \) and \( r_{2} \) are the random numbers. \( c_{1}r_{1}(P_{\text{best}_i} - x_{i}^{k}) \) and \( c_{2}r_{2}(G_{\text{best}} - x_{i}^{k}) \) represent the particle’s and swarm’s best known positions.

A.1. PSO Pseudo Code

```c
/* Xh is the upper bound of solution boundary. 
Xl is the lower bound of solution boundary. 
c1 and c2 is the PSO weighting for each particle private fitness and global fitness value. */
#define ParticleNumbers=50;
#define Iteration=80;
#define Xh=120;
#define Xl =20;
c1 = 1; c2 = 1.4;
GlobalFitness = 999;
PFitness (Particle Numbers) = 999;
/*13-Dimension Array for particle values and Velocity 
For L1 Length, L1 Impedance, L2 Length, L2 Impedance 
L31 Length, L31 Impedance, L32 Length, L32 Impedance 
L33 Length, L33 Impedance, R_c, R_0 and R_1 */
Swarm [Particle Numbers, 13];
Initial Swarm(); //Initial all the particles by randam number.
Gbest = zeros (Particle Numbers, 13) /* The best positions of Group*/
Pbest = zeros (Particle Numbers,13) /*The best positions of particle*/
V = zeros (Particle Numbers,13); /*Calculate all the fitness value of each particle*/
for (i = 0; i< Particle Numbers; i++)
{ 
```
Fitness Value[i] = Fitness Function (Swarm[i,:]); /*Fitness Function Please Ref Eq. (20) */ 
if (Fitness Value[i] < PFFitness[i])
    PFFitness[i] = FitnessValue[i];
    Pbest[i,:] = Swarm[i,:];
if (Fitness Value[i] < Global Fitness)
    GlobalFitness = FitnessValue[i];
    Gbest[1,:] = Swarm[i,:];
}

/*PSO Algorithm*/
for (j = 0; j < Iteration; j++)
{
    for (i = 0; i < Partical Number; i++)
    {
        R1 = Round(); /*Get R1 and R2 randomly between 0 ∼ 1*/
        R2 = Round();
        V[i,:] = (1.2 - j / Iteration) * V[i,:] + c1 * R1 * (Pbest[i,:] - Swarm[i,:]) + c2 * R2 * (Gbest[1,:] - Swarm[i,:]); /*Calculate Velocity*/
        Swarm[i,:] = Swarm[i,:] + V[i,:]; /*Calculate Next Positions*/
        for (k = 0; k < 13; k++) /*Check Boundary*/
        {
            if (Swarm(i,k) > Xh) /*Using Damping Method to force all the particles inside the boundary. */
                Swarm(i,k) = Xh - (Swarm(i,k) - Xh) * Round();
            if (Swarm(i,k) < Xl)
                Swarm(i,k) = Xl + (Xl - Swarm(i,k)) * Round();
        }
    }
}

/*Calculate current fitness value and Update*/
for (i = 0; i < ParticleNumbers; i++)
{
    Fitness Value[i] = Fitness Function (Swarm[i,:]); /*Fitness Function Please Ref Eq. (20)*/
    if (Fitness Value[i] < PFFitness[i])
        PFFitness[i] = FitnessValue[i];
        Pbest[i,:] = Swarm[i,:];
    if (Fitness Value[i] < Global Fitness)
        GlobalFitness = FitnessValue[i];
        Gbest[1,:] = Swarm[i,:];
}
//Calculate the compensate capacitor*
CcValue=Cc(Gbest);

REFERENCES


20. JEDEC, *DDR3 SDRAM SPECIFICATION*.
