A Spatial RAKE Receiver for Real-Time UWB-IR Applications

Master thesis

Claus Limbodal

1st July 2005
Acknowledgements

First and foremost, I would like to thank my supervisor Tor Sverre Lande for accepting me as his student, for guidance, inspiration and encouragement.

I would also like to thank Dag T. Wisland for all assistance and guidance through my time working on this project.

Further, I wish to pay tribute to Johannes Goplen Lomsdalen and Håvard Kolle Riis for their patient assistance with computer software and support.

Next I would like to thank Kjetil Meisal for all the hours of useful discussions through the course of this project. In addition I would like to thank Olav Stanly Kyrvestad for his contribution to this thesis, especially with regard to chip and PCB preparations.

Furthermore, I would like to thank Hans Kristian Otnes Berge, Rene Jensen, Snorre Aunet, Vidar Strønstad Øverås, Lena Mariann Garder, Espen Torstensen, Karianne Øysted, Omid Mirmotahari, Håkon Hjortland, Håvard Moen and Jens Petter Koren for their contribution to this thesis in many different ways.

A special thanks to all my fellow students at the MES research laboratory for providing a great environment both scientifically and socially.

Finally, I would like to thank my parents for their support and help throughout my time as a student, and Siri, for her patience and understanding. Your support has been invaluable.
Preface

This thesis is submitted in partial fulfillment of the degree of Master of Science in Microelectronic Systems within the Masterprogramme in Electronics and Computer Science at the University of Oslo. The thesis project was initiated in January 2004 and concluded June 2005.

The project has been a challenge in many ways as the thesis covers a wide range of issues regarding the design and testing of a prototype chip as well as the design of a print circuit board (PCB) for measurement of the chip. However, I have learned a lot during the course of the project. I believe that I have benefited from the work and gained valuable experience in the process of implementing an ASIC.

The thesis presents a RAKE receiver for Ultra-Wideband applications. The prototype chip was manufactured in a standard 0.12\(\mu\)m CMOS process from ST Microelectronics, and tested on a FR-4 PCB manufactured at Elprint.

Oslo, 1. July 2005

Claus Limbodal
Abstract

The concept of ultra wideband impulse radio has interesting properties. The wide transmission band makes penetration through different materials better than narrow band transmission. The lack of carrier may be traded for low power solutions provided a power efficient receiver may be implemented. Unlike narrow band radio, demanding statistical computation must be carried out. This is often done in a parallel architecture. Although several portable applications are striving for higher bandwidth, there is an increasing demand for short-range low bandwidth mobile communication units. In several of these applications ultra low power is important. In addition other properties of impulse radio transmissions may be appreciated such as interference immunity and penetration.

The purpose of this thesis is to explore a low-power solution for correlator-based impulse radio receivers. A mixed-mode parallel RAKE structure is realized in a standard 0.12\(\mu\)m CMOS technology. The receiver is implemented as a RAKE structure combining digital shift registers with analog computation in a series of parallel taps of a synchronizing delay line. In each parallel bit stream the incoming signal is cross-correlated with a stored template. By combining a delay line and a mixed-mode correlator we can explore multipath reflections in a time domain statistical computation for symbol recovery. Simulations are presented showing promising results with regard to power consumption and overall functionality. Measurements are performed confirming the basic functionality of the circuit.
# Contents

1 Introduction ................................. 1
   1.1 Introduction and overview of the thesis ......... 1

2 Characterization of a UWB-IR transmission channel .... 5
   2.1 Basic propagation mechanisms in short range communications ........................................... 6
      2.1.1 Free space propagation ......................... 7
      2.1.2 Reflection and refraction ...................... 8
      2.1.3 Diffraction .................................... 10
      2.1.4 Scattering ................................... 11
      2.1.5 Log-distance path loss model ................. 11
         2.1.5.1 Saleh-Valenzuela indoor models .......... 12
      2.1.6 Fading ...................................... 13
         2.1.6.1 Fading due to multipath time delay spread .. 14
         2.1.6.2 Fading due to Doppler spread .......... 16
      2.2 Multipath environments ....................... 16
      2.3 Summary .................................... 17

3 Modulation techniques in baseband and carrier-based radio .... 19
   3.1 Modulation .................................. 20
      3.1.1 Baseband modulation techniques ............ 22
         3.1.1.1 Pulse Position Modulation ........... 22
         3.1.1.2 Pulse Amplitude Modulation ...... 23
         3.1.1.3 On-Off Keying ..................... 24
## CONTENTS

3.1.1.4 M-ary Bi-Orthogonal Keying . . . . . . . . . . . 25  
3.1.2 Carrier-based modulation techniques . . . . . . . . . 25  
3.1.2.1 Phase Shift Keying . . . . . . . . . . . . . . . . 26  
3.1.2.2 Frequency Shift Keying . . . . . . . . . . . . . 27  
3.2 Spread spectrum and pseudo-noise coding . . . . . . . . . 28  
3.2.1 PN-coding . . . . . . . . . . . . . . . . . . . . . . . . . 29  
3.2.2 Direct Sequence . . . . . . . . . . . . . . . . . . . . . . 29  
3.2.3 Frequency Hopping . . . . . . . . . . . . . . . . . . . . . 30  
3.2.4 Time Hopping . . . . . . . . . . . . . . . . . . . . . . . . 31  
3.2.5 Multiple access . . . . . . . . . . . . . . . . . . . . . . . 31  
3.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 32  
3.4 Symbol detection . . . . . . . . . . . . . . . . . . . . . . . . 33  
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34  

4 Receiver structures in UWB-IR systems 37  
4.1 Multiple access interference and packet collision . . . . . . 37  
4.2 The correlator receiver . . . . . . . . . . . . . . . . . . . . . 39  
4.3 The matched filter approach . . . . . . . . . . . . . . . . . 40  
4.4 The RAKE receiver . . . . . . . . . . . . . . . . . . . . . . . 41  
4.5 The orthogonal RAKE architecture . . . . . . . . . . . . . . . 44  
4.5.1 Sampled delay line . . . . . . . . . . . . . . . . . . . . . . 46  
4.5.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . 47  
4.5.3 Analog correlation . . . . . . . . . . . . . . . . . . . . . . 48  
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49  

5 CMOS implementation 51  
5.1 Delay line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52  
5.2 Mixed-mode correlator . . . . . . . . . . . . . . . . . . . . . 53  
5.3 Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58  
5.4 RAKE finger . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60  
5.4.1 Combiner line and pre-charging . . . . . . . . . . . . . 61  
5.5 System implementation . . . . . . . . . . . . . . . . . . . . . 64  
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter 1

Introduction

1.1 Introduction and overview of the thesis

Wireless communications has taken a natural part of the life of millions of people today, in one way or another. Radio and television has made people familiar with this way of transferring information through the past century. In the past couple of decades the use of wireless communications has grown due to a growing number of applications, as well known technologies like cell phones, GPS, WLAN, WiFi and Bluetooth has seen daylight as a result of an increasing demand for high-speed mobile information handling units. As this development of wireless communications mainly has moved toward higher speed, the focus today is moving more in the direction of less power consumption.

The term “radio” is usually associated with narrowband communication systems, where the information is encoded, or modulated in some fashion by the transmitter. This modulation is usually achieved by imposing the information onto a carrier frequency in some way, and the frequency of this carrier is given by the allocated channel in the frequency spectrum. However, in the emission of a carrier frequency a lot of power is wasted, as the energy of the carrier itself is not utilized. Baseband communication systems like Ultra Wideband (UWB) would therefore have a distinct advantage as no energy is wasted in continuous emission of a carrier frequency.

UWB technology involves use of extremely large bandwidths compared to conventional radio systems, implying that other radio communication systems within the range of a UWB-transmitter, could be jammed. Thus UWB were until recently only used for military communications, radar and sensing. A substantial change occurred in February 2002 when the Federal Communication Committee (FCC) issued a ruling that UWB could
be used for data communications as well [1]. This was recently followed by a similar proposal to be approved by the end of 2005 the European Telecommunications Standards Institute (ETSI) [2]. The FCC and proposed ETSI emission masks are depicted in figure 1.1. The band allocated for this use were the band from 3.1 GHz to 10.6 GHz, by far the largest allocation of bandwidth to any commercial system. There were, however strict limitations on power emission levels yielding a maximum available power to a transmitter of approximately 0.5mW [3, 4]. This confine UWB to indoor, short range communications for potentially very high data rates. The Institute of Electrical and Electronic Engineers (IEEE) followed by defining a standard for high data rate transmission with the IEEE 802.15.3a standard. Recently also the IEEE 802.15.4a became a task group working on a specification for a ultra low-power, low data rate standard.

UWB radio has been defined as radiation of waveforms with an instantaneous fractional bandwidth of 0.25. The fractional bandwidth is expressed as

$$B_f = \frac{(f_H - f_L)}{\frac{f_H + f_L}{2}}$$  \hspace{1cm} (1.1)

where

- $f_H$ is the higher frequency of the total frequency band, in this case 10.6 GHz.
1.1 Introduction and overview of the thesis

- $f_L$ is the lower frequency of the total frequency band, in this case 3.1 GHz.
- $\frac{f_H + f_L}{2}$ is the center frequency of the total frequency band.

The above definition yields to different approaches to UWB. One is the Multi Band Orthogonal Frequency Division Multiplexing (MB-OFDM) approach dividing the frequency band into a small number of wide bands operating with a carrier frequency. This approach, often referred to as MB-UWB, typically address the IEEE 802.15.3a high data rate standard.

A second approach is the impulse radio, or UWB-IR, typically addressing low power, low data rate applications. UWB-IR communication is utilizing time domain processing as the information is encoded in the time between pulses. The transmitted pulses are shaped to contain the frequencies within the 3.1 GHz - 10.6 GHz spectrum, typically Gaussian shaped. UWB-IR also has an advantage as transmitter and receiver design can be designed with simple structures making this a potentially low-complexity, low-cost technology.

The severe restrictions on transmitted power have limited UWB to short-range applications. The low emission levels make reception of UWB-IR pulses difficult as the transmission of the pulses has to be carried out in a noise-like, or pseudo-noise, fashion. The shaping of the frequency spectrum is achieved by utilizing pseudo-noise (PN) sequences as a symbol representation of each information bit. Low data rate applications allow for the symbol sequence to be quite long, without impeding the system performance.

The motivation for this thesis is to implement a symbol correlator in CMOS technology for a low power, low data rate UWB-IR receiver. The main challenge in implementing a symbol correlator for UWB-IR is the high frequencies involved, making conventional DSP approaches quite power consuming. Thus a different approach will have to be considered. The proposed solution presented in the thesis is a spatial RAKE receiver for UWB-IR applications [5,6].

The basic theory described in chapter 2 and 3 is mostly based on the book “Wireless Communications: Principles and Practice” by Theodore S. Rappaport. The theory presented is considered relevant to the subject UWB-IR and prepared with this in mind.

The outline of the thesis:

- Chapter 2 includes a characterization of the main effects influencing the electro-magnetic waves in an indoor environment
1.1 Introduction and overview of the thesis

• Chapter 3 describes some of the most popular modulation schemes used in wireless communications today, both for carrierbased and baseband systems.

• Chapter 4 introduces some known receiver structures including the presented RAKE receiver.

• Chapter 5 describe the implementation of the main building blocks of the RAKE receiver.

• Chapter 6 contain the simulation results and the measurement results on the main components of the RAKE receiver.

• Chapter 7 is a discussion of the project with respect to implementation, simulation results and measurement results.

• Chapter 8 presents the conclusion of the thesis including proposal for future work.
Chapter 2

Characterization of a UWB-IR transmission channel

Transferring information through the air can be very difficult as a transmission path may vary from simple line-of-sight (LOS) to one severely obstructed by various kinds of obstacles. Wireless transmission channels are in general random and unpredictable and does not offer easy analysis. Thus attempting to model a transmission channel in order to achieve the best signal recovery at the receiver end is a difficult task and is usually done in a statistical way based on measurements. The term transmission channel refers to the surrounding environment around and between the transmitter and the receiver antennas in a communication system, through which the electromagnetic waves travel.

Propagation of electromagnetic waves is influenced by the transmission channel. The effect different types of influence has on the propagating waves are generally divided into reflection, refraction, diffraction and scattering. Due to influence from multiple objects these waves travel along different paths of varying lengths, referred to as multipath waves. Interaction between these waves may cause fading as a result of interference between the multiple path waves, called multipath fading, and the strengths of these waves decrease as the distance between transmitter and receiver increases.

As UWB-IR is a time processing communication system utilizing the high resolution of the UWB channel. These effects are fundamental for the functionality of the technology and serve as a basis for the design of an UWB-IR system.

In order to predict the average received signal power at a given distance from the transmitter, different propagation models are used. Propagation models that predict the mean signal strength for an arbitrary separ-
2.1 Basic propagation mechanisms in short range communications

Propagation of electromagnetic waves is influenced by the transmitter and receiver surroundings in different ways. For indoor short range transmissions like UWB-IR, these effects can be divided into four groups as mentioned earlier: reflection, refraction, diffraction and scattering. The large- and small-scale propagation models are based on the physics of these mechanisms. There are several factors that have to be considered when looking at and predicting these kinds of effects. The most important ones are wavelength, angle of incidence, the properties of the material that makes the obstacle and the wave polarization. Polarization are instantaneous electric field components of a electromagnetic wave in orthogonal directions in space. A polarized wave can be represented as a sum of two spatially orthogonal components. Typical examples are vertically, horizontally or left-hand or right-hand circular polarization.

In this section the basic propagation mechanisms that determines the conditions in a typical short range transmission channel are described.
2.1 Basic propagation mechanisms in short range communications

2.1.1 Free space propagation

For LOS signal paths propagation can be considered as geometrical expansion in three dimensions. For signals emitted from an isotropic antenna, i.e. a theoretical point in space, expansion is spherical with the total energy remaining constant over the surface area $4\pi d^2$ of a sphere with a radius $d$. Thus the energy density propagates as $\frac{1}{4\pi d^2}$. In these cases $d$ corresponds to the distance between the transmitter and the receiver. It should be noted that the loss due to spreading is not frequency dependent.

The free space propagation model used to predict received signal strength of such LOS signals is based on the Friis free space equation

$$ P_r(d) = \frac{P_t G_t G_r \lambda^2}{(4\pi d)^2 L} , \quad (2.1) $$

where

- $P_r(d)$ is the received power at a given distance $d$.
- $P_t$ is the transmitted power.
- $\lambda$ is the wavelength in meters.
- $d$ is the distance between the transmitter and the receiver in meters.
- $L$ is a system loss factor ($L \geq 1$) due to transmission line attenuation, filter losses and antenna losses.
- $G_t$ and $G_r$ are the gain of the transmitter and receiver antennas, respectively.

Antenna gain is the gain-effect obtained by designing an antenna to radiate or receive more power in one or several other directions instead of omnidirectionally. The relationship between the radiated power in the direction of maximum gain and the radiated power of an isotropic source, is called the gain of the antenna given in units of dBi (gain with respect to an isotropic antenna). The frequency dependency of this formula is given by the antenna apertures given by the term

$$ A_e = \frac{\lambda^2}{4\pi} , \quad (2.2) $$

which can be recognized in equation 2.1. An expression for the path loss in free space is given as the relation between transmitted and received power.
2.1 Basic propagation mechanisms in short range communications

\[ PL(dB) = 10 \log \frac{P_t}{P_r} = -10 \log \left( \frac{G_t G_r \lambda^2}{(4\pi d)^2} \right) \]  

(2.3)

Observe that equation 2.3 contain just a multiplication of the emitted power, the energy density, the antenna aperture and the gain of the transmitter and receiver antennas. Using unity gain antennas a simpler form is derived

\[ PL(dB) = -20 \log \left( \frac{\lambda}{4\pi d} \right) \]  

(2.4)

The free space model is only valid in the far-field of the transmitter antenna, called the Fraunhofer region, which is defined as a distance beyond a distance \( d_f \) given by

\[ d_f = \frac{2D^2}{\lambda} \]  

(2.5)

where \( D \) is the largest physical linear dimension of the transmitter antenna [7,8].

2.1.2 Reflection and refraction

Reflection and refraction are two closely related effects of radio wave interference. When a radio wave propagating in one medium impinges upon another medium having different conductive properties and a surface much larger than the wavelength, some part of it is reflected and some part is refracted into the second medium, see figure 2.1. There is no loss of energy in absorption in the point of incidence. In theory a ray could be reflected completely without any refraction if the second medium is an ideal conductor. However, this is never the case as such a material does not exist.

As shown in figure 2.1, an incident radio wave at an angle \( \theta_i \) causes the transmission of two rays from the surface. One part is reflected from the surface at an angle \( \theta_1 \) and one part is refracted into the second medium at an angle \( \theta_2 \), with the reflected part leaving the surface at an angle \( \theta_r = \theta_i \). The phase difference between a LOS direct path signal and the reflected signal is given by

\[ \phi = \frac{2\pi \Delta l}{\lambda} \]  

(2.6)
2.1 Basic propagation mechanisms in short range communications

Figure 2.1: Reflection and refraction of an incident radio wave

where \( \Delta l \) is the difference in path length between the direct path and the reflected path, and \( \lambda \) is the wavelength. The time delay between a direct path signal and a reflected path signal is

\[
\tau = c \cdot l_{mc},
\] (2.7)

where

- \( \tau \) is the time delay.
- \( c \) is the speed of light.
- \( l_{mc} \) is the path length of each multipath component.

The angle of the refracted part obeys Snell’s law which states

\[
n_1 \sin \theta_1 = n_2 \sin \theta_2,
\] (2.8)

still referring to figure 2.1, \( n_i \) is the index of refraction in the medium of the incident wave, \( n_2 \) is the index of refraction of the second medium. The index of refraction for a given medium is

\[
n = \sqrt{\mu_r \varepsilon_r},
\] (2.9)

where \( \mu_r \) is the relative permeability of the medium and \( \varepsilon_r \) is the relative permittivity of the medium [7,9].
2.1 Basic propagation mechanisms in short range communications

2.1.3 Diffraction

Diffraction is a phenomenon that allows radio waves to propagate behind obstructions, and the effect is enhanced if the obstruction has sharp edges, that is if the dimensions are small compared to the wavelength. Diffraction can be explained by Huygens principle, which states that all points on a wavefront can be considered as point sources for generation of secondary waves and that these waves combine to produce a new wavefront expanding spherically from the point of generation, as shown in figure 2.2 [7].

Diffraction loss as a function of the difference in path length around an obstruction is explained by Fresnel zones. Fresnel zones is successive regions where secondary waves have a path length from the transmitter to the receiver greater than the length of a LOS path by a factor of $\frac{n\lambda}{2}$ [7].

Diffraction yields a phase difference in the received signal compared to a possible LOS signal due to the difference in signal path length between a direct path and a diffracted path, called excess path length. Calculating this phase difference can be done by applying the equation shown in equation 2.6 with $\Delta l$ as the excess path length, and similarly the time delay equals the one given in equation 2.7 here with the excess path length as $l_{mc}$ [7, 9].
2.1 Basic propagation mechanisms in short range communications

2.1.4 Scattering

Scattering occurs when the medium in which the wave travels contains objects that have dimensions that are small compared to the wavelength or have structures with such small dimensions. Scattered waves are produced by rough surfaces, small objects and other types of irregularities in the channel and are in reality a special case of reflection. Scattering is a kind of reflection, however as it is dealing with small objects, it often induces effects different from the ones described in section 2.1.2.

Surface roughness can be tested using the Rayleigh criterion which defines a surface as smooth on the basis of a critical height $h_c$ of surface protuberances for a given angle of incidence $\theta_i$, expressed as

$$h_c = \frac{\lambda}{8 \sin \theta_i}.$$  \hspace{1cm} (2.10)

The protuberance is defined as rough if the height is greater than the critical height at that angle [7].

2.1.5 Log-distance path loss model

Over the years quite a few radio wave propagation models have been derived based on theory or measurements. These models show that average received signal power decreases logarithmic with distance [7, 8, 3]. The average large-scale path loss is given as

$$PL(d)[dB] = PL(d_0) + 10m \log \left( \frac{d}{d_0} \right),$$  \hspace{1cm} (2.11)

where

- $m$ is a path loss exponent which indicates the rate at which the path loss increases with distance.
- $d_0$ is a reference distance at which measurements are done close to the transmitter.
- $d$ is the distance between transmitter and receiver.

However this model does not take into account the fact that the surrounding environmental clutter may be vastly different for different locations and also changing even at one particular location at different times. Measurements show that the path loss at a particular location
is random and distributed log-normally about the theoretical average path loss [7]. This yields

$$PL(d)[dB] = PL_0(d_0) + 10m \log \left( \frac{d}{d_0} \right) + X_\sigma,$$  \hspace{1cm} (2.12)

where $X_\sigma$ is a zero-mean Gaussian distributed random variable with standard deviation $\sigma$. This model describes log-normal shadowing which means that measurement of signal levels at a distance from a transmitter have a Gaussian distribution about a mean value.

## 2.1.5.1 Saleh-Valenzuela indoor models

The Saleh-Valenzuela indoor models are a series of indoor propagation models used as a reference UWB channel model in the IEEE 802.15.3 standard.

In 1987 Saleh and Valenzuela carried out extensive indoor propagation measurements between two vertically polarized omni-directional antennas located at the same floor of an office building using modeling 10 ns, 1.5 GHz radar-like pulses. The measurements involved averaging the detected pulse response while sweeping the frequency of the transmitted pulse. This method resulted in that multipath components down to 5 ns were resolvable [7].

The results of these measurements show that the indoor channel is very slow time varying, and the channel impulse response is statistically independent of antenna polarization as long as there are no LOS path between the antennas. The maximum multipath delay spread, which is explained in section 2.1.6, was found to be between 100 - 200 ns within the rooms of a building whereas in hallways it was reported to be up to 300 ns. The rms delay spread within rooms had a maximum of 50 ns and a median of 25 ns. The large-scale path loss with no LOS path was reported to vary over a 60 dB range and obey the log-distance path loss model with an path loss exponent between three and four [7].

From these measurements a simple indoor multipath model was developed by Saleh and Valenzuela, thus called the Saleh-Valenzuela (SV) model. This model has a couple of distinct differences to previous developed models as it assumes the multipath components or rays to arrive in clusters with an independent fading and time delay for each cluster as well as a for each component within the cluster. The multipath gain magnitude are independent Rayleigh random variables with variances that decay with time delay. The time arrival of cluster and each ray within a cluster were modeled independently using a Poisson
process with different rates, and the interarrival times of the clusters and the multipath components within the cluster are exponentially distributed. The model contains no imaginary part as the phase of the channel impulse response can be either 0 or \( \pi \). The structure of the clusters is a result of how the surroundings affect each multipath component as they are formed by reflections from objects in the vicinity of the communication units \([7,4,3]\).

The IEEE 802.15.3 Working Group for Wireless Personal Area Networks and its channel modeling subcommittee decided during 2002 and 2003 to use a so called modified SV model as a reference UWB channel model. The model proposed by IEEE 802.15.3 is derived from the SV-model due to the clustering phenomena described by the measured indoor channel data using a log-normal distribution rather than the original Rayleigh distribution for the amplitude of the received components \([3]\).

The modified SV-model contain four different channel implementations based on average separation distance between transmitter and receiver, and whether a LOS component is present or not.

### 2.1.6 Fading

Any communication system has to deal with the fact that a transmitted signal undergo fading in different forms, so also a short range communication system. In this section different types of fading are described, depending on the relation between signal and channel parameters. Two distinct types of fading effects may occur, delay spread and Doppler spread, each independent of the other.

Delay spread describes the time dispersive properties of a communication channel and are determined from a power delay profile (PDP), which is found by measuring the spatial average of the channel impulse response over a local area. By making several of these local area measurements a collection of PDP's are built where each represent a possible state of the multipath channel. Fading effects due to delay spread is divided into two categories of fading, flat and frequency selective fading.

Doppler spread is a measure of the spectral broadening as a result of Doppler shift in the transmission channel. Doppler shift occur as a wireless communication unit, i.e. the transmitter or receiver moves with respect to the other. Consider equation 2.13, as a communication unit travel from A to B in a time \( \Delta t \) with \( \Delta l \) as the distance between A and B. \( \theta \) is the angle of incidence assumed to be the same at A and B, and \( v \) is the relative velocity of the unit. The phase change due to this movement is then
2.1 Basic propagation mechanisms in short range communications

\[ \Delta \phi = \frac{2\pi \Delta l}{\lambda} = \frac{2\pi v \Delta t}{\lambda} \cos \theta, \]  

(2.13)

and hence to the receiver there seems to be a change in frequency, called a Doppler shift \( f_D \) given by

\[ f_D = \frac{\Delta \phi}{2\pi \Delta t} = \frac{v}{\lambda} \cos \theta. \]  

(2.14)

The Doppler spread causes then the received frequency spectrum to contain frequency components within a Doppler spectrum in the range of \( \pm f_D \) of the original transmitted frequency spectrum. The effect of this spectrum broadening is negligible if the baseband signal bandwidth is much larger than the Doppler spread [7].

2.1.6.1 Fading due to multipath time delay spread

Flat fading can be defined as an effect of delay spread where the bandwidth of the channel is larger than the bandwidth of the transmitted signal and thus the average delay spread is shorter than the symbol period time.

This means that flat fading is an effect the received signal will undergo if the channel has a constant gain and a linear phase response over a bandwidth larger than the bandwidth of the transmitted signal. The structure of the multipath environment in flat fading is such that the frequency spectrum of the transmitted signal is preserved at the receiver end. However, the nature of multipath signal behavior causes variations in the gain of the channel which changes the strength of the received signal with time. This result in amplitude changes in the received signal over time. The characteristics of flat fading is shown in figure 2.3.

Frequency selective fading is conditions under which the channel has a constant gain and a linear phase over a bandwidth that is smaller than the bandwidth of the transmitted channel. Under these conditions the impulse response of the channel has a rms delay spread that approaches or exceeds the transmitted symbol period time [7].

Figure 2.4 shows the effect frequency selective fading has on a transmitted signal, where some of the frequency components in the received spectrum have greater gain than others.

Frequency selective fading occurs due to time dispersion of the transmitted symbol within the channel and thus induces inter-symbol interference (ISI). ISI is though an effect it is possible to eliminate. Nyquist
2.1 Basic propagation mechanisms in short range communications

![Diagram of channel impulse response](image)

Figure 2.3: Flat fading in a communication channel. The received signal is shown as a result of the channel impulse response influence on the transmitted signal.

![Diagram of channel impulse response](image)

Figure 2.4: Frequency selective fading in a communication channel. The received signal is shown as a result of the channel impulse response influence on the transmitted signal.

observed that the effect of ISI could be canceled if the overall response of the communication system is designed so that the response due to all symbols is equal to zero at every sampling instant at the receiver. From this Nyquist derived his criterion for ISI cancellation, which implicates that the channel can be modeled as a filter. From this filter response a transfer function can be derived in order to create shaping filters at both the transmitter and the receiver end to completely eliminate the ISI [7].

Frequency selective fading is of special interest in UWB communication because of the wide frequency spectrum of the transmitted signal. The large bandwidth causes the channel to appear extremely frequency selective [10].
2.1.6.2 Fading due to Doppler spread

Doppler spread is divided into fast and slow fading, depending on how rapidly the transmitted baseband signal changes as compared to the rate of the change of the channel.

Fast fading is signal distortion caused by frequency dispersion in the channel due to Doppler spreading. This means that the channel impulse response changes rapidly within the symbol duration. In the frequency domain the distortion due to fast fading increases as the Doppler spread increases relative to the bandwidth of the transmitted signal. In practice fast fading only occurs for very low data rate transmissions [7].

In slow fading the channel impulse response changes at a rate slower than the transmitted baseband signal. The channel can even be assumed to be static over one or several symbol periods. This implies that the Doppler spread is much less than the baseband signal bandwidth, in the frequency domain [7].

It should be noted that the relative velocity between the transmitter and the receiver together with the baseband signaling determines whether the signal undergoes fast or slow fading.

It should also be noted that if a channel is characterized as a fast or slow fading channel, it does not specify whether the channel is flat fading or frequency selective.

2.2 Multipath environments

A summary of all the effects mentioned in this chapter yields a general description of the diverse mechanisms behind electromagnetic wave propagation in short range communication. The combination of these effects result in a channel that places fundamental limitations on the performance of wireless communication systems. Interaction between multipath components reflecting from various objects, where different electromagnetic waves travel along different paths with different gain, creates a multipath environment, see figure 2.5.

For UWB-IR communication systems the propagation mechanisms will cause some change in the received pulse shape compared to the transmitted pulse shape due to the large bandwidth of the UWB-IR pulse, because these effects will behave different for the lower and the upper frequencies of the signal spectrum. The size of the Fresnel zone will be different for the lower and the upper frequencies, and the reflection coefficients and the surface roughness may also be significantly different.
2.3 Summary

because of the difference in wavelength. The large bandwidths associated with UWB-systems causes a significantly increase in the number of resolvable multipath components resulting in an extremely multipath rich communication channel. As far as frequency diversity concerns, a UWB-pulse has a so high spectral content that the signal is not very sensitive to notches in the frequency spectrum which is a property that can be exploited.

2.3 Summary

In this chapter basic theory behind electromagnetic wave propagation for short range communication has been presented. The mechanisms behind wave propagation in surroundings characterized by severely obstructed propagation paths are diverse and are the basis for a multipath environment impeding reception of RF-signals. Under such conditions the design of a communication system is challenging, and the fact that reception of RF-signals can be difficult introduces complexity in the design of a communication system. This applies particularly to indoor transmissions and yields the need for modulation and coding schemes in order to make information recovery possible without substantial losses.

Figure 2.5: A multipath environment as a result of reflections from multiple objects. The black path is a LOS path.
Chapter 3

Modulation techniques in baseband and carrier-based radio

The nature of the UWB-IR technology with its emission of pulses over a frequency spectrum of up to 7.5 GHz necessitates strong regulations on the emission of such pulses. Periodic transmission of UWB pulses with a sufficient effect could jam other communication systems within its coverage area. The regulations concerning emitted power in UWB are therefore strict and do not allow the energy of a transmission to exceed -41.3 dB/MHz, as shown in figure 1.1 [1, 2]. Thus modulation and coding are important issues in UWB-IR as detection of a received signal will depend on a good modulation scheme.

The performance of modulation and symbol detection schemes is often measured by looking at Bit Error Rate (BER) as a function of Signal-to-Noise Ratio (SNR) and comparing the performance with the theoretical Shannon limit derived from the channel capacity theorem which yields the maximum channel capacity in a Additive White Gaussian Noise (AWGN) channel [11, 12], also referred to as the Shannon-Hartley theorem, which states

\[ C = B \log_2 \left( 1 + \frac{S}{N} \right), \]  

where

- \( C \) is the channel capacity in bits per second.
- \( B \) is the transmission bandwidth in Hz.
\[ N = kT, \quad k = 1.3806505 \times 10^{-23} \text{J/K}. \]

- \( B \) is the same as above.

In this chapter some different modulation and symbol recovery schemes will be presented, both for baseband and carrierbased systems.

The main focus of the modulation techniques covered in the first section is on power- or bandwidth-efficiency in an AWGN channel, whereas the second section presents modulation techniques which takes into account interference from multiple users.

When predicting the quality of a transmission link at a location the Q-function or complementary error-function are useful tools. The probability of error in an AWGN channel can be derived by using

\[ Q(z) = \frac{1}{\sqrt{2\pi}} \int_{z}^{\infty} \exp\left(-\frac{x^2}{2}\right) dx = \frac{1}{2} \text{erfc}\left(\frac{z}{\sqrt{2}}\right), \quad (3.2) \]

where

\[ Q(z) = 1 - Q(-z). \quad (3.3) \]

3.1 Modulation

Modulation can be defined as a process of encoding information from a message source in a matter suitable of transmission. A large variety of modulation schemes has been developed through the years. The need of these different schemes are generally due to transmission requirements of the different applications.

Conventional narrowband radio modulation generally involves mixing a baseband signal as a modulating signal with a high frequency carrier, called the modulated signal. Narrowband communication systems are thus commonly classified as carrierbased radio. A typical block schematic view of a narrowband radio system is shown in figure 3.1.

Signal recovery in a narrowband system is commonly achieved through demodulation of the modulated signal (see figure 3.1). More specifically
3.1 Modulation

![Diagram of a typical narrowband communication system](image)

**Figure 3.1**: A typical narrowband communication system. A typical structure consists of High Pass Filtering (HPF), Band Pass Filtering (BPF), Low Pass Filtering (LPF) and multiplication of the signal with Radio Frequency (RF) and Intermediate Frequency (IF) oscillators as shown.

It can be said that the purpose of demodulation is to extract the baseband information from the high frequency carrier in order to make the information available for further processing.

Carrier-based communication covers a wide range of product groups from commercial radio and TV transmissions, satellite communication to short range applications like Bluetooth, WLAN and Zigbee.

Whereas narrowband radio is based on modulation of a baseband signal on a carrier frequency, and thereby consume energy in the production and emission of a carrier, UWB-IR is a carrierless alternative for short range applications. The concept of UWB-IR is based on transmission of a baseband signal without a carrier and thus defined as a carrierless, or baseband, communication system, shown in figure 3.2.

![Diagram of a typical UWB communication system](image)

**Figure 3.2**: A typical UWB communication system

For UWB-IR there are a number of different data modulation techniques that may be used. These well known techniques are not unique to UWB communication, but used in other modulation schemes as well [9].

The performance of the different modulation techniques discussed in this section is under the assumption of coherent detection unless otherwise stated. Coherent detection is defined as when the phase of the RF frequency is incorporated into the demodulation or signal recovery process, generally leading to the optimum performance attainable for a given modulation scheme. This is however not true in all cases [13].
3.1 Modulation

3.1.1 Baseband modulation techniques

The baseband modulation schemes covered in this section are the most popular techniques suitable for UWB-IR transmission.

3.1.1.1 Pulse Position Modulation

In pulse position modulation (PPM) the value of a transmitted bit is reflected as its position within a time frame. This means that if a bit with the value '0' is represented by a pulse originating at a certain time $t$, a bit with a value '1' is shifted in time by the amount of $\delta$ from $t$. A figure describing PPM is shown in figure 3.3.

An analytical expression of PPM is shown in equation 3.4

$$x(t) = w_{tr} \left( t - \delta d_j \right), \quad (3.4)$$

where

- $w_{tr}$ represents UWB pulse waveform.
- $d_j$ represents the bit transmitted.
- $d_j$ assumes the values

$$d_j = \begin{cases} 0, & j = 0 \\ 1, & j = 1 \end{cases} \quad (3.5)$$

In figure 3.3 the Gaussian first derivative pulse is shown, defined analytically as

$$w_{G'} = -\frac{t}{\sqrt{2\pi}\sigma^2} \exp \left( -\frac{t^2}{2\sigma^2} \right), \quad (3.6)$$

where deviation $\sigma$ is proportional to the pulse length $T_p$ by $\sigma = \frac{T_p}{2\pi}$ [3].

The probability of error of PPM in an AWGN channel assuming coherent detection is then

$$P_{ePPM} = \frac{1}{2} \text{erfc} \left( \frac{E_b}{\sqrt{2}N_0} \right). \quad (3.7)$$
3.1 Modulation

3.1.1.2 Pulse Amplitude Modulation

In impulse radio, individually sent impulses may be modulated by the pulse amplitude in a variation of pulse amplitude modulation (PAM). The classic Binary PAM (BPAM) scheme is the most interesting modulation technique as far as BER performance concerns [8,14]. The probability of error in a PAM symbol at an average symbol SNR is

\[ P_{epam} = \frac{M - 1}{M} \text{erfc} \left( \sqrt{\frac{3kY_b}{M^2 - 1}} \right), \]  

(3.8)

and the corresponding BER follows

\[ P_{ePAM} = \frac{1}{k} P_{epam}, \]  

(3.9)

where

- \( M = 2^k \) corresponds to the number of amplitude modulated levels.
- \( k \) is the number of bits per symbol.
- thus symbol SNR is \( \frac{E_s}{\eta_0} = kY_b \).

As \( M \) increases the modulation efficiency decreases and requires a higher SNR per bit to maintain the BER [8,11].
A BPAM presentation using two antipodal Gaussian pulses is shown in figure 3.4. A mathematical expression of the transmitted BPAM signal is

\[ x(t) = d_j \cdot w_{tr}(t), \]  

where

\[ d_j = \begin{cases} 
-1, & j = 0 \\
1, & j = 1 
\end{cases} \]  

### 3.1.1.3 On-Off Keying

On-Off Keying (OOK) is in reality also an amplitude modulating scheme where nothing is transmitted in the case of an '0', using the definition

\[ d_j = \begin{cases} 
0, & j = 0 \\
1, & j = 1 
\end{cases} \]  

As the signal is modulated using two amplitude levels, the probability of error is the same as for BPAM presented in equations 3.8 and 3.9 for both coherent and envelope detection [13].

Figure 3.5 show a OOK modulation scheme using the definition in eq. 3.12 [3].
3.1 Modulation

3.1.1.4 M-ary Bi-Orthogonal Keying

M-ary Bi-Orthogonal Keying (MBOK) is not among the most popular modulation techniques today. The reason why it is mentioned here is simply because of its high modulation efficiency, approaching the Shannon limit mentioned earlier, see equation 3.1.

An MBOK coder/modulator uses a set of \( M \) moderate length ternary codes among an assigned code set to represent \( M \) symbols. In the code set, half the codes are complementary of the other half. For two codes in the set performance is equivalent to BPAM, while longer sequences behaves like direct-sequence spread spectrum (DS-SS) code, covered in section 3.2.2, and the modulation efficiency increases as the length of the code sequence increases approaching -1.59 dB as \( M \to \infty \) for any BER [8,15].

3.1.2 Carrierbased modulation techniques

Digital modulation techniques in carrierbased narrowband systems are generally classified as linear or nonlinear. As the name indicates linear modulation schemes imply that the amplitude of the modulated signal varies linearly with the modulating signal. In general, linear modulation schemes do not have a constant envelope, which yields a good bandwidth efficiency making these techniques popular in mobile radio systems where there is an increasing demand to handle an growing number...
of users within a limited spectrum. The most popular linear modulation schemes are the different types of phase shift keying (PSK).

Nonlinear modulation techniques with a constant envelope are used in many wireless radio systems due to advantages with respect to linear modulation schemes, like simplified design and good power efficiency. Among the nonlinear modulation schemes frequency shift keying (FSK) are the most popular.

3.1.2.1 Phase Shift Keying

Phase Shift Keying is a modulation method where one or several bits are represented by switching the phase of a carrier. The simplest form of PSK is binary PSK (BPSK), which usually implies switching the phase of a constant amplitude carrier signal between two values separated by 180°, corresponding to the two binary levels '0' and '1', shown in figure 3.6.

The most popular PSK modulation is probably Quadrature PSK (QPSK) where the phase of the carrier takes one of four values separated by 90°, thus transmitting two bits for each symbol. Its popularity is due to the fact that its bandwidth efficiency is twice the efficiency of BPSK since two bits are transmitted in a single symbol, and still keeping the probability of error the same as for BPSK assuming coherent detection, shown in equation 3.13, which also is the same error probability as BPAM, referring to equations 3.8 and 3.9.
3.1 Modulation

Figure 3.7: The waveform in a) show a typical continuous FSK signal with no phase error in the shift between '1' and '0', whereas the waveform in b) show a discontinuous signal with phase difference in at time of shifting.

\[ P_{e_{BPSK}} = P_{e_{QPSK}} = Q \left( \sqrt{\frac{E_b}{N_0}} \right) \]  \hspace{1cm} (3.13)

3.1.2.2 Frequency Shift Keying

In Frequency Shift Keying (FSK) a constant amplitude carrier is switched between a number of frequencies corresponding to the number of desired message states. Depending on how the frequency variations are modulated on the transmitted signal, the FSK signal will have either continuous phase or discontinuous phase between bits.

The most common way of modulating FSK on a carrier is to frequency modulate one oscillator with the message signal. In principle this is similar to analog frequency modulation (FM) with a binary message waveform (see figure 3.7) [7].

Discontinuous FSK normally occur when the FSK is imparted on the transmitted signal through switching between two independent oscillators. As a result of this frequency shifting, discontinuities in the modulating signal cause problems like spectral spreading and spurious transmissions. Thus this form of FSK is generally not used in wireless systems, due to the strict regulations on wireless transmissions (see figure 3.7).

In FSK it is also possible to detect a signal in the presence of noise without a coherent carrier reference using matched filters followed by envelope detectors.

The probability of error in FSK systems employing coherent and non-coherent detection is expressed in equations 3.14 and 3.15, respectively.
3.2 Spread spectrum and pseudo-noise coding

\[ P_{e_{FSK,C}} = Q\left(\sqrt{\frac{E_b}{N_0}}\right) \]  \hspace{2cm} (3.14)

\[ P_{e_{FSK,NC}} = \frac{1}{2} \exp\left(-\frac{E_b}{2N_0}\right) \]  \hspace{2cm} (3.15)

A comparison of the performance of PSK and FSK is provided in [16].

3.2 Spread spectrum and pseudo-noise coding

Spread spectrum modulation is a family of modulation techniques which focus on performance in a multiple-user or multiple access interference (MAI) environment. Thus the difference between spread spectrum and the previously mentioned techniques is that the modulation methods described previously strive to achieve greater power- or bandwidth-efficiency in an AWGN channel, whereas spread spectrum systems are bandwidth inefficient in an single user environment but very bandwidth efficient in a MAI environment, as spread spectrum employs a transmission bandwidth several orders of magnitude larger than conventional narrowband techniques.

Spread spectrum signals have pseudo-random noise-like properties compared with the information signal. The spreading of the transmitted signal is controlled by a pseudo-noise (PN) code. The PN-code is a binary sequence appearing to be random but can be reproduced in a deterministic way by a receiving part. Demodulation of the spread spectrum signal is performed by the receiver through cross-correlation with a locally generated template, causing a despreading of the spread spectrum signal restoring the original message signal. A cross-correlation with an undesired user, however, only results in a small amount of noise at the output of the receiver.

There are several interesting properties of spread spectrum, as its inherent interference rejection capability and its ability to exploit the delayed multipath components to improve the performance of the system. The spreading of the spectrum yields an ability to reject interference is important in a MAI environment, and is really a consequence of the use of PN-sequences. Since each user is assigned its own unique PN-code which is close to orthogonal to the codes of the other users, a receiver can differ between each user through the PN-code, although all the users occupy the same frequency spectrum. This implies that up to a certain number of users, interference between users are negligible.
3.2 Spread spectrum and pseudo-noise coding

3.2.1 PN-coding

A PN-sequence is a binary sequence with an autocorrelation that possesses certain autocorrelation properties, and is of great importance in most spread spectrum systems. PN-sequences are generally classified into two groups, periodic and aperiodic. As the name suggests an aperiodic sequence is one that does not repeat itself in a periodic way. Normally it is assumed that the aperiodic sequence has a value of zero outside its stated interval. The periodic sequence, however is a bit sequence that repeats itself exactly with a specific period. In spread spectrum systems mainly the latter are used.

A periodic sequence is considered pseudo-random if it satisfies the following conditions [12,13]:

- The period consists of a number of ones and zeroes, that differs by exactly one. This means that the length of the sequence $N$ is an odd number, presented as

$$|N_{1'} - N_{0'}| = 1$$ (3.16)

- In every period half of the runs, where a run is considered a sequence consisting of subsequent bits of the same value, have the length of one. This means that ones are followed by a zero and vice versa. Further, one fourth of the runs have the length two, one eighth have the length three, and so forth. The number of positive runs are equal to the number of negative runs.

- The autocorrelation of a period is two-valued, and thus described as

$$C(k) = \sum_{n=1}^{N} a_n a_{n+k} = \begin{cases} N, & k = l \cdot N, \quad l \in \mathbb{N} \\ -1, & otherwise \end{cases}$$ (3.17)

where

$$a_{n+N} = a_n$$

3.2.2 Direct Sequence

Direct Sequence Spread Spectrum (DS-SS) is a system that spreads the baseband data by directly multiplying the baseband information data
3.2 Spread spectrum and pseudo-noise coding

with a PN-sequence produced by a PN-code generator, creating a symbol representation of the information bits, called a chip. Synchronized data symbols are added in modulo-2 fashion to the chip prior to phase modulation, typically. At the receiver side typically a coherent demodulation scheme is applied, typically PSK, see section 3.1.2.1.

The spreading of baseband data over a bandwidth much larger than the bandwidth of the original message signal followed by a despreading at the receiver, yields a system gain effect. The filtering in the demodulator remove most of the energy of interfering signals. This introduces the term processing gain (PG). The processing gain expresses a systems ability to suppress interband interference, and can be illustrated as the ratio of the chip bandwidth to the message data bandwidth

$$PG = \frac{B_{\text{chip}}}{B_{\text{data}}}.$$  \hspace{1cm} (3.18)

The performance of DS-SS in terms of probability of error can be expressed as a Q-function assuming each MAI is independent

$$P_{e_{\text{DSSS}}} = Q \left( \sqrt{\frac{K-1}{3N} + \frac{N_0}{2E_b}} \right)$$ \hspace{1cm} (3.19)

where

- $N$ is the number of random chips from each interferer approximating a Gaussian distribution.

- $K - 1$ is the number users which serve as identically distributed interferers.

which is reduced to the BER expression for BPSK in a single user case, see equation 3.13. It should be noted that this is an convenient approximation as the contribution from each MAI is not independent in reality.

3.2.3 Frequency Hopping

Frequency Hopping Spread Spectrum (FH-SS) can be defined as a sequence of modulated data bursts with a time-varying, pseudo-random set of carrier frequency, called hopset. The hopping occurs over a large frequency band that includes a number of channels. Each channel has a bandwidth that is large enough to cover the span of a narrowband
3.2 Spread spectrum and pseudo-noise coding

modulation burst, typically FSK, see section 3.1.2.2. The transmission of the data is done by hopping the carrier frequency between channels in a random fashion, known only to the receiver. Small bursts of data are sent on each channel using a conventional modulation scheme, before the transmitter hops again. FH-SS systems can use several carriers simultaneously, but usually one or two is used [13].

The fact that FH-SS is based on the use of carriers makes it in principle non-applicable to UWB-IR.

3.2.4 Time Hopping

Time Hopping spread spectrum (TH-SS) is a technique that divides the transmission time into time frames which in turn is divided into time slots. During each frame only one time slot is modulated with a message signal. The slot chosen to be modulated for a given frame, is selected by a PN-code generator. The information bits are transmitted in a burst during the selected time slot. At the receiver side the message signal arrives at a rate much faster than sent out, which requires the signal to be stored and retimed to the original message rate [13].

The processing gain of a TH-SS system is simply twice the number of time slots \( k \) in a frame

\[
PG = 2k
\]  

(3.20)

Interference between users are normally minimized by coordinating the times at which each user can transmit a signal. If transmissions collide this will cause message errors and cause the need of forward error correcting codes [13].

It should be noted that the acquisition time is similar to that of DS-SS and the implementation is simpler than for FH-SS [13].

3.2.5 Multiple access

Since the emitted energy is spread out in the frequency domain it is fair to say that UWB systems can be characterized as an extension of traditional spread spectrum schemes. UWB-IR provides multiple access as different users are allowed to share the same physical medium in the communication process. The separation between users is possible when the transmission is shared in a coordinated manner by assigning each user a channel. In the different multiple access schemes for carrier-based communication systems a channel is usually corresponding to
time, frequency or code division multiple access (TDMA, FDMA, CDMA respectively). UWB based systems may adopt any of these methods; for the IEEE 802.15.3a standard a TDMA based approach seems to be the current trend [4].

As impulse radio employ PN sequences in shaping the spectrum according to the emission mask, two approaches have gained most interest. The most common methods of encoding data is time hopping and direct sequence schemes treated in the previous chapter. For time hopping sequences, the sequence can serve as user signatures to ensure access to the medium for multiple user. This multiple access scheme is thus called time hopping multiple access (THMA). A second approach is a pulsed version of DS-CDMA where each user is assigned an unique code. The THMA and DS-CDMA approaches are usually referred to as TH-UWB and DS-UWB.

Each code modifies the transmitted signal in such a way that the receiver is capable of separating a useful signal from the other users signal, which are seen as interfering signals by the receiver. The possibility of removing these unnecessary contributions depend mainly on the characteristics of the codes used for separating the transmitted data flow. Under ideal conditions, the receiver is not affected by the presence of multiple transmissions. In a realistic scenario, however, ideal synchronization and code orthogonality is lost due to different propagation delay on different propagation paths. As a result the receiver might not be capable of separating different data flows leading to a system performance affected by MAI.

### 3.3 Synchronization

Synchronization is a major issue in symbol recovery in UWB-IR as in spread spectrum techniques and is to a certain degree depending on the properties of the propagating channel. A single path environment would impede synchronization, however that kind of conditions rarely occurs, as some reflections are likely to appear especially in an indoor environment.

There are several different synchronization levels operating for carrier, code, symbol, word, frame and network. In spread spectrum techniques normally code synchronization is performed; when the receiver is synchronized the received spreading code and the reference spreading code is aligned with the same phase.

Synchronization can be split into two phases, acquisition and tracking, and communication using any spread spectrum scheme is only possible
3.4 Symbol detection

if the necessary synchronization are performed with sufficient accuracy. For DS systems an uncertainty region corresponding to a multiple of the length of the code sequence occur. For systems applying TH modulation this region is divided into a number of cells, depending on the number of possible pulse position combinations in a bit interval [3].

UWB-IR systems uses very large bandwidths, much larger than traditional spread spectrum methods. Nevertheless UWB-IR can in many ways be considered as a spread spectrum technique. This consideration applies to synchronization as well, as code synchronization should be performed in UWB-IR systems as it is in conventional spread spectrum techniques. This will be treated further in section 4.5.2.

3.4 Symbol detection

There are some differences between UWB-IR and spread spectrum schemes, as the properties of the communication channel and the duty cycle used in the transmission. Impulse radio may operate with a extremely low duty cycle compared to spread spectrum systems. In comparison with such conventional schemes this leads to processing gain defined as

$$PG_{dc}(dB) = 10 \log \left( \frac{T_f}{T_p} \right)$$

(3.21)

where $T_f$ is the symbol or frame time and $T_p$ is the total pulse width within the symbol or frame time.

As discussed in chapter 2, the UWB channel is extremely multipath rich, also compared to a spread spectrum channel. The way the multipath components combine affect the total received power. An increase in the received power occur if the multi-paths components are combined, whereas interference is the result if these components are not combined.

Transmission of information may be characterized as single- or multi-pulse signal transmissions. As the names suggests a single-pulse transmission transmits one pulse per symbol, whereas in multi-pulse transmissions the transmitter introduces redundancy by increasing the number of pulses per symbol in order to improve performance. The improvement in performance is referred to as processing gain, defined as

$$PG_{ss}(dB) = 10 \log (N_s)$$

(3.22)

where $N_s$ is the number of pulses per symbol.
The presence of multiple pulses per symbol yields two possible strategies for construction of a receiver; the soft decision detection and the hard decision detection. In soft decision detection the signal formed by $N_s$ is considered as a single multi-pulse signal by the receiver. The received signal is cross-correlated using a correlation mask matched with the train of pulses representing the symbol. The received energy in this case is increased by a factor equal to $N_s$. The BER is consequently decreased, without increasing the transmitted power. However, the bit rate is also reduced by the same factor [4].

Hard decision detection implies that the receiver performs a number of independent decisions equal to $N_s$ over the pulses representing the information bit. The decision is performed on the basis of a majority criterion. A given number of pulses exceeding a threshold are compared to the number of pulses that falls below the same threshold, the estimated bit corresponds to the highest number of the two. Error occurs when half of the pulses or more are misinterpreted.

Comparisons of the performance of the two methods in general show that soft decision detection outperforms hard decision detection in a AWGN channel, whereas the opposite is the case in presence of interference noise. With reference to UWB-IR, hard decision detection generally performs better than soft decision when several interfering UWB signals are present. This is however not necessarily true in all cases [4]. Performance of hard decision detection is mainly affected by the number of interferers leading to collisions when deciding for a single pulse, while performance of soft decision is affected mainly by the average interfering power received over a symbol period.

### 3.5 Summary

Wireless communications in general trade mobility and the need for cable connection for increased system loss. In general the system loss in short range communication systems is due to the effects treated in chapter 2 which yields the need of some modulation scheme, usually one of the techniques, or a combination of the techniques.

In this chapter an introduction to different modulation techniques is given, specifically techniques relevant to short range communication systems in particular UWB-IR. As the different modulation schemes have different properties, the requirements of the application in which the scheme is applied to decides which technique or combination of techniques is the best. A part of these properties is the probability of error and, for the spread spectrum techniques, processing gain, which is given
3.5 Summary

for the different modulation methods making comparison of the different schemes possible.

For UWB-IR several modulation schemes are feasible, typically TH-UWB and DS-UWB, which combine TH-SS and DS-SS with multiple access. DS schemes in general presents better results than TH schemes [17, 18], however in a AWGN channel with a low duty cycle the performance of the two schemes is very similar [3]. The choice of which baseband modulation scheme to implement may cause some difference in the performance with respect to BER. The performance of antipodal PAM typically outperform PPM in combination with either THMA or DS-CDMA [4, 14], whereas PPM typically outperform OOK [19, 20]. The performance of M-BOK is however better than the other schemes [8, 15].

Which modulation scheme to use depends on the expected operating conditions, as loss and reflections, and the desired system complexity. M-BOK involves the most complex implementation, whereas antipodal PAM is a simpler approach performing quite well compared to PPM and OOK. A conclusion on the choice of scheme for the approach presented in this thesis would be a DS-UWB approach employing BPAM with hard decision detection.
3.5 Summary
Chapter 4

Receiver structures in UWB-IR systems

Impulse radio is a UWB communication system for low power, short range applications, typically utilizing transmission of PN-sequences using the modulation schemes presented in the previous section. Because UWB systems use very large bandwidths, many multipath components can be identified from the received signal. This causes problems as the transmitted energy is spread over a large number of multipath components, causing the energy of each path to be very low. Detection of symbols in UWB-IR is therefore usually obtained by some sort of correlation process. Correlation is defined as a process that compares an interval of a signal with a template waveform and produces an output proportional to the integral of the product of that interval [21]. Correlating receivers are devices that detect weak signals in noise by averaging the product of the received signal and a locally generated waveform possessing some known wave characteristics.

This chapter deals with detection of UWB-IR signals. First an introduction to MAI is provided including a derivation of probability of successful packet reception. In the subsequent sections some different receiver approaches are treated, focusing on the RAKE structure.

4.1 Multiple access interference and packet collision

As described in the previous chapter the nature of an UWB-IR communication system is characterized by system loss as a result of the low emission level in multipath environments, other radio communication systems operating within the frequency spectrum and multiple users. The
fundamental problem to be faced by UWB-IR, as well as all communication systems operating over a shared link, is how to efficiently mediate access to the shared medium. Typically this is addressed by reorganizing the data into information units called packets which is coded before sent over the air. Usually some redundancy is added to these packets in order to obtain error detection and correction for example different Forward Error Correction (FEC) schemes and Cyclic Redundancy Check (CRC).

Multiple access interference may be analyzed under this perspective by observing that interference corresponds to packet collision. The term packet collision may be defined as interference provoked by collisions occurring between pulses belonging to different transmissions. For a network consisting of asynchronous users transmitting data in an uncoordinated manner it is reasonable to assume a packet inter-arrival process following a Poisson distribution [4]. A packet contains a number of pulses, which is a part of a set of \( N_s \) pulses carrying the information of one bit. Predicting the pulse inter-arrival process is a rather complex matter depending on modulation, channel and code properties. In order to obtain an expression on the probability of pulse collision, an assumption that the pulse inter-arrival process follows a Poisson process, is made. In doing so a probability that one or more pulses will collide with the transmitted pulse can be expressed as

\[
P_{\text{pulsecollision}} = 1 - \exp \left( -2 \left( N_u - 1 \right) \frac{T_M}{T_s} \right)
\]  

where

- \( N_u \) is the number of users.
- \( T_M \) is the time occupied by a single pulse.
- \( T_s \) is the time occupied by a symbol containing \( N_s \) pulses.

An expression of the probability of pulse error can be derived from this [4], assuming the collision caused a random detection at the receiver

\[
P_{\text{pulseerror}} = \frac{1}{2} P_{\text{pulsecollision}}.
\]  

Further, assuming an error on a bit when more than half of the pulses in a symbol is corrupted. This corresponds incidentally to hard decision detection as described in section 3.4. An expression on BER is obtained
4.2 The correlator receiver

\[ P_b = \sum_{i=1}^{N_s} \left( \begin{array}{c} N_s \\ i \end{array} \right) p_{\text{pulseerror}}^i \left( 1 - p_{\text{pulseerror}} \right)^{N_s - i}. \] (4.3)

Following the probability of correct detection

\[ P_{\text{correct bit}} = 1 - P_b \] (4.4)

The derivation of this equation is simplified and does not take into account dithering or the use of either THMA or DS-CDMA.

Assuming that a packet error occur if one bit or more is corrupted, a probability for a successfully received packet is

\[ P_{\text{correct packet}} = (1 - P_b)^L \] (4.5)

where \( L \) is the packet length.

It should be noted that for low data rates, the probability of packet error is extremely low \(< 10^{-4}\), for a number of packets simultaneously in the air up to 50 [4].

4.2 The correlator receiver

The correlator, or autocorrelator, receiver is perhaps the simplest approach to an UWB receiver, often referred to as an optimum receiver for an AWGN channel. The principle behind this receiver is that correlation is performed by mixing of the received pulse with a template pulse waveform followed by an integration, where the product of the integration is sampled as output data. The integration interval is somewhat longer than the pulse length in order to pick up some of the multipath components.

In order to optimize the processing gain and SNR, the template waveform should be same as the received waveform. This is however not easily achieved as the received signal typically will be distorted by the transmission channel. Thus generating a template similar to the received pulse is difficult without severely increasing the circuit complexity. The complexity of template generation can be avoided by approximating the template by using a delayed version of the transmitted pulse or by using a rectangular pulse. The correlator receiver is showed in figure 4.1 [8, 3, 4].
The main challenges in this approach are providing accurate synchronization and performing the necessary operations at the required speed. The synchronization of the template generator is sensitive to timing jitter in the pulse generation both at the transmitter and the receiver, causing the performance of the receiver to degrade. In addition, the need of high processing speed in the system causes the overall performance to suffer from high power consumption.

The optimum receiver for an AWGN channel may not be an appropriate scheme in an multipath environment with interfering transmissions from other users, as the structure foresees the presence of a correlator that is matched to one single waveform, whether it is single- or multi-pulse. The receiver performance for this receiver improves proportional to $\frac{E_b}{N_0}$, implying that the average transmitted power has to increase in order to improve performance.

### 4.3 The matched filter approach

The matched filter receiver, also referred to as a Finite Impulse Response filter (FIR) [21], is a structure used in both DS and TH modulated systems, typically employing soft decision detection through a cross-correlation with a correlation mask. A matched filter is matched to the correlation mask containing a multi-pulse signal corresponding to a PN-sequence, giving it an impulse response equal to the PN-sequence reversed in time. Generally a matched filter receives data from a front-end consisting of a LNA and a filter followed by sampling. The received signal is then despread in a matched filter which will peak when the data match a specific PN-sequence, implying that the data and the PN-sequence are synchronized.

As mentioned the front-end of the matched filter receiver typically incorporates a sampling of the incoming signal from the LNA. In order to reduce the sampling frequency this is usually achieved by a number of parallel Analog-to-Digital Converters (ADC). The signal is then passed on
4.4 The RAKE receiver

A RAKE receiver is a homodyne cross-correlator utilizing a direct RF-to-baseband conversion. Intermediate frequency conversion is not needed making the implementation simpler than conventional heterodyne systems. The RAKE receiver takes advantage of multipath propagation by combining a large number of different and independent replicas of the same transmitted pulse, in order to exploit time diversity of the multipath channel. In general different approaches to the RAKE structure can support both TH and DS modulated systems, applying soft- or hard-decision detection.

There exist a number of different schemes for exploiting diversity, involving the need of including some sort of weighting function in the
4.4 The RAKE receiver

Prior to the correlators, the signal passes through time delay elements, which function is to align the multipath components in time. In this case the correlator mask adopted to all the correlator banks may be the same. The correlator banks is followed by a combiner that determines the variable to be used for the decision on the transmitted symbol employing a set of weighting functions that are used to combine the outputs of the correlators at each finger. These weighting functions are in general performed by amplifying the strongest components and attenuate the weak ones [4]. Different weighting schemes include [4,3]:

- Maximum Ratio combining (MRC) where weights are determined to maximize SNR.
- Selection Diversity (SD) which selects the strongest multipath components.
- Absolute Combining (AC) add the absolute value of all the outputs before feeding the detector.
- Equal Gain Combining (EGC) which add all the outputs without any weighting.

The choice of weighting scheme typically depend on type of receiver and desired structure complexity.

A RAKE receiver like the one in figure 4.3 must know the time distribution of all of the multipath components received. Usually this task is
4.4 The RAKE receiver

performed by supplying the receiver with the capability of scanning the channel impulse response and adjusting the delay of the delay elements according to the delay of a certain number of multipath components. Time delay synchronization is performed based on correlation measurements on the received waveform. Knowledge of the amplitudes on the received signal is also required in order to adjust the weighting functions, achieved by using pilot symbols for channel estimation [4].

The receiver in figure 4.3 can be greatly simplified if the channel is modeled with a discrete time impulse response. Figure 4.4 shows a simplified structure of the RAKE receiver where the different contributions are spaced in time by a multiple of the time span of detectable multipath components, called a pulse window or bin. This structure opens for a single correlator structure employing the correlator mask \( m(t) \).

The receiver in figure 4.3 integrates the product between \( m(t) \) and \( r(t) \), and the output of the correlator is sampled with period \( t = k\tau \) before passing through the delay units and a combiner implementing a weighting function.

Implementing a RAKE will increase the complexity of a receiver proportional to the number of multipath components analyzed before decision. However, RAKE receivers profit from gain obtained simply due to its structure, called maximum RAKE gain, defined as the ratio of total energy density to the single-impulse energy density [8]. The maximum RAKE gain can be expressed as

\[
RG_{\text{max}} (dB) = -10 \log \left( 1 - \exp \left( \frac{-t_0}{\tau_{rms}} \right) \right) \quad (4.6)
\]

where
• $t_0$ is the average impulse arrival interval.

• $\tau_{max}$ is the rms delay spread.

The number of resolvable paths in UWB systems are generally considered to exceed 100, because of the fine path resolution. A decrease in processed multipath components leads to a decrease in the energy captured by the receiver. Measurements done in office buildings show that a RAKE receiver requires 50 branches or fingers in order to capture 60% of the total energy of the received signal [4].

4.5 The orthogonal RAKE architecture

The orthogonal RAKE receiver presented in this thesis is based on a structure called a full RAKE structure. The full RAKE receiver is in reality an approach to the all-RAKE (ARAKE) receiver. The ARAKE is a theoretically ideal concept as it captures all of the received signal power by having a finger for each multipath component. Since a signal transmitted in a multipath environment in theory consists of an infinite number of multipath components, the ARAKE requires an infinite number of RAKE branches and an infinite number of correlators. Consequently, implementation of the ARAKE receiver is not possible [3]. The full RAKE receiver, however, has a number of fingers reflecting the time span of a bin. A typical time span of one bin is 50 ns, but may vary significantly due to environmental conditions as a result of different rms delay spread as mentioned in section 2.1.5.1.

The RAKE receiver presented is based on an orthogonal structure that has a lot in common with the RAKE receivers discussed in the previous section, especially the receiver structure in figure 4.4. The structure presented in this section, however, is an even simpler structure based on the assumption that by capturing the energy of the first 50 ns subsequent to a detected pulse, enough energy is collected to make a qualified decision on whether a symbol has arrived or not. The combining of the correlated signals is performed by a simple EGC combining all outputs from the correlators on one line without any weighting leading to reduced complexity at the cost of performance. In addition to the absence of weighting the main difference between this RAKE receiver compared to the conventional RAKE receivers presented in section 4.4, is the implementation of the correlator using a stored template PN-sequence.

In order to detect and shape the incoming signals, a front-end containing a LNA and some coarse bandpass filtering was included in the total
receiver structure [24]. The RAKE receiver is only concerned with correlation between incoming pulse trains and the template PN-sequence, and is designed under the assumption that the front-end detect the arriving pulse train and process each pulse into a rectangular pulse with a duration of 1 ns. Figure 4.5 show the orthogonal RAKE topology.

The shaped pulse train from the front-end output enter a delay line consisting of a number of taps corresponding to the number of RAKE fingers. The pulse train is then propagating through the delay line, while the signal is sampled into the RAKE fingers at a suitable rate, preferably 20MHz. The sampled signal is clocked through the fingers made up of shift registers, one for each finger. A continuous convolution is then performed in each finger, comparing the incoming sequence with the stored template and returning a result of the correlation. When the number of matching bits in a finger reaches a user specified threshold, the output of the finger return '1', indicating a match.

A conclusion on the choice of modulation scheme for this approach was made in chapter 3. The suggested scheme for the proposed receiver
4.5 The orthogonal RAKE architecture

structure was a DS-UWB approach employing BPAM with hard decision detection. The orthogonal RAKE receiver presented employ a coded DS-UWB approach, but does not include any processing of the biphase information in BPAM coding. The lack of biphase processing imply that the receiver only detect whether there is a pulse or not, which correspond to a OOK coding approach.

4.5.1 Sampled delay line

The incoming pulse stream enters a tapped delay line made up of a series of delaying elements corresponding to the number of required taps. Since the incoming pulse has a pulse width of 1 ns it was necessary to use fast elements to achieve the desired functionality. As the fastest element in CMOS technology today is the inverter, it seemed natural to build the delaying elements up of inverters.

The delay line "stores" the sequence while the sampling takes place. The total time delay of this delay line is spanning one bin, consisting of a number of elements which relates to the number of RAKE fingers. The delay line is sampled by a clock with a suitable rate reflecting the relationship between the number of fingers and the total time delay of the delay line, in this case the sampling frequency is 20 MHz. This implicates a considerable down-conversion of the operating frequency; a sampling frequency of 20 MHz corresponds to a down-conversion by a factor of 50 of the original frequency, leading to lower power consumption.

A sampling clock of 20 MHz means that the sampling operation does not satisfy Nyquist’s theorem, which states that sampling of a signal has to happen at a frequency twice the sampled frequency in order to avoid misinterpretation. This delay line, however, is not meant to measure or detect frequency, but rather detect the presence or absence of a bit. One transmitted pulse will appear as a pulse train propagating through the delay line with a high duty cycle as a result of several multipath components arriving at the receiver, see figure 4.6. The principle behind the sampled delay line suggests that each multipath component is sampled once on its way through the line. However, it is almost impossible to design a delay of exactly 50 ns, giving a worst case scenario where the pulse is hidden inside an delay element at the time of sampling, if the unit delay is longer than 1 ns. It was therefore designed for a unit delay of less than 1 ns. This would cause the pulse to overlap between two taps, resulting in a detection of the pulse at two taps of the previously mentioned worst case scenario. The possibility of this double detection has to be considered in the detection process, however it concerns only a small number of fingers and should not introduce major errors.
4.5 The orthogonal RAKE architecture

The sampled delay line principle requires some accuracy at the sampling clock oscillator in order to provide accurate relative synchronization. In this context this means that the oscillators providing the 20 MHz clock, both at the transmitter and the receiver side, has to operate with a low frequency drift with respect to the 20 MHz frequency. Drift in the clock frequency at either side would result in an error at the receiver. The sampled pulse trains will then drift in the delay line at the time of sampling, resulting in pulses with the same position in the pulse train to be detected in different fingers within the time of a symbol. This could cause misinterpretation of a symbol. It is therefore desired to use fairly accurate oscillators both at the receiver and the transmitter side in order to minimize error imposed by frequency drift.

This RAKE receiver does not require any absolute synchronization between the two clock oscillators, as symbol synchronization is obtained in the case of a symbol match.

4.5.2 Synchronization

The incoming pulse is detected and quantized in real-time without any synchronization or clocking [24]. Since the incoming sequence is just a continuous stream of pulses, some kind of synchronization is needed. This synchronization is provided by the delay line and a sampling clock. All the transmitted data bits arrives as pulse trains occupying the delay line before it is sampled out at a rate of 20 MHz. The synchronization scheme employed is called code synchronization as the signal is syn-
chronized at the event of a match. This is also the case for the different matched filter and RAKE receiver approaches using similar techniques in order to achieve synchronization.

4.5.3 Analog correlation

The samples from the delay line are clocked into a series of shift registers making up the RAKE fingers. The length of each shift register is corresponding to the length of the transmitted symbol. The length of this symbol is really a result of a trade-off between canalization and BER. A system requiring a higher number of channels get a higher BER compared to a system requiring a lower number of channels for a given symbol length.

As a result of the parallel structure of the architecture, several fingers can contain parts of the transmitted symbol simultaneously, depending on the number of multipath components. As suggested in section 4.5 the PN-sequence stored in the shift registers is cross-correlated with a stored template. The actual comparing of voltage levels takes place in each RAKE cell, made up of one D-latch and one correlator. As bits are shifted through the shift register, a running cross-correlation is done against the stored PN-sequence. This result in a mixed-mode solution as the correlation is used combining digital shift registers with analog computation in the correlator.

Each sample from the delay line is clocked in parallel through the receiver. The outputs of all the correlators belonging to a finger is summed on one combiner line. In this approach no weighting functions is applied to the outputs of the correlators, corresponding to a simple version of EGC. The degree of match can then easily be computed by applying simple hard decision detection as one finger performs a number of independent decisions corresponding to the number of bits in a symbol. The result of these operations is compared to a threshold and considered a match if the number of decisions exceed the threshold. The decision in hard decision detection is usually based on a majority criterion. The presented solution differ from this definition in that it includes an adjustable threshold level for the final decision.

The combiner line makes up a load capacitance as a result of the parasitic capacitance of the metal line itself together with the small drain capacitances of all of the transistors connected to the combiner line. The voltage level will increase with a rate corresponding to the relation between the number of correlators contributing with a charging current and the load capacitance. The decision is carried out by continuously
4.6 Summary

looking at the voltage level of the combiner line. When this voltage exceeds a specified threshold voltage decided by the desired level of match, the correlator output indicates the presence of a symbol by a binary ‘1’.

There is some statistical considerations to be taken into account in calculating the probability of match, but by applying a simple current limitation to each correlator, a range of adjustment on the steepness of the voltage curve is achieved. This gives a possibility to adjust the number of correlators required to get a symbol match. The result is a simple analog computation achieved by exploring one of the most fundamental laws of electronic engineering, Kirchoff’s current law. At this point it should be noted that the correlation is an energy manipulation process. The receiver must receive enough energy to exceed some threshold, and the correlator integrates the signal energy of a long, low power signal into a short high power signal.

If more transmitting channels within the range of a device is required, the number of symbols can easily be increased to meet that requirement. The number of correlators in each cell together with combiner lines and code registers must in that case be duplicated. The number of correlators, combiner lines and these three parts correspond to the number of symbols.

4.6 Summary

In this chapter the most popular receiver topologies currently used in UWB-IR applications are presented. Common receiver structures include correlator circuit, matched filter receivers and RAKE receivers.

A proposal of a spatial RAKE receiver were presented as a possible approach for a low power UWB-IR receiver, exploring inverter delay lines for synchronization and mixed-mode correlation for symbol recovery. By combining a delay line and a mixed-mode correlator we can explore multipath reflections in performing a convolution of the parallel bit streams with a stored template and at the same time eliminate the need for absolute synchronization. This approach does, however, require some accuracy with regard to relative synchronization. The computation involved is a simple solution achieved through continuous correlation of discrete values followed by a thresholding of a combiner line containing the sum of all the correlation operations performed. By implementing a receiver as described using delay lines we explore parallelism in trading parallel correlation for lower clock frequency. As a result lower power consumption is achieved.
4.6 Summary
Chapter 5

CMOS implementation

In this chapter the design of the RAKE receiver with its building blocks is described. The issues treated are concerns taken in the design process in order to achieve the desired functionality.

The receiver itself is a 50 by 50 matrix structure of cell elements, where each cell consists of one D-flipflop and one correlator. The total number of cells in the structure is then 2500, which in the end is a considerable number of transistors. The pulse is distributed through a delay line which is sampled by a clock running at 20 MHz. The sampled signal is then clocked through 50 parallel shift registers and correlated with a stored template. The result of this correlation is a voltage level of a combiner line. When this output level reaches an adjustable level a comparator at the end of each line is triggered and gives a binary '1' at its output. In this implementation no combiner is included to process the data from each of the 50 fingers.

It was desired to keep the devices within the different components as small as possible, due to the number of cells in the structure. An increase in area of a cell would lead to an increase in the size of the overall structure by a multiple of 2500. For the delay elements and the comparator the area would increase by a factor of 50 corresponding to the number of fingers. The basis for the design of each component was therefore to make the design compact and small. For most of the devices minimum size was therefore a natural starting point in the design process.

The basis for the choice of technology for implementation of the RAKE receiver was to make a low cost low power circuit in CMOS. The circuit was realized in a 0.12\(\mu m\) standard CMOS process from ST Microelectronics. The tool used for the schematic and layout implementation was Cadence Composer and Cadence Virtuoso, respectively, in a Cadence 4.4.6 environment.
The circuit is designed under the assumption that the incoming pulse is a shaped and quantized by a front-end, and arrive the RAKE receiver with a pulse width of 1 ns [24].

The total system is divided into blocks which is described in the following sections.

### 5.1 Delay line

Synchronization of the received signal is provided by a delay line. The function of this delay line is to store the history of the received pulse with its corresponding multipath components, within its structure for a certain time. This time is corresponding to the relationship between the desired sampling frequency and the chosen time span of one bin, which in this case is 50 ns. The delay line is as the name suggests just a line of delaying elements which is built up of inverters.

The size of the transistors used in the inverters was decided as a compromise between the high frequencies involved and the demand for low power consumption. As a result of this trade-off combined with simulations in Cadence it was decided that minimum sized transistors are sufficiently capable of handling the present conditions. The simulations were performed on one delay element consisting of a chain of inverters, and indicated that a delay line made up of minimum sized inverters could handle frequencies up to 5 GHz. The simulation is shown in appendix C. Based on that fact it seemed likely that these elements could handle a worst case scenario of continuously occurring pulses. The minimum size, however, imply that the width of the pMOS transistors should be corrected to compensate for the difference in mobility. This difference was confirmed through dc-simulations to correspond to a width of the pMOS of approximately 0.5μm for a minimum nMOS width of 0.15μm. The relationship between pMOS and nMOS mobility in the 0.12μm process were thereby calculated to be \( \frac{0.5}{0.15} = 3.33 \). The pMOS transistors used were therefore designed according to this difference with a width of 0.5μm.

The distributed pulse is assumed to have an expected pulse width of 1 ns. According to simulations the number of delay elements giving the typical delay closest to 1 ns was 34 elements. There are however various factors causing inaccuracy in the simulations, like mismatch, process variations and how accurate the model is compared to the silicon implemented device. Another factor is the effect parasitic capacitances have on the performance of the implemented structure. These are important issues that may influence the performance of structures like the delay
5.2 Mixed-mode correlator

line, especially with respect to the fine pitch process applied and the frequencies involved. Thus both the pulse and the delay line had to be designed with this in mind.

One issue that had to be addressed was the shadowing effect that could occur if the time delay of the delay elements were too long compared to the pulse width. This would result in a worst case scenario where a pulse might be hidden inside a delay element at the time of sampling. This was of course an unwanted state indicating that data could be lost. With a slight overlap between the outputs of two subsequent delay elements, this problem should be avoided. Consequently, the delay elements were designed with 30 inverters instead of the 34 inverters which according to simulations had a typical delay closest to 1 ns. The reduction in numbers from 34 to 30 gave a reduction in the delay which should provide the necessary overlap to avoid any shadowing. The layout of a delay element is shown in appendix B.

The total number of delay elements required for this type of applications relates directly to the number of RAKE fingers implemented. As no delay is needed for the finger at the input, the required number of delay elements is one less than the number of branches. The receiver at hand has a total of 50 fingers, yielding the demand of 49 delay elements. With each element consisting of 30 inverters, this adds up to a total of 1450 inverters. The total delay line structure is pictured in appendix B.

Simulations of the delay element and delay line is shown in figures 5.1 and 5.2. The delay from an event occur at the input of a delay element, to a change occur at the output is 852ps.

Further simulations are shown in appendix C.

5.2 Mixed-mode correlator

In order to achieve symbol recovery some correlation at the receiver’s side has to take place. As the desire for low power consumption remains a major issue, analog computation is explored as a correlator may be implemented simply using three transistors like a current-mode AND-gate, as shown in figure 5.3. The two transistors at the top perform the actual correlation, while the bottom one is a current limiter controlled by a biasing voltage. This is a probable solution for a future receiver.

However, it was interesting to explore the property of canalization, which is one of the great advantages achieved when PN-coding is used. Thus some measurement of the difference in detecting one or both logic levels was desired. In order to achieve the desired functionality some functions
5.2 Mixed-mode correlator

had to be added to the correlator. By including simple digital logic, two different detection modes are available; only matching of binary '1' and both '1' and '0'. These two operation modes are corresponding to a combined function of a XNOR- and a NAND-gate, since the correlator output is passed on through pMOS transistors. An extra input to activate the desired mode of operation was therefore required.

The correlator circuit consists of two parts, one part controls the correlation current, and one part decides whether there is a match or not. Since the correlation operation is performed by additional logic, the current-mode AND-gate suggested earlier in figure 5.3 is reduced to two transistors, in this case pMOS transistors. The top pMOS is a current limiter controlling the current flow in case of a match and the level of current flowing through is controlled by an external bias voltage ($V_{bias}$) at the transistors gate. The bottom pMOS turns the current on and off in response to the voltage output from the digital logic. The configuration with the switching transistor close to the combiner line was originally designed to shield the power rail from switching noise. Introducing switching noise to the power rails is generally unwanted as the performance of devices connected to the same rail could be affected by high frequency peaks from the power supply. This of course implies that switching noise is imposed to the combiner line, which again might cause the comparator to trigger a output signal prematurely. This subject is discussed in chapter 7.

A secondary effect achieved by placing the switching transistor close
5.2 Mixed-mode correlator

Figure 5.2: A simulation on the delay line. The simulation show the signal after one element, 10 elements and 50 elements. The black waveform is the input clock with a frequency of 500 MHz, which correspond to a pulse width of 1ns. The blue waveform correspond to the input signal after one delay element. The red waveform correspond to the input signal after 10 delay elements. The green waveform correspond to the input signal after 50 delay elements.

to the combiner line is a reduction of the total load capacitance of the combiner line as the capacitance at the drain and source of the biasing transistor and the source of the switching transistor was avoided.

The digital logic that performs the logical operation of comparing the two input voltages is a constellation of transistors organized as differential cascode voltage switch logic (DCVSL) consisting of two pMOS transistors and a pull-down network (PDN). The choice of this circuit topology was for the most part out of simplicity, as the emphasis at this point was to achieve the desired functionality. However this kind of logic style completely eliminates static currents and provides a rail-to-rail swing by applying two concepts; differential logic and positive feedback. The applied differential logic does in turn require that each input is provided in complementary format, thus three inverters was embedded in the PDN, one for each input. In return the output signal is also provided in complementary format. When the drain on one pMOS-transistor is high, the other one is “closed” as a result of the positive feedback, and vice versa. The PDN is divided into two mutually exclusive parts, and the logic func-
5.2 Mixed-mode correlator

In the basic principle of the correlator, the drain of the “closed” pMOS always is pulled down to lower rail level [25].

As mentioned, the differential gate provides the output signal in complementary format. This means that with the topology shown in figure 5.4 the gate provides the function of a combined XNOR- and NAND-gate or a combined XOR- and AND-gate, depending on the choice of output. The actual comparing of the two input voltages with the control input corresponds in this case to a combined XNOR- and NAND-function, since the output was connected to the gate is provided by pMOS transistors. A truth table of the logic function provided by the differential gate is shown in table 5.1.

<table>
<thead>
<tr>
<th>Control</th>
<th>In1</th>
<th>In2</th>
<th>Out</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 5.1: The truth table of the logic function of the differential gate. When the Control input is '1' the combinatoric function of In1 and In2 corresponds to the function of a NAND-gate, whereas if the Control input is '0', the combinatoric function of In1 and In2 corresponds to a XNOR-gate.

When the Control input is high, the output response to the voltage level
5.2 Mixed-mode correlator

Figure 5.4: The differential gate correlator with its pull-down network

on the other two inputs corresponds to the logical function of a XNOR-gate. Whereas if the control input is low, the output response to the input voltage at the other two inputs corresponds to the logical function of a NAND-gate.

Simulation of the performance of the correlator indicated that minimum sized transistors were able to match the requirements with respect to functionality. The logic should not have any problems in performing satisfactory under the present conditions, as the output is only connected to one pMOS-gate, and the minimum speed requirement is only 20MHz. There could be some uncertainty regarding the two pMOS-transistors controlling the current to the combiner line. The total load this line represents is unknown due to its parasitics and all the drain capacitances of the interconnected correlators. Despite this uncertainty it was decided to use minimum sized transistors here as well as the rest of the correlator circuit. This decision was made on the basis of two reasons.

- First there is a relatively high number of correlators on one line, and the length of the combiner line relates directly to the number
5.3 Comparator

The essential function of the RAKE receiver is symbol recovery. Whether there is a match or not in each finger is decided as a result of the com-

![Figure 5.5: Simulation showing the functionality of the correlator. The input signals In1 and In2 is shown together with the Control input. The bottom waveform is the output response from the logic, corresponding to the truth table in table 5.1.](image)

... of correlators on one finger. Even if the number of matching correlators is limited by some statistical factor, a certain percentage, probably between 60-70%, of these current sources should handle the load. Adding the width of all correlators contributing to the charging of the line yields a considerable total width.

- Second, according to simulations the circuit performed quite well and match was achieved with a threshold of 600mV. Simulations show that the circuit is capable of driving the combiner line even when a heavy load is applied.

Furthermore, the simulations show that by choking the current a symbol match can be fully regulated within the whole range between 1 and 50 clock pulses. Aiming at a match at 60-70% of total symbol size, this is more than sufficient.

Simulations on functionality of the correlator is shown in figure 5.5. The layout of the whole correlator circuit is shown in appendix B.

5.3 Comparator

The essential function of the RAKE receiver is symbol recovery. Whether there is a match or not in each finger is decided as a result of the com-
biner line and a comparator. The comparator is placed at the end of each combiner line, performing a continuous comparison of the voltage level at the combiner line to an external set trigger level. A symbol match means that a certain percentage of the correlators on one finger is contributing to get the level of the combiner line high enough to trigger the comparator. The comparator output a pulse indicating a match.

The length of the output pulse reflects to a certain degree the number of matching correlators as the voltage level at the combiner line follows a standard charging curve as the line acts like a capacitor. As the steepness of this curve depend on the amount of current charging the line, the voltage level has a shorter rise time if the number of contributing correlators are large.

The choice of comparator topology was made on the basis of a few preferences, as structure simplicity, low power consumption and range of adjustment. A schematic figure of the chosen comparator is shown in figure 5.6.

The comparator is a complementary differential amplifier with each nMOS operating in a push-pull fashion with a corresponding pMOS. As the amplifier also is self-biased through negative feedback, the resulting performance of this circuit configuration is improved compared to conventional differential amplifiers. The main performance enhancements are less sensitivity of active-region biasing to variations in processing, temperature and supply. The capability of supplying switching currents are also significantly greater than the quiescent bias current. The result is a differential amplifier with a simple structure capable of operating at high speed with good precision [26].
The circuit is biased in a stable fashion through self-biasing of devices $M_1$ and $M_2$ by connecting the bias-voltage inputs to the internal node $V_{bias}$. This self-biasing creates a negative feedback loop that stabilizes the bias voltages. Any variation in operating conditions result in a shift of the $V_{bias}$ voltage that corrects the bias voltages through the negative feedback [26].

In this case the differential amplifier performance should be sufficient, with regard to speed and range of adjustment of the comparator. With some additional pulse shaping and driving capability in the two output inverters, the total comparator structure should provide the desired functionality. Simulations performed on the comparator is shown in figure 5.7.

The comparator was implemented with minimum sized transistors at the two inputs, with the pMOS transistors sized according to the difference in mobility explained in section 5.1. The top and bottom transistor were implemented with a width twice the width of the input inverters as these transistors had to handle the current from both the input inverters. The first inverter in the output buffer is implemented with minimum size while the second inverter was implemented with a width twice the minimum size in order to add some driving capability to the output. The layout of the comparator is shown in appendix B.

5.4 RAKE finger

The RAKE finger is a serial structure of RAKE elements or cells, as shown in figure 5.8. The RAKE cell is the main block of the RAKE receiver, consisting of no less than 2500 RAKE cells. The cell has two main tasks; to be an element in a shift register within the finger it is a part of, and to compare the present state of a latch output with a template state. There are two main parts that constitutes the RAKE cell, one D-latch and one correlator. A block schematic view of the RAKE cell is depicted in figure 5.9.

A D-latch with clock pulses on its control input is transparent as long as the pulse remains at the active level. This means that the state transitions of the latch continuous along the whole active period of the clock. Thus a master-slave function was required and a master-slave flipflop was implemented in order to achieve the desired functionality.

The D-flipflop and the extra inverter was implemented with only minimum sized transistors in order to minimize power consumption and the fact that for this part of the circuit the main focus was on functionality. At the required operating speed of 20MHz, the logic should be
Figure 5.7: The functionality of the comparator is shown with a 20 MHz input signal and a range of adjustment from 320 mV to 690 mV. The solid line is the input signal, which is a voltage going linearly up from zero to one volt in 50 ns, and then down to zero at 100 ns. The dashed line shows how the output responds to the changes in input voltage when the threshold voltage is set to 310 mV. The dotted line shows the output voltage response when the threshold is set to 690 mV. These two voltages determine the range of operation of this comparator. Note the hysteresis between the switching points at the positive and negative edge.

able to provide the desired functionality with minimum sized transistors. Simulations showed that the shift register had no problem handling frequencies above 100 MHz.

In appendix B the layout of the RAKE cell is shown. The RAKE cell were designed to make the construction of the RAKE fingers and eventually the whole matrix easy.

Simulations on a finger consisting of 50 RAKE cells are shown in figure 5.11 and 5.12.

5.4.1 Combiner line and pre-charging

The output current from each correlator are summed on a single wire by interconnection. This combiner line will be a somewhat large structure as it has to be long enough to include all 50 correlators. The size of the line structure will then function as a load capacitance as a result of the the parasitic capacitance of the metal line itself and the drain capacitances of the adjacent correlator and pull-down transistors. An
approximation on the total load is not easily carried out as the process parameters of the applied process has not been made available from the supplier.

The concept of the combiner line configuration is shown in figure 5.10. The circuit drawn is functioning like a wired current-limited AND-gate. When the inputs are low, a unit current $I_{\text{out}}$ is delivered as set by the bias voltage. The current drawn by one finger is then

$$I_{\text{finger}} = \sum_{j=1}^{N} c_j I_j, \ c \in [0,1]$$  \hspace{1cm} (5.1)

where

• $N$ is the length of the pseudo random sequence.
• $I_j$ is the unit current from each correlator.

The finger current, $I_{\text{finger}}$, is directly proportional to the degree of match between the stored pseudo random pattern and the received bit sequence. By matching the finger current with an appropriate pull-down current, the output will be high if and only if an appropriate degree of
match is present. Instead of having an constant flowing current, the pull-down is performed between each sampling initiated by an inverted version of the sample clock. The pull-down current provides a simple pre-charging to the combiner line, resulting in a lower power consumption as a constant current flow is avoided.

![Diagram of the combiner line with pre-charging](image)

Figure 5.10: The principle of the combiner line with pre-charging. $V_{\text{match}}$ corresponds to the connection to the output threshold line.

The pre-charging is performed by two nMOS transistors in series and like the correlator output transistors, the function is like a current-limited AND-gate. The top transistor runs on an inverted version of the sampling clock, in order to pull down the combiner line between sampling, while the bottom transistor is for current limitation. As for the correlator, the chosen configuration with the switching transistor close to the correlator is due to noise precautions. High frequency variations in the power supply was undesired, consequently the combiner line is exposed to the noise.

The size of the pre-charge transistor were required to be somewhat large since these transistors had to pull all the charge from the combiner line in half a clock period. To ensure the functionality of the system, the pre-charge transistors were designed with a width of $5\mu m$. According to simulations a width of $0.5\mu m$ was enough to provide the functionality, but due to the uncertainty regarding the load at the combiner this width was increased by a factor of ten. The width of these transistors is not having any direct effect on the power consumption as the function of the pull-down circuit is to pull the voltage level of the combiner line down to zero no matter what level it has. $5\mu m$ should be more than sufficient to provide the desired function.

Simulations on the circuit is shown in figure 5.11 and 5.12. The figures shows the output from a simulation of a RAKE finger with the same
5.5 System implementation

set up, however with different bias voltages applied to the pMOS in the correlators. The comparators threshold voltage is here set to 600mV. The top curve shows the clock, while the curve in the middle shows the voltage level of the output line. The bottom curve shows the output from the comparator. The shift register with the template has been filled with only ones and the correlator is set to correlate on only ones. The input of the finger is set to the positive power rail. This result in a signal of only ones traveling one step in the RAKE finger shift register each clock period. For each period we then get an increment in the number of correlators contributing to pull up the output line. As expected the mixed-mode circuits are handling the modest frequency requirement of the spatial RAKE receiver. The figures show that a range of adjustment on the number of matching correlators required to achieve a response from the comparator. Corner simulations performed on the RAKE finger confirms its functionality with regard to process variations. The result of the simulations is shown in figure 5.13. It is clear that the different corners in the simulations affect the bias voltage necessary for the desired percentage of match. The simulations also show that the circuit performance depend most on the properties of the pMOS. A slow or fast pMOS has a larger effect on the performance than a slow or fast nMOS.

Figure 5.11: *This is a simulation showing the functionality of one RAKE finger. In this simulation the correlator bias voltage was set to 650mV.*

It was not possible to perform Monte Carlo simulations on the RAKE finger due to the time it took to simulate one iteration. A full Monte Carlo simulation would take months. Additional simulations are shown in appendix C.

5.5 System implementation

The overall structure is a somewhat complex structure consisting of a matrix of 2500 cell elements and a delay line. The cell elements is or-
5.5 System implementation

Figure 5.12: This simulation shows a result of the same set up. The correlator bias voltage was set to 700mV.

organized in a finger structure of 50 fingers. The fingers were constructed in such a way that they could be folded together two by two giving 25 “double” fingers, making power supplying of the structure easier. The Vdd-power rails were supplied from the right side while the ground-rail were supplied from the left side. In addition the design had to include a shift register for storing the PN-sequence for correlation.

There was also concerns regarding the routing of the different components and the total system. All routing were done in order to minimize area of overlap between the different metal layers and thus minimizing coupling noise. Routing of the power rails were done according to the finger structure with vdd and ground connections entering the circuit from opposite sides.

In order to reduce the number of pads required 48 of the outputs were multiplexed in six 8-to-1 multiplexers (MUX’s). The MUX’s then had to handle a shifting frequency of 20MHz × 8 = 160MHz. Pre-designed MUX’s were available in a library provided by the supplier of the design kit, however no specifications regarding performance were provided.

Buffers capable of driving a load of 1pF were designed in order to provide driving capability to the output pads, which according to the documentation provided a load of up to 0.85pF for some of the pads. Layout of these buffers is shown in appendix B.

The RAKE-receiver contains over 103000 transistors performing a significant number of operations per second (ops). The sampled signal enter the matrix at a rate of 20 MHz resulting in a number of 50 correlation operations per finger per clock period. As the number of fingers is 50, this result in a total number of 2500 ops per clock period, leading to a number of 5 Gops. Main specifications of the circuit is presented in table 5.2.
Figure 5.13: Corner simulations on the RAKE finger. It is clear that the different corners in the simulations affect the bias voltage necessary for the desired percentage of match. The simulations also show that the circuit performance depend most on the properties of the pMOS. A slow or fast pMOS has a larger effect on the performance than a slow or fast nMOS.

The size of the complete receiver structure is 993 µm by 682 µm. Figure 5.14 show a picture of the RAKE-receiver. On the left side the delay line can be seen, while on the right side the six multiplexers and the output from the comparators are easily observed.

Some additional circuits were added for verification purposes, like buffered outputs from the delay line, template shift register and the combiner line at the top finger.

5.6 Summary

There are a lot of concerns to be considered in designing a structure like the RAKE receiver. At this point the main concern was the overall functionality at an operating speed of 20 MHz, with the possibility of
5.6 Summary

<table>
<thead>
<tr>
<th>Characteristic</th>
<th>Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Layout area</td>
<td>993μm x 682μm</td>
</tr>
<tr>
<td>Chip area</td>
<td>1.8mm x 1.3mm</td>
</tr>
<tr>
<td>Technology</td>
<td>0.12μm Standard CMOS process</td>
</tr>
<tr>
<td>Number of transistors</td>
<td>over 103000</td>
</tr>
<tr>
<td>Supply voltage</td>
<td>1V- 1.2V</td>
</tr>
<tr>
<td>Operations /second</td>
<td>5Gops</td>
</tr>
<tr>
<td>Power consumption of 50x50 matrix</td>
<td>~ 50μW</td>
</tr>
<tr>
<td>Power consumption of delay line</td>
<td>~ 1.5mW</td>
</tr>
<tr>
<td>Sampling clock frequency</td>
<td>20MHz</td>
</tr>
</tbody>
</table>

Table 5.2: Table of the circuit specifications. The maximum power consumption is in the case of match at every correlator and a delay line running at 1 GHz.

increasing the sampling frequency. The components were designed with this in mind. The design of the different layout blocks was also carried out with respect to area and power consumption. As the devices are replicated in large numbers an increase in the size from the minimum necessary size of a cell, would lead to an multiple increase in the overall size. The implementation of the correlator matrix made the design of a cell structure necessary. Putting these cells together to make up a RAKE finger made the design of the overall structure easier. The correlator matrix system is performing a number of 5 Gops consuming approximately 50μW. This is a large number of operations per Watt compared to circuits implementing Digital Signal Processing (DSP) solutions.
Figure 5.14: *Picture of the RAKE-receiver layout.*
Chapter 6

Measurements

In this chapter the measured results of the implemented and fabricated chip is presented. In addition a Print Circuit Board (PCB) designed and manufactured for testing and measurement of the prototype chip is presented.

6.1 PCB design

In order to perform measurements on the chip, a PCB had to be designed and manufactured. In addition components capable of handling the frequencies going in and coming out of the chip had to be purchased.

The components required for testing of the RAKE receiver included crystal oscillators, frequency multipliers and dividers, level shifters, shift registers and latches, voltage regulators and a Matched Impedance Connector (MICTOR) to make connection to a Field Programmable Gate Array card possible. The external components used on the PCB is briefly described in table A.1.

The task of the frequency multiplicators and dividers was to provide the correct frequencies at the clock and address inputs of the MUX’s. The level shifters was required to shift the operating high levels of the different components. Shift registers and latches provided the necessary demultiplexing required to read out the data from the correlator matrix. The MICTOR connects the PCB to a FPGA Personal Computer Interface (PCI) co-processing card making read-out of the data on a computer possible.

It was desired to test the functionality at sample clock frequencies up to 60MHz. By using frequency multipliers and dividers 20MHz, 40MHz
6.2 Measurement results

and 60MHz sample clock was obtained. As the RAKE receiver is sensitive to frequency drift of the sample clock, the two crystal oscillators providing the sample clock frequency was purchased with this in mind. The specifications of the two crystal oscillators with regard to frequency drift are ±0.5ppm for the 40MHz oscillator and ±50ppm for the 60MHz oscillator, which is quite accurate, especially for the former one.

6.2 Measurement results

The measurements presented in this section show the basic functionality of the correlation process.

The measurements were affected by noise, both high frequency and periodic 50 Hz. This put limitation on the performance, especially the ability to send signals on the input receiver. The presented results were therefore obtained using a setup where the data input were controlled by the template shift register. The correlators were set to correlate on ones. The result is a waveform containing 20MHz output pulses at the event of match. In this setting this occur when the input data is high.
Figure 6.1: Measurement of the shift register functioning at 60 MHz. This was the maximum frequency the shift register could handle without introducing error. The input clock is shown at the top and the input data is shown at the bottom. In the middle the output from the shift register is shown.
Figure 6.2: Two measurements of single match pulses. The top waveform show the pulses at the output of a RAKE finger in case of a match. The sampling frequency of 20 MHz is clearly shown. In the bottom waveform the bias voltage is adjusted to a slightly lower level than the level in the above waveform. This means that the contribution from each correlator is larger yielding the voltage level at the combiner line to increase faster. The result is a wider pulse at the output.
6.2 Measurement results

Figure 6.3: The function of the correlators is shown. In this measurement the template shift register is fed with the frequency in the bottom waveform. The correlator shift register is filled with ones. The output from the correlators is shown in the top waveform indicating a match when the template shift register contains ones. The square pulse shapes observed in the bottom waveform is composed of a large number of single match pulses, as shown in figure.
Figure 6.4: This figure shows the result of a measurement performed with the correlator sampling frequency at 40 MHz. The bottom waveform is the result of a simulation performed with the same setup for comparison. The data rate in the simulation is higher, but confirms the appearance of the measured pulses in this setting. The square pulse shapes observed in the bottom waveform is composed of a large number of single match pulses, as shown in figure.
6.3 Summary

Figure 6.5: This figure show the output response with a clock rate of 1 MHz at the template shift register, in this case the data rate is 100kHz. The bottom waveform clearly show the widening of the output pulse train as a result of the time it takes to fill and empty the shift register. As a match is obtained at the output with only a few ones in the shift register, the output will indicate a match after a few clock pulses. As the input fall to a low level it takes time to shift the remaining ones through the shift register, resulting in an output pulse train that is wider than the input pulse.

6.3 Summary

In this chapter measurement showing the basic functionality is shown. The correlator function is clear, however significant noise impeded the measurements, indicating that further noise precautions should be included in a future design. It could be interesting to move the measurement setup to another location as the noise conditions at the present location could be one reason for the noise problems.
6.3 Summary
Chapter 7

Discussion

In this thesis a spatial RAKE receiver for UWB-IR applications has been presented. A prototype chip has been designed, implemented and fabricated. Publications of this work including the front-end is shown in appendix E.

The thesis describes theoretical background together with some existing solutions for a UWB-IR receiver including the design and implementation of the presented RAKE receiver.

The theoretical background consist of a characterization of the channel effects that influence the reception of UWB-IR signals, together with a description of modulation techniques used to improve symbol detection. In chapter 4 some theory on multiple access interference were described along with common receiver architectures for UWB-IR receivers. The proposed RAKE receiver were presented with the main principles of functionality. In chapter 5 the design and implementation of the different building blocks is described with simulation showing the functionality of the circuit topologies. Chapter 6 provides the measured results from the circuit.

The progress of the project provided a lot of challenges. The 0.12μm process was not available to me until September 2004. The initial design and simulation were therefore carried out in a 0.35μm process from Austria Microsystems (AMS). These simulations confirmed the need for a faster process with respect to the high frequencies involved in the signal processing. With respect to the functionality of the RAKE receiver some simulations were carried out and some power consumption estimates were provided. The result of these simulation shows a higher power consumption than expected. However, better performance has been achieved with the same process indicating that the models used in the AMS Hit-Kit were somewhat conservative and did not serve our pur-
pose very well. The power estimates obtained through these simulations are provided in appendix C.

As the 0.12\(\mu\text{m}\) process were installed, the design, simulation and eventually layout of the RAKE receiver could begin. It soon turned out that the design kit provided by Circuits Multi-Project (CMP) were not compatible with the design tools, for example, extraction of parasitics were not supported for Diva extraction. During the design process no parasitics has been extracted, and no simulation has been performed on the layout. In addition design parameters and design rules were not provided, making the layout design a long and tedious process as CMP demanded zero Design Rules Check (DRC) errors.

However, aiming at an operating frequency of 20 MHz for the correlator matrix, the influence of parasitic capacitances would not be critical for these parts of the circuit. Minimum sized transistors were used where the simulations showed that it provided the necessary functionality reducing the total area of the circuit. As an increase in size in the area of one cell would be multiplied by 2500 it was desired to minimize the size. Of course the major contributor with respect to the size was the correlator, as can be seen in the layout of the RAKE cell in appendix B.

The correlator circuit was designed to hold additional functionality resulting in a lot larger and more complex structure than the three transistor solution (see figure 5.3) proposed as a probable solution for a future receiver. This of course affected the area and power consumption. Simulations carried out on a RAKE finger implemented with the three transistor correlator show that the power consumption was 7-8\% less for this solution. This is probably due to the dynamic power consumption properties of DCVSL logic. Even though the static power consumption for DCVSL logic is eliminated, the dynamic power consumption is high. This is however not a major issue for the circuit as future versions of the RAKE receiver not will require the additional functionality of the applied correlator solution.

With respect to size the area of the correlator made up about 40\% of the total RAKE cell area. This means that by applying a simpler correlator approach, the total receiver area can be reduced significantly.

The pull-up and pull-down of the combiner line voltage level provided by the correlators and the pull-down transistors organized with the switching transistors close to the combiner line and the bias transistor close to the power rail. This may, however, not be an optimal solution. The initial reason for this constellation was to shield the power rails from switching noise, introducing the combiner line to the noise instead. Whether this is the reason for some of the noise in the measured results is not
known. It is usually desired to minimize noise on the power rails since this may cause problems in other circuits connected to the same power.

The width of the pull-down transistors was designed with ten times the width required according to the simulations. This was a precaution as an insufficient pull-down of the combiner line would cause the receiver to malfunction.

The comparator circuit seemed as a simple solution serving its purpose with its functionality confirmed by measurements. In a future implementation of a RAKE receiver it might be desired to design a permanent threshold circuit, but the presented comparator is a good candidate as long as there is a need for some adjustment of the threshold voltage.

The presented delay line is not an optimal solution. The line of inverters consumes a lot of power compared to the rest of the circuit, and a solution with longer transistors inside each element would be preferred. Monte Carlo simulations were performed to see the effect of the process, mismatch and temperature. This initial simulation showed that the time delay were within the 1ns delay. However, it turned out later that Monte Carlo simulations performed in the 0.12µm Hit-Kit provided by CMP only took into account mismatch in the computation. Monte Carlo simulations had to be carried out in each process variation corner. These simulations performed at a later time showed that the worst case unit time delay was longer than 1ns. This was not desired and a future delay line design has to take into account results from Monte Carlo and corner simulations. The design was insufficiently simulated before implementation due to the limited time imposed by the deadline for submission of the chip for fabrication.

An improved delay element has been designed with simulations showing a decrease in power consumption of about 60% simply by using two slow inverters with a length of 0.5µm and 1µm minimum width, between two faster minimum sized inverters for pulse shaping. Simulations performed on the improved delay element is shown in section C.1. Monte Carlo simulations performed on each corner show that the performance of the improved delay element with respect to the time delay spreading with process and mismatch variations, is within the 1ns maximum time delay as desired. Further improvement of the power consumption could be obtained by using a poly line between the two inverters at the input and output of the delay element.

The implementation of the complete system was done as well arranged as possible with the designed layout blocks. The routing was done with noise precautions in mind to minimize coupling noise. The chip was fabricated at CMP and returned in April.
The measurements performed on the circuit confirms the functionality of the RAKE fingers with its correlators, pull-down transistors and comparators. As expected the mixed-mode circuits are handling the modest frequency demands. However, the simulations on the shift register showed that a receiver with this type of construction will have a problem functioning with a sampling frequency higher than 60 MHz. The measurements was also influenced by noise, indicating that further noise precautions should be included in a future receiver. Shielding of the circuit both on-chip and externally is some improvements that could be suggested.

It is also possible that the location at which the measurements were performed could have had some influence on the measurements due to noise conditions. It could therefore be interesting to perform measurements on the circuit in a less noisy environment.

A conclusion on the choice of modulation scheme for the approach presented in this thesis was made in the summary section of chapter 3 to be a DS-UWB approach employing BPAM with hard decision detection. The orthogonal RAKE receiver presented employ a DS-UWB code synchronization approach, but does not include any processing of the biphase information in BPAM coding. The lack of biphase processing imply that the receiver only detect whether there is a pulse or not, which correspond to a OOK coding approach. Biphase detection should be pursued in a future receiver design.

As soon as the chip design were submitted, the work on the PCB started. Components capable of handling the present conditions had to be found and purchased. Footprints of the components were designed and a PCB were fabricated at Elprint. So far the oscillators and the frequency multipliers and dividers have proved their function.

The work on the PCB were more comprehensive than expected, thus took a longer time to finish than planned. As the measurements were initiated it turned out that the PCB needed some modification. The modification were performed by Nor-Team, as the available equipment was not sufficient for such fine pitch operations.

Additional measurements should be performed to further confirm the performance and limitations on the RAKE receiver circuit. Data recovery should also be attempted through a PC mounted FPGA PCI co-processing card.

The project has been challenging, but most of all it has been an informative and educational process working in a exiting research area.
Chapter 8

Conclusion and proposal for future work

8.1 Conclusion

In this thesis a low-power solution for time domain processing UWB receivers has been presented. The receiver presented is a mixed-mode RAKE structure combining digital delay lines with analog computation in a series of parallel taps of a synchronizing delay line. The use of this delay line imply that the need for absolute synchronization with a transmitter is avoided. In each parallel bit stream a simple analog correlation operation perform a real-time convolution with a stored template. High-speed clocks are avoided with estimated maximum clock rate at 20MHz, enabling power efficient implementation. By combining a delay line and a mixed-mode correlator we explore multipath reflections obtaining symbol probability matching using Kirchoff's current law. The RAKE receiver is implemented and realized in standard 0.12µm CMOS technology with simulations showing promising results and measurements confirming the fundamental functionality.

The main contribution of this thesis is the proposed orthogonal RAKE architecture. The discoveries and ideas of the author are presented in chapter 4 and 5. The proposed structure is a low power solution offering great real-time computational capability using simple mixed-mode structures. DSP approaches performing the same amount of operations per second would consume a lot more power than the presented solution. There are still issues to be addressed but the overall structure should provide a power efficient alternative for UWB-IR receivers.

A paper presenting this receiver has been accepted for publication at the International Conference on Ultra Wideband (ICU) this year. In addition
one paper presenting the principal behind power efficient UWB receivers was published at the Workshop on Wireless Circuits and Systems (WoWCAS) in 2004. A third paper based on the front-end is pending publication at Norchip later this year. The publications on this work is shown in appendix E. All in all the presented approach should provide a substantial contribution to this field of research.

8.2 Future work

First and foremost additional measurements should be performed on the presented RAKE receiver in order to obtain further information on the performance and limitations of this RAKE receiver circuit. A program for the FPGA co-processing card is under development and it would be interesting to attempt data recovery of the whole matrix using the implemented MUX's.

A future RAKE receiver should be implemented with an improved delay line. The correlator circuit should also be implemented with an improved and simpler solution. The structure of such a circuit depend on whether it is desired to look at one or both binary levels, which again might depend on the application. It could also be interesting to implement some weighting function in the correlation process, as there exist simple weighting circuits that could be included without severely increasing the complexity of the circuit.

Another proposal of improvement could be to include some biphase processing with regard to the antipodal BPAM modulation scheme. BPAM could provide performance improvement compared to OOK in a multi-user environment.

A possible approach to a future receiver is the Partial-RAKE (PRAKE) architecture. The PRAKE involves reducing the number of fingers by combining the first propagation paths, based on the assumption that the first multipath components typically will be the strongest and contain most of the received signal power. This may be possible in if the interesting bits are occurring repeatedly in the same position of the delay line. However, this approach introduces synchronization issues.

Provided synchronization is possible, this approach has distinct advantages. The number of fingers may be reduced to taps at the lower part of the delay line, thus the earlier taps are not used and may simply be removed. Consequently, the power consumption is reduced. In addition we may reduce the transmitted data rate by increasing bin length and still use the same receiver topology.
Appendix A

Measurement setup

In this appendix a description of the measurement setup is shown including the instruments used for the measurements and the components used on the PCB to achieve the necessary functionality with regard to clock frequencies and signal processing.

The prototype chip was tested using a PCB designed specifically for this chip. We encountered a few problems during testing and some modification on the PCB were necessary in order to obtain full functionality of the PCB. A picture of the PCB is shown in figure A.1.

![Picture of the PCB](image)

Figure A.1: Picture of the PCB

The instruments used for obtaining measurements are:

- **Agilent infinium 54855A DSO** 6GHz/20GSa/s Oscilloscope
- **Agilent 54624A** 100MHz/200MSa/s Oscilloscope
<table>
<thead>
<tr>
<th>Part</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CDCM1804</td>
<td>Clock frequency divider</td>
</tr>
<tr>
<td>CDC5801</td>
<td>Clock multiplier and divider</td>
</tr>
<tr>
<td>74LVC4245A</td>
<td>Translating transceiver</td>
</tr>
<tr>
<td>AD8170</td>
<td>Multiplexer</td>
</tr>
<tr>
<td>FXL4245</td>
<td>Signal translator</td>
</tr>
<tr>
<td>MICTOR</td>
<td>152-pin Matched Impedance Connector</td>
</tr>
<tr>
<td>MC100E241</td>
<td>8-bit Scannable register</td>
</tr>
<tr>
<td>74ABT16373B</td>
<td>16-bit transparent latch</td>
</tr>
<tr>
<td>LM117</td>
<td>Voltage regulator</td>
</tr>
<tr>
<td>CDC5801</td>
<td>Clock frequency multiplier</td>
</tr>
<tr>
<td>CFPT-9006</td>
<td>TCXO crystal oscillator 40MHz</td>
</tr>
<tr>
<td>CFPS-72</td>
<td>SPXO crystal oscillator 60MHz</td>
</tr>
</tbody>
</table>

Table A.1: *Table of components on the different components used on the PCB*

- **Agilent 33250A** 80MHz Function/Arbitrary Waveform Generator
- **Agilent 33220A** 20MHz Function/Arbitrary Waveform Generator
- **Agilent E3631A** Triple output DC power supply
- **Hewlett Packard 85024A** 300kHz - 3GHz High Frequency Probe
- **Fluke 189** Multimeter

The software used throughout the project are:

- **Cadence 5.0.0** for initial design and simulation in the AMS 0.35μm process.
- **Cadence 4.4.6** for simulation, plotting and schematic and layout design of the chip.
- **Cadence Concept** for schematic design of the PCB.
- **Cadence Allegro** for layout design of the PCB.
- **LyX** for documentation
- **Illustrator 10** for drawing illustrations.
- **Illustrator CS** for drawing illustrations.
- **Matlab 7.0** for measurements and plotting.
The manufacturers involved in the production and modification of chip and PCB:

- **CMP** for chip production and Hit-Kit supplier.
- **ST Microelectronics** involved through the use of their 0.12\(\mu\)m 8-metal layer standard CMOS process.
- **Elprint** used for production of FR-4 four layer PCB.
- **Nor-Team** used for modification of PCB.

Due to the limited time for measurements only the basic functionality is obtained. The measurement setup is pictured in figure A.2.

![Figure A.2: Pictures of the measurement setup](image)
Appendix B

Layout blocks

In this appendix the layout of the main building blocks of the RAKE receiver is shown.

Figure B.1: *Layout of one delay element. The element consist of two rows of 15 inverter each giving a delay of 1 ns per element. The total delay line consist of 49 of these elements.*
Figure B.2: *Layout of the whole delay line*
Figure B.3: *Layout of the correlator*
Figure B.4: Layout of the comparator circuit.

Figure B.5: Layout of the RAKE cell. To the left the D-latch with its NAND-gates and two inverters is seen, and to the right the correlator circuit can be observed.
Figure B.6: These figures show buffers that are used in the circuit for driving capability. The largest one is shown in the upper right figure is used as a supplement to the buffer in the bottom figure. The buffer shown in the upper right figure has a width of 64μm, and was designed to be capable of driving a load of 1pF at a frequency of 1 GHz.
Figure B.7: Picture of the chip with the RAKE receiver and front-end. The RAKE-receiver is easily observed on the same silicon as a front-end circuit. The front-end is seen in the upper left corner with its decoupling capacitors as black squares. The total size of the chip is 1792µm by 1333µm.
Appendix C

Additional simulations

In this appendix some additional simulations are presented including simulations on an improved delay element.

Figure C.1: This simulation shows the response of the delay element with an input frequency of 5GHz.
Figure C.2: This simulation shows a result of a Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Fast nMOS, Fast pMOS corner at temperatures of 27°C, shown at the left figure, and -50°C, shown at the right figure.

Figure C.3: This simulation shows a result of a Monte Carlo simulation of the delay time of one delay element. The simulation shown in the left figure is performed at the Slow nMOS, Fast pMOS corner. The simulation shown in the right figure is performed at the Fast nMOS, Slow pMOS corner shown at the right figure.
C.1 Improved delay element

As it became clear that the designed delay element were not optimal, an suggested improved delay element were designed. This element contained four inverters, where the first and the last element where minimum sized inverters where the pMOS transistors was compensated for the difference in mobility with a width of $0.5\mu m$ as in the original delay element. The two middle transistors had the same width, but a larger length. The transistor close to the input inverter had a length of $0.5\mu m$, whereas the inverter close to the output inverter had a length of $1\mu m$. Monte Carlo simulations were performed on this constellation showing that the time delay were within the 1ns limit as desired. The simulations are shown in figure C.5. The simulations only covered the corners regarding speed as the delay element perform within these corner when it comes to speed, which is the important corners for the delay line.

The power consumption of this delay element was approximately $9.3\mu W$, which is less than 30% of the original delay element.

Figure C.4: This simulation show a result of a Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Slow nMOS, Slow pMOS corner at temperatures of 27°C, shown at the left figure, and 100°C, shown at the right figure.
Figure C.5: This simulation shows the Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Fast nMOS, Fast pMOS corner at temperatures of 27°, shown at the left figure and -50°, shown at the right figure.
C.2 Simulations performed in the AMS 0.35\(\mu\text{m}\) process

In this section simulation on inverters in the AMS 0.35\(\mu\text{m}\) process are shown. From these simulation it became clear that the models used in this design kit was conservative and did not serve our purpose very well.

Figure C.6: A simulation of the response time of inverters. Inverters with two different sizes are shown, with one inside a delay line constellation.

Figure C.7: A simulation of the power consumption of inverters. Inverters with two different sizes are shown, with one inside a delay line constellation.
C.2 Simulations performed in the AMS 0.35\(\mu m\) process
# Appendix D

## Chip pin overview

<table>
<thead>
<tr>
<th>Description</th>
<th>Pin #</th>
</tr>
</thead>
<tbody>
<tr>
<td>Analog line out finger 50</td>
<td>8</td>
</tr>
<tr>
<td>Template shift register output</td>
<td>9</td>
</tr>
<tr>
<td>Output finger 50</td>
<td>12</td>
</tr>
<tr>
<td>Output finger 49</td>
<td>15</td>
</tr>
<tr>
<td>MUX address A</td>
<td>18</td>
</tr>
<tr>
<td>MUX address B</td>
<td>19</td>
</tr>
<tr>
<td>MUX address C</td>
<td>20</td>
</tr>
<tr>
<td>Output MUX 6</td>
<td>21</td>
</tr>
<tr>
<td>Output MUX 5</td>
<td>22</td>
</tr>
<tr>
<td>Output MUX 4</td>
<td>23</td>
</tr>
<tr>
<td>Output MUX 3</td>
<td>24</td>
</tr>
<tr>
<td>Output MUX 2</td>
<td>25</td>
</tr>
<tr>
<td>Output MUX 1</td>
<td>28</td>
</tr>
<tr>
<td>Vdd RAKE</td>
<td>29</td>
</tr>
<tr>
<td>Threshold voltage comparator</td>
<td>32</td>
</tr>
<tr>
<td>Bias voltage pre-charge</td>
<td>33</td>
</tr>
<tr>
<td>Output finger 1</td>
<td>34</td>
</tr>
<tr>
<td>Bias voltage correlator</td>
<td>35</td>
</tr>
<tr>
<td>Control input voltage level</td>
<td>36</td>
</tr>
<tr>
<td>Sample clock input</td>
<td>37</td>
</tr>
<tr>
<td>Input RAKE</td>
<td>38</td>
</tr>
<tr>
<td>Ground</td>
<td>78</td>
</tr>
<tr>
<td>Output delay line</td>
<td>80</td>
</tr>
<tr>
<td>Input finger 49 and 50</td>
<td>81</td>
</tr>
<tr>
<td>Data input shift register</td>
<td>82</td>
</tr>
<tr>
<td>Clock input shift register</td>
<td>83</td>
</tr>
</tbody>
</table>
Appendix E

Publications

Tor Sverre Lande, Dag Wisland, Claus Limbodal and Kjetil Meisal
CMOS UWB receivers

Claus Limbodal, Kjetil Meisal, Tor Sverre Lande and Dag T. Wisland
A Spatial RAKE Receiver for Real-Time UWB-IR applications
Accepted for publication at IEEE International Conference on Ultra Wideband, ICU 2005

Kjetil Meisal, Claus Limbodal, Tor Sverre Lande and Dag T. Wisland
CMOS impulse radio front-end
Prepared for submission at Norchip 2005
CMOS UWB receivers

Tor Sverre Lande, Dag Wisland, Claus Limbodal and Kjetil Meisal

CMOS UWB receivers.

Tor Sverre Lande, Member, IEEE, Dag Wisland, Claus Limbodal, Kjetil Meisal
bassen@ifi.uio.no

Abstract—Ultra Wide Band systems are hard to implement in standard CMOS. In this paper we propose a spatial RAKE receiver using analog computation for symbol detection and inverter delay lines synchronization.

I. INTRODUCTION

The purpose of this paper is to explore UWB techniques trading lower transmitted bandwidth for low power implementation. Mixed-mode practical structures are developed suitable for implementation in standard CMOS technology.

II. UWB POWER TRADE-OFFS

In current UWB transceivers the major power consumption is the symbol recovery in the receiver. A typical UWB receiver is shown in Figure 1. The transmitted monocyte is received in a broadband antenna. After some coarse bandpass filtering (not shown) the monocyte is matched with a template. Since the emitted energy is severely restricted, the UWB signal is virtually buried in white noise. Through integration monocytes are recovered and quantized (ADC).

![Figure 1: Generic UWB receiver](image)

With significant noise present and interfering transmissions from other sources, several erroneous detections will occur. In order to transmit a symbol ("0" or "1") a number of monocytes must be combined for one symbol. Periodic repetition of emitted monocytes is impossible due to FCC regulations, so pseudo random sequences are used for symbol encoding.

The faint monocyte is also reflected by the surroundings giving delayed copies or reflections. In narrow band systems this kind of interference is destructive (fading). In UWB technology multipath pulses are explored constructively to reconstruct symbols. The pulse sequence is correlated in the time domain with the expected pseudo random pattern. With the presence of reflections, the pseudo random pattern will be repeated. If several parallel correlations are done on different delayed version of the received bit sequence, reflections may be used constructively to detect the transmitted symbol, (RAKE receivers) [1].

Although a large number of multipath pulses are received for one emitted pulse [2], usually 2-3 “fingers” are used in the RAKE receiver. The main reason is the increased complexity with DSP-based RAKE implementation.

At the finest resolution a monocyte is typically integrated over a short timeslot like 1ns (Figure 2). With high energy pulses, several reflections or multipath components occur and a pulse window or bin of typically 10-50ns is used to "collect" all the emitted energy. Finally pulse detections of 50-100 bins are combined as a pseudorandom sequence to identify one symbol. A symbol encoding time is typically in the order of 500ns.

![Figure 2: UWB timescales](image)

Our aim is to explore mixed-mode circuits to implement a real-time RAKE correlation finger and digital delay lines to reduce clock frequency two orders of magnitude.

III. THE SPATIAL RAKE ARCHITECTURE

Each bin may contain a number of quantized monocytes. The maximum number of monocytes is varying significantly due to environmental conditions. According to [2] the strongest component usually occurs early in the observed bin. Although a significant number of reflections may occur, they tend to have damped energy.

The received signal is integrated with a short time constant reducing white noise. A simple threshold element like a comparator is used for pulse detection leaving us with a bit sequence with a rate of approximately f_{MAX}=1GHz. Although GHz rate hardware is feasible in CMOS, great care and advanced (expensive) fine-pitch processes must be utilized. When a DSP is used, even higher clock frequencies is required to keep up with the arriving data.

Aiming at GHz operation the simplest, fastest and most compact element in CMOS is explored: the CMOS inverter. A cascade of 2-10 inverters should span the duration of a
monocyte (depending on technology) and the bit sequence of an observation bin may be “stored” with approximately 500 cascaded inverters. By “trapping” or sampling the inverter delay line with a clock rate equal to the bit rate ($f_{bit}$) the detected monocytes within one bin is available simultaneously. The good news is that the bin rate is typically only 1/50 of the input bit rate giving a sampling clock of approximately only 20MHz.

The penalty of reduced clock rate is a number of parallel bit streams for each tap of the delay line. These bit streams are representing both the original monocyte and possible multipath reflections. Each bitstream should be inspected for occurrence of a pseudo random bit pattern encoding our symbols (usually only two ‘0’ and ‘1’). The number of bits in a pseudo random sequence is typically 50-100. These bit patterns are stored in a shift register connected to each tap on the delay line.

As suggested in Figure 3 the bit sequence stored in the shift registers is cross-correlated with a stored pseudo random sequence. As bits are shifted through the shift register, a running cross-correlation is done against the stored random sequence. Since we have two symbols, correlators and code registers must be duplicated. Since the fingers are holding the bissequence of the different timelines of the bin, we have synchronized the different delayed versions of the bi-sequence. We are now able to compute the degree of combined match by simultaneously combine the correlation output from all the correlators. However, we need one combiner for each symbol.

By exploiting spatial “maps” using delay lines we are able to trade parallel correlation for lower clock frequency. The question is how this again may be implemented to achieve overall lower power consumption.

Figure 3 the orthogonal RAKE topology

IV. IMPLEMENTATION

The full RAKE receiver structure proposed in Figure 3 is after all a considerable matrix of correlators and shift registers. A typical number could be 30x50x2 which is something like 5000 correlators. The D-atches is just a handful of gates and 2500 is manageable.

The computation involved is statistical in the sense of finding a probability of symbol occurrence. Striving at lower power we may return to analog computation where a current-mode correlator may be implemented very simply using 3 nMOS transistors.

![Figure 4 the correlator circuit](image)

The current drawn by one finger is then

$$I_{finger} = \sum_{j=1}^{N} c_j I_j$$

$N$ is the length of the pseudo random sequence, $c_j \in \{0,1\}$ and $I_j$ is the unit current from each correlator. The finger current, $I_{finger}$, is directly proportional to the degree of match between the stored pseudo random pattern and the received bit sequence. Simply by matching the finger current with an appropriate pull-up current, the output will be low if and only if an appropriate degree of match is present.

The combiner may be implemented simply by OR-ing together the decisions from each finger. More elaborate procedures might also be implemented using some kind of weighting function. Some smoothing may also be necessary matching the maximum symbol rate.

V. CONCLUSIONS

The UWB receiver architecture presented in this paper is combining digital delay lines with analog computation to implement a full RAKE receiver in CMOS. High speed clocks are avoided with estimated maximum clock-rate at 20MHz. This is enabling power efficient implementation in standard CMOS. Real time correlation is implemented with simple analog correlator circuit and symbol probability matching is computed using Kirchoff’s current law. Initial simulations are promising and silicon implementations are underway.

REFERENCES


A Spatial RAKE Receiver for Real-Time UWB-IR applications

Claus Limbodal, Kjetil Meisal, Tor Sverre Lande and Dag T. Wisland

Accepted for publication at IEEE International Conference on Ultra Wideband, ICU, Sept 5-8 2005
A Spatial RAKE-Receiver for Real-Time UWB-IR Applications

Claus Limbodal, Kjetil Meisal, Tor Sverre Lande, Member IEEE, Dag Wisland, Member IEEE

Abstract—Ultra Wideband systems are hard to implement in standard CMOS technology. In this paper we present a novel spatial RAKE-receiver, exploring mixed-mode circuits for symbol detection and inverter delay lines for synchronization. The receiver is implemented as a RAKE structure combining digital shift registers with analog computation in a series of parallel taps of a synchronizing delay line. In each parallel bit stream the incoming signal is cross-correlated with a stored template. By combining a delay line and a mixed-mode correlator we can explore multipath reflections in a time domain statistical computation for symbol recovery.

Index Terms—UWB, Low-power, RAKE-receiver, Impulse radio

I. INTRODUCTION

The concept of impulse radio (IR) has interesting properties. The wide transmission band makes penetration through different materials better than narrow band transmission. The lack of carrier may be traded for low power solutions provided a power efficient receiver may be implemented. Unlike narrow band radio, demanding statistical computation must be carried out. This is often done in a parallel (RAKE) architecture.

Although several portable applications are striving for higher bandwidth, there is however demands for short-range low bandwidth communication links like in wearable and implantable microelectronics. In several of these applications ultra low power is important. In addition other properties of impulse radio transmissions may be appreciated such as interference immunity and penetration.

The purpose of this paper is to explore low-power solutions for correlation-based impulse radio receivers. A mixed-mode RAKE-like structure is realized in a standard 0.12μm CMOS technology. Simulations are carried out and show promising results with regard to power consumption and overall functionality. Measured results are expected in short time.

II. UWB POWER TRADEOFFS

In current impulse radio receivers the major power consumption is the symbol recovery. A typical receiver is shown in Figure 1. The transmitted pulse is received in a broadband antenna. After some crude bandpass filtering (not shown) the impulse is matched with a template. Since the emitted energy is severely restricted, the UWB signal is virtually buried in white noise. Through integration pulses are recovered and quantized (ADC).

With significant noise present and interfering transmissions from other sources, several erroneous detections will occur. In order to transmit a symbol ('0' or '1') a number of pulses must be combined for one symbol. Periodic repetition of emitted pulses is impossible due to FCC regulations, so pseudo random sequences (PR) of pulses are used for symbol encoding. Typically 50-100 pulses are used for each symbol. Additional benefits with pseudo random coding is that both scrambling and channelization is achieved as well. PR sequences sparsely populated may enable a large number of simultaneous transmissions with minor interference (robust wireless link).

The faint monocyte buried in noise is also reflected by the surroundings giving delayed copies or reflections. In narrow band systems this kind of interference is destructive (failing). In impulse radio technology multipath pulses are explored constructively to reconstruct symbols. The pulse sequence is correlated in the time domain with the expected pseudo random pattern. With the presence of reflections, the pseudo random pattern will be repeated. If several parallel correlations are done on different delayed version of the received bit sequence, reflections may be used constructively to detect the transmitted symbol. These receivers are called RAKE [1] receivers and are usually implemented with a number of "fingers", one finger for each correlation.

Although a large number of multipath pulses are reported for one emitted pulse [2], only 2-3 fingers are used in the RAKE receiver. The main reason is the increased computational demand when implemented on a DSP.

Aiming at lower power consumption in UWB systems, the DSP implemented RAKE receiver structure is certainly an obvious candidate for power reduction. Current solutions depend on DSP hardware running at several GHz.

In order to understand the trade-offs three different time scales may be identified (Figure 2). At the finest resolution a monocyte is typically in the order of 200ps. With high-energy pulses, several reflections or multipath components occur and a pulse window or bin of typically 10-50ns is used to "collect" all the emitted energy. Finally pulse
detection in 50-100 bins is combined as a PR sequence to identify one symbol. A symbol encoding time is typically in the order of 500 ns.

![Impulse radio timescales](image)

Figure 2 Impulse radio timescales

With pulse duration <1 ns processing rates of several GHz must be used in order to combine a number of RAKE fingers and figure out the probability of a transmitted symbol with a DSP approach.

Our aim is to explore mixed-mode circuits to implement a real-time RAKE structure reducing clock frequency two orders of magnitude [5].

III. THE ORTHOGONAL RAKE ARCHITECTURE

As shown in Figure 3 the incoming pulse is detected and quantized [4] in real time without any synchronization or clocking. The received and quantized pulse sequence is “stored” in a delay line using standard inverters. The delay is spanning one bin (typically 50 ns) possibly containing the received pulse with reflections. The delay line is sampled with a clock reflecting the transmitting pulse repetition rate. For low bandwidth transmissions this may be in the order of some MHz.

The samples from the delay line are clocked into shift registers making up the RAKE fingers. Some finger will contain the PR sequence of the transmitted symbol. Depending on the number of reflections, symbol recovery is possible in several fingers. As suggested in Figure 3 the PR sequence stored in the shift registers is cross-correlated with a stored PR sequence. As bits are shifted through the shift register, a running cross-correlation is done against the stored PR sequence. With more than one symbol, correlators and code registers must be duplicated. The different fingers are not only correlating the PR sequences, but the PR sequences are also synchronized. We are now able to compute the degree of combined match by simultaneously combine the correlation output from all the correlators. However, we need one combiner for each symbol.

![RAKE topology](image)

Figure 3 the orthogonal RAKE topology

IV. IMPLEMENTATION

The full RAKE receiver structure in Figure 3 is after all a considerable matrix of correlators and shift registers. It is built up as a 50x50 matrix of cells in a finger structure each consisting of one D-latch and one correlator, with a total of 2500 correlators and D-latches.

The arriving pulses have been shaped by a front-end, and are distributed as 1 ns pulses through the delay line [4]. No delay is needed for the first finger so the delay line consists of 49 elements with 30 standard minimum sized inverters in each element, which adds up to a number of 1440 inverters. Aiming at a unity-delay of approximately 1 ns, 30 inverters should match this requirement with the process used.

The computation involved is statistical in the sense of finding a probability of symbol occurrence. Staying at lower power we return to analog computation, as a correlator may be implemented very simply using three transistors.

However it is interesting to investigate the property of channelization. For PR-sequences of a given length there is a tradeoff between channelization and bit error rate (BER). An increasing number of channels result in a higher BER, increasing the length of the PR-sequence will improve both.

![Correlator circuit](image)

Figure 4 the implemented correlator circuit

By including simple digital logic, two different detection
modes are available. Detecting only matching of "1" is effective in a noisy environment while correlation of both "1" and "0" may be more efficient enabling shorter symbol sequences. Thus we needed the possibility to set the operation mode of the correlator between correlation on only ones and both ones and zeroes, which gives a combined XOR- and AND-gate. Extra logic had to be included in the correlator circuit in order to achieve the desired functionality as depicted conceptually in Figure 4.

The correlator consists of two parts. Two pMOS transistors controls the correlation current in case of a match, and a differential cascade coupled circuit performs in this case the actual comparing between the two inputs in a pull down network (PDN). The PDN consists of 13 nMOS transistors implementing the required combinational function. This kind of logic combines two concepts: differential logic and positive feedback, and requires that each input is provided in complementary format [6]. In order to meet this requirement three inverters, one for each input, are embedded in the PDN. The Control input decides whether the correlator matches only ones or both binary levels, while the two other inputs are the stored template and the output of the D-latch in each cell. The Vbias input controls the current flowing to the output line which contributes to pull the voltage level on the output line up to threshold level. Vmatch is the actual connection to the output line.

\[ I_{\text{fig}} = \sum_{j=1}^{N} c_j f_j. \]

\( N \) is the length of the pseudo random sequence, \( c_j \in [0,1] \) and \( f_j \) is the unit current from each correlator. The finger current, \( I_{\text{fig}} \), is directly proportional to the degree of match between the stored pseudo random pattern and the received bit sequence. Simply by matching the finger current with an appropriate pull-down current, the output will be high if and only if an appropriate degree of match is present. The pull-down current provides a simple pre-charging to the output line. This pre-charging is performed by two nMOS transistors in series like a current-mode AND-gate. One transistor is running on inverted clock to pull down the output line between sampling while the other transistor is for current limiting.

The comparator at the output of each finger (Figure 6) is a simple structure chosen because of its low power consumption and self-biasing properties in addition to a fairly good range of adjustment [3]. With some additional pulse shaping and driving capability in the output inverters this should be sufficient. In order to reduce the number of output pads required for the output signals, six pre-designed 84×1 multiplexers were used. These will operate on a 160MHz clock frequency in order to keep up with the streaming data from all outputs.

![Figure 6. The comparator circuit.](image)

The RAKE-receiver is realized in a standard 0.12μm process on the same silicon as a front-end circuit [4]. The size of the complete receiver structure is 993μm by 682μm and contains just over 10,000 transistors. Figure 7 shows a picture of the RAKE-receiver. On the left side the delay line can be seen, while on the right side the six multiplexers and the output from the comparators are easily observed.

![Figure 7. A picture of the RAKE-receiver.](image)

The circuit drawn in Figure 5 is like wired current-limited AND-gates. When the inputs are low, a unit current \( I_{\text{sat}} \) is delivered as set by the bias voltage. The correlation currents from all correlators of each finger are summed on a single wire by interconnection. The current drawn by one finger is then

\[ I_{\text{fig}} = \sum_{j=1}^{N} c_j f_j. \]

The finger current, \( I_{\text{fig}} \), is directly proportional to the degree of match between the stored pseudo random pattern and the received bit sequence. Simply by matching the finger current with an appropriate pull-down current, the output will be high if and only if an appropriate degree of match is present. The pull-down current provides a simple pre-charging to the output line. This pre-charging is performed by two nMOS transistors in series like a current-mode AND-gate. One transistor is running on inverted clock to pull down the output line between sampling while the other transistor is for current limiting.

The comparator at the output of each finger (Figure 6) is a simple structure chosen because of its low power consumption and self-biasing properties in addition to a fairly good range of adjustment [3]. With some additional pulse shaping and driving capability in the output inverters this should be sufficient.

In order to reduce the number of output pads required for the output signals, six pre-designed 84×1 multiplexers were used. These will operate on a 160MHz clock frequency in order to keep up with the streaming data from all outputs.

![Figure 6. The comparator circuit.](image)

The RAKE-receiver is realized in a standard 0.12μm process on the same silicon as a front-end circuit [4]. The size of the complete receiver structure is 993μm by 682μm and contains just over 10,000 transistors. Figure 7 shows a picture of the RAKE-receiver. On the left side the delay line can be seen, while on the right side the six multiplexers and the output from the comparators are easily observed.

![Figure 7. A picture of the RAKE-receiver.](image)

Figure 8 show the complete chip with pads. The RAKE-receiver is easily observed. The front-end is seen in the upper left corner with its decoupling capacitors as black squares. The total size of the chip is 1792μm by 1333μm.

So far simulations show promising result with regard to power consumption. Based on simulations estimated idle
power consumption of the receiver is about 50mW. An estimate of a fully correlating finger with a sampling frequency of 20MHz is about 100µW. Similar hard wired receiver topologies for comparison are hard to find.

![Figure 8 a picture of the chip with the RAKE receiver and front-end](image)

V. PARTIAL RAKE RECEIVER

The proposed structure in Figure 3 is often called a full RAKE receiver with a finger for all available taps of the delay-line. It is however possible to reduce the number of fingers if the interesting bits are occurring repeatedly in the same position of the delay line. This may be possible if the clock is locked to some property of the bit stream. It is reasonable to assume some clustering [2] since reflections is a consequence of an initial emitted pulse. A possible approach could be to measure the “energy” of the lower part of the delay line. Provided some clustering around the emitted pulse exists, the clock could be tuned to achieve the highest energy at that location. We may again turn to a current-mode approach. If digitally controlled current sources are attached to the lower taps and summed on a wire, the total summed current should be proportional to the number of “1” in the lower part of the delay line. This again may be used to synchronize the clock with a PLL. Due to the pseudo random occurrence of the clusters, clock adjustments must be slow.

The clock adjustment must however cope with the high frequency bit stream and significant current must be used to keep up. Based on simulations done on one finger, an increase in sampling frequency from 20 MHz to 40 MHz cause an estimated increase in power consumption of about 30%.

The upside is that the number of fingers may be reduced to taps at the lower part of the delay line. When earlier taps are not used, we may simply remove them and reduce the power consumption.

Another important consequence is that we may reduce the transmitted data rate by increasing bin length and still use the same receiver topology.

VI. SIMULATIONS

Figure 9 shows the result of a simulation of a delay element. The delay is just a few picoseconds less than 1ns.

![Figure 9 the time delay of one delay element](image)

In Figure 10 the comparator functionality is shown. The dotted line is the input signal, which is a voltage going linearly up from zero to one volt in 50 ns, and then down to zero at 100 ns.

![Figure 10 range of operation of the comparator](image)

The dashed line show how the output responds to the changes in input voltage when the threshold voltage is set to 310mV. The solid line show the output voltage response when the threshold is set to 690mV. These two voltages determine the range of operation of this comparator.

![Figure 11 simulation results of one RAKE finger](image)
Figure 11 shows the output from a simulation of a RAKE finger. The threshold voltage of the comparator is here set to 600mV. The top curve is the clock, while the curve in the middle shows the voltage level at the output line. The bottom curve shows the output from the comparator. The shift register with the template has been filled with only ones, the correlator is set to correlate on only ones and the input of the finger is set to the positive power rail. By doing this we get a signal of only ones traveling one step in the RAKE finger shift register each clock period. For each period we then get an increment in the number of correlators contributing to pull up the output line. In the simulation presented in... the correlation threshold is reached after 14 clock pulses corresponding to a match on 14 correlators. As expected the mixed-mode circuits presented are handling the modest frequency requirement of the spatial RAKE receiver.

Measured results are expected in short time; if possible they will be included in the final paper.

VII. CONCLUSIONS

The power efficient UWB receiver presented in this paper is combining digital delay-lines with analog computation to implement a full RAKE receiver. High-speed clocks are avoided with estimated maximum clock-rate at 20MHz. This is enabling power efficient implementation. Real time correlation is implemented with simple analog correlator circuit and symbol probability matching is computed using Kirchoff’s current law. The receiver is realized in a standard 0.12μm CMOS process, with simulations showing promising results.

REFERENCES
[4] Kjell Meier, Claes Lindhod, Tor Stevne Lande, Dag Walden; "CMOS impulse radio transceiver" To be published
CMOS impulse radio front-end

Kjetil Meisal, Claus Limbodal, Tor Sverre Lande and Dag T. Wisland

Prepared for submission at Norchip 2005
CMOS Impulse Radio Receiver Front-end

Kjetil Meisal, Claus Limbodal, Tor Sverre Lande, Member IEEE, Dag T. Wisland, Member IEEE

Abstract—Low power impulse radio receiver front ends are hard to implement in standard CMOS. In this paper we present a simple thresholding solution exploring simple inverter structures. The ultra wide band impulse radio receiver front end consists of a LNA, integrator and thresholding pulse shaper all in standard digital CMOS technology. The continuous time front-end is designed to work with a RAKE receiver for low power ultra wide band applications.

Index Terms—UWB, Low-power, Impulse radio

I. INTRODUCTION

The focus on short range wireless communication technology is increasing. Although standards like Bluetooth should address such demands, there seem to be intense development of increased bandwidth using lower power. An interesting technology is Ultra Wide Band (UWB) or Impulse Radio. With the recent FCC approval of wide-band transmission between 3.1GHz–10.6GHz with a EIRP emission level at -41.6 dB several efforts have emerged achieving in excess of 100Mbit/s data rates for short range communication.

The concept of base band radio using very short pulses has other interesting advantages. The wide transmission band makes penetration through different materials better than narrow band transmission [4][5]. The lack of carrier may be traded for low power solutions. Traditionally narrow-band based transmissions are wasting energy on a HP carrier which in itself is transferring no information.

There is, however, another interesting development of short-range low bandwidth communication links like in wearable and implantable microelectronics (Personal Area Networks) [8]. In several of these applications like implantable devices ultra low power is important. Exploring impulse radio for robust wireless communication in medical applications is also interesting.

In this paper we propose a novel front-end for impulse radio receivers suitable for implementation in standard CMOS technology. The novel architecture is exploring continuous time delay lines avoiding high speed sampling clocks. The proposed solution is implemented in STM 120um CMOS process.

II. SAMPLED DELAY-LINE ARCHITECTURE

There are several approaches to UWB or impulse radio receiver front-ends [9][10].

As shown in Figure 1 in a typical UWB receiver the incoming signal is multiplied with a template, integrated and quantized. The symbol recovery processing is done by signal processing algorithms in the DSP. However, this architecture is hard to optimize for low power solutions due to very demanding timing constraints and the high sampling rate required to process GHz signals.

We are exploring a continuous time solution using early quantization and a sampled delay-line architecture relaxing timing constraints. The sampling rate is reduced to something less than 100MHz.

Figure 2: Sampled delay-line architecture.

In Figure 2 the sampled delay-line architecture is shown. The incoming wideband signal is amplified in a LNA and then immediately quantized. The sequence of received impulses is delayed using cascaded inverters. The delay-line may now be sampled with a 20-50MHz clock and decoded using parallel correlators in a RAKE arrangement [12][13]. Although sampling rate is reduced, the sampled delay-line architecture must handle the bandwidth of UWB signals (10GHz) making a straight CMOS implementation quite demanding. Our UWB front-end is implemented using 120nm technology from ST Microelectronics with a fit around 90GHz.

The length of the time delay line and the duration of the fabricated pulse is a representation of the total amount of fingers in the RAKE constellation and the last possible received reflection. The 1ns length of the fabricated pulse and the 50ns length of the time delay line implicates that the reflections used are located 30cm to 15m apart from the original pulse.
III. LNA DESIGN

The wideband LNA amplifies the input signal with a gain of 25±30dB. With an extremely weak input signal, low noise structures are required. Our approach is to reduce the number of active elements to a minimum. As most fine-pitch processes are triple well, careful design may reduce substrate noise. The degree of isolation this provides together with decoupling and shielding should reduce the impact of noise on the amplifier. The proposed LNA is a simple cascade amplifier with a serial conductive load (Figure 3).

![LNA architecture diagram]

Figure 3: LNA architecture

The topology consists of an input transistor (Q3), and two biasing transistors (Q1 and Q2). The cascading transistor (Q2) is improving LNA gain, and transistor Q1 is controlling the load current. This design is giving high frequency and driving performance in the required UWB band (Figure 8).

The transistor mismatch and parasitics is hard to predict in advanced processes. Especially the parasitic capacitances of the wires seem to have major impact [7]. By using large transistors and capacitors, and by making the design compact with short interconnections, the impact of parasitic capacitance is reduced.

The input of the LNA is level shifted by a voltage follower of the same design as the one described in section V.

IV. COMPARATOR/QUANTIZER

The first question is to what extend the incoming, noisy signal could be quantized by simple thresholding. The emitted pulse is mixed with significant noise, so looking for a fixed threshold might not be feasible, either too noisy (low threshold) or too lossy (high threshold). We still find this approach feasible for two reasons: 1) Symbol transmission is designed for lossy data using pseudorandom sequences and cross-correlation for statistical symbol recovery. 2) We have added dual-slope detection. By assuming transmitted pulses to be a sequence of a change of one polarity followed by a change of opposite polarity (figure 4), we expect to achieve sufficient discrimination.

The high frequency and low amplitude of the incoming signal makes it difficult to use a standard differential comparator for thresholding. In order to achieve high frequency performance our topology is based mostly on inverters, which sometimes is combined with a capacitive feedback to improve high frequency gain. The idea of pulse shape recognition is quite simple and is a combination of thresholding and logical gates. As shown in Figure 4 we are thresholding both slopes of an emitted impulse. By construction, the inter-slope time is fixed and set to T. In order to detect both slope polarities (positive followed by negative and negative followed by positive) we feed the amplified signal through two different signal paths.

![Dual slope monocyte diagram]

Figure 4: Dual slope monocyte (positive followed by negative)

In both signal paths the signal is shifted to a suitable level depending on slope polarity. The detected pulses are shaped to pulses with the same duration (approx. 1ns) After thresholding the early pulse is delayed by τ and the final pulse is detected by simple digital logic.

The circuit implementation is shown in Figure 5 and has one path for each slope polarity. For details about the level shifting and thresholding circuit, see section V.

![Pulse shape detection schematic]

Figure 5: Pulse shape detection schematics

A following stage correlates the pulse with a delayed version of the pulse on the parallel path, where the delay reflects the preset, fixed τ between the slopes (figure 4). A symmetric solution is used for the opposite phase detection. An additional noise improvement is added by using an XNOR gate for combining each slope-phase. By definition both phases may not occur at the same time and must be due to noisy signal's.

V. LEVEL-SHIFTING AND THRESHOLDING

The level shifter and thresholding scheme (Figure 6) consists of a "slow" tailing pair differential amplifier used as a voltage follower, combined with two inverters and a capacitive feedback. The thresholding level is fixed in the inverters, but by combining the inverter structure with a level shifter it will have the functionality of a adjustable threshold circuit. The tailing pair topology of the level shifter makes up as a high pass filter due to the voltage followers.
frequency response. This can be adjusted by regulating the tail current, and shows through simulations to have an upper 3dB limit just above 2GHz.

![Figure 6: Levelshifting and thresholding circuitry](image)

VI. DELAY LINE & PULSE SHAPING

The duration of the shaped pulse must be matched to the unit-delay of the delay line (+ in Figure 2). A longer pulse may reduce the sampling clock of the delayline. Lower sampling clock will reduce power consumption. Then again longer pulses may mask information carrying reflections for parallel RAKE fingers. A reasonable pulse duration seem to be 1ns [2][3] mostly avoiding masking of multipath components. The duration of the pulse must be short enough not to mask out wanted reflections at the same time as the total length of the time delay line is proportional to the timespan between the first arriving pulse and the last possible reflection.

VII. ISOLATION TECHNIQUES

Due to the high frequency demands, low power supply and the sensitive circuitry in this design, on-chip isolation is important [1]. The analog circuits are all isolated from the digital switching circuits. This is done by using P-wells for circuit separation and isolation. The different circuits and isolated wells have separated power rails and power pads to ensure minimum cross-talk isolation. The shielding used is providing isolation from electric fields generated by other circuitry on or off chip, and the on-chip decoupling provides an AC connection to isolated ground. A shield was added in the top metal layer for analog circuits and wires together with decoupling to isolated ground on all analog signals and supplies. The isolated ground from decoupling and shielding are star-coupled into several separated pads for isolated ground, which again is star coupled together close to the power supply rail on the PCB off chip. The separated supplies and ground for the different circuits are all coupled into isolated pads. As mentioned earlier the level shifting circuit also adds in as an isolation/noise reduction factor in the design by high pass filtering the signal. As it is the combination of all the different isolation techniques used in a design that determines the final results we are confident that the combined techniques that were used will provide sufficient isolation and noise reduction.

VIII. IMPLEMENTATION

The front end circuit topology is implemented on the same silicon as the RAKE receiver (Figure 7). The front end radio receiver is located at the upper left corner of the RAKE (large square matrix). The black squares located around the front end are the decoupling capacitors. The total silicon size is 1.8mm x 1.35mm equaling 2.43mm2.

![Figure 7: Circuit layout](image)

IX. SIMULATIONS

Simulations of the circuit is divided into simulations of the LNA separate, the complete system with the LNA, level shifters, triggered high speed processing for pulse shape detection and pulse sharpeners and the level shifter.

The first simulation (Figure 8) shows the AC response of the LNA simulated between 10MHz and 50GHz.

![Figure 8: LNA AC response](image)

The model shows that the gain stays above 30dB in the area between 3GHz to 10GHz.

The second simulation (Figure 9) shows the complete circuit starting from the bottom trace. The lowest trace is input signals with complementary phases. The input dual monostable has a amplitude of only 10mV (modulated in MATLAB). The next trace is the output from the LNA. As the signal is split, the next to traces exhibit the output from the complementary slope detectors each detecting the different slope phases. These signals are then combined by
the XNOR gate and finally shaped to a 1ns pulse.

Figure 9: Front end processing simulation results

Power consumption is estimated for the dual monocytes input up to be level shifter. If we assume three reflections for each pulse and 25 pulses (50%) for each symbol containing 30 bits sampled in the delay line at 20 MHz. The total power consumption adds up to 84 mW. Since we assume that 50% of the symbol contains pulses and they all are followed by three reflections each, it adds up to be a fairly conservative estimate.

An AC simulation of the level shifter is shown in Figure 10 and a DC analysis in Figure 11 to indicate the performance. The AC response (Figure 10) shows that we have a cut-off frequency at 2GHz. The simulation was performed with a tail bias at 90%.

Figure 10: Voltage follower frequency response

The DC response (Figure 11) indicates the linearity of the voltage follower and the different offsets at tail bias voltages from 100-450mV.

Figure 11: Voltage follower linearity

X. MEASUREMENTS

The circuit will be tested with input pulses of different phase and amplitude, and with different levels of added noise. We are expecting the chip to arrive shortly, and measurements will start momentarily.

Figures of different measurement results and comments are therefore pending and will be added for the final version.

XI. CONCLUSIONS

The ultra wide band impulse radio receiver front end architecture presented in this paper is implemented in a 120nm CMOS process. The front-end is exploring dual-slope impulses and thresholding. The continuous time pure CMOS front-end is a pre-conditioner for a sampled delayline RAKE receiver, making up a novel low-power UWB impulse radio receiver.

REFERENCES

[3] Claus Lindbord, Kjetil Melvik, Tor Steene Lande and Dag T. Wolland, "A spatial RAKE receiver for real time ultra wide band applications", To be published.
[4] Ian Oppermann, Matti Hautamaki and Jari Lintilä, "UWB Theory and Applications", Published by Prentice Hall PTR 2004
[9] Maikyoung Ch. Hyngbjo Jung, Student Member, IEEE, Ramesh Harjani, Senior Member, IEEE, and Dongsoo Park, Member, IEEE, "A New Noncoherent UWB Impulse Radio Receiver", IEEE Communications Letters, VOL. 9, NO. 2, February 2005
### Appendix F

#### Glossary

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>AC</td>
<td>Absolute Combining</td>
</tr>
<tr>
<td>ADC</td>
<td>Analog - Digital Converter</td>
</tr>
<tr>
<td>AMS</td>
<td>Austria Micro Systems</td>
</tr>
<tr>
<td>ASIC</td>
<td>Application Specific Integrated Circuit</td>
</tr>
<tr>
<td>AWGN</td>
<td>Additive White Gaussian Noise</td>
</tr>
<tr>
<td>BER</td>
<td>Bit Error Rate</td>
</tr>
<tr>
<td>BPAM</td>
<td>Binary Pulse Amplitude Modulation</td>
</tr>
<tr>
<td>BPSK</td>
<td>Binary Phase Shift Keying</td>
</tr>
<tr>
<td>BPF</td>
<td>Band Pass Filtering</td>
</tr>
<tr>
<td>CDMA</td>
<td>Code Division Multiple Access</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complementary Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>CMP</td>
<td>Circuits Multi-Project</td>
</tr>
<tr>
<td>CRC</td>
<td>Cyclic Redundancy Check</td>
</tr>
<tr>
<td>DCVSL</td>
<td>Differential Cascode Voltage Switch Logic</td>
</tr>
<tr>
<td>DRC</td>
<td>Design Rules Check</td>
</tr>
<tr>
<td>DS</td>
<td>Direct Sequence</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital Signal Processing</td>
</tr>
<tr>
<td>EGC</td>
<td>Equal Gain Combining</td>
</tr>
<tr>
<td>ETSI</td>
<td>European Telecommunication Standard Institute</td>
</tr>
<tr>
<td>FCC</td>
<td>Federal Communication Committee</td>
</tr>
<tr>
<td>FDMA</td>
<td>Frequency Division Multiple Access</td>
</tr>
<tr>
<td>FEC</td>
<td>Forward Error Correction</td>
</tr>
<tr>
<td>FH</td>
<td>Frequency Hopping</td>
</tr>
<tr>
<td>FIR</td>
<td>Finite Impulse Response</td>
</tr>
<tr>
<td>FPGA</td>
<td>Field Programmable Gate Array</td>
</tr>
<tr>
<td>FSK</td>
<td>Frequency Shift Keying</td>
</tr>
<tr>
<td>GPS</td>
<td>Global Positioning System</td>
</tr>
<tr>
<td>HPF</td>
<td>High Pass Filtering</td>
</tr>
<tr>
<td>IEEE</td>
<td>The Institute of Electrical and Electronic Engineers</td>
</tr>
<tr>
<td>Acronym</td>
<td>Definition</td>
</tr>
<tr>
<td>---------</td>
<td>------------</td>
</tr>
<tr>
<td>IF</td>
<td>Intermediate Frequency</td>
</tr>
<tr>
<td>ISI</td>
<td>Inter Symbol Interference</td>
</tr>
<tr>
<td>LNA</td>
<td>Low Noise Amplifier</td>
</tr>
<tr>
<td>LOS</td>
<td>Line Of Sight</td>
</tr>
<tr>
<td>LPF</td>
<td>Low Pass Filtering</td>
</tr>
<tr>
<td>MA</td>
<td>Multiple Access</td>
</tr>
<tr>
<td>MAI</td>
<td>Multiple Access Interference</td>
</tr>
<tr>
<td>MB-OFDM</td>
<td>Multi Band Orthogonal Frequency Division Multiplexing</td>
</tr>
<tr>
<td>MBOK</td>
<td>M-ary Bi-Orthogonal Keying</td>
</tr>
<tr>
<td>MES</td>
<td>Micro Electronic Systems</td>
</tr>
<tr>
<td>MICTOR</td>
<td>Matched Impedance Connector</td>
</tr>
<tr>
<td>MRC</td>
<td>Maximum Ratio Combining</td>
</tr>
<tr>
<td>MUX</td>
<td>Multiplexer</td>
</tr>
<tr>
<td>NLOS</td>
<td>No Line Of Sight</td>
</tr>
<tr>
<td>OOK</td>
<td>On-Off Keying</td>
</tr>
<tr>
<td>PAM</td>
<td>Pulse Amplitude Modulation</td>
</tr>
<tr>
<td>PCB</td>
<td>Print Circuit Board</td>
</tr>
<tr>
<td>PCI</td>
<td>Personal Computer Interface</td>
</tr>
<tr>
<td>PDN</td>
<td>Pull Down Network</td>
</tr>
<tr>
<td>PDP</td>
<td>Power Delay Profile</td>
</tr>
<tr>
<td>PG</td>
<td>Processing Gain</td>
</tr>
<tr>
<td>PN</td>
<td>Pseudo Noise</td>
</tr>
<tr>
<td>PPM</td>
<td>Pulse Position Modulation</td>
</tr>
<tr>
<td>PSK</td>
<td>Phase Shift Keying</td>
</tr>
<tr>
<td>RF</td>
<td>Radio Frequency</td>
</tr>
<tr>
<td>RG</td>
<td>RAKE Gain</td>
</tr>
<tr>
<td>SD</td>
<td>Selection Diversity</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal to Noise Ratio</td>
</tr>
<tr>
<td>SS</td>
<td>Spread Spectrum</td>
</tr>
<tr>
<td>SV</td>
<td>Saleh - Valenzuela</td>
</tr>
<tr>
<td>TDMA</td>
<td>Time Domain Multiple Access</td>
</tr>
<tr>
<td>TH</td>
<td>Time Hopping</td>
</tr>
<tr>
<td>UWB</td>
<td>Ultra Wide Band</td>
</tr>
<tr>
<td>UWB-IR</td>
<td>Ultra Wide Band Impulse Radio</td>
</tr>
<tr>
<td>WLAN</td>
<td>Wireless Local Area Network</td>
</tr>
</tbody>
</table>
BIBLIOGRAPHY

Bibliography


123


List of Figures

1.1  UWB emission masks for FCC and ETSI.  
2.  
2.1  Reflection and refraction of an incident radio wave.  
2.2  Diffraction.  
2.3  Flat fading in a communication channel. The received signal is shown as a result of the channel impulse response influence on the transmitted signal.  
2.4  Frequency selective fading in a communication channel. The received signal is shown as a result of the channel impulse response influence on the transmitted signal.  
2.5  A multipath environment as a result of reflections from multiple objects. The black path is a LOS path.  
3.  
3.1  A typical narrowband communication system. A typical structure consist of High Pass Filtering (HPF), Band Pass Filtering (BPF), Low Pass Filtering (LPF) and multiplication of the signal with Radio Frequency (RF) and Intermediate Frequency (IF) oscillators as shown.  
3.2  A typical UWB communication system.  
3.3  Pulse position modulation.  
3.4  BPAM pulses for '1' and '0'.  
3.5  On-off keying pulses for '1' and '0'.  
3.6  A BPSK sequence is shown corresponding to a bit sequence of '0101'. The black waveform is corresponding to a bit '0' and the red waveform corresponds to a bit '1'.  
3.7  The waveform in a) show a typical continuous FSK signal with no phase error in the shift between '1' and '0', whereas the waveform in b) show a discontinuous signal with phase difference in at time of shifting.
LIST OF FIGURES

4.1 The correlator receiver ........................................ 40
4.2 A matched filter receiver ................................. 41
4.3 A RAKE receiver with $N$ parallel correlators .......... 42
4.4 RAKE receiver for discrete-time channels .......... 43
4.5 The orthogonal RAKE topology ......................... 45
4.6 Sampling of multipath components. The red pulse arrives the antenna first and the blue pulse arrives the antenna last. At the time of sampling the red pulse has propagated the longest way through the delay line, but is hidden inside the delay element. The sampled level on both sides of that delay element is '0'. .................. 47

5.1 Simulation of one delay element. The typical delay for this element is 852ps. ......................... 54
5.2 A simulation on the delay line. The simulation show the signal after one element, 10 elements and 50 elements. The black waveform is the input clock with a frequency of 500 MHz, which correspond to a pulse width of 1ns. The blue waveform correspond to the input signal after one delay element. The red waveform correspond to the input signal after 10 delay elements. The green waveform correspond to the input signal after 50 delay elements. .......................... 55
5.3 The basic principle of the correlator ................. 56
5.4 The differential gate correlator with its pull-down network . 57
5.5 Simulation showing the functionality of the correlator. The input signals $I_{n1}$ and $I_{n2}$ is shown together with the Control input. The bottom waveform is the output response from the logic, corresponding to the truth table in table 5.1. ............... 58
5.6 The comparator circuit .................................. 59
5.7 The functionality of the comparator is shown with a 20 MHz input signal and a range of adjustment from 320 mV to 690 mV. The solid line is the input signal, which is a voltage going linearly up from zero to one volt in 50 ns, and then down to zero at 100ns. The dashed line show how the output responds to the changes in input voltage when the threshold voltage is set to 310mV. The dotted line shows the output voltage response when the threshold is set to 690mV. These two voltages determine the range of operation of this comparator. Note the hysteresis between the switching points at the positive and negative edge. ....................... 61
LIST OF FIGURES

5.8 The RAKE finger ................................................. 62
5.9 A schematic view of the RAKE cell .............................. 62
5.10 The principle of the combiner line with pre-charging. $V_{\text{match}}$ corresponds to the connection to the output threshold line. 63
5.11 This is a simulation showing the functionality of one RAKE finger. In this simulation the correlator bias voltage was set to 650mV. ................................................. 64
5.12 This simulation shows a result of the same set up. The correlator bias voltage was set to 700mV. .................. 65
5.13 Corner simulations on the RAKE finger. It is clear that the different corners in the simulations affect the bias voltage necessary for the desired percentage of match. The simulations also show that the circuit performance depend most on the properties of the pMOS. A slow or fast pMOS has a larger effect on the performance than a slow or fast nMOS. 66
5.14 Picture of the RAKE-receiver layout. ......................... 68

6.1 Measurement of the shift register functioning at 60 MHz. This was the maximum frequency the shift register could handle without introducing error. The input clock is shown at the top and the input data is shown at the bottom. In the middle the output from the shift register is shown. ............. 71
6.2 Two measurements of single match pulses. The top waveform show the pulses at the output of a RAKE finger in case of a match. The sampling frequency of 20 MHz is clearly shown. In the bottom waveform the bias voltage is adjusted to a slightly lower level than the level in the above waveform. This means that the contribution from each correlator is larger yielding the voltage level at the combiner line to increase faster. The result is a wider pulse at the output. ............. 72
6.3 The function of the correlators is shown. In this measurement the template shift register is fed with the frequency in the bottom waveform. The correlator shift register is filled with ones. The output from the correlators is shown in the top waveform indicating a match when the template shift register contains ones. The square pulse shapes observed in the bottom waveform is composed of a large number of single match pulses, as shown in figure. ................. 73
6.4 This figure show the result of a measurement performed with the correlator sampling frequency at 40 MHz. The bottom waveform is the result of a simulation performed with the same setup for comparison. The data rate in the simulation is higher, but confirm the appearance of the measured pulses in this setting. The square pulse shapes observed in the bottom waveform is composed of a large number of single match pulses, as shown in figure.....74

6.5 This figure show the output response with a clock rate of 1 MHz at the template shift register, in this case the data rate is 100kHz. The bottom waveform clearly show the widening of the output pulse train as a result of the time it takes to fill and empty the shift register. As a match is obtained at the output with only a few ones in the shift register, the output will indicate a match after a few clock pulses. As the input fall to a low level it takes time to shift the remaining ones through the shift register, resulting in an output pulse train that is wider than the input pulse. ..75

A.1 Picture of the PCB .83
A.2 Pictures of the measurement setup .85

B.1 Layout of one delay element. The element consist of two rows of 15 inverters each giving a delay of 1 ns per element. The total delay line consist of 49 of these elements. .87
B.2 Layout of the whole delay line .88
B.3 Layout of the correlator .89
B.4 Layout of the comparator circuit. .90
B.5 Layout of the RAKE cell. To the left the D-latch with its NAND-gates and two inverters is seen, and to the right the correlator circuit can be observed. .90
B.6 These figures show buffers that are used in the circuit for driving capability. The largest one is shown in the upper right figure is used as a supplement to the buffer in the bottom figure. The buffer shown in the upper right figure has a width of 64μm, and was designed to be capable of driving a load of 1pF at a frequency of 1 GHz. .91
LIST OF FIGURES

B.7 Picture of the chip with the RAKE receiver and front-end. The RAKE-receiver is easily observed on the same silicon as a front-end circuit. The front-end is seen in the upper left corner with its decoupling capacitors as black squares. The total size of the chip is 1792µm by 1333µm. 

C.1 This simulation show the response of the delay element with an input frequency of 5GHz. 

C.2 This simulation show a result of a Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Fast nMOS, Fast pMOS corner at temperatures of 27°C, shown at the left figure, and -50°C, shown at the right figure. 

C.3 This simulation show a result of a Monte Carlo simulation of the delay time of one delay element. The simulation shown in the left figure is performed at the Slow nMOS, Fast pMOS corner. The simulation shown in the right figure is performed at the Fast nMOS, Slow pMOS corner shown at the right figure. 

C.4 This simulation show a result of a Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Slow nMOS, Slow pMOS corner at temperatures of 27°C, shown at the left figure, and 100°C, shown at the right figure. 

C.5 This simulation show the Monte Carlo simulation of the delay time of one delay element. This simulation is performed at the Fast nMOS, Fast pMOS corner at temperatures of 27°C, shown at the left figure and -50°C, shown at the right figure. 

C.6 A simulation of the response time of inverters. Inverters with two different sizes are shown, with one inside a delay line constellation. 

C.7 A simulation of the power consumption of inverters. Inverters with two different sizes are shown, with one inside a delay line constellation.
List of Tables

5.1 The truth table of the logic function of the differential gate. When the Control input is '1' the combinatoric function of In1 and In2 corresponds to the function of a NAND-gate, whereas if the Control input is '0', the combinatoric function of In1 and In2 corresponds to a XNOR-gate. 56

5.2 Table of the circuit specifications. The maximum power consumption is in the case of match at every correlator and a delay line running at 1 GHz. 67

A.1 Table of components on the different components used on the PCB 84