Главная Media and Radio Signal Processing for Mobile Communications

Media and Radio Signal Processing for Mobile Communications

,
Advances in media and radio signal processing have been the driving forces behind the
industrial and social changes enabled by the widespread use of smartphones and mobile
multimedia communications. We started our research on these exciting topics in January
1999, as the expectations for 3G mobile communications systems and their multimedia
services were generating great excitement. Our research, initially from an academic
viewpoint for a doctoral dissertation, shifted to more practical concerns when Dr. Jung
joined Samsung Electronics and began working to design 3G and 4G mobile commu-
nications systems. We realized that many of the approaches and assumptions made in
the literature were not realistic in actual systems and we identified new opportunities for
improvements.
Categories: Technique
Год: 2018
Издательство: Cambridge University Press
Язык: english
Страниц: 497
ISBN 13: 978-1-108-42103-4
File: PDF, 36.16 MB
Скачать (pdf, 36.16 MB)
Читать онлайн
 
You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
1

Deep Learning

Год: 2019
Язык: english
File: PDF, 3.08 MB
2

Network Flow Algorithms

Год: 2019
Язык: english
File: PDF, 2.77 MB
Media and Radio Signal Processing for Mobile Communications

Get to grips with the principles and practise of signal processing used in real mobile
communications systems. Focusing particularly on speech and video processing, pioneering experts employ a detailed, top-down analytical approach to outline the network
architectures and protocol structures of multiple generations of mobile communications
systems, identify the logical ranges where media and radio signal processing occur, and
analyze the procedures for capturing, compressing, transmitting and presenting media.
Chapters are uniquely structured to show the evolution of network architectures and
technical elements between generations up to and including 5G, with an emphasis on
maximizing service quality and network capacity through reusing existing infrastructure and technologies. Examples and data taken from commercial networks provide an
in-depth insight into the operation of a number of different systems, including GSM,
cdma2000, W-CDMA, LTE, and LTE-A, making this a practical, hands-on guide for
both practicing engineers and graduate students in wireless communications.
Kyunghun Jung is a Principal Engineer at Samsung Electronics, where he leads the
research and standardization for bringing immersive media services and vehicular
applications to 5G systems.
Russell M. Mersereau is Regents Professor Emeritus in the School of Electrical and
Computer Engineering at the Georgia Institute of Technology, and a Fellow of the IEEE.

This impressive book provides an excellent comprehensive explanation of the principles and
practices of media and radio signal processing in real mobile communications systems. It also
wonderfully explains the evolution of signal processing operations and thereby gives the reader a
deep insight into the challenges and how they were overcome.
Kari Järvinen, Nokia Technologies
With today’s mobile user experience so influenced by multimedia services, providing a clear background on the fundamentals of the entire protocol stack, from the physical layer to the multimedia
codecs, media handling, and immersive media, is an invaluable book for understanding today’s
mobile cellular systems. The authors’ experience with the development of the protocols and standards of these systems provides unknown insights into the reason for their development that allows
the reader to better understand these technologies.
Nikolai Leung, Qualcomm

Media and Radio Signal
Processing for Mobile
Communications
K YUN G H U N J U NG
Samsung Electronics

R US S E LL M. MER SER EAU
Georgia Institute of Technology

University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108421034
DOI: 10.1017/9781108363204
c Cambridge University Press 2018

This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2018
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Jung, Kyunghun, 1970– author. | Mersereau, Russell M., author.
Title: Media and radio signal processing for mobile communications /
Kyunghun Jung, Samsung Electronics, Russell M. Mersereau, Georgia Institute of Technology.
Description: New York, NY : Cambridge University Press, 2018. |
Includes bibliographical references and index.
Identifiers: LCCN 2017054695 | ISBN 9781108421034 (alk. paper)
Subjects: LCSH: Multimedia communications. | Mobile communication systems. |
Signal processing–Digital techniques.
Classification: LCC TK5105.15 .J86 2018 | DDC 621.39/167–dc23
LC record available at https://lccn.loc.gov/2017054695
ISBN 978-1-108-42103-4 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
c 2001. 3GPPTM TSs and TRs are the property of ARIB, ATIS, CCSA, ETSI, TTA and TTC who jointly

own the copyright in them. They are subject to further modifications and are therefore provided to you “as is”
for information purposes only. Further use is strictly prohibited.

To Bongho, Hyesook, and Hoonjung, and to Martha

Contents

Preface
Acknowledgments
Glossary

page xiii
xv
xvi

1

Introduction
1.1
Historical Background
1.1.1 Problem Description
1.1.2 Performance Criteria
1.2
Analog Mobile Communications Systems
1.2.1 Network Architecture
1.2.2 Speech and Radio Signal Processing Operations
1.2.3 Cellular Operation
1.3
References

1
1
1
5
6
7
9
14
17

2

Signal Processing in TDMA Systems
2.1
Speech Signal Processing
2.1.1 Linear Predictive Coding
2.1.2 Fixed Bit-Rate versus Variable Bit-Rate Coding
2.2 AMPS Enhancements
2.2.1 Narrowband AMPS
2.2.2 Digital AMPS
2.2.3 Further Opportunities
2.3
Global System for Mobile Communications
2.3.1 Network Architecture
2.3.2 Channel Structure
2.3.3 Full-Rate Speech Codec
2.3.4 Uplink and Downlink Signal Processing
2.4
References

18
18
22
30
31
31
32
42
42
43
44
47
52
62

3

Evolution of TDMA Systems
3.1
Enhancements in Speech Compression
3.1.1 Enhanced Full-Rate Speech Codec
3.1.2 Half-Rate Speech Codec
3.2
Enhancements in Coordination of Compression and Transmission
3.2.1 Joint Source-Channel Coding Theory

64
64
64
66
71
71

viii

Contents

3.3

3.4

3.5

3.2.2 Adaptive Multi-Rate Speech Codec
3.2.3 Link Adaptation
Enhancements in Wireless Transmission
3.3.1 Downlink Advanced Receiver Performance
3.3.2 8-PSK Half-Rate Channel
3.3.3 Voice Services over Adaptive Multi-User Channels on One Slot
3.3.4 Adaptive Pulse Shaping
Performance Evaluation
3.4.1 Speech Compression and Transmission Performance
3.4.2 Live Call Analysis
3.4.3 VAMOS Operation
References

74
79
84
84
87
90
95
96
96
106
106
112

4

Signal Processing in CDMA Systems
4.1
TDMA Limitations
4.1.1 Guard Time and Guard Band
4.1.2 Fixed Bit-Rate Speech Coding
4.1.3 Frequency Re-Use Factor
4.1.4 Wideband Multipath Fading
4.2
CDMA Principles
4.2.1 Spread Spectrum Theory
4.2.2 Pseudo Noise Sequence
4.2.3 Generation of PN Sequence
4.2.4 Phase Shift of PN Sequence
4.2.5 Decimation of PN Sequence
4.2.6 Rake Receiver Theory
4.3
Interim Standard 95
4.3.1 Network Architecture
4.3.2 QCELP Speech Codec
4.3.3 Reverse Link Signal Processing
4.3.4 Forward Link Signal Processing
4.4
References

114
114
114
115
115
116
116
117
119
120
122
125
126
127
129
130
134
145
149

5

Evolution of CDMA Systems
5.1
Enhancements in Speech Compression
5.1.1 QCELP-13 Speech Codec
5.1.2 Enhanced Variable Rate Codec
5.2 cdma2000
5.2.1 Reverse Link Signal Processing
5.2.2 Forward Link Signal Processing Procedures
5.3
Enhancements in Coordination of Compression and Transmission
5.3.1 Selectable Mode Vocoder
5.3.2 4th Generation Vocoder
5.3.3 Network Control and Voice Control of Speech Compression

150
150
151
155
156
157
161
164
164
167
174

Contents

5.4

ix

Enhancements in Wireless Transmission
5.4.1 cdma2000 Revision E
5.4.2 Reverse Link Signal Processing
5.4.3 Forward Link Signal Processing
5.4.4 Blanked-Rate 18 Frames
5.4.5 Reduced Power Control Rate
5.4.6 Frame Early Termination
5.4.7 Interference Cancellation
Performance Evaluation
5.5.1 Speech Compression and Transmission Performance
5.5.2 Live Call Analysis
5.5.3 Derivation of CDMA Voice Capacity
References

175
175
175
177
178
179
180
181
182
183
188
189
193

6

Signal Processing in W-CDMA Systems
6.1
W-CDMA Release 99
6.1.1 Network Architecture
6.1.2 Protocol Stack Principles
6.2
Radio Signal Processing
6.2.1 Radio Link Control
6.2.2 Medium Access Control
6.2.3 Physical Layer
6.2.4 Link Management
6.2.5 Operational Strategy
6.3
Video Signal Processing
6.3.1 A/D Conversion
6.3.2 Motion Estimation and Compensation
6.3.3 Multi-Dimensional Signal Processing
6.3.4 D/A Conversion
6.3.5 Combined Distortion from Compression and Transmission
6.3.6 Rate Control
6.4 Video Codecs
6.4.1 H.263 Video Codec
6.4.2 MPEG-4 Video Codec
6.5
3G-324M
6.5.1 System Architecture
6.5.2 Media Adaptation and Multiplexing Procedures
6.5.3 Radio Signal Processing
6.6
References

195
196
198
199
202
202
204
206
224
228
234
235
237
239
242
242
249
253
253
255
258
258
259
267
269

7

Evolution of W-CDMA Systems
7.1
Enhancements in Wireless Transmission
7.1.1 Pilot-Free Slot Format
7.1.2 SRB Power Boost

272
272
273
273

5.5

5.6

x

Contents

7.2

7.3

7.4

7.1.3 Compressed DPDCH
7.1.4 Frame Early Termination
Enhancements in Media Negotiation
7.2.1 Media Configuration Delay
7.2.2 Accelerated Media Negotiation
Performance Evaluation
7.3.1 Video Compression and Transmission Performance
7.3.2 Live Call Analysis
7.3.3 Voice Capacity
References

274
275
276
276
281
284
285
287
290
291

8

Signal Processing in SC-FDMA/OFDMA Systems
8.1 Technical Background
8.1.1 New Problem Description
8.1.2 Packetization of Circuit-Switched Systems
8.2
Voice over Long Term Evolution
8.2.1 Network Architecture
8.2.2 Functional Split
8.3
Radio Signal Processing Procedures
8.3.1 Packet Data Convergence Protocol
8.3.2 Radio Link Control
8.3.3 Medium Access Control
8.3.4 Physical Layer
8.3.5 Link Management
8.3.6 Operational Strategy
8.4
Media Signal Processing Procedures
8.4.1 Adaptive Multi-Rate Wideband Speech Codec
8.4.2 H.264 Video Codec
8.4.3 RTP/UDP/IP Packetization
8.4.4 Jitter Buffer Management
8.5
Resource Reservation Procedures
8.5.1 IP Multimedia Subsystem
8.5.2 SDP Offer
8.5.3 SDP Answer
8.5.4 Quality of Service Representation
8.5.5 Session Negotiation
8.6
References

292
292
292
294
297
297
299
303
304
313
314
319
335
340
354
356
358
360
363
364
364
365
368
368
374
375

9

Evolution of SC-FDMA/OFDMA Systems
9.1
Enhancements in Media Compression
9.1.1 Enhanced Voice Services Speech Codec
9.1.2 High Efficiency Video Coding
9.1.3 Session Negotiation of Enhanced Media
9.2
Enhancements in Coordination of Compression and Transmission

378
378
379
393
394
395

Contents

xi

9.2.1 Media Adaptation
9.2.2 Selective Intra-Refreshing
9.2.3 Coordination of Video Orientation
Enhancements in Session Negotiation
9.3.1 Reduction of Resizing-Induced Spectral and Computational
Inefficiency
9.3.2 Asymmetric Media Configuration
Enhancements in Wireless Transmission
9.4.1 Spectrum Usage Analysis
9.4.2 Carrier Aggregation
9.4.3 Recommendation of Media Bit-Rates
Remote Management of Operation
9.5.1 Session Negotiation Management
9.5.2 Media Adaptation Management
Performance Evaluation
9.6.1 Speech Compression and Transmission Performance
9.6.2 Video Compression and Transmission Performance
9.6.3 Live Session Analysis
9.6.4 Voice Capacity
9.6.5 Derivation of LTE Voice Capacity
References

395
407
410
414

Signal Processing in 5G Systems
10.1 Technical Background
10.2 Network Architecture
10.3 New Radio Access
10.4 Immersive Media Service
10.4.1 Virtual Reality
10.4.2 Ambisonic Audio Signal Processing
10.4.3 Omnidirectional Video Signal Processing
10.4.4 Controlling Quality–Capacity Tradeoff of Immersive Media
10.5 References

451
451
453
454
457
457
458
461
463
463

Index

465

9.3

9.4

9.5

9.6

9.7
10

414
418
419
419
421
425
426
427
429
431
435
440
441
446
447
449

Preface

Advances in media and radio signal processing have been the driving forces behind the
industrial and social changes enabled by the widespread use of smartphones and mobile
multimedia communications. We started our research on these exciting topics in January
1999, as the expectations for 3G mobile communications systems and their multimedia
services were generating great excitement. Our research, initially from an academic
viewpoint for a doctoral dissertation, shifted to more practical concerns when Dr. Jung
joined Samsung Electronics and began working to design 3G and 4G mobile communications systems. We realized that many of the approaches and assumptions made in
the literature were not realistic in actual systems and we identified new opportunities for
improvements.
Some of these approaches, which were based on extensions of conventional joint
source-channel coding, were inadequate to reflect real situations, such as the high cost
of frequency spectrum or the need for a network entity to be responsible for controlling the tradeoff between media quality and network capacity. Books that analyzed real
mobile communications systems, on the other hand, focused on the radio signal processing and network architectures, while providing limited guidance on the needs of
the media signal processing. In light of the significant discrepancy between the work
in academia and industry that we observed, we prepared this book to explain the principles and practices of both media and radio signal processing used in actual mobile
communications systems.
We examine multiple generations of commercially deployed or standardized mobile
communications systems and analyze in detail the areas where the media and radio signal processing take place and interact. We trace the evolution of the signal processing
operations, as new technical elements were introduced to meet the challenges. We identify where elements were inherited from earlier systems for compatibility, and explain
how the media codecs, network architectures, and radio access technologies interact to
maximize quality and capacity in a consistent, top-down fashion. From Chapter 2 to
Chapter 9, each pair of chapters covers the basic construction and operating principles
of a mobile communications system and its evolved version in which the initial limitations are partially solved. Each pair is self-contained and can be read independently.
Proceeding to the next pair shows more radical approaches made when evolutionary
enhancements were not sufficient and completely new elements were required. We
would like to point out that the signal processing techniques in the early chapters are

xiv

Preface

no less important than those in the later chapters on more state-of-the-art systems, as
they often become critical design constraints when new systems are designed.
Several media compression and wireless transmission techniques that looked promising from their theoretical analysis and even made it into standardization and implementation, ultimately proved to be unsuccessful in attaining the envisioned performance in
real environments. Since managing complexity and stability is a key requirement in the
design of complex systems such as mobile communications, many procedures designed
for previous systems are re-used. We discuss examples of borrowed technical concepts
from earlier systems that are applied to different areas successfully. The simulations of
communications systems often produce varying results depending on the complexity of
the system models or configuration of their parameters. Moreover, evaluations of media
quality often require subjective testing. In this book, we present the highest-quality
simulation results recognized by the standardization organizations, official results of
subjective testing administered by expert agencies contracted for those services, and
field measurements from commercially-operational GSM, cdma2000, W-CDMA, and
LTE and LTE-A handsets and networks that show the variation of key media and radio
parameters during compression and transmission in the time domain.
The trajectory of the technical evolution covered throughout the chapters shows that
each generation has introduced new technical elements or absorbed elements previously not included in mobile communications systems, as video in 3G and IP in 4G.
These require new types of signal processing to meet the harsh mobile environment.
Historically, compression and transmission of media have been the focus of mobile
communications, but it is envisioned that other areas of signal processing, e.g., recognition and synthesis of media, will play key roles in next generation systems providing
immersive media and vehicular applications. We expect that this book will bridge the
gap between academia and industry, and provide its readers with insight for the design,
analysis, and implementation of mobile multimedia communications systems.

Acknowledgments

We started our signal processing careers through the books, teaching, and collaborations of A. V. Oppenheim, R. W. Schafer, T. P. Barnwell III, J. H. McClellan, and L.
R. Rabiner, whose influence can be seen in the early chapters of this book. It was the
DSP Leadership Universities Program of Texas Instruments, granted in April 1999 with
the consideration of Gene Franz, Bob Hewes, Panos Papamichalis, and Raj Talluri, that
enabled us to initiate our long-term research on the handling and interaction of media
over mobile communications. In the systems from GSM to 5G, we were advised by
many designers and developers of those systems. For GSM, Paolo Usai, Karl Hellwig, Stefan Bruhn, and Jongsoo Choi shared with us their experience and expertise
on this fundamental and still dominant mobile communications system. For IS-95 and
cdma2000, we were deeply influenced by the work of Jhongsam Lee, Vieri Vanghi, and
Yucheun Jou. For W-CDMA and 3G-324M, Kwangcheol Choi, Yonghyun Lim, and
Youngmin Jeong helped us with the real-time transmission of media over the system.
We enjoyed the development and deployment of EVS over 4G systems with Hosang
Sung, Kihyun Choo, Jonghoon Jeong, and Woojung Park. Thomas Belling contributed
advice and suggestions we learned about core network issues. Terence Betlehem helped
us understand a new signal processing area, ambisonic audio, and write the audio section of virtual reality. We also appreciate the ongoing efforts of Kyungmo Park and Imed
Bouazizi for the realization of 5G systems, as outlined in the last chapter. We would like
to thank especially Kari Järvinen, Tomas Frankkila, and Nikolai Leung for their decadelong services at the MTSI SWG during the historical transitions from circuit-switched
to packet-switched mobile multimedia communications systems and the introduction of
IMS. This small group of experienced and versatile experts adroitly handled complex
engineering problems in the last stage of standardization and development where many
technical issues are interwoven, and shared the thrill of stabilizing those systems just
before their worldwide launches. With Ingemar Johansson, we introduced the negotiation of video resolution to the Internet community, via RFC 6236. We would also like
to thank Byungoh Kim who managed the hosting of standardization meetings, in which
many important technical decisions were made, at exotic venues in Korea. Finally, we
appreciate the generous permission of NTT DOCOMO, Innowireless, Accuver, 3GPP,
3GPP2, and Samsung Electronics for the use of their images, experimental data, and
other precious information that constitute key features of this book.

Glossary

4GV 4th Generation Vocoder. 164
ACELP Algebraic Code Excited Liner Prediction. 64
ACI Adjacent Channel Interference. 91
ACK Acknowledgment. 328
ACS Active Codec-mode Set. 75
ADPCM Adaptive Differential Pulse Coded Modulation. 19
AES Advanced Encryption Standard. 313
AL Adaptation Layer. 263
AL2 Adaptation Layer Type 2. 198
AL3 Adaptation Layer Type 3. 263
AM Acknowledged Mode. 201
AMC Adaptive Modulation and Coding. 302
AMPS Advanced Mobile Phone System. 6
AMR Adaptive Multi-Rate. 71
AMR-WB Adaptive Multi-Rate Wideband. 354
AOP Anchor Operating Point. 169
APCM Adaptive Pulse Coded Modulation. 48
APS Adaptive Pulse Shaping. 96
AQPSK Adaptive Quadrature Phase Shift Keying. 92
ARFCN Absolute Radio Frequency Channel Number. 44
AS Application Specific. 366
AS Application Server. 364
ASN.1 Abstract Syntax Notation One. 278
ATM Asynchronous Transfer Mode. 198
AWGN Additive White Gaussian Noise. 86
BCH Bose–Chaudhuri–Hocquenghem. 12
BER Bit Error Rate. 34
BIC Blind Interference Cancellation. 87
BLER Block Error Rate. 228
BLP Bitmask of following Lost Packets. 408
BMC Broacasting and Multicasting Control. 202
BPSK Binary Phase Shift Keying. 196

Glossary

BS Base Station. 7
BSC Base Station Controller. 2
BSR Buffer Status Report. 316
BSS Base Station Subsystem. 58
BTS Base Transceiver Station. 2
BWE Bandwidth Extension. 389
BWM Bandwidth Multiplier. 427
CA Carrier Aggregation. 319
CABAC Context Adaptive Binary Arithmetic Coding. 360
CAZAC Constant Amplitude Zero Auto-Correlation. 337
CCI Co-Channel Interference. 91
CCSRL Control Channel Segmentation and Reassembly Layer. 259
CCTrCH Coded Composite Transport Channel. 206
CDMA Code Division Multiple Access. 116
CDVCC Coded Digital Verification Color Code. 34
CELP Code Excited Linear Prediction. 130
CFN Connection Frame Number. 295
CID Context Identifier. 311
CIR Carrier-to-Interference Ratio. 49
CLDFB Complex Modulated Low Delay Filter Bank. 386
CLTD Closed-Loop Transmit Diversity. 223
CMC Codec Mode Command. 74
CMI Codec Mode Indication. 74
CMOS Complementary Metal Oxide Semiconductor. 235
CMR Codec Mode Request. 74
CNG Comfort Noise Generation. 384
CoID Codec Identifier. 426
CP Control Plane. 303
CP Cyclic Prefix. 321
CPICH Common Pilot Channel. 230
CQI Channel Quality Indicator. 328
CRC Cyclic Redundancy Check. 39
CRS Cell-specific Reference Signal. 338
CSMA Carrier Sense Multiple Access. 13
CSoHS Circuit-Switched Voice Services over HSPA. 295
CT Channel Type. 204
CTU Coding Tree Unit. 393
CVO Coordination of Video Orientation. 410
CVSD Continuously Variable Slope Delta Modulation. 416
CVT Continuously Variable Transmission. 175
D-AMPS Digital Advanced Mobile Phone System. 32
DARP Downlink Advanced Receiver Performance. 85

xvii

xviii

Glossary

DC Direct Conversion. 216
DCI Downlink Control Information. 333
DCT Discrete Cosine Transform. 240
DFT Discrete Fourier Transform. 21
DL-SCH Downlink Shared Channel. 316
DM Device Management. 427
DMRS Demodulation Reference Signal. 328
DN Data Network. 453
DP Data Partitioning. 257
DPCCH Dedicated Physical Control Channel. 208
DPDCH Dedicated Physical Data Channel. 208
DRX Discontinuous Reception. 274
DS Dynamic Scheduling. 341
DS Direct Source. 435
DS Direct Spread. 117
DSP Digital Signal Processor. 183
DST Discrete Sine Transform. 394
DTCH Dedicated Traffic Channel. 204
DTMF Dual-Tone Multi-Frequency. 366
DTS DARP Test Scenario. 85
DTX Discontinuous Transmission. 39
DU Digital Unit. 297
E-UTRAN Evolved Universal Terrestrial Radio Access Network. 297
ECN Explicit Congestion Notification. 302
EDGE Enhanced Data Rates for GSM Evolution. 90
EEP Equal Error Protection. 38
EFR Enhanced Full Rate. 64
EIB Erasure Indicator Bit. 151
EMR Enhanced Measurement Report. 59
Enhanced aacPlus Enhanced Advanced Audio Coding Plus. 385
EO End Office. 8
EPC Evolved Packet Core. 298
ERT Error Resilience Tool. 256
ESN Electronic Serial Number. 141
ESP Encapsulating Security Payload. 311
EV-DO Evolution Data Only. 294
EVRC Enhanced Variable Rate Codec. 155
EVS Enhanced Voice Services. 303
F-FCH Forward Fundamental Channel. 149
FBI Feedback Information. 212
FBR Fixed Bit-Rate. 30
FC Full Context. 308

Glossary

FCELP Full-Rate Code Excited Linear Prediction. 169
FDD Frequency Division Duplex. 7
FDMA Frequency Division Multiple Access. 33
FET Frame Early Termination. 180
FFT Fast Fourier Transform. 240
FI Framing Information. 281
FIR Full Intra Request. 407
FM Frequency Modulation. 6
FO First-Order. 307
FOV Field of View. 461
FPPP Full-Rate Prototype Pitch Period. 169
FR Full Rate. 47
FSK Frequency Shift Keying. 11
GBR Guaranteed Bit-Rate. 370
GMSK Gaussian Minimum Shift Keying. 54
GOB Groups of Block. 239
GP Guard Period. 45
GPRS General Packet Radio Service. 43
GSC Generic Signal Audio Coder. 384
GSM Global System for Mobile Communications. 18
HARQ Hybrid Adaptive Repeat and Request. 214
HCELP Half-Rate Code Excited Linear Prediction. 169
HEC Header Extension Code. 257
HEC Header Error Control. 261
HEVC High Efficiency Video Coding. 303
HMD Head Mounted Display. 453
HNELP Half-Rate Noise Excited Linear Prediction. 173
HR Half Rate. 66
HRM Half-Rate Max. 173
HRTF Head Related Transfer Function. 461
HSDPA High Speed Downlink Packet Access. 295
HSPA High Speed Packet Access. 294
HSS Home Subscriber Server. 364
HSUPA High Speed Uplink Packet Access. 296
I-CSCF Interrogation Call Session Control Function. 364
IC Interference Cancelation. 84
ICI Inter-Channel Interference. 92
ICM Initial Codec Mode. 81
IDR Instantaneous Decoding Refresh. 234
IE Information Element. 369
IETF Internet Engineering Task Force. 305
IF Intermediate Frequency. 216

xix

xx

Glossary

IMS IP Multimedia Subsystem. 298
IOT Internet of Things. 452
IP Internet Protocol. 292
IR Initialization and Refresh. 307
IS-54 Interim Standard 54. 32
IS-95 Interim Standard 95. 114
ISDN Integrated Services Digital Network. 195
ISF Immittance Spectral Frequency. 357
ISI Inter-Symbol Interference. 144
ISO International Organization for Standardization. 253
ITU-T International Telecommunication Union Telecommunication Standardization
Sector. 253
JBM Jitter Buffer Management. 363
JD Joint Demodulation. 87
JPEG Joint Photographic Experts Group. 258
LAR Log Area Ratio. 28
LCD Liquid Crystal Display. 242
LCG ID Logical Channel Group Identifier. 316
LCID Logical Channel Identifier. 315
LDPC Low Density Parity Check. 455
LEC Local Exchange Carrier. 8
LGP Linearized GMSK Pulse. 90
LOS Line-of-Sight. 44
LPC Linear Predictive Coding. 22
LS Last Segment. 281
LSB Least Significant Bit. 232
LSF Line Spectral Frequency. 28
LTE-A Long Term Evolution Advanced. 435
MAC Media Access Control. 201
MBM Motion Boundary Marker. 257
MBMS Multimedia Broadcast Multicast Service. 202
MBR Maximum Bit-Rate. 370
MC Multiplex Control. 261
MCPTT Mission Critical Push To Talk. 439
MCS Modulation and Coding Scheme. 330
MD Music Detector. 165
MDCT Modified Discrete Cosine Transform. 383
MIB Master Information Block. 342
MIMO Multiple-Input and Multiple-Output. 330
MIPS Million Instructions Per Second. 96
MM Mixed Mode. 135
MME Mobility Management Entity. 298

Glossary

MO Management Object. 428
MONA Media Oriented Negotiation Acceleration. 281
MOS Media Oriented Setup. 282
MOS Mean Opinion Score. 6
MPC Media Preconfigured Channels. 282
MPEG Motion Picture Expert Group. 253
MPL Multiplex Payload Length. 264
MRC Maximal Ratio Combining. 127
MS Mobile Station. 2
MSC Mobile Switching Center. 2
MSE Mean Square Error. 237
MSRG Modular Shift Register Generation. 120
MTSI Multimedia Telephony Service for IMS. 299
MTSIMA MTSI Media Adaptation. 430
MTSINP MTSI Network Preference. 428
MTSO Mobile Telephone Switching Office. 7
MTU Maximum Transfer Unit. 301
MUD Multi-User Detector. 215
MuMe Multi-Media. 426
MUROS Multi-User Re-using One Slot. 109
MUX Multiplexer. 260
N-AMPS Narrowband Advanced Mobile Phone System. 32
NACK Negative Acknowledgment. 407
NAL Network Adaptation Layer. 359
NAS Non-Access Stratum. 303
NB Narrow Band. 19
NC No Context. 308
NELP Noise Excited Linear Prediction. 169
NFV Network Function Virtualization. 454
NMT Nordic Mobile Telephone. 43
NR New Radio. 454
NRZ Non-Return-to-Zero. 12
NSRP Numbered Simple Re-transmission Protocol. 259
O-mode Bi-directional Optimist Mode. 305
O-TCH/AHS Adaptive Multi-Rate Speech Channel at 8-PSK Half Rate. 87
OFDM Orthogonal Frequency Division Multiplexing. 319
OID Organization Identifier. 426
OLED Organic Light Emitting Diode. 242
OLTD Open-Loop Transmit Diversity. 223
OoBTC Out-of-Band Transcoder Control. 231
OQPSK Offset QPSK. 144
OSC Orthogonal Sub-Channel. 95

xxi

xxii

Glossary

OSI Open Systems Interconnection. 202
OTD Orthogonal Transmit Diversity. 162
OTT Over The Top. 457
OVSF Orthogonal Variable Spreading Factor. 213
P-CSCF Proxy Call Session Control Function. 364
P-GW Packet Data Network Gateway. 298
PCC Primary Component Carrier. 422
PCCC Parallel Concatenated Convolutional Code. 210
PCEF Policy and Charging Enforcement Functionality. 365
PCell Primary Cell. 422
PCG Power Control Group. 140
PCM Pulse Coded Modulation. 2
PCRF Policy and Charging Rules Function. 364
PCS Personal Communications Service. 151
PDB Packet Delay Budget. 373
PDC Personal Digital Cellular. 74
PDCCH Physical Downlink Control Channel. 342
PDCP Packet Data Convergence Protocol. 201
PDSCH Physical Downlink Shared Channel. 332
PDU Protocol Data Unit. 200
pDVD Percentage Degraded Video Duration. 249
PELR Packet-Error Loss Rate. 373
PEMR Packet Enhanced Measurement Report. 59
PHICH Physical Hybrid-ARQ Indicator Channel. 327
PHR Power Headroom Report. 316
PHS Personal Handy-Phone System. 258
PHY Physical Layer. 201
PID Packet ID. 408
PIP Picture In Picture. 269
PLI Picture Loss Indication. 407
PLR Packet Loss Ratio. 405
PM Packet Marker. 261
PMI Precoding Matrix Indicator. 328
PMRM Power Measured Report Message. 152
PN Pseudo Noise. 117
PPI Pixels Per Inch. 452
PPP Prototype Pitch Period. 169
PRACK Provisional Response Acknowledgment. 375
PSD Power Spectral Density. 117
PSNR Peak Signal to Noise Ratio. 243
PSTN Public Switched Telephone Network. 2
PSVT Packet Switched Video Telephony. 404
PT Payload Type. 263

Glossary

PUCCH Physical Uplink Control Channel. 316
PUSCH Physical Uplink Shared Channel. 323
QCELP Qualcomm Code Excited Linear Prediction. 130
QCELP-13 Qualcomm Code Excited Linear Prediction 13 kbps. 151
QCI QoS Class Identifier. 372
QNELP Quarter-rate Noise Excited Linear Prediction. 169
QOF Quasi-Orthogonal Function. 190
QoS Quality of Service. 199
QPP Quadrature Permutation Polynomial. 324
QPPP Quarter-rate Prototype Pitch Period. 169
QPSK Quadrature Phase Shift Keying. 196
R-FCH Reverse Fundamental Channel. 145
R-mode Bi-directional Reliable Mode. 305
RAB Radio Access Bearer. 201
RAT Radio Access Technology. 195
RATSCCH Robust AMR Traffic Synchronized Control Channel. 79
RB Resource Block. 320
RC Repeat Count. 262
RC Radio Configuration. 144
RCELP Relaxed Code Excited Linear Prediction. 155
Rev. E Revision E. 168
RF Radio Frequency. 3
RI Rank Indication. 328
RIV Resource Indication Value. 330
RLC Radio Link Control. 201
RM Resynchronization Marker. 256
RM Rate Matching. 221
RNC Radio Network Control. 198
ROHC Robust Header Compression. 201
RoT Rise over Thermal. 191
RPE-LTP Regular Pulse Excitation-Long Term Prediction. 47
RRC Radio Resource Control. 303
RRC Root-Raised Cosine. 96
RRH Remote Radio Head. 297
RS Rate Set. 154
RSCP Received Signal Code Power. 288
RSRP Reference Signal Received Power. 423
RSRQ Reference Signal Received Quality. 423
RSSI Received Signal Strength Indicator. 34
RTCP Real-time Transport Control Protocol. 299
RTP Real-time Transport Protocol. 299
RTT Round Trip Time. 441

xxiii

xxiv

Glossary

RV Redundancy Version. 325
RVLC Reversible Variable Length Code. 257
RXLEV Received Signal Level. 55
RXQUAL Received Signal Quality. 55
S-CSCF Session Call Session Control Function. 364
S-GW Serving Gateway. 298
SACCH Slow Associated Control Channel. 34
SAD Sum of Absolute Difference. 237
SAIC Single Antenna Interference Cancellation. 86
SAO Sample Adaptive Offset. 394
SAT Supervisory Audio Tone. 14
SBC Sub-Band Codec. 416
SBR Spectral Band Replication. 390
SC Static Context. 308
SC-VBR Source Controlled Variable Bit-Rate. 379
SCC Secondary Component Carrier. 422
SCell Secondary Cell. 422
SCH Synchronization Channel. 219
SCPIR Sub-Channel Power Imbalance Ratio. 92
SCS Supported Codec-mode Set. 426
SDP Session Description Protocol. 365
SDU Service Data Unit. 200
SF Signaling Flag. 45
SFH Slow Frequency Hopping. 57
SFN System Frame Number. 342
SIB2 System Information Block 2. 348
SID Silence Descriptor Frame. 41
SIGCOMP Signaling Compression. 301
SIN System Identification Number. 12
SIP Session Initiation Protocol. 299
SIR Signal-to-Interference Ratio. 11
SLF Subscription Locator Function. 364
SMS Short Message Service. 202
SMV Selectable Mode Vocoder. 164
SNR Signal-to-Noise Ratio. 9
SO Second-Order. 307
SPC Signaling of Preconfigured Channels. 282
SPS Semi Persistent Scheduling. 341
SR Spreading Rate. 190
SR Scheduling Request. 316
SRB Signalling Radio Bearers. 201
SRP Simple Re-transmission Protocol. 259
SRS Sounding Reference Signal. 338

Glossary

SRVCC Single Radio Voice Call Continuity. 391
SSAC Service Specific Access Control. 348
SSN Segment Sequence Number. 281
SSRG Simple Shift Register Generation. 121
ST Signaling Tone. 16
STS Space Time Spreading. 162
STTD Space-Time block coding based Transmit Diversity. 223
TACS Total Access Communications System. 43
TB Tail Bits. 45
TBS Transport Block Set. 204
TBS Transport Block Size. 316
TCH/AFS Full Rate Speech Traffic Channel for AMR. 74
TCH/AHS Half Rate Speech Traffic Channel for AMR. 74
TCH/EFS Full Rate Speech Traffic Channel for EFR. 65
TCH/FS Full Rate Speech Traffic Channel. 44
TCP Transport Control Protocol. 301
TCTF Target Channel Type Field. 204
TCX Transform Codec Excitation. 384
TDMA Time Division Multiple Access. 7
TF Transport Format. 206
TFCI Transport-Format Combination Indicator. 208
TFI Transport-Format Indicator. 206
TFO Tandem Free Operation. 232
TM Traffic Mode. 136
TMMBR Temporary Maximum Media Bit-rate Request. 402
ToC Table of Contents. 361
TPC Transmit Power Control. 212
TRAU Transcoder and Rate Adaptation Unit. 3
TrFO Transcoder Free Operation. 231
TSC Training Sequence Code. 45
TT Traffic Type. 136
TTI Transmission Time Interval. 204
U-mode Uni-directional Mode. 305
UCF Until Closing Flag. 262
UDP User Datagram Protocol. 299
UE User Equipment. 198
UEP Unequal Error Protection. 37
UI User Interface. 268
UICC Universal Integrated Circuit Card. 378
UL-SCH Uplink Shared Channel. 316
UM Unacknowledged Mode. 201
UMB Ultra Mobile Broadband. 451

xxv

xxvi

Glossary

UMTS Universal Mobile Telecommunications System. 195
UP User Plane. 303
UPF User Plane Function. 453
USAC Unified Speech and Audio Coding. 386
VAD Voice Activity Detector. 49
VAMOS Voice Services over Adaptive Multi-User Channels on One Slot. 91
VBR Variable Bit-Rate. 30
VLC Variable Length Code. 235
VLSI Very Large Scale Integration. 18
VoIP Voice over Internet Protocol. 3
VoLTE Voice over Long Term Evolution. 297
VR Virtual Reality. 457
VSELP Vector Sum Excited Linear Predictive. 32
W-CDMA Wideband Code Division Multiple Access. 195
W-CDMA+ Wideband Code Division Multiple Access Plus. 272
WiMAX Worldwide Interoperability for Microwave Access. 451
WMOPS Weighted Million Operations Per Second. 183

1

Introduction

1.1

Historical Background
Mobile communications systems require a significant financial investment to obtain
radio spectrum, which consists of small, but expensive, frequency bands that are used
to extend the networks over wide geographic areas. There are additional costs to operate and maintain those networks. Roaming agreements made between service providers
can complement insufficient network coverage, but financial constraints still dictate that
existing assets such as the backbone networks be re-used whenever possible. As a result,
new mobile communications systems are rarely designed without incorporating some
elements of earlier systems.
Before discussing the signal processing procedures used by the second and later generations of digital mobile communications systems, it is appropriate to describe the goals
and define the performance criteria that were used to construct those procedures. Then
we outline the key features of their precursors, the first generation analog mobile communications systems, which introduced the cellular concept. It will become apparent
that the design of these early analog systems and the experience gained from operating
them had a profound impact on the design of later systems. A more detailed discussion of the technical and social background that drove the development of early mobile
communications systems can be found in [Lee (1995), Rappaport (2002)].

1.1.1

Problem Description
In the discussions in this text we have divided the signal processing operations in
the mobile communications system into two subsystems: the speech signal processing subsystem and the radio signal processing subsystem. The former incorporates
bandwidth-limiting, sampling, and encoding the speech waveform into as few bits as
possible while maintaining acceptable speech quality. The latter is concerned with protecting those bits, packaging them, and transmitting them through the network. In some
sense the distinction is artificial since the two subsystems interact and are typically
implemented on the same processors. On the other hand, they were typically developed by researchers with different technical backgrounds and in most cases are defined
by different standards or different parts of the same standard. Whenever we use the term
signal processing operations, this should be understood to mean both subsystems taken
together.

2

Introduction

BSC

MSC

BTS

PSTN

MS

Range for Speech and Radio Signal Processing

Fig. 1.1 Network architecture of circuit-switched mobile communications systems.

Figure 1.1 shows a generic network architecture that represents the signal processing
operations employed by the second generation digital circuit-switched mobile communications systems. In this architecture, once a call is established, the Mobile Station
(MS) transforms a short, typically 20 ms, segment of speech into an appropriate digital
format, and then transmits it to one or more Base Transceiver Stations (BTS). During the end-to-end transmission from the MS to the far-end device, which may be
another MS or a fixed telephone, the speech is represented in several digital formats
at different bit-rates, depending on the communications links over which the speech is
transported.
A set of BTSs is controlled by a Base Station Controller (BSC). The BSC sets up
and terminates calls to and from the BTSs and hands over ongoing calls among the
BTSs based on the quality of the wireless links between the MS and the BTSs or the
level of cell loading. The Mobile Switching Center (MSC) manages the operation of
the controllers and connects them with either the Public Switched Telephone Network
(PSTN) or other circuit-switched mobile communications networks. The 64 kbps Pulse
Coded Modulation (PCM) format is typically used from the BSC and upward, i.e., in
the direction of the MSC. The speech delivered to the MS undergoes the reverse signal
processing operations.
The link between the MS and the BTS is not the only wireless link in the end-toend speech transmission paths. In addition, microwave links, consisting of one or more
T1 (1.544 Mbps) or E1 (2.024 Mbps) lines modulated in the high-frequency carriers,
are often used as backhaul between the base station and the switching center, or in
locations where fixed networks are not available or economical. Since the microwave
is operated as a high-powered, line-of-sight wireless link over a dedicated, low-cost
frequency spectrum, it does not suffer from many of the limitations inherent in mobile
communications. In this chapter, we confine our interest to the dynamic interactions of
the MS, BTS, BSC, and MSC, which must be carefully coordinated to maximize both
speech quality and network capacity. Similar approaches will be taken in the following
chapters but the network architectures or the node names will change as the mobile
communications systems evolve.

1.1 Historical Background

3

The term circuit-switched refers to the nature of communications links in which the
information, such as digitally formatted speech, is transmitted with negligible variation
in speed or delay, regardless of the link quality or network load. This definition does
not necessarily imply that the level of data loss or the bit-rate is uniformly maintained
over the end-to-end transmission paths, however. A circuit-switched network consists of
a series of such communications links, each of which transports the speech or data of
one or more users at a fixed bit-rate. The end-to-end paths meeting the required transport capabilities and and channel conditions must be established before the transmission
begins.
The interface between two communications links where the bit-rate or the speech
format needs to be changed may require an additional processing delay but such a
delay is generally lower than that associated with packet-switched networks such as
the Internet or Ethernet, where the data packets can be transmitted without establishing
an end-to-end transmission path. Without an established path, data packets can be lost
or delivered in an order that is different from the order in which they were initially transmitted. The maximum allowed total delay, i.e., the mouth-to-ear delay, in commercial
voice telephony systems is required to be equal to or less than 280 ms for a satisfactory
call quality [ITU-T (2003)]. The wired telephone networks and contemporary circuitswitched mobile communications networks often complete the entire procedures in less
than 200 ms.
In circuit-switched networks, the coded received speech first encounters error correction decoding, which is followed by error concealment when uncorrected errors corrupt
the speech. The decoded speech is then converted to an analog representation for play
out. Re-transmission of missing or corrupted frames, which would increase the delay
and its variability, is generally not used. In packet-switched networks, each network
node is allowed to retransmit lost data packets reported by a neighboring node, provided
that the total delay budget is met. As interim solutions that bridged the gap between these
two fundamentally different transmission techniques, hybrid approaches that combined
the benefits of circuit-switched wired networks and packet-switched wireless networks
were proposed and standardized [Ozturk et al. (2010)]. With these approaches, speech
handling in the wired portions of the network is identical to that in conventional circuitswitched networks while re-transmission of lost speech data and scheduling of shared
channels are allowed in the wireless links between the MS and the BTS.
Figure 1.2 shows the signal processing operations employed when the speech is transmitted between two second generation digital circuit-switched networks, from GSM
to cdma2000. The digitized and compressed speech is wirelessly transmitted by the
MS and recovered from the Radio Frequency (RF) signal by the BTS. The compressed
speech is then reconstructed at the Transcoder and Rate Adaptation Unit (TRAU), which
can be located at either the BTS, BSC, or MSC. The farther the TRAU is separated from
the BTS, the farther the speech is transported at its lowest bit-rate. This saves the infrastructure cost since a 64 kbps channel can transport four speech channels encoded at
bit-rates lower than 16 kbps. Therefore it is advantageous to extend the distance between
the speech encoder and the speech decoder as far as possible, in some cases covering the
entire transmission path. Voice over IP (VoIP) is an example of such an extreme case.

4

Introduction

Fig. 1.2 Speech and radio signal processing operations from GSM to cdma2000.

The wireless link between the MS and the BTS is unique in that the bit-rate of the
speech, which can change even during a call depending on the voice activity or the network control, is the lowest in the transmission path. Furthermore, because of the harsh
nature of the wireless channel and the limited signal processing and transmit power
of the MS, the speech is more likely to be damaged or lost in this short link than in
any other. The transmission cost is also the highest in this link, because of the large
investment for radio spectrum and network infrastructure.
The roaming capability, when extended globally, greatly increases the value of the
radio spectrum that is shared by many countries. As a result, each generation of mobile
communications systems has made more efficient use of the radio spectrum than its
predecessors while simultaneously improving the speech quality. The main objectives
of the signal processing operations in circuit-switched mobile communications networks
can be simply summarized as the maximization of the number of satisfied users through
efficient design of the network architectures and the procedures for all of the entities
between the MS and the MSC. Beyond this point the existing PSTN infrastructure allows
few opportunities for innovation.
Figure 1.3 illustrates the generic signal processing operations that occur between the
MS and the BTS that are applicable to most digital circuit-switched networks. To counter
the negative effects of the wireless channel including propagation loss and multipath
fading of the transmitted signal, the MS and the BTS continuously control the bit-rate
and transmit power, and report the channel status to the BSC so that the call can be
transferred to a neighboring BTS with better link quality when the current BTS cannot
support the necessary network services. The BTS, to which the call is handed over, may
belong to the same or a different network type. During the transfer process, some of the
speech signals en route to the destination BTS or MS can be lost, generating a small but
audible loss of quality.
A number of metrics and criteria have been established to measure how well these
performance objectives are met. This and the following chapters will show that these
objectives can be achieved using a variety of approaches. These range from efficient
speech compression algorithms that result in speech quality that is high enough for
commercial services at low bit-rates to wireless communications techniques that use
less bandwidth and/or less transmit power. In a restricted medium, such as the wireless
channel, higher signal quality and higher network capacity are conflicting objectives.
Thus, control mechanisms that trade one against the other play a key role in the overall
system operation. New techniques for speech compression or wireless transmission need

1.1 Historical Background

5

Fig. 1.3 Generic speech and radio signal processing operations in mobile communications

systems.

to be incorporated carefully, however, to be compatible with the existing infrastructure.
Changes can be made to operational networks but for MSs, once manufactured and activated, it can be very difficult, if not impossible, to make substantial changes other than
software upgrades.

1.1.2

Performance Criteria
The speech and radio signal processing procedures used in mobile communications
systems are designed to meet well-established criteria for maintaining speech quality.
These fall into five types. The blocked call rate measures the capability of the network
to handle incoming service requests. A request for setting up a call might be rejected
because of insufficient radio resource or poor link quality. The blocked call rate does
not differentiate among the possible sources of call blocking. This measure can be
applied to a diversity of network types including fixed or mobile, analog or digital, and
circuit-switched or packet-switched.
A second quality criterion is the call drop rate, which evaluates the capability of
the network to maintain an established call. Conventional telephony systems such as
the PSTN maintain a negligible call drop rate, but mobile communications systems are
likely to exhibit rates as high as a few percent, regardless of the underlying radio access
technologies or speech compression algorithms.
A third group of factors that affect the speech quality includes those that measure the
reliability or link quality of the connection. These include the bit error rate, frame error
rate, or frame erasure rate, all of which measure the probability that encoded speech
frames are corrupted or lost in the channel during transmission. In the PSTN, the bit
error rate is typically as low as 10−6 whereas a 1–3% corruption rate for transmitted
speech frames is considered acceptable in mobile communications networks. When the
received speech frames contain bit errors, the error control coding may identify the
location of those errors and recover the speech. Error concealment methods, such as the

6

Introduction

methods that replace corrupted frames by interpolating or extrapolating nearby correctly
decoded speech frames, can also be used to maintain an acceptable speech quality.
A fourth group of quality criteria relates to the operation of packet-switched networks,
especially those built with the Internet Protocol (IP). This group includes such measures
as the packet loss rate and jitter loss rate to evaluate the effects of different error types.
Finally, there are measures of network acceptability that quantify the end-to-end delay
of the voice services. These are often the most stringent to meet but they have a profound
influence on the overall design of the system.
These five groups of quality criteria are defined mainly to establish a set of minimum
requirements for toll quality or carrier grade services. In many cases these are objectively measurable but they cannot completely replace the important subjective criteria,
as measured by the Mean Opinion Score (MOS) derived from subjective evaluations
with human listeners. From the point of view of service providers, all of these criteria are used to maximize the number of simultaneous calls whose quality exceeds a set
of minimum requirements, rather than to maximize the quality of each call for a fixed
number of simultaneous calls.

1.2

Analog Mobile Communications Systems
When it was developed in the 1970s and commercially launched in 1982, the Advanced
Mobile Phone System (AMPS) introduced many fundamental aspects of mobile communications systems, such as the frequency re-use to increase network capacity and
the handover of ongoing calls between cells [Young (1979), MacDonald (1979)]. Some
aspects were necessary to cope with the regulatory limitations. One of these was the
restricted bandwidth that was allocated. The AMPS was initially assigned two 25 MHz
bands located above 800 MHz for the forward (BTS to MS), and reverse (MS to BTS)
channels. The AMPS was adopted by many countries and often operated in frequency
bands slightly different from the original ones.
For the Frequency Modulation (FM) techniques that were used in AMPS, radio
spectrum below 800 MHz would have been preferred but was not then available. The
800 MHz bands were from a part of the radio spectrum that had previously been occupied by television channels. This bandwidth had been freed after the channels were
relocated to cable. When the number of people using mobile communications continued
to increase, the need to accommodate additional customers, coupled with the difficulty
of obtaining additional spectrum, resulted in technical decisions made during the redesign of AMPS that influenced many key aspects of the next generation digital mobile
communications systems.
Before proceeding to more detailed descriptions of AMPS, it is important to distinguish between a band and a channel as these terms are used in this book. We follow
the definitions of [Razavi (2011)], in which a band refers to the entire radio spectrum
in which the MSs of a mobile communications system are allowed to communicate,
while a channel refers to the smaller bandwidth assigned to one or more MSs for services. These definitions match well with the spectrum allocation practices of both the

1.2 Analog Mobile Communications Systems

7

Table 1.1 Channel numbering system.
Channel number n

Reverse channel frequency (MHz)

Forward channel frequency (MHz)

825 + 0.03n
825 + 0.03(n − 1023)

870 + 0.03n
870 + 0.03(n − 1023)

1–799
991–1023

Fig. 1.4 Spectrum allocation. (a) Reverse channels. (b) Forward channels.

first generation analog mobile communications systems and the second generation Time
Division Multiple Access (TDMA) systems. In these systems, signals from one or more
MSs are transmitted over a narrowly confined channel of 30–200 kHz. Each channel has
the center frequency of an RF carrier and an integer is typically assigned to label each
channel. Then, a set of contiguous channels constitutes a band. An MS in a mobile communications system requires at least one band for the reverse channels and another for
the forward channels, if it is operated in the Frequency Division Duplex (FDD) mode.

1.2.1

Network Architecture
With a 25 MHz band and a channel spacing of 30 kHz, AMPS provides 832 channels
that can be divided between one or more service providers in each area. A typical configuration might be that a half of the total capacity, i.e., a combination of 395 channels
for voice service and 21 channels for call control, would be assigned to each service
provider in a market where two providers compete. Figure 1.4 shows the spectrum
allocation of AMPS in the US in the 1980s when two types of service providers, a nonwireline (A) operator and a wireline (B) operator, shared the band. Channels 313–333
and 334–354 are the control channels assigned for each operator. The channel numbers
and carrier frequencies of AMPS are related as shown in Table 1.1.
The earlier architecture of a generic circuit-switched mobile communications network, shown in Fig. 1.1, is directly applicable to AMPS whose network architecture
is shown in Fig. 1.5; only the terminology is different. In AMPS, the Base Station
(BS) performs similar network operations as the BTS, and the Mobile Telephone
Switching Office (MTSO) performs management tasks similar to those of the BSC
and the MSC. It controls the call processing and manages the cellular operation.

8

Introduction

Table 1.2 Control partitioning of AMPS.
MS

BS

MTSO

Setup channel selection
Channel tuning
Message reception and
transmission
Failure sequence timing
Tone switch-hook
supervision
Pre-origination dialing

Radio control
Location data collection
Component calibration
MS control
Message relaying and
reformatting
Switch-hook and fade
supervision

Standard local switching
Radio channel management
Remote unit maintenance
BS and MS control
Message administration MS
location tracking
Handover synchronization

Fig. 1.5 Network architecture of AMPS.

MTSO can be interconnected to a Local Exchange Carrier (LEC) End Office (EO)
with Type 1 interconnection link. Table 1.2 outlines the technical responsibilities of
the MS, BS, and MTSO partitioned among these three entities [Fluhr and Porter
(1979)]. Although AMPS is often classified as an analog mobile communications system, many signal processing and control channel operations are represented in digital
formats.
Figure 1.5 also shows the interface types used between the network entities of AMPS.
We focus on the link between the MS and the BS, and the link between the BS and
the MTSO. The first link is of crucial importance since no further signal processing is
performed after the analog speech waveform from the MS is digitized and encoded into
a 64 kbps PCM format at the BS. This basic format for speech is maintained throughout
the transmission paths until the signal reaches either another BS or a fixed telephone.

1.2 Analog Mobile Communications Systems

9

Conversion between two PCM formats, μ-law and A-law, may occur at intermediate
locations but this would have little impact on the speech quality or total delay since
the two formats are similarly defined and the conversion requires a negligible amount
of computation. A T1 carrier, microwave, or Type 1 interconnection link carries large
numbers of 64 kbps PCM channels. The second link, between the BS and the MTSO, is
also important since the MTSO is responsible for controlling speech quality and network
capacity by indirectly controlling the MS through the BS. The measures available to
the MTSO include the handover to another BS or channel in the same cell, and power
control.

1.2.2

Speech and Radio Signal Processing Operations
Figure 1.6 shows the speech and radio signal processing operations in AMPS for the
transmit and receive sides [Arredondo et al. (1979)]. In the first step, the sound pressure
level of speech is converted to voltage variations by the microphone, and then band-pass
filtering limits the bandwidth of the signal to 300–3000 Hz. Because the waveform will
be frequency modulated, the signal amplitude is also limited to control the amount of
energy that would be leaked to adjacent channels. This is done by companding, i.e., nonlinearly compressing the amplitude at the transmitter and expanding it at the receiver.
AMPS uses a 2:1 compander, through which a 2 dB change in the input voltage level is
compressed to a 1 dB change. The compander also has the effect of improving the subjective speech quality in poor channel conditions. Figure 1.7 illustrates the companding
and modulation procedures used.
The energy of speech signal, after filtering and compression, is concentrated in the
low frequency bands and the Signal-to-Noise Ratio (SNR) in the high frequency bands
is reduced. It is further degraded in the FM and carrier modulation process. Figures
1.7(a) and 1.7(b) show the input-output characteristics of the compander and the deviation limiter, respectively. With a channel width of 30 kHz, the frequency deviation
of speech signal is confined to approximately 24 kHz, around a center frequency fc ,

Fig. 1.6 Speech and radio signal processing operations of AMPS. (a) Transmitter side.

(b) Receiver side.

10

Introduction

Fig. 1.7 (a) Compander input-output characteristics. (b) Frequency deviation limiting. (c) ±8 kHz

binary FSK.

Fig. 1.8 Frequency response. (a) Pre-emphasis. (b) De-emphasis.

to reduce the interference to and from the adjacent channels. Pre-emphasis boosts the
high-frequency components of speech signal at the transmitter while de-emphasis compensates for this at the receiver. Figures 1.8(a) and 1.8(b) show the frequency response
of the pre-emphasis and de-emphasis filters, respectively, where the angular frequency
ω = 2π f . These analog signal processing operations are common to both the forward
and reverse channels of AMPS.
There are no measures that prevent unauthorized eavesdropping of ongoing calls in
AMPS, since providing reliable security with analog signal processing is very difficult.
Anyone with the intention and capability of scanning channels can listen to or record the
conversations. This fundamental limitation of AMPS was overcome in the next generation digital systems in which ciphering of digitally compressed speech became a basic
feature.

1.2 Analog Mobile Communications Systems

11

It is not easy to track a speaker when speech signals to and from multiple MSs
are transmitted over the same channel. Therefore it is important to allocate the
channels carefully over the cells to insure that this collision does not happen. For
frequency-modulated speech processed as in Fig. 1.6 to be of acceptable quality, a
Signal-to-Interference Ratio (SIR) of at least 18 dB is required over 90% of the network coverage. The 7-cell (K = 7) frequency re-use pattern shown in Fig. 1.9 has been
found to be the smallest re-use factor that meets the requirements for channel efficiency
with 120-degree directional antennas. Two MSs that use the same channel should be
sufficiently separated to avoid the mutual interference. With the 7-cell re-use, two layers
of cells provide enough propagation loss to insulate two cells that share the same set of
channels. In Fig. 1.9, those cells marked with the same character can share the same set
of channels. Note that in practice the cells are rarely regularly hexagonal in shape and
may differ widely in size. The channels are assigned to each cell based on the amount
of expected voice traffic.
Referring to Fig. 1.4, if there are 395 voice channels and 21 control channels assigned
to each service provider, with the 7-cell re-use pattern, approximately 56 voice channels
can be assigned to each cell. One-third of the channels assigned to each cell, i.e., 17
or 18, can be allocated to each sector of a three-sector antenna. Note that using directional antennas reduces the interference but cannot increase the number of channels.
The 42 control channels, channels 313–354, are located in the middle of the 25 MHz
band to facilitate the operation of a channel-scanning frequency synthesizer, especially
for those in the MS whose tuning capability is limited. Unlike the voice channels,
the control channels apply a form of digital modulation, binary Frequency Shift Keying (FSK), to modulate the Manchester-coded data. Figures 1.10(a) and 1.10(b) show
the formats of the forward and reverse control channels, which transmit control data
at 10 kbps.

Fig. 1.9 7-cell re-use pattern.

12

Introduction

Fig. 1.10 Control channels. (a) Forward control channel. (b) Reverse control channel. (c) Voice

control channel.

Fig. 1.11 Control signal processing procedures.

The binary control data is first converted to a Non-Return-to-Zero (NRZ) format, and
further encoded to the Manchester (bi-phase) code, as shown in Figures 1.11 and 1.12.
The benefit of Manchester coding is that it concentrates signal energy within a 10 kHz
band, enabling an easy detection of signal at the receiver. The Manchester-coded data
is integrated and low-pass filtered. Finally each symbol is represented with one of two
possible frequency deviations and phase modulated with an RF carrier, as shown in Fig.
1.7(c).
Figure 1.10 shows that the formats of the forward and reverse control channels are not
identical. Each control channel is separated into the A and B messages so that MSs with
even phone numbers read and write the A messages, and MSs with odd numbers use
the B messages. Throughout the forward control channels, key information to be used
by the MS is periodically broadcast by the BS. This includes the System Identification
Number (SIN) of the network and the power level for initial transmission. A burst-idle
bit, as shown in Fig. 1.10 with an ↑, is inserted after each ten message symbols, after a
Bit Sync, and after a Word Sync, to turn the receiver off during an idle period.
In the forward control channels, ten messages follow the Bit Sync and Word Sync, and
the A and B messages alternate. After a (40,28) Bose–Chaudhuri–Hocquenghem (BCH)
coding is applied, each message is repeated five times. Thus, in Fig. 1.10(a), A1 = A2 =
· · · = A5 , and B1 = B2 = · · · = B5 . The receiver applies the majority logic to recover
the correct message. With a minimum distance of 5, the BCH code used in the forward
control channel can correct one bit error or detect up to two bit errors in each message.
The 10-bit Bit Sync, 1010101010, and the 11-bit Word Sync, 11100010010, are unique

1.2 Analog Mobile Communications Systems

13

Fig. 1.12 Waveforms of control signal processing. (a) Binary data. (b) NRZ-coded. (c) 10 kHz

clock. (d) Manchester-coded. (e) Integrator output.

bit patterns used by the receiver to facilitate detecting the message boundaries. The
messages are not allowed to have these bit patterns. With a clock frequency of 10 kHz,
the duration of a message and four burst-idle bits is 4.4 ms. Therefore, although the bitrate at the phase (binary FSK) modulator is 10 kbps, with two MSs sharing a forward
28×10
= 0.6 kbps.
control channel, the actual bit-rate for an MS is (10+11+40×5×2+42)
The use of the forward control channels is managed by the network in a centralized
fashion, but the use of the reverse control channels is left to the discretion of the MS.
To avoid the collisions of messages sent from multiple MSs, the Carrier Sense Multiple
Access (CSMA) is employed. An MS tries to detect the presence of transmissions by
other MSs before attempting to transmit by checking the burst-idle bits. If one or more
carriers are sensed, the MS waits until the end of any ongoing transmissions and then
initiates its own transmission. Each message, consisting of one–five 36-bit words, is
(48,36) BCH-coded, and repeated five times. This BCH code can correct up to one bit
error. Bit Sync is made up of 30 bits of alternating ones and zeros, i.e., 1010, . . . ,1010,
and Word Sync is 11100010010. Digital Color Code, as shown in Fig. 1.10(b) with
∗, can be one of four 7-bit sequences that identify the target BS. Following a similar
development as in the forward control channel, assuming N ≤ 5 distinct messages,
36×N×10
7.5N
= 1+5N
kbps, which
the actual bit-rate of the reverse control channel is 48+48×5×N
corresponds to 1.25 kbps for N = 1 and 1.44 kbps for N = 5, for an MS.
The 21 pre-defined control channels may not be sufficient to manage the operation of
AMPS when the network is overloaded, but defining additional control channels from
the spectrum would reduce the gain in network capacity gained from frequency re-use.
Since voice activity is typically absent for more than half of a call duration, the gaps
in the signals in the voice channels can also be used to transmit control information.

14

Introduction

Table 1.3 Voice control channel parameters.

L1
L2
K

Forward control channel

Reverse control channel

100
40
11

101
48
5

Figure 1.10(c) shows the format of a voice control channel whose key parameters are
outlined in Table 1.3. Naturally the bit-rate of a voice control channel is lower than that
of the dedicated control channels.
28×10
=
From channel parameters, the actual bit-rate for an MS is 100+11+40+(48+40)×10
0.27 kbps in the forward voice control channel. In the reverse voice control channel,
36×10
= 0.66 kbps. When there is an urgent need
the actual bit-rate is 101+11+48+(48+48)×4
for control signaling, the voice can be interrupted for a period short enough not to be
perceived, and the control information can be transmitted at 10 kbps during this interval.
The majority logic is used as in the control channel to assist the reception of messages.
The messages are repeated 11 times in the forward voice control channels while in the
reverse voice control channels they are repeated five times.
The main use of voice channels for control purposes is for signaling the handover
messages, which are usually transmitted at low SNR by MSs located at the cell edges.
This strategy, called blank and burst, of temporarily transmitting control information
instead of voice is also employed in many other digital mobile communications systems.
Table 1.4 shows the control information exchanged between the MS and the BS. In the
call setup using dedicated control channels, more bits are exchanged than during the call
using the voice control channels. Note that the 64-bit dialed digits do not have tight delay
requirements as they are typically input manually. Note also that the transmit power is
directly controlled only in the forward channels but mutual, and faster, control of the
power would reduce the interference and increase network capacity.

1.2.3

Cellular Operation
In AMPS, speech quality of each MS needs to be constantly monitored during conversations so that an acceptable quality, the toll quality, is maintained. The forward and
reverse control channels, because of their limited capacity and the delay required to
obtain an access opportunity, are not appropriate for this type of persistent measurement
and immediate signaling. With the voice control channel, it is possible to continuously monitor the speech quality but the amount of information required for informing
such measurement is still excessive for its negligible bit-rates. As a compromise that
meets the two conflicting requirements, a type of analog signaling that spectrally
overlaps with the voice signal can be used, which does not noticeably interrupt the
conversation.
To indicate that a channel of each BS is currently alive, Supervisory Audio Tone
(SAT), a single-frequency signal at either 5970, 6000, or 6030 Hz above the center
frequency, is continuously transmitted with the speech signal. Only one of the SATs is

1.2 Analog Mobile Communications Systems

15

Table 1.4 Control information on dedicated control and voice control channels.
Channel

Control information

Bits

Forward control

MS page
Channel designation
MS transmit power
Overhead (local parameters)
System control

24, 34
11
2
22–30
4

Reverse control

Identification
Dialed digits
System control

56, 66
64
4

Forward voice control

Orders
Channel designation
MS transmit power
System control

5
11
2
4

Reverse voice control

Order confirmation
Dialed digits
System control

5
64
4

Fig. 1.13 Spatial allocation of SATs.

used by each BS. If an SAT is not detected for more than a pre-defined time, the call is
disconnected by either the MS or the BS. When the MS does not detect the SAT from
the serving BS but returns an SAT at other frequency, which may happen, for example,
when the SAT from another BS using the same channel is stronger, the call is also
disconnected. Therefore, like voice channels, the SAT frequencies have to be carefully
allocated over the cells. Figure 1.13 shows the allocation of SATs over neighboring cells,

16

Introduction

Table 1.5 Supervision decisions for SAT and ST.
ST on
SAT received
SAT not received

ST off

MS on-hook
MS off-hook
MS in fade or transmitter turned-off

Table 1.6 System parameters of analog mobile communications systems.

Region
Reverse channel frequency (MHz)
Forward channel frequency (MHz)
Reverse/forward channel spacing (MHz)
Channel bandwidth (kHz)
Number of channels
Modulation (voice)
Modulation (control)

NTT

TACS

NMT

Japan
870–885
925–940
55
25/12.5
600
FM
FSK

UK
917–950
872–905
45
25
1320
FM
FSK

Scandinavia
463–467.5
453–457.5
10
25
180
FM
FSK

in which D11 should be sufficiently larger than D12 to re-use the SAT. Since the SAT is
located above the spectrum of the speech, the SAT can be combined with the speech
after band-pass filtering at the transmitter and removed from the speech by band-pass
filtering at the receiver, as shown in Fig. 1.6.
In addition to detecting the presence of an SAT, signal amplitude is used to monitor
the health of a channel. If the power level of SAT measured by the BS falls below a
threshold, the MTSO first signals the MS to increase its transmit power, which can have
one of nine pre-defined levels. If this is not effective, or not possible because the MS
is already transmitting at its maximum power, the MTSO asks the neighboring BSs
to measure the signal strength of the MS. If stronger measurements are reported, the
MTSO initiates the handover to a new BS with a stronger signal. A 10 kHz Signaling
Tone (ST) is sometimes transmitted with the speech signal for control purposes. Table
1.5 outlines the decisions to be made by the BS based on the combinations of the SAT
and the ST.
Until now we have summarized the signal processing and quality control procedures
of AMPS, a first generation analog mobile communications system. Table 1.6 outlines
the key parameters of some mobile communications systems that were contemporary
with AMPS, among which the NTT system was the first to be commercially deployed.
These use similar analog signal processing operations. Many essential features that later
digital mobile systems inherited were introduced in AMPS, including the handover,
power control, and in-band signaling. Although AMPS became a dominant mobile
communications system in the first generation, technical opportunities for improving
speech quality or network capacity can be easily identified. For example, when there
is no voice activity, the transmitter can be completely turned off to reduce power consumption and the interference to other cells. Because of the fundamental limitations
of analog signal processing, several of the basic procedures shown in Fig. 1.6 have to

1.3 References

17

be maintained, regardless of the cell loading level or link quality. This separation of
the speech signal processing from the radio signal processing gradually disappeared in
the next generation of digital mobile communications systems. Their interaction was
exploited for higher speech quality and network capacity.

1.3

References
Arredondo, G. A., Feggeler, J. C., and Smith, J. I. 1979. Advanced Mobile Phone Service: Voice
and Data Transmission. The Bell System Technical Journal, 58(1).
Fluhr, Z. C., and Porter, P. T. 1979. Advanced Mobile Phone Service: Control Architecture. The
Bell System Technical Journal, 58(1).
ITU-T. 2003. G.114 International Telephone Connections and Circuits – General Recommendations on the Transmission Quality for an Entire International Telephone Connection; One-Way
Transmission Time. May.
Lee, W. C. Y. 1995. Mobile Cellular Telecommunications: Analog and Digital Systems. 2nd edn.
McGraw-Hill Professional.
MacDonald, V. H. 1979. Advanced Mobile Phone Service: The Cellular Concept. The Bell System
Technical Journal, 58(1).
Ozturk, O., Kapoor, R., Chande, V., Hou, J., and Mohanty, B. 2010. Circuit-Switched Voice
Services over HSPA. IEEE Vehicular Technology Conference, May.
Rappaport, T. S. 2002. Wireless Communications: Principles and Practice. 2nd edn. Prentice Hall.
Razavi, B. 2011. RF Microelectronics. 2nd edn. Prentice Hall.
Young, W. R. 1979. Advanced Mobile Phone Service: Introduction, Background, and Objectives.
The Bell System Technical Journal, 58(1).

2

Signal Processing in TDMA Systems

We begin this chapter with a short discussion describing the digital compression of
speech signals in mobile communications systems. In the fixed networks used by analog
mobile communications systems, the speech is already represented in simple, but inefficient, digital formats such as 64 kbps PCM but analog waveforms are still employed to
represent the speech over the wireless channel itself. We outline the basic idea behind
Linear Predictive Coding (LPC), which increases the compression efficiency to the levels required for the limited frequency spectrum of mobile communications, by focusing
on the types of input signals generated by the human vocal tract. Then we introduce
approaches used to improve the analog mobile systems when their network capacity
was challenged by a rapidly increasing number of users. After analyzing analog and
digital extensions of the AMPS, we discuss the speech and radio signal processing operations used by a new Time Division Multiple Access (TDMA) mobile communications
system, the Global System for Mobile Communications (GSM).

2.1

Speech Signal Processing
In AMPS, once an FM channel has been assigned, it is monopolized by a single MS
for the duration of the call. To transport the speech signals of multiple MSs on a single
channel, the signal of each MS has to be digitally processed and separated from those of
the other MSs so that the BS can separate them if no error occurs during transmission.
In the system, data transmission rate of the control channel is only 10 kbps. However,
advancements in the RF and Very Large Scale Integration (VLSI) technologies since
the AMPS was developed now enable more spectrally efficient digital modulation and
robust error correction coding schemes that can boost the data transmission rate by a
factor of four to five.
When the speech signal is represented using a conventional digital format such as
64 kbps PCM, the bit-rate is too large to be transmitted over a 30 kHz channel even
with digital modulation and coding techniques. Furthermore, the bit-rate is likely to be
increased further when the overhead to protect the digitally processed speech bit-stream
and maintain the connection of each MS with the BS is added. To enable transmission
over a 30 kHz channel, it is necessary to compress the speech so that its bit-rate is
reduced significantly, to around 10 kbps. This must be done in such a way that the
quality is not noticeably compromised by the compression.

2.1 Speech Signal Processing

19

Classical waveform-based coding techniques [Jayant and Noll (1984)], such as the
PCM or the Adaptive Differential Pulse Coded Modulation (ADPCM), attempt to model
the waveform shape of the speech signal as faithfully as possible. These techniques have
low computational complexity but they cannot reach the required low bit-rates without
significantly sacrificing the accuracy of the representation. As an alternative, modelbased coding techniques, which simulate the speech generation process of the human
vocal tract, can be used to compress and reconstruct speech signals that sound similar
to the original but require lower bit-rates. The model-based speech compression techniques target perceptional similarity but do not attempt to match the original waveform
shape.
The frequency at which the raw speech is sampled prior to compression also has a
profound effect on the achievable quality of the reconstructed speech. Figure 2.1 shows
a series of speech waveforms sampled at 8000 Hz (upper) and their associated spectrograms (lower). The spectrogram is a visual representation of the instantaneous frequency
spectrum as the speech signal evolves with time. The waveforms, which consist of the
speech signals in several languages, speech with music or noise, music, and the sound
of a bugle, are low-pass filtered before and after the sampling to contain only those frequency components between 100 and 3500 Hz. Inspection of the spectrograms reveals
that there are periodic structures in the frequency domain that may be exploited to design
efficient speech compression algorithms. When the speech is represented at 8000 samples/s, as in PSTN, it is referred to as the Narrowband (NB) speech, to distinguish it
from representations that use higher sampling rates. As the spectrum of the human voice
is often extended to higher frequencies, representation of speech at higher sampling
rates than 8000 Hz can improve the quality and resilience against acoustic noise while
incurring additional bit-rate and complexity.

Fig. 2.1 Spectrogram of narrowband speech waveforms.

20

Signal Processing in TDMA Systems

Fig. 2.2 Speech production system (courtesy of Alan Hoofring and the National Cancer Institute).

(a) Larynx and nearby structures. (b) Larynx (top view).

In the model-based coding, the physical, biological human speech production
anatomy is simulated to represent the speech at minimal bit-rates. Figure 2.2 depicts the
anatomy of the speech production system in which the force driving production is provided by a flow of air from the glottis. The vocal tract is the acoustical path from the larynx to the mouth, nasal cavity, or both. The flow of air is modulated by the time-varying
vocal tract shape. Detailed analysis of the interactions between the structures involved
in speech production can be found in [Quatieri (2001), Rabiner and Schafer (2010)].
Figure 2.3(a) shows a 3.072 second speech waveform, which sounds the phrase Add
the sum to the product of these three. The waveform consists of 24576 samples digitized
at 8000 samples per second. The proportion of silence is comparable to that of actual
speech, which suggests another opportunity to reduce the bit-rate, e.g., by using lower
bit-rates or transmitting no signals at all in such periods. The waveform shown in Fig.
2.3(b) corresponds to the beginning of the speech segment, called its onset, which consists of the first 1400 samples of the segment. The first 500 samples contain no voice
activity but the following 800 samples show a weak noisy signal. Figure 2.3(c) expands
the portion of the waveform represented by samples 4000–4300.
This part of the speech waveform includes some periodic components, which are
overlapped by non-periodic components. The periodic patterns originate from voiced
sound where the vocal tract is excited by quasi-periodic pulses created by adjusting the
tension in the vocal cords. The non-periodic waveforms are typically generated from
background noise or unvoiced sound, created when the vocal tract is excited by turbulence which has a wide, flat spectrum. The vocal tract excitations corresponding to these
two types of speech waveforms can be modelled reasonably well by a periodic impulse
train and white noise, respectively.

2.1 Speech Signal Processing

x 10

21

4

2
0
–2
0.5

1

1.5

2

(a)

x 10

4

100
0
–100
200

400

600

800

1000

1200

1400

(b)
x 10

4

2
0
–2
4000

4050

4100

4150
(c)

4200

4250

4300

Fig. 2.3 (a) Speech waveforms. (b) Waveform for silence and unvoiced speech. (c) Waveform for

voiced speech.

Fig. 2.4 Frequency domain representation of voiced speech waveform.

To picture the periodic structure of a voiced speech waveform in the frequency
domain, samples 4000–4300 were transformed using a Discrete Fourier Transform
(DFT), whose magnitude is shown in Fig. 2.4. The spectrum of this portion of the signal contains peaks that are equispaced along the frequency axis, reflecting the periodic
structure of the underlying waveform. The duration of each cycle of voiced speech is
called the pitch period length, τ , and the fundamental frequency, or pitch, is the reciprocal of the pitch period length, F0 = τ1 . F0 is a measure for how high or low the
voice sounds. From Fig. 2.4, it can be estimated that the fundamental frequency F0 is
230 Hz. It should be noted that some of the spectral peaks have larger amplitudes than
others. The local maxima in the envelope that connect the peaks are called formants.

22

Signal Processing in TDMA Systems

They reflect the resonant frequencies of the vocal tract and are essential components for
the intelligibility of speech. For this waveform the formants are located at F1 = 870
and F2 = 2440 Hz, where the peaks occur. These estimated parameters and acoustic
characteristics of speech can be exploited in the reduction of bit-rate to represent the
signal. For generic audio signals, however, such approaches specifically assuming the
origin of the input signal would result in limited quality if other types of source signals
are entered.

2.1.1

Linear Predictive Coding
Linear Predictive Coding (LPC) exploits the structure of the speech production system and the spectro-temporal characteristics of the speech waveform to a greater extent
than the waveform-based coding techniques. By analyzing the acoustic nature of a short
speech segment, during which the statistical characteristics of sound are assumed to
remain stationary, LPC computes another segment of the same length in the time domain
that can be represented at a lower bit-rate but sounds similar to the original. The bit-rate
reduction is achieved by modeling the speech segment as the output of a linear timevarying digital filter whose coefficients and input signals are determined by the segment.
The filter coefficients, and the gain and type of input signal constitute the bit-stream
generated for each speech frame.
Figure 2.5∗ shows the modeling of human vocal tract as a series of N concatenated
tubes. In Fig. 2.5(a), the glottis is assumed to be located to the left of tube 1, and the
lips are connected to tube N. A set of wave equations relating the air pressure and the
volume velocity are derived at the boundaries of the tubes, following the development of
[Quatieri (2001), Rabiner and Schafer (2010)]. In tube k, the volume velocity at location
x and time t, denoted as uk (x, t), is defined as the rate at which the air particles flow perpendicularly through an area Ak . In tube k, Ak is constant. uk (x, t) is constructed from the
forward- and backward-traveling wave components. pk (x, t) represents the incremental
pressure with respect to atmospheric pressure in tube k, and c represents the speed of

Fig. 2.5 (a) Concatenated tube model of vocal tract. (b) Forward and backward-traveling sound

waves.
∗ Rabiner, Lawrence R.; Schafer, Ronald W., Digital Processing of Speech Signals, 1st ed., 
c 1979.
Reprinted by permission of Pearson Education, Inc., New York, New York.

2.1 Speech Signal Processing

23

sound. It is assumed that the energy is lost only at the end of the tubes, the lips, when
the sound waves propagate into free space.
The volume velocity and pressure for tube k are given by


x
x
−
−
u
,
(2.1)
t
−
t
+
uk (x, t) = u+
k
k
c
c





x
x
ρc +
+ u−
(2.2)
pk (x, t) =
u t−
k t+ c ,
Ak k
c
where 0 ≤ x ≤ lk , and lk is the length of tube k. ρ is the density of air particles, which
−
is assumed to be constant in the atmosphere. u+
k (t) and uk (t) represent the velocity of
forward- and backward-traveling waves, respectively. To solve the wave equations, the
boundary conditions at the edges of the tubes can be exploited. For the volume velocity
and the incremental pressure to be continuous in both time and space between tubes k
and k + 1, the following conditions must be met,
uk (lk , t) = uk+1 (0, t),

(2.3)

pk (lk , t) = pk+1 (0, t),

(2.4)

for k = 1, 2, . . . , N − 1.
Let τk = lck be the time for the sound wave to propagate through tube k. Then,
after some manipulation of the equations and the boundary conditions, the following
relationships can be obtained for the volume velocity,
2Ak+1
Ak+1 − Ak −
u+
(t − τk ) +
u (t),
k
Ak+1 + Ak
Ak+1 + Ak k+1
Ak+1 − Ak +
2Ak
u−
uk (t − τk ) +
u−
k (t + τk ) = − A
k+1 (t),
+
A
A
+
A
k+1
k
k+1
k
u+
k+1 (t) =

(2.5)
(2.6)

which implies that in each tube, a part of the traveling wave in each direction propagates
to the next tube while a part is reflected back to the current tube. Likewise, a part of
the reflected wave continues propagating to the previous tube while another part is rereflected. The reflection coefficient at the kth junction can be derived as
rk =

Ak+1 − Ak
,
Ak+1 + Ak

(2.7)

which is the fraction of the wave at the junction between tubes k and k+1 that propagates
backward into tube k + 1.
In other words, rk is the amount of u−
k+1 (t) reflected at the junction. Since Ak > 0,
−1 ≤ rk ≤ 1. The magnitude of the reflection coefficient is identical between the
two tubes, regardless of the direction in which the wave approaches the boundary. The
signal flow graph in Fig. 2.6∗ can model the wave propagation inside the vocal tract.
The forward- and backward-traveling sound waves are abstracted with the signals, gains
on the branches, and the delay elements.
From the structure of signal flow graph, it is seen that the length of the impulse
response is infinite but, in practice, the waves decay rapidly as the impact of the
reflection coefficients accumulates exponentially. The interconnection of the identical

24

Signal Processing in TDMA Systems

Fig. 2.6 Signal flow graph for lossless tube model of vocal tract.

Fig. 2.7 Modeling boundary conditions. (a) Glottis. (b) Lips.

modular structures with different delays and reflection coefficients can be used to represent the wave propagation in tubes 2 through N − 1. However, different structures may
be required in tubes 1 and N, to represent the unique roles of the glottis and the lips in
the generation of speech. To derive the modular structures of the first and last tubes, we
assume that no part of the backward-traveling wave in tube 1 proceeds beyond the glottis. Likewise, it is assumed that no part of the backward-traveling wave enters tube N
from free space. To describe the relationships of the volume velocity and the incremental pressure at the glottis and the lips, the boundary conditions are modeled as electrical
circuits, as shown in Fig. 2.7∗ .
The input signal at the glottis, x = 0 in tube 1, can be visualized as an electrical
circuit consisting of a source, uG (t), and an impedance, ZG , as shown in Fig. 2.7(a).
The relationship between pressure and velocity is conceptually similar to that between
voltage and current. Using the wave equations and the acoustic assumptions, rG in
Fig. 2.6 is represented by ZG , A1 , and the two constants. It is assumed that the backwardtraveling wave in tube 1 does not proceed farther. Under this assumption the following
relationships can be derived from the boundary conditions,
p1 (0, t)
,
ZG
−
ρc [u+
−
1 (0, t) + u1 (0, t)]
.
u+
1 (0, t) − u1 (0, t) = uG (t) − A
ZG
1
u1 (0, t) = uG (t) −

(2.8)
(2.9)

If we ignore the spatial parameter x, rG is derived as
(1 + rG )
uG (t) + rG u−
1 (t),
2
ρc
ZG − A1
.
rG =
ZG + Aρc1

u+
1 (t) =

(2.10)
(2.11)

2.1 Speech Signal Processing

25

Likewise, the output signal at the lips, x = lN of tube N, can be visualized as an electrical circuit loaded with an impedance, ZL . Using the wave equations and the acoustic
assumptions, the relationship between ZL and rL can be similarly derived. Assuming
that no traveling wave enters tube N from free space, with the boundary conditions, the
following relationships must be met,
pN (lN , t) = ZL uN (lN , t),
ρc +
+
−
[u (t − τN ) + u−
N (t + τN )] = ZL [uN (t − τN ) − uN (t + τN )].
AN N

(2.12)
(2.13)

Then the backward-traveling wave in tube N can be combined with the corresponding
forward-traveling wave such that
+
u−
N (t + τN ) = −rL uN (t − τN ),

rL =

ρc
AN
ρc
AN

− ZL
+ ZL

.

(2.14)
(2.15)

Let the lengths of N tubes be identical, l = l1 = l2 = · · · = lN , and let τ = cl be
the delay for propagating through a tube. Then the continuous signal flow graph of Fig.
2.6 can be converted into an equivalent discrete-time model as shown in Fig. 2.8∗ . Its
impulse response is
h[n] =

∞


bk δ[n − N − 2k] = b0 δ[n − N] +

k=0

∞


bk δ[n − N − 2k],

(2.16)

k=1

which implies that after the input signal is applied at time t = 0, the earliest arrival of
sound waves to the lips will be at t = Nτ , and that further arrivals will occur at multiples
of 2τ after the first one.
The half-sample delays in the discrete-time model, which cannot be implemented
exactly, can be removed in the lower backward branches, and replaced with one-sample
delays in the upper forward branches. The gains in the branches remain the same, and a
delay of − N2 in the last branch toward the lips can compensate for the increased delay
in the upper branches to make the two models equivalent.
Therefore, instead of a speech segment, N reflection coefficients and another value
related to the amplitude of the segment can be transmitted, if they can be computed
from the speech segment, and the bit-rate for these N + 1 values is lower. Alternatively,
it can be shown that the transfer function that relates the volume velocity at the glottis
and the lips is of the form,

Fig. 2.8 Discrete-time lossless tube model of vocal tract.

26

Signal Processing in TDMA Systems

N

Az− 2
H(z) =
.
N

1−
ak z−k

(2.17)

k=1
− N2

Note that z
in the numerator of H(z) corresponds to a shift of N2 samples in the time
domain. Removing this factor does not influence the accuracy of modeling significantly.
The N poles define the frequencies of the formants. More complex effects in the vocal
tract can be reflected in the transfer function by adding additional poles and zeros. In the
all-pole modeling of LPC, the process of human speech generation is simulated using a
time-varying digital filter with a steady-state transfer function of the form,
H(z) =

UL (z)
=
UG (z)

A
1−

N


,
ak

(2.18)

z−k

k=1

where A is the gain and the ak are the filter coefficients, which vary slowly with time.
UG (z) and UL (z) are the z-transforms of the discrete-time volume velocity at the glottis
and the lips, uG [n] and uL [n], respectively. The bit-rate reduction is achieved if fewer
bits are required to represent A, a1 , a2 , . . . , aN than are required for the original speech
segment. These N + 1 parameters, N reflection coefficients and a gain, constitute the
essential information for an encoded speech frame representing the input signal for a
short period.
This all-pole model is a reasonable representation for non-nasal voiced speech.
Although more detailed acoustic models for nasal and fricative sounds require more
complex transfer functions, the all-pole model is adequate for most types of speech provided that the order, N, is sufficiently high. For the all-pole model to be valid, duration
of input signal needs to be limited so that each speech sample can be linearly predicted,
i.e., estimated from a linear combination of the previous samples. Therefore in LPC, the
typical duration of a speech segment represented by the all-pole model is 10–30 ms, and
the gain in accuracy from using a larger N diminishes when the order reaches about 10.
Let s[n] be the output speech signal generated by applying uG [n] to a vocal tract
modeled by h[n]. Then their z-transforms are related by
S(z) = H(z)UG (z),
S(z) = AUG (z) +

N


(2.19)
ak S(z)z−k .

(2.20)

k=1

When uG [n] is zero, e.g., after n = 0 when the input to the glottis is an impulse, the
output speech signal can be estimated from the previous samples and the error can be
expressed as
s[n] = AuG [n] +

N


ak s[n − k],

(2.21)

k=1

≈

N

k=1

ak s[n − k] = s̃[n],

(2.22)

2.1 Speech Signal Processing

e[n] = s[n] − s̃[n] = s[n] −

N


ak s[n − k].

27

(2.23)

k=1

These assumptions are also valid for the pitch period when the input signal is an impulse
train. If ZG and ZL are real, all coefficients of the impulse response or the transfer
function are also real. Then ak can be found by minimizing the mean-square error,
∂e2 [n]
∂
=
(s[n] − s̃[n])2 = 0,
∂ak
∂ak

(2.24)

for k = 1, 2, . . . , N. However, inverting the N ×N matrix directly, for each speech frame,
may incur excessive computation for the processors in the MS.
∞

Define the autocorrelation R(i) as
s(n)s(n − i). Then the N equations can be ren=0

arranged into the form,
N


ak R(|i − k|) = R(i),

(2.25)

k=1

for i = 1, 2, . . . , N, which in matrix form becomes
⎡
⎤⎡
R(0)
R(1)
· · · R(N − 1)
⎢
⎢ R(1)
R(0)
· · · R(N − 2) ⎥
⎢
⎥⎢
⎢
⎥⎢
..
..
..
..
⎣
⎦⎣
.
.
.
.
R(N − 1)

R(N − 2)

···

R(0)

a1
a2
..
.

⎤

⎡

⎥ ⎢
⎥ ⎢
⎥=⎢
⎦ ⎣

aN

R(1)
R(2)
..
.

⎤
⎥
⎥
⎥.
⎦

(2.26)

R(N)

The ak can be computed by inverting the matrix. The value of gain A can be computed
from the following relationship [Hayes (1996)],
A = R(0) −
2

N


ak R(k) = εN ,

(2.27)

k=1

where εN is the minimum prediction error. However, direct inversion of a matrix of this
size is again a computationally intensive operation that should be avoided if there are
alternatives. The complexity can be reduced significantly if the Toeplitz structure of the
matrix is exploited, since the elements along each of the diagonals are equal. With this
structure, it is possible to derive the values of ak using recursion-based methods.
In the MS of a digital mobile communications system, GSM, which uses the Full-Rate
(FR) speech codec, the reflection coefficients are computed and converted as follows.
First, to remove the DC offset of the input signal sampled at 8000 samples/s, the speech
samples, so (n), are notch-filtered by the operation
sof (n) = so (n) − so (n − 1) + α ∗ sof (n − 1),

(2.28)

where α is 32735 ∗ 2−15 . The filtered signal is then pre-emphasized using
s(n) = sof (n) − β ∗ sof (n − 1),

(2.29)

28

Signal Processing in TDMA Systems

with β set to 28180 ∗ 2−15 . For a 20 ms speech segment, the nine autocorrelation values
when N = 8 are computed using
R(k) =

159


s(i)s(i − k),

(2.30)

i=k

for k = 0, 1, . . . , 8.
From these values, the reflection coefficients can be computed using several recursions, which differ in the numbers of required multiplications and additions, and in
the memory required to store the program and data. A straightforward approach for
matrix inversion, such as the Gaussian elimination, requires a number of multiplications and divisions proportional to N 3 . A recursion-based inversion technique, the
Levinson–Durbin recursion, reduces the number to N 2 . In addition, the required memory is reduced from N 2 to 2(N + 1). The Schur recursion is slightly more efficient than
the Levinson–Durbin recursion as well as more friendly to implementations that use parallel processing. Figure 2.9 illustrates the flow of Schur recursion used in the FR speech
encoder of GSM.
The Schur recursion computes the reflection coefficient ri and the error εi using the
ith-order filter for each i while the Levinson–Durbin recursion generates the filter coef√
ficient ai and the gain A. εi and A are related by A = εN . Figure 2.10 shows that
from a segment of speech waveform, three equivalent sets of N + 1 parameters, the
autocorrelation values, the reflection coefficients and the error, and the filter coefficients and the gain, can be computed and converted to each other, using appropriate
recursion techniques. The Inverse Levinson–Durbin, Step-Up, and Step-Down recursions complete the chain of parameter transformations. Notice that if the length of
speech signal, e.g., 160 samples for 20 ms, is much larger than the filter order, e.g.,
N = 8, the computational cost for obtaining the autocorrelation values dominates
the complexity but for the MS, the differences among the recursions can also be
significant.
In Fig. 1.2, it is shown that the analog speech signal is digitized into an intermediate
13- or 16-bit format, which is compressed further to around 10 kbps using a speech
encoder. If two bytes are assigned for each value of ak and A in an all-pole model with
N = 8, the bit-rate is reduced from 104 or 128 kbps to 7.2 kbps. Since −1 ≤ ri ≤ 1 for
all i, it can be shown using the Levinson–Durbin recursion that the system is stable, and
all poles are located within the unit circle. However, stability of the transfer function
might be lost if the values of reflection coefficients or filter coefficients are changed
by errors occurring during transmission. A simple approach to preserve the stability is
to convert the coefficients into another format that is more resilient to errors, such as
the Line Spectral Frequencies (LSF), which are related to the pole locations [Itakura
(1975)].
In the FR speech encoder, the ith reflection coefficients are converted to the Log Area
Ratios (LAR), which are defined as


1 + ri
,
(2.31)
LAR(i) = Log10
1 − ri

2.1 Speech Signal Processing

Fig. 2.9 LPC analysis using Schur recursion in GSM FR [3GPP (2000a)].

29

30

Signal Processing in TDMA Systems

Fig. 2.10 Equivalence between autocorrelation sequence, reflection coefficients, and all-pole

model parameters.

Fig. 2.11 Network control of speech bit-rate. (a) Fixed bit-rate. (b) Variable bit-rate.

for better quantization characteristics and higher error resilience. To save the complexity
for implementing the logarithmic function, the following segmented piecewise approximation is used instead,
⎧
⎪
|ri | < 0.675,
⎪
⎨ ri
(2.32)
LAR(i) = sign[ri ] ∗ [2|ri | − 0.675] 0.675 ≤ |ri | < 0.950,
⎪
⎪
⎩sign[r ] ∗ [8|r | − 6.375] 0.950 ≤ |r | ≤ 1.000,
i

i

i

which does not need the costly division and logarithm operations.

2.1.2

Fixed Bit-Rate versus Variable Bit-Rate Coding
Although the all-pole modeling can be used for most types of speech signals, different
models can also be considered when the simple model fails. Typical speech compression
suites include several models designed to meet the diverse acoustic nature of the input
signals as closely as possible. Depending on whether or not the bit-rates required to
represent the coefficients of the models are identical for each speech segment, the speech
compression schemes can be classified as being either Fixed Bit-Rate (FBR) or Variable
Bit-Rate (VBR).
Figure 2.11 illustrates the principles of FBR and VBR speech coding. Although drawn
identically, the maximum bit-rates of two schemes are not necessarily equal. Figure
2.11(a) illustrates the output bit-rate of an FBR speech encoder where a fixed bit-rate is
used whenever voice activity is present. Otherwise, the bit-rate falls to zero or to a low
level that corresponds to a special frame type containing only the background noise.

2.2 AMPS Enhancements

31

Fig. 2.12 4-level VBR speech coding.

An FBR speech encoder exploits the observations made in Fig. 2.3 that the duration of
silence can be comparable to that of actual speech, to reduce the average bit-rate. In contrast, a VBR speech encoder generates multiple types of output speech frames, to match
the time-varying nature of the input signal, as illustrated in Fig. 2.11(b). Depending on
the algorithms, the lowest bit-rate of a VBR speech codec might be zero.
VBR can be considered a capacity-driven approach that is especially effective if the
bit-rate saved by an MS can be used by other MSs. The more conventional FBR applies
an identical bit-rate regardless of the nature of the input signal as long as voice activity is
present. Therefore, in mobile communications systems using FBR, the network typically
controls the maximum bit-rate of each MS to control the tradeoff between speech quality
and network capacity. It is generally possible to apply different bit-rates to each MS,
depending on the channel conditions to and from the MS. In addition to the bit-rate,
the network also controls the transmit power of the MSs and base stations, as another
resource available to control the quality–capacity tradeoff.
In contrast, in mobile communications systems that use VBR, the network typically
controls the average bit-rate of each MS. Figure 2.12 illustrates the principles of a VBR
speech encoder supporting four bit-rates, where different combinations of bit-rates are
assigned for the silence, unvoiced speech, onset of speech, and voiced speech. Note that
in the figure, the ratio of the four bit-rates is eight, four, two, and one but VBR itself
does not limit the ratios. From the differences in the variation pattern of bit-rates, it can
be seen that in VBR, it would be necessary to inform the receiver of the bit-rate used for
speech somehow, periodically or when it changes. The LPC principles can be applied
to either FBR or VBR as the bit-rate can be differentiated in the coefficient quantization
stages.

2.2

AMPS Enhancements

2.2.1

Narrowband AMPS
The mobility and coverage of AMPS increased the number of mobile telephone users
to an unexpected degree. The straightforward approach of meeting this challenge to

32

Signal Processing in TDMA Systems

Fig. 2.13 Frequency-, time-, and power-domain representation. (a) AMPS. (b) N-AMPS.

(c) D-AMPS.

increase network capacity, adding more frequency spectrum, was not particularly attractive as