fccjxxw.com
非常超级学习网 学习超级帮手
当前位置:首页 >> >>

CONTEXT-SENSITIVE SPEECH RECOGNITION IN THE AIR TRAFFIC CONTROL SIMULATION Author


EUROPEAN ORGANISATION FOR THE SAFETY OF AIR NAVIGATION

EUROCONTROL

EUROCONTROL EXPERIMENTAL CENTRE

CONTEXT-SENSITIVE SPEECH RECOGNITION IN THE AIR TRAFFIC CONTROL SIMULATION EEC Note No. 02/2001

Issued: February 2001

The information contained in this document is the property of the EUROCONTROL Agency and no part should be reproduced in any form without the Agency’s permission. The views expressed herein do not necessarily reflect the official views or policy of the Agency.

REPORT DOCUMENTATION PAGE

Reference: EEC Note No. 02/2001 Originator: EEC - ASC (Analytic and Scientific)

Security Classification: Unclassified Originator (Corporate Author) Name/Location: EUROCONTROL Experimental Centre B.P.15 F - 91222 Brétigny-sur-Orge CEDEX FRANCE Telephone : +33 (0)1 69 88 75 00 Sponsor (Contract Authority) Name/Location: EUROCONTROL Agency Rue de la Fusée, 96 B -1130 BRUXELLES Telephone : +32 2 729 9011

Sponsor: EATCHIP Development Directorate DED.4

TITLE: CONTEXT-SENSITIVE SPEECH RECOGNITION IN THE AIR TRAFFIC CONTROL SIMULATION

Author D. Schaefer EATCHIP Task Specification -

Date

Pages

Figures

Tables

Appendix

References

2/01

xiii + 192 Project -

57

23

6 Period

91

Task No. Sponsor -

1994 to 2001

Distribution Statement: (a) Controlled by: Head of ASC (b) Special Limitations: None (c) Copy to NTIS: YES / NO Descriptors (keywords): Air Traffic Control, ATC Simulation, Automatic Speech Recognition, Cognitive Models, Context Knowledge, Dynamic Syntax Abstract: This report discusses the use of situation knowledge as a means to enhance the recognition performance of commercially available automatic speech recognizers. A cognitive model of the ATC controller is proposed that continuously observes the present situation and generates a prediction of the sentences the controller is most likely to say. The prediction is made available to the speech recognizer which thus gains a basic form of situation knowledge. The results of an experimental evaluation of this approach in an ATC simulator are discussed.

This document has been collated by mechanical means. Should there be missing pages, please report to: EUROCONTROL Experimental Centre Publications Office Centre du Bois des Bordes B.P. 15 91222 - BRETIGNY-SUR-ORGE CEDEX France

Dirk Sch?fer

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

This book has been submitted to the German Armed Forces University in Munich, Germany, as a Doctoral Thesis

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

UNIVERSIT?T DER BUNDESWEHR M?NCHEN Fakult?t für Luft- und Raumfahrttechnik

Thema der Dissertation: Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Verfasser: Dirk Sch?fer

Vorsitzender des Promotionsausschusses: Prof. Dr.-Ing. Dipl.-Kfm. Bernhard Katzy 1. Berichterstatter: Prof. Dr.-Ing. Reiner Onken 2. Berichterstatter: Prof. Dr.-Ing. Heinz Winter, Institut für Luft- und Raumfahrt der Technischen Universit?t Berlin

Tag der Prüfung: 27. November 2000 Mit der Promotion erlangter akademischer Grad: Doktor der Ingenieurwissenschaften (Dr.-Ing.)

Paris, den 25. Januar 2001

vi

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

TABLE OF CONTENTS
ABBREVIATIONS ....................................................................................................... IX LIST OF TABLES ........................................................................................................ XI LIST OF FIGURES .................................................................................................... XIII 1 INTRODUCTION ........................................................................................................ 1 2 AIR TRAFFIC CONTROL AND AIR TRAFFIC CONTROL SIMULATION ................... 5 2.1 The History of Air Traffic Control............................................................................ 5 2.2 Air Traffic Control Today........................................................................................ 9 2.3 Air Traffic Control Simulation ............................................................................... 15 2.4 The Application of Speech Recognition ............................................................... 18 2.5 Summary ............................................................................................................. 22 3 AUTOMATIC SPEECH RECOGNITION TECHNOLOGY ......................................... 24 3.1 Voice-based Man-Machine Communication......................................................... 24 3.2 Fundamentals of Automatic Speech Recognition................................................. 26 3.3 Present State of the Art ....................................................................................... 30 3.4 Summary ............................................................................................................. 37 4 THE USE OF CONTEXT KNOWLEDGE FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS ......................................................................................... 39 4.1 Approaches to Improve the Performance of Speech Recognition Systems ......... 39 4.2 The Use of Static and Dynamic Knowledge for the Syntax Definition .................. 43 4.3 Cognitive Models of Human Performance............................................................ 45 4.4 Models of Air Traffic Control as a Cognitive Process ........................................... 50 4.5 Expert Knowledge Acquisition ............................................................................. 55 4.6 Summary ............................................................................................................. 57 5 A COGNITIVE MODEL OF THE ATC CONTROLLER.............................................. 58 5.1 The Cognitive Controller Model (CCM) ................................................................ 58 5.2 CCM Functional Architecture ............................................................................... 60 5.3 Air Traffic Control in Frankfurt Westradar 1.......................................................... 62 5.4 The Observation Function ................................................................................... 65 5.5 The Decision Function ......................................................................................... 69 5.6 The Phraseology Function ................................................................................... 75 5.7 Summary ............................................................................................................. 80 6 IMPLEMENTATION OF THE COGNITIVE CONTROLLER MODEL......................... 81 6.1 Methods of Knowledge Representation ............................................................... 81 6.2 Object Representation in the Observation Module............................................... 87 6.3 Production Rules in the Decision Module............................................................. 90 6.4 The Phraseology Module ..................................................................................... 93 6.5 Performance and Calibration ............................................................................... 95 6.6 Summary ............................................................................................................. 98 7 THE SIMULATION AND SPEECH RECOGNITION ENVIRONMENT....................... 99 7.1 The Speech Recognition Engine.......................................................................... 99 7.2 The Simulation and Speech Recognition Environment ...................................... 101

vii

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation 7.3 The ATC Speech Recognizer ............................................................................ 105 7.4 The Syntax Compiler ......................................................................................... 111 7.5 Summary ........................................................................................................... 111 8 EXPERIMENTAL EVALUATION ............................................................................ 112 8.1 Objectives.......................................................................................................... 112 8.2 Experimental Design and Test Subjects ............................................................ 113 8.3 Logged Data ...................................................................................................... 116 8.4 The NASA Task Load Index (TLX) .................................................................... 117 8.5 Statistical Analysis............................................................................................. 118 8.6 Summary ........................................................................................................... 122 9 RESULTS OF THE EXPERIMENTS ...................................................................... 123 9.1 Data Material ..................................................................................................... 123 9.2 Recognition Error Rate ...................................................................................... 124 9.3 Subjective Workload .......................................................................................... 129 9.4 Questionnaires .................................................................................................. 131 9.5 The Recognition Confidence.............................................................................. 135 9.6 Discussion ......................................................................................................... 137 9.7 Summary ........................................................................................................... 140 10 SUMMARY AND CONCLUSIONS........................................................................ 141 10.1 Summary ......................................................................................................... 141 10.2 Conclusions ..................................................................................................... 144 ACKNOWLEDGMENTS ............................................................................................ 146 REFERENCES .......................................................................................................... 147 INDEX ....................................................................................................................... 157 APPENDIX A: ICAO STANDARD PHRASEOLOGY.................................................. 159 APPENDIX B: EXTENDED PHRASEOLOGY ........................................................... 161 APPENDIX C: CLASS STRUCTURE ........................................................................ 164 APPENDIX D: PRODUCTION RULES ...................................................................... 169 APPENDIX D1: PRODUCTION RULES POSSIBLE CLEARANCES......................... 169 APPENDIX D2: PRODUCTION RULES .................................................................... 169 APPENDIX E: EXCERPT FROM A DECODE LOG FILE .......................................... 190 APPENDIX F: QUESTIONNAIRE.............................................................................. 192

viii

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

ABBREVIATIONS
A/D ACT AIC AIP API ASCII ASR ATC ATCSim ATCSR ATMOS ATN BFS CAA CASSY CBT CCM CE COMPAS CPU C-VFR DFS DLL DLR DSP EnCoRe FAA FFM FFT FL GUI Hdg HMM IAS ICAN ICAO Analogue / Digital Adaptive Control of Thoughts Air Intercept Control Aeronautical Information Publication Application Programmer’s Interface American Standard Code for Information Interchange Automatic Speech Recognition Air Traffic Control ATC Simulation ATC Speech Recognizer Air Traffic Management and Operations Simulator Augmented Transition Network Bundesanstalt für Flugsicherung (predecessor of DFS) Civil Aeronautics Authority Cockpit Assistant System Computer Based Training Cognitive Controller Model Callsign Error Computer Oriented Metering, Planning, and Advisory System Central Processing Unit Controlled Visual Flight Rules Deutsche Flugsicherung GmbH (German Air Navigation Services) Dynamic Link Library Deutsches Zentrum für Luft- und Raumfahrt e.V. (German Aerospace Center) Digital Signal Processing Chipset Enroute Controller's Representation Federal Aviation Administration Frankfurt (navigation aid) Fast Fourier Transformation Flight Level Graphical User Interface Heading Hidden Markov Model Indicated Airspeed International Commission for Air Navigation International Civil Aviation Organization

ix

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation
IE IFR ISA ISC kts LBU MINDS MoFl NASA NDB NM NTM PC PDI PE RADAR ROC/D RTCA RUD SWAT TAU TE TLX UTC VFR VHF VOR WR1 WR2 Instruction Error Instrument Flight Rules Instantaneous Self Assessment Interactive Speech Card Knots Luburg (navigation aid) Multi-modal Interactive Dialogue System Model der Fluglotsent?tigkeit - Model of Controller Behavior National Aeronautics and Space Administration Non-Directional Radio Beacon Nautical Miles Nattenheim (navigation aid) Personal Computer Phonetic Decoder Interface Parameter Error Radio Detection And Ranging Rate of Climb/Descent Radio Technical Commission for Aeronautics Rüdesheim (navigation aid) Subjective Workload Assessment Technique Taunus (navigation aid) Total Error Task Load Index Universal Time Coordinated Visual Flight Rules Very High Frequency Very High Frequency Omni-directional Radio Beacon Westradar 1 Westradar 2

x

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

LIST OF TABLES
Table 3-1 Recognition accuracy in the experiments of Matrouf et al. .......................36 Table 4-1 Syntactic, semantic and pragmatic knowledge. .......................................43 Table 5-1 Strategies applied for controlling arrival traffic in WR1. ............................73 Table 5-2 Excerpt from Aeronautical Information Publication (AIP)..........................76 Table 5-3 Most frequent clearances recorded during simulations. ...........................77 Table 5-4 Heading clearances (ICAO standard phraseology). .................................78 Table 5-5 Heading clearances (extended phraseology). ..........................................79 Table 6-1 Script for dining in a retaurant. .................................................................85 Table 6-2 Clearance weights. ..................................................................................90 Table 6-3 Excerpt from a simulation data file. ..........................................................95 Table 6-4 Excerpt from a pseudo pilot protocol........................................................96 Table 6-5 Mean pseudo pilot input delay. ................................................................97 Table 8-1 Overview of the test subjects. ................................................................115 Table 8-2 Sequence of experimental and traffic scenarios per test subject............116 Table 8-3 Definitions of the NASA TLX Rating scales............................................118 Table 8-4 Categories of instructions. .....................................................................119 Table 8-5 Decode categories.................................................................................119 Table 8-6 Categories of recognition errors.............................................................120 Table 9-1 Number of sentences per decode category............................................123 Table 9-2 Frequency of clearance types................................................................124 Table 9-3 Recognition Error Rate per error category. ............................................126 Table 9-4 Error rate per clearance type - ICAO standard phraseology...................127 Table 9-5 Error rate per clearance type - extended phraseology............................127

xi

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

xii

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

LIST OF FIGURES
Figure 2-1 Air traffic control center in the early 1950s................................................7 Figure 2-2 Air traffic control center in the early 1970s................................................8 Figure 2-3 Vertical Airspace Structure in Germany. .................................................10 Figure 2-4 Lateral airspace structure, excerpt from aviation chart............................11 Figure 2-5 Control tower at München airport............................................................12 Figure 2-6 München enroute control center. ............................................................12 Figure 2-7 Radar screen based on secondary radar. ...............................................13 Figure 2-8 Paper flight strip. ....................................................................................14 Figure 2-9 Flight strip board.....................................................................................15 Figure 2-10 Controller working position at DLR’s ATMOS. ......................................16 Figure 2-11 The pseudo pilot concept......................................................................17 Figure 2-12 The six pseudo pilot stations of DLR’s ATMOS. ...................................18 Figure 2-13 Automatic speech recognition in the ATC simulation. ...........................19 Figure 2-14 Number of misrecognitions among US line pilots..................................21 Figure 2-15 Error rate among pseudo pilots in DLR’s ATMOS.................................22 Figure 3-1 The principles of automatic speech recognition. .....................................28 Figure 4-1 The principles of automatic speech recognition (2). ................................40 Figure 4-2 Two types of recognition errors (schematically). .....................................42 Figure 4-3 Model of human information processing .................................................46 Figure 4-4 Model of three levels of performance......................................................47 Figure 4-5 Conceptual model of operator’s tasks.....................................................49 Figure 4-6 Components of the controller model MoFl. .............................................54 Figure 5-1 CCM in the simulation and speech recognition environment...................60 Figure 5-2 The structure of the Cognitive Controller Model. .....................................62 Figure 5-3 Structure of Frankfurt Westradar 1. ........................................................63 Figure 5-4 Frequency of different clearance categories in WR1...............................64 Figure 5-5 The observation function. .......................................................................67 Figure 5-6 Possible control advisories for an isolated aircraft. .................................70 Figure 5-7 "Lineal holding" in WR1. .........................................................................71 Figure 5-8 The decision function..............................................................................74 Figure 5-9 Phrases associated with the clearance to contact Arrival sector.............76 Figure 5-10 The phraseology function. ....................................................................80
xiii

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Figure 6-1 Fuzzy membership functions for temperature. ........................................82 Figure 6-2 Statechart of driver behavior (lane change). ...........................................83 Figure 6-3 Semantic net description of automobiles. ...............................................84 Figure 6-4 Sample Petri Net. ...................................................................................86 Figure 6-5 The classes Position, Waypoint and Flightplan.......................................88 Figure 6-6 The class Aircraft....................................................................................89 Figure 6-7 The class Clearance...............................................................................89 Figure 6-8 Rule to reduce airspeed (possible clearances). ......................................91 Figure 6-9 Rule to proceed direct to Rüdesheim (probable clearances)...................91 Figure 6-10 Excerpt from a syntax possible.sts (extended phraseology). ................94 Figure 7-1 Structure of the Phonetic Engine PE500...............................................101 Figure 7-2 The ATC simulation and speech recognition environment. ...................102 Figure 7-3 Excerpt from a syntax possible.sts (extended phraseology). ................104 Figure 7-4 Excerpt from the file speech.log ...........................................................108 Figure 7-5 Excerpt from the file aircraft.ini. ............................................................109 Figure 9-1 Recognition error rates. ........................................................................125 Figure 9-2 Recognition error rate per error category. .............................................125 Figure 9-3 Error rate per clearance type - ICAO standard phraseology..................126 Figure 9-4 Error rate per clearance type - extended phraseology. .........................127 Figure 9-5 Error rate per test subject - ICAO standard phraseology.......................128 Figure 9-6 Error rate per test subject - extended phraseology. ..............................129 Figure 9-7 NASA TLX scores - ICAO standard phraseology..................................130 Figure 9-8 NASA TLX scores - extended phraseology...........................................130 Figure 9-9 The recognition confidence of correct and incorrect decodes. ..............136

xiv

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

1

Introduction

In the early days of aviation there seemed to be unlimited space for air traffic. But very soon substantial growth rates in aviation revealed that uncontrolled air traffic would expose aircraft to the risk of mid-air collisions. To avoid these risks, the first conventions about air navigation were implemented in the early 1920s. When radio communication between ground stations and aircraft was invented in the early 1930s, the predecessors of today’s air traffic control (ATC) services were installed, responsible for maintaining safe separations between all aircraft in the airspace of their responsibility. The pilots, in turn, were obliged to comply with the instructions of ATC controllers. Although accidents in aviation can have catastrophic effects, traveling by air is very safe when compared to other means of transportation.1 However, annual growth rates of about seven percent for the number of commercial flights were reported in Germany during the last decades and the increase is predicted to continue in the near future [DFS 98]. Air traffic control services have to cope with steadily growing rates of traffic while maintaining and even increasing today’s standards of safety and efficiency. Computer-based systems can facilitate this task to some degree. However, replacing the human operator by automated systems does not appear to be a viable approach because air traffic control is a fairly complex process, and automated systems, at least in the near future, seem incapable of dealing with other than standard situations sufficiently reliable. Besides that, complete automation would leave the human operator in the role of a passive supervisor. A more promising approach is to design systems that support the controller and free him from tedious routine tasks while helping him to maintain situation awareness. Whenever computer-based systems have been introduced in ATC in recent years, it has continuously been emphasized that they should be designed as assistance systems, keeping the controller in the decision process and leaving it up to him or her whether to delegate tasks to the system or not. Assistant systems that support the human operator efficiently must be designed in a way that they exactly suit the operators needs and provide the right functionality at the right time. A good understanding of the cognitive processes involved in air traffic control is necessary in order to achieve a good system design. Whether computer
1

The National Safety Council has calculated that the rate of fatalities among scheduled US airline flights in 1994 was 0.04 per one hundred million passenger miles. The rate of fatalities among automobiles for the same time was 0.86 per one hundred million passenger miles, more than 21 times higher [FAA-1 97]. 1

Chapter 1 – Introduction

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation and human operator can cooperate efficiently becomes a question of crucial importance. Cognitive science approaches and ergonomics can to some degree contribute to the solution of this question. However, new systems cannot be introduced in operational service without extensive testing. Air traffic control simulation facilities provide a realistic environment for testing and evaluation of new systems without causing safety hazards. Integration of simulation studies in an early phase of the design process also provides a rich source of information for system design. In order to confront the test subjects with the same conditions they would encounter in reality and thus to be able to transfer findings from simulations to reality, the simulation must be highly realistic. Most ATC simulation environments use the so-called pseudo pilot concept to simulate the communication between the controller and the pilots of aircraft in the control sector. Via a simulated radio connection the controller transmits clearances to pseudo pilots who control the aircraft accordingly and confirm the instructions. Depending on the number of aircraft a pseudo pilot has to operate, entering the clearance parameters and reading back the clearances often results in very high workload. This sometimes causes delayed, erroneous, or even non-existent pilot responses. Obviously, the capacity of pseudo pilots to respond to clearances immediately and reliably is crucial for the quality of the simulation. Therefore extensive training and a sufficient number of pseudo pilots is required, so that ATC simulations are comparatively expensive. At least some problems correlated to the pseudo pilot concept can be solved by the introduction of automatic speech recognition systems (ASR) in ATC simulators. However, in order not to adversely affect the simulation, an excellent performance of the speech recognizer is crucial. The number of misrecognitions must not exceed those of pilots and pseudo pilots. Also immediate responses to the clearances are required. Overwhelming advances have been achieved in ASR hard- and software technology during the last two decades. Some 15 years ago, speech recognition was limited to systems understanding a dozen of isolated words with marginal accuracy after having been trained by an individual speaker. Today, speaker-independent ASR systems recognize continuous speech and deal with large vocabularies. However, ASR technology still doesn’t meet the requirements for a successful application in air traffic control and ATC simulation. Recent studies conclude that "today’s speech recognition technology just isn’t good enough for this domain" [Churcher 96].

2

Chapter 1 - Introduction

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation One approach to enhance the performance of existing ASR systems consists in the use of situation knowledge. A system that continuously observes the situation and assesses the likelihood of each possible instruction could provide some basic form of artificial situation awareness. The scope of this work is to investigate the benefits of this approach for speech recognition in the air traffic control simulation. Additional concerns, such as usability, user acceptance, and perceived workload also are discussed. Chapter 2 provides an overview of air traffic control and ATC simulation. The history of ATC and its present state are discussed. An introduction to ATC simulators and the pseudo pilot concept is given. The problems correlated to the pseudo pilot concept are outlined and automatic speech recognition (ASR) is proposed as a solution for these problems. The fundamentals of automatic speech recognition are described in chapter 3. Further, requirements for a successful application of speech recognition in an ATC simulation environment are discussed and the performance of currently available systems is outlined. Related studies and their approaches are discussed in detail. The use of situation knowledge as a means to increase the recognition performance is presented in chapter 4. Using situation knowledge requires a dynamic assessment of the actual situation and an estimation which instructions might be expected. A cognitive model of the ATC controller is proposed to support a dynamic situation assessment. The fundamentals of a cognitive model of controller behavior are presented in chapter 5 and the Cognitive Controller Model (CCM) is proposed as a model of mental processes involved in air traffic control. According to existing models of problem solving provided by cognitive psychology the working process in air traffic control is structured and three main processes are identified: observation of the traffic situation, decision making and planning, and controller-pilot communication. Each of these processes is discussed in detail. Chapter 6 describes the implementation of CCM as a computer program. The structure and architecture of the model are outlined together with its calibration. The performance of the model is discussed by comparing its estimations with data recorded in ATC simulations. The implementation of the speech recognizer and its interface to the controller model and the ATC simulation are presented in chapter 7. The speech recognizer identifies the clearances issued by the controller and transmits the corresponding parameters
Chapter 1 – Introduction 3

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation to the simulation. It also simulates the response of a pilot with synthetic speech. Each of these functions is described in detail. Chapter 8 describes the experiments carried out in an ATC simulator in order to evaluate the overall effects the use of situation knowledge has on the recognition performance. Qualified ATC personnel participated in simulation sessions in which aircraft were controlled by means of voice. The effects of ASR technology on user acceptance and controller workload were also investigated. The results of the experiments are presented in chapter 9. The recognition performance is discussed as well as the usability of the speech recognition system, judged by test subjects and indicated by assessment of subjective workload. Finally, the potential of this approach for future systems is discussed. Chapter 10 summarizes the findings and gives an estimate of benefits and drawbacks of the Cognitive Controller Model and the use of situation knowledge for automatic speech recognition in the area of ATC simulation.

4

Chapter 1 - Introduction

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

2

Air Traffic Control and Air Traffic Control Simulation

Steady increases in air traffic have created a crowded sky where unlimited space seemed to exist when the first manned aircraft lifted off a century ago. It soon became apparent that uncoordinated air traffic would suffer from great hazards of accidents. Air traffic control services were implemented and entrusted with controlling the air traffic. Procedures and regulations became more complex and compliance with them more important as the traffic density increased further. Air traffic control services were equipped with increasingly complex technical systems. With the importance of air traffic control, the need to emulate the working environment of controllers in simulation facilities became apparent. This permits the training of controllers and helps to test new systems before their introduction into operational service. In order to simulate the communication between controller and pilot, most simulation facilities use the pseudo pilot concept which means that simulator staff execute and respond to the controller’s clearances. The replacement of pseudo pilots with automatic speech recognition would make simulation facilities independent of the costly need for personnel. However, in order not to affect the simulation environment in an unwanted way, its performance must be at least equal to that of human listeners. This leads to several requirements towards the speech recognizer for an undisturbed simulation environment.

2.1

The History of Air Traffic Control

In the early days of manned aviation airborne vehicles were permitted to fly with little or no restrictions. Few aircraft existed and they were only operated with moderate speed and under good weather conditions so that there seemed to be little danger of mid-air collisions. Navigation was based on compass courses between characteristic locations or was completely based on vision, often using roads or railway tracks as navigational aids. The convention to fly right of such features is considered as one of the first rules in air traffic [Colchester 61]. An increase in air traffic following World War I soon demonstrated the necessity of agreements on air traffic rules. Several European countries founded the International Commission for Air Navigation (ICAN) in 1919 and agreed on the International Convention for Air Navigation. The United States did not sign the ICAN convention but implemented some of its concepts when launching a program to establish the Federal Airways System in 1927. The 1920s and particularly the 1930s saw a significant increase in passenger air traffic and it soon became apparent that aviation would play a major role in a future
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 5

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation transportation system if only the navigation could be supported by means independent of the weather [Gilbert 73]. First steps towards instrumental navigation were taken when in 1927 four course radio beacons were installed across the United States, permitting aircraft equipped with a radio receiver to maneuver along one of the four beams. Air routes independent of landmarks were established, using the location of the beacons and the crossing points of beams as fixes. By 1929 radio marker beacons were introduced that permitted to navigate towards the beacon from any direction [RTCA 52]. Also, light beacons were installed to support navigation at night. In 1936 the major airports in Germany were connected by ten night flight air routes, constructed from 157 light beacons [BFS 90]. However, a severe disadvantage of light beacons was that they permitted navigation under good weather conditions only. In 1930, two way radio telephone communication between aircraft and ground stations came into operation. Until 1932, most US airlines had equipped their aircraft for radio telephone communication with airline ground stations which allowed them to report flight progress and receive meteorological information. The first radio-equipped airport control tower entered service at Cleveland Municipal Airport in 1930 with 20 other airports following during the next five years [Gilbert 73]. However, apart from airport towers, radio transmissions were mostly used for the communication with airline radio operators and dispatchers. In 1934, the major American airlines installed flight tracking systems for their aircraft when approaching large airports. These systems were based on aircraft position reports to an airline radio operator. In case of a conflict, the radio operators of the concerned airlines communicated with each other and with the tower controller. This soon demonstrated the benefits of a coordinated effort. The major airlines signed the Interline Agreements and consented to the establishment of joined Airway Traffic Control Centers at the airports of Newark, Chicago, and Cleveland which went into operation in 1935 and 1936. In 1936, the Airway Traffic Control Centers were taken over by the US government and five additional centers were established during the next months. The centers derived the traffic situation from aircraft position reports they received via radio transmission and copied to a blackboard. The aircraft positions were then estimated and marked on a map table. In the early 1940s blackboards for entering aircraft position reports were gradually replaced by paper flight progress strips. Each strip covered the flight of one aircraft between two reporting points, presenting the necessary information from the flight plan and allowing the controller to write down reports and advisories with a pencil. By 1942, 23 additional Airway Traffic Control Centers had gone into operation and a complete coverage of the US airway system had been achieved [Gilbert 73].
6 Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Figure 2-1

Air traffic control center in the early 1950s [Source: DFS].

In 1947 the International Civil Aviation Organization (ICAO) was founded, a date commonly marked as the beginning of air traffic control. The ICAO, with at that time more than 50 signatory nations, released standards and procedures for air traffic which were then incorporated by the local administrations of the member countries. The second fundamental advance in air traffic control after the invention of radio communication was achieved with the invention of RADAR (Radio Detection and Ranging) during World War II. Since radar provides a precise position of each aircraft in the vicinity of the radar antenna, it greatly contributed to the accuracy in air traffic control. Due to position estimates that were troubled by reporting delays and inaccurate navigation, huge separations had formerly been necessary which could now be reduced considerably. Moreover, the visualization of the aircraft positions on a radar screen greatly facilitated the work for ATC controllers, depicting an air traffic situation which before had to be constructed mentally from position reports. Beginning in the early 1950s, radar systems went into operation in US ATC facilities. Radar screens display the positions of all targets in the range of a radar antenna. The radar antenna emits radio wave pulses while rotating around an axis, usually vertical, and receives the pulses reflected by obstacles. By measuring the duration between emission and reception the distance of targets can be calculated, while the bearing to the targets is derived from the angle of the antenna during emission and reception. Thus, a two-dimensional image of the position of all targets in the vicinity
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 7

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Figure 2-2

Air traffic control center in the early 1970s [Source: DFS].

of an antenna can be produced. Radar systems based on this principles are called primary radar. A drawback of primary radar systems is that they do not provide information about the altitude. Also, the altitudes of targets in the three-dimensional space distorts its depiction in the two-dimensional plane. Aircraft can be equipped with so-called transponders that receive the pulses emitted by the radar antenna and in turn send a response signal. Transponders transmit additional aircraft-specific information such as an aircraft identification code, altitude, and speed to the ground station. This information can then be displayed on the radar screen next to the aircraft position. Radar systems based on this technology are referred to as secondary radar. By the mid-70s a semi-automated ATC system had been introduced in the United States, based on secondary radar technology. Although the new systems greatly facilitated the work of ATC controllers, continuous enhancements were required to successfully manage the heavily increasing US air traffic of the late 1970s. The increase was at least partially due to the Airline Deregulation Act of 1978 which created a competitive environment by permitting newly founded airlines to operate on what was before a regulated market.

8

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Although troubled by two World Wars, the development of air traffic control in Germany followed a similar pattern. After World War II the responsibility for air traffic within the German borders remained with the allied forces. As aviation played a major role in World War II, this step was deemed necessary in order to prevent Germany from regaining military strength too quickly. Air traffic control in the American protection zone was coordinated by the Federal Aviation Authority2 and many American standards and procedures were implemented in West Germany. In 1953, the allied forces decided that Western Germany should regain air sovereignty. In consequence, the Bundesanstalt für Flugsicherung (BFS) was established and entrusted with coordinating and controlling air traffic in Germany. After regaining full sovereignty, West Germany became member of ICAO. In the late 1950s and early 1960s faster, jet-powered aircraft with longer ranges came into service. It soon became obvious that efficiency gains in air traffic control would result from jointly coordinating areas larger than the European countries. In 1960, Germany, France, the United Kingdom and the Benelux countries signed the Eurocontrol Treaty to coordinate and harmonize air traffic control throughout Europe. The Eurocontrol organization was put into service at the same time. Centralized air traffic control throughout Europe, however, would touch upon national sovereignties and many Eurocontrol member countries were reluctant to implement it. Advances have been achieved slowly and are still in progress. Similar to the deregulation in the United States, air traffic deregulation took place in Europe in 1993, resulting in a variety of newly founded airlines and an increase in scheduled flights. At the same time the services of the BFS were transferred to DFS Deutsche Flugsicherung GmbH (German Air Navigation Services), newly founded in 1992 as a limited liability company. Presently DFS employs about 4,900 personnel including 3,600 ATC controllers. In 1997 DFS had to manage a total of 2.2 million commercial flights in the Federal Republic of Germany [DFS 98].

2.2

Air Traffic Control Today

Two categories of flights exist in controlled airspace, visual flight rules (VFR) and instrument flight rules (IFR). Pilots flying under visual flight rules are responsible for maintaining visual separation from other aircraft. To guarantee that other traffic can be detected early enough, minimum visibility requirements are binding for VFR
2

The Civil Aeronautics Administration (CAA) was founded in 1940, entrusted with the United States Air Traffic Control Services. The Civil Aeronautics Administration was succeeded by the Federal Aviation Agency (FAA) in 1958 which changed its name into Federal Aviation Administration when it was taken over by the newly founded Department of Transportation in 1967 [FAA-1 97]. 9

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation flights. Pilots may either navigate visually or based on navigational aids. Aircraft flying under Instrumental Flight Rules navigate on air-routes with the help of navigational aids and are not dependent on visibility conditions. Air traffic control services are responsible for maintaining separation between all aircraft and the pilots are obliged to comply with ATC instructions. The requirements for aircraft equipment and pilot qualifications are significantly higher for IFR flights. The airspace in Germany is vertically divided into the lower airspace between ground and flight level 240 and the upper airspace from flight level 245 to flight level 460, each flight level representing an altitude of 100 feet. The airspace is also classified into different categories, in each of which different procedures and requirements apply. As this classification has been derived from ICAO standards, it is common among most ICAO member countries. A classification of the lower airspace in Germany is depicted in Figure 2-3. In the uncontrolled airspace G only VFR flights are permitted. The airspaces C, D, E, and F are controlled airspaces. Airspace C permits IFR flights and controlled VFR flights3 only and applies above flight level 100 in all areas and on top of airport control zones in some areas. Airspace D describes the control zones of controlled airports. The remaining airspace above airspace G is named airspace E. Airspace D and E permit both IFR and VFR flights.

Figure 2-3
3

Vertical Airspace Structure in Germany [AIP 97].

Flight under IFR conditions with VFR aircraft equipment. The pilot must hold a C-VFR license. Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

10

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Higher altitudes permit more economic flights regarding travel time and fuel consumption. For economic reasons and to avoid conflicts with VFR traffic, commercial aviation generally flies in airspace C. The remainder of this chapter therefore focuses on airspace C.

Figure 2-4

Lateral airspace structure, excerpt from aviation chart [AIP 97].

Airspace C is divided into control sectors in which one air traffic controller is responsible for ensuring separations between all aircraft. IFR traffic usually navigates on published air-routes. The segments of these routes consist of the connections between two navigational aids, such as Non Directional Radio Beacons (NDB) or Very High Frequency Omni-directional Radio Beacons (VOR). Figure 2-4 depicts an excerpt of an aviation chart for the vicinity of Frankfurt airport in Germany. Air traffic control services may be divided into the following categories: ? tower, apron, and ground control ? approach and departure control ? enroute control. Tower, apron, and ground services control traffic at and around airports, maintaining visual contact to the aircraft. Approach and departure control services control approaching and departing traffic in the vicinity of airports while enroute control is responsible for aircraft in controlled airspace above flight level 100. A tower controller working position is depicted Figure 2-5 in while Figure 2-6 shows the working position of an enroute controller.

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

11

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Figure 2-5

Control tower at München airport [source: DFS].

Figure 2-6

München enroute control center [source: DFS].

12

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation For all approach, departure and enroute control services, the major sources of information are ? the radar screen ? paper flight strips ? radio/telephone communication with aircraft pilots and other ATC controllers. Figure 2-7 depicts a radar screen based on secondary radar technology. Note for example the aircraft with the callsign SAB513 (Sabena flight 513) heading east. The second line in the aircraft label specifies the current flight level (92), the ground speed (29 = 290 knots) and the aircraft weight category (M = medium). Secondary radar displays are in operational service since 1980 and all aircraft flying in controlled airspace in Germany must be equipped with transponders.

Figure 2-7

Radar screen based on secondary radar.

Pilots of aircraft flying under instrument flight rules must file a flight plan prior to departure giving details about aircraft equipment and the requested route to the destination airport. The flight plans are processed centrally in order to avoid conflicts and to provide the relevant information to all concerned control sectors. For each aircraft flying in a sector a paper flight strip is printed, displaying the necessary information for that sector. The flight strips are handed to controllers a few minutes before the aircraft enters the sectors to inform them of the traffic they have to expect. Figure 2-8 depicts a paper flight strip. It contains (from left to right) the scheduled
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 13

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation time overhead the fix Frankfurt (FFM) (10.15 UTC), the scheduled flight level (FL120), aircraft type and weight category (B747, heavy), callsign (DLH126), maximum speed (510 knots), and the routing within the sector (entering the sector overhead waypoint Luburg (LBU), proceeding via Frankfurt (FFM) and Taunus (TAU)).

Figure 2-8

Paper flight strip.

The paper flight strips are generally stored on a strip board and sorted by the time at which an aircraft is expected to enter the sector. The ATC controller may write on the flight strip with a pencil to keep track of control advisories he or she issued to the aircraft. Figure 2-9 depicts a strip board with paper flight strips. Usually a team of two controllers takes responsibility for the traffic within an ATC sector, consisting of a planning controller and an executive controller. The planning controller reviews the flight plan information, identifies possible conflicts on a strategic basis and develops solutions which he proposes to the executive controller. The executive controller solves conflicts on a short-term basis and issues clearances to the aircraft pilots. If traffic density permits, the executive controller alone coordinates traffic in the sector. The executive controller communicates with the pilots of aircraft in his sector on a very high frequency (VHF) radio frequency which is published in navigation charts and which all pilots flying in this sector have to monitor. A communication is typically initiated by a controller issuing an instruction, addressing the aircraft by its callsign. The pilot reads back the clearance to acknowledge it. Other pilots listen to the communication and derive a mental picture of the traffic situation in their vicinity, an effect referred to as the party line effect. In order to allow for a fast and efficient communication and to minimize the risks of misunderstanding, the phraseology of pilot controller communications has been standardized by ICAO. Although the principles of air traffic control have remained mostly unchanged during the last four decades, continuous improvements have been achieved. Only in this way was it possible to handle the increasing traffic density efficiently and safely. Between 1986 and 1997 the number of IFR flights in the Federal Republic of Germany doubled from 1.08 million to 2.21 million. During the same time, the number of hazardous situations was continuously reduced: in 1997 a total of only 15 aircraft

14

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation proximities4 have been reported in German airspace, only one of which was caused by ATC services. No accidents or incidents occurred in that year [DFS 98].

Figure 2-9

Flight strip board [source: DFS].

To be able to safely and efficiently deal with the anticipated growth in air traffic it is required to develop new concepts and controller assistance systems. These systems have to be optimized and tested extensively prior to their introduction into operational service. Air traffic control simulators serve as experimental testbeds, providing realistic conditions for controllers without causing hazards to real air traffic. The fundamentals of ATC simulators will be discussed below.

2.3

Air Traffic Control Simulation

ATC training simulators provide an efficient supplement to theoretical training and training on the job. By gradually increasing the complexity of the training scenarios, students can be confronted with situations tailored to their growing skills. Emergency scenarios hopefully never encountered in reality may be generated without imposing hazards upon real aircraft. Training simulators are also used to maintain the skills of experienced controllers in critical situations. In Germany, the German Air Navigation Services operates a variety of ATC simulators for training purposes.

4

Violations of the minimum separation between two or more aircraft or between an aircraft and ground obstacles without damage to humans or the airframe are referred to as aircraft proximities. Incidents involve the damage of airframes or injury of persons while accidents involve fatalities. 15

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Air traffic control simulators are also used for research purposes. This helps to test and evaluate new ATC concepts and systems throughout the design phase and before introduction into operational service. The results of the simulations permit the assessment of system performance and usability and the identification of weak points, so that the system can be enhanced accordingly. Also, research simulation facilities are used to scrutinize the mental processes involved in the work of ATC controllers. Simulations allow for the generation of scenarios according to the specific scope of the investigation and the reproduction of these scenarios if necessary. All relevant data can be logged for analysis. ATC research simulators are mostly operated by ATC equipment manufacturers and research facilities. The German Aerospace Center (DLR) operates the Air Traffic Management and Operations Simulator (ATMOS) in its Braunschweig research center with three radar controller working positions for enroute and approach/departure sectors. Figure 2-10 depicts one of the three controller working positions of DLR’s ATMOS.

Figure 2-10
16

Controller working position at DLR’s ATMOS.
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation According to the specific field of use, the fidelity of ATC simulators may vary considerably. Simple mouse-operated standalone systems serve to familiarize student controllers with basic control procedures while generating realistic working environments for research purposes requires more complex systems. Common among most simulators, however, is that artificial aircraft symbols are presented on a display simulating the radar screen. The simulated aircraft correspond to computergenerated data possessing static and dynamic properties such as callsign, position, altitude, speed, and heading. Control advisories can be entered to the simulation computer via an interface so that the aircraft executes the desired maneuvers. Most ATC simulation facilities use the pseudo pilot concept to simulate the communication with aircraft pilots. Each controller working position is equipped with a radio communication link to pseudo pilots in an adjacent room. The pseudo pilots listen to the clearances and enter the relevant parameters via a terminal which is connected to the simulation computer. They also read back the clearances, giving the controller the impression he or she had communicated with a real aircraft pilot. Figure 2-11 schematically depicts the pseudo pilot concept.
Voice Communication

Radar Display

Aircraft Parameters

ATC Simulation ATC Controller Flight Plans Airspace Structure

Control Inputs

Pseudo-Pilot Terminal

Control Inputs Pseudo-Pilot

...

Figure 2-11

The pseudo pilot concept.

Figure 2-12 depicts the six pseudo pilot stations of DLR’s ATMOS, each pseudo pilot controlling up to six aircraft. A computer screen displays the relevant data for aircraft operated by a pseudo pilot. Clearance parameters are entered via a customized keyboard. At least one pseudo pilot is required per controller working position. However, as pseudo pilots have to enter the control parameters and read back the instructions at
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 17

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Figure 2-12

The six pseudo pilot stations of DLR’s ATMOS.

the same time, their workload often becomes unacceptably high. In order to avoid distortions of the simulation fidelity caused by pseudo pilot induced delays or mistakes, often more than one pseudo pilot is required per controller. The need for many pseudo pilots makes simulations very expensive. A possible solution to this problem is the use of automatic speech recognition (ASR) to replace the pseudo pilots.

2.4

The Application of Speech Recognition

The replacement of pseudo pilots with an automatic speech recognition (ASR) system could help to solve at least some of the problems correlated to the pseudo pilot concept. Figure 2-13 depicts the concept of replacing the pseudo pilots with an automatic speech recognition system. The controller speaks the clearance in the same way as in reality or in conventional simulators. Not a pseudo pilot but a speech recognizer listens to the instructions and identifies the concerned aircraft and the type and parameters of the clearance. This information is then transmitted to the simulation computer and a speech synthesis system generates a synthetic pilot answer. The controller working environment itself remains unchanged. The syntax contains all sentences the controller might use at any point during the simulation and is typically constructed once as part of the system development process. The syntax contains sequences of words in a text-based notation, while the dictionary contains parametric descriptions of the words used in the syntax. Due to
18 Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation
Voice Communic ation

Radar Display

Airc raft Parameters Control Inputs ATC Simulation ATC Controller Flight Plans Airspac e Structure
...

Airc raft Parameters

Automatic Speec h Rec ognizer

Speec Synthesis h

Syntax Compiler

Syntax

Dic tionary

Figure 2-13

Automatic speech recognition in the ATC simulation.

the fact that the speech recognizer requires a fully digitized and parametric search space, the syntax and the dictionary are compiled prior to use by the speech recognizer which is usually done once as a part of the system development process. Simulations for research purposes often aim at quantifying the benefits of new systems in terms of performance, accuracy, and operator’s workload. Affecting the experimental conditions would mean to affect these measurements and it would no longer be possible to transfer results and implications derived during simulation to operational practice. The process of communicating is highly routined by daily practice and its disruption could have adverse effects. A speech recognition system must therefore allow the controller to communicate and transmit his control advisories in exactly the same way he or she communicates with aircraft pilots or pseudo pilots. The simulated aircraft must execute the instruction immediately and a read back must be performed in a natural way. The requirements crucial for successful application of automatic speech recognition in the ATC simulation domain are listed below. ? Control Advisories: The speech recognizer must understand all clearances necessary to control the aircraft in the ATC sector. ? Continuous Speech Recognition: The speech recognizer must permit the controller to speak in a continuous and natural way. Isolated word recognition requiring
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 19

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation short breaks between words would attract additional attention and thus affect the amount of attention the controller could allocate to the control task. ? Speaker-independent Recognition: Speaker-dependent recognition requires adaptation of the ASR parameters to the speaking characteristics of individual subjects by speaking an extensive set of sample sentences. As the voice varies from day to day, a re-training for the same speaker is sometimes required. Speaker-independent recognition systems do not need speaker-adaptation. The simulation would not have to be configured before starting and, if necessary, another controller could take over during the simulation. ? Synthetic Pilot Responses: Read-back of clearances and pilot inquiries must be simulated with a synthetic voice in a natural way. A visual feedback of the recognition results would be intrusive because the controller would either ignore it or, by scanning it, trouble his of her perception of the information presented on the radar screen and the paper flight strips. Also, controllers are used to scan pilot readback in order to ensure that the pilot understood the clearances correctly. Any modification of this pattern would result in a disruption of the experimental conditions. ? Immediate Response and Execution: Especially in situations of high traffic density or conflicts, advisories are often issued in quick sequence and, as controllers wait for the read-back of one clearance before speaking the next clearance, any delay would adversely affect the simulation. The time a speech recognition system needs to respond to and execute the clearance must not exceed the response time of pilots and pseudo pilots. ? Phraseology: The phraseology controllers use to speak the instructions often deviates slightly from the ATC phraseology as standardized by ICAO and published in the Aeronautical Information Publications (AIP). While the deviations vary with different controllers, each individual seems to employ his or her favorite wording quite consistently. Adapting to the wording of the ICAO standards would require additional concentration. Therefore, a speech recognition system would be highly desirable that would permit the most frequent deviations from the ICAO standards. ? Recognition Accuracy: Incorrect or incomplete recognition of clearances greatly delays the simulation because another control action is necessary to correct the mistake. Perfect recognition can neither be expected from pilots and pseudo pilots nor from automatic speech recognition. However, the recognition error rate of a speech recognition system should not exceed those of pilots or pseudo pilots.

20

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation
Mean Error Rate
3,50% 3,36%

3,00%

2,50%

2,00%

1,50%

1,40%

1,00% 0,65% 0,50% 0,41%

0,76%

0,73%

0,00% 1 2 3 4 >4 Average

Complexity

Figure 2-14

Number of misrecognitions among US line pilots [Cardosi 93].

The mean number of incorrectly or incompletely recognized control advisories in pilot controller transmissions was the scope of an investigation Cardosi conducted among US airline pilots in 1993 [Cardosi 93]. Cardosi analyzed forty-seven hours of ATC communication with 5,032 controller-pilot transmissions, which included 3,576 clearances. She then classified the clearances according to the number of pieces of information each instruction contained, which she named complexity level. For example, "Northwest 123, cross ten miles west of Pullman at and maintain flight level 200" was considered to contain four elements. The number and percentage of incorrectly or incompletely recognized clearances was then analyzed for each complexity level. The results are depicted in Figure 2-14. Not surprising, the percentage of misrecognitions increases with the complexity of clearances. In total, only 0.73 per cent of all clearances were recognized incorrectly or incompletely. Pseudo pilots have to control a number of aircraft listed in a table on the computer terminal, so that attention is distributed across several aircraft. Also, pseudo pilots are often less trained than airline pilots. The rate of incorrectly or incompletely understood control advisories among pseudo pilots may therefore differ from that of line pilots. To determine the average rate of misrecognitions among pseudo pilots in DLR’s ATMOS, twelve hours of voice transmission of simulations were analyzed containing 1,805 clearances. These clearances were then classified according to the number of instructions each clearance contained. For example, "Lufthansa 456, descend to flight level 110 with 2,500 feet per minute or more" was classified as
Chapter 2 - Air Traffic Control and Air Traffic Control Simulation 21

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation
Mean Error Rate
9,00% 8,04% 8,00% 7,25% 7,00% 6,00% 5,00% 4,16% 4,00% 3,00% 2,15% 2,00% 1,00% 0,00% 1 2 >2 Average

Number of Instructions per Transmission

Figure 2-15

Error rate among pseudo pilots in DLR’s ATMOS.

containing two instructions5. The percentage of misrecognitions was identified for each category. Figure 2-15 gives the results. The average percentage of incorrectly or incompletely understood clearances was 4.2 percent. Again, the mean percentage of misrecognitions increases with the complexity of the clearance. The error rate is highly correlated to the number of aircraft pseudo pilots have to control and their level of practice. The results of this investigation in DLR’s ATMOS can therefore not easily be transferred to other simulation facilities. However, controller participating in simulations in ATMOS consider the performance of pseudo pilots sufficient for untroubled experiments. It may be expected that a recognition rate of 95 percent or above would be acceptable for a speech recognizer.

2.5

Summary

The pseudo pilot concept as applied commonly in air traffic control simulations has several drawbacks. The most important disadvantage is the high number of personnel required for simulation runs. A promising approach to mitigate these disadvantages is the replacement of pseudo pilots by automatic speech recognition technology. Realistic simulation environments require a speech recognizer that is able to understand and respond to the instructions given by the controller in the

5

The number of instructions per transmission does therefore not equal the complexity level as determined by Cardosi [Cardosi 93]. Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

22

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation same way as pilots or pseudo pilots. The speech recognizer must understand at least 95 percent of all instructions correctly and respond and execute the instructions with little delay. The answer of the aircraft pilot must be generated by a speech synthesizer in a naturally sounding voice. The system must further permit the controller to speak in a natural, continuous way and to use expressions other than the standardized ATC phraseology. A speaker-independent system is desirable as it does not have to be adapted to the speaker prior to simulations.

Chapter 2 - Air Traffic Control and Air Traffic Control Simulation

23

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

3

Automatic Speech Recognition Technology

Natural, spoken language is the most convenient and efficient way of communicating with other people and it appears very desirable to communicate with machines in exactly the same way. However, the enormous complexity of natural language and the difficulties in constructing sensory systems capable of decoding spoken signals correctly are obstacles to this vision. At least limited success is possible in fields where well-structured and less complex languages are used. The ATC language has been standardized in order to permit fast, efficient and unambiguous communication in a noisy and safety-critical environment. Various efforts have been made to introduce automatic speech recognition in the air traffic control domain. Although promising approaches are documented, results obtained under laboratory conditions cannot be transferred to operational conditions and the requirements for successful application in the ATC or ATC training domain are not met by speech recognition systems currently available.

3.1

Voice-based Man-Machine Communication

Spoken language is considered to be the most natural and convenient way of communicating with other people. For a long time, people have been seeking for possibilities to communicate with machines in exactly the same way. Vocal communication is often more convenient and efficient than traditional ways of man-machine interaction. Besides, machines capable of understanding spoken language would gain a human-like appearance. Voice-based man-machine communication requires: ? a speech recognition system in order to identify the words and sentences6 that have been spoken, ? a speech understanding system (language parsing system), in order to infer the meaning of the spoken sentences and to initiate adequate actions, ? a speech synthesis system in order to generate an acoustic response. Speech Recognition The speech recognition system converts the spoken utterance into a sequence of words or phonemes. To permit for natural communication with humans the system must be able to recognize any spoken utterance in any context, independent of speaker and environmental conditions, with excellent accuracy, reliability, and speed. For a variety of reasons, today’s automatic speech recognition systems are far from reaching the performance of their human counterparts:

6

Contrasting to the traditional and linguistic meaning, speech recognition technology uses the term ’sentence’ as synonym for ’sequence of words’ and it will be used accordingly in this document. Chapter 3 – Automatic Speech Recognition Technology

24

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation ? Humans possess a much more sophisticated sensory system for the interpretation of acoustic signals. ? Humans use additional sources of information, e.g. visual information such as gestures and speaker physiognomy. ? Humans generally keep track of the actual context as well as of the history of the spoken dialogue. ? Humans have access to much broader and sophisticated sources of knowledge about the actual context, including common sense. Speech Understanding After the spoken words have been transcribed by the speech recognition system, the meaning of these words must be inferred by a speech understanding system. The difficulty of understanding language depends very much on the structure and complexity of the sentences and the context in which they have been spoken7. For many years, linguists have been trying to acquire and structure knowledge about languages in a standardized form and implement it on computer systems. Although some very restricted fields can be described with moderate fidelity, science is far from achieving linguistic models of entire natural languages. Therefore, machine understanding of natural spoken language will at least in the near future remain vision. Speech Synthesis Once spoken language has been recognized by the ASR and adequate actions and answers have been inferred by the speech understanding system, an acoustic response must be synthesized. Although a synthetic voice may sound unnatural and sometimes difficult to listen to, speech synthesis is considered a problem solved at least in principle. Also, the required responses can sometimes be recorded from live speech and then simple displayed, so that the ’synthetic’ voice sounds quite natural. By restricting speech applications to a limited domain and by standardizing the spoken phrases, at least some problems correlated to speech recognition and language understanding can be alleviated greatly. If the operator uses standardized phrases rather than to express his wishes in spontaneous speech, only certain sequences of words have to be taken into consideration during the speech recognition process. Also, the process of language understanding can be facilitated significantly because the speech understanding process is reduced to identifying one out of a variety of possible sentences and initiating an action correlated to that sentence.

7

The complexity of natural languages appeared so immense to psychologist and linguist Chomsky that he claimed it was impossible to apprehend any language without a-priori, genetic knowledge about the structure of the language [Anderson 93]. That may or may not be the case, however, it illustrates how complex the process of language understanding actually is. 25

Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Recent improvements in speech recognition and computer hardware technology have given access to a variety of applications in which speech recognition is a valuable and efficient replacement or supplement of traditional ways of man-machine communication. Rather than understanding natural language, these systems provide an interface to control machines and computers with speech in a standardized phraseology.

3.2

Fundamentals of Automatic Speech Recognition

As early as the fourth century BC, Greek philosopher Aristotle possessed a surprising knowledge of the physics of sound. In 350 BC he wrote: All sounds are produced because air is set in motion by expansion or compression or when it is clashed together by an impact from the breath or from the strings of musical instruments [Aristotle in Taylor 89]. However, it was not before the middle of the 19th century that acoustics became a scientific branch of physics. Researchers such as Willis, Herman and Helmholtz investigated the production of sound and particularly the production of speech. Thanks to their work and a number of subsequent investigations we now have a fairly good understanding of the mechanics of speech. A tone consists of pressure waves in a medium such as water or, most frequently, air. The pressure waves can be described in terms of their frequency, corresponding to the pitch, and their amplitude, corresponding to the loudness of the tone. Sounds generally involve a multitude of tones changing over the time, so that they must be described by dynamically analyzing all frequencies. Speech may be analyzed similar to any other acoustic event, i.e. by dynamically analyzing the frequencies of the pressure waves. However, it was soon found that speech events are generally a conjunction of characteristic acoustic patterns, later referred to as phonemes, so that words can be described as sequences of phonemes. The majority of automatic speech recognizers is based on the analysis of the acoustic event of speech in order to identify the spoken phonemes and thus to infer which words and sentences have been spoken. According to Rabiner & Juang the beginning of serious speech recognition research dates back to at least 45 years ago [Rabiner & Juang 93]. However, the first steps have been taken as early as 1916 when Flowers proposed a voice-operated typewriter which in he describes as [Flowers 16 in Taylor 89]: a visible-writing machine recording our thoughts as rapidly as we speak.
26 Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation His invention based on mirrors which were attached to resonant circuits and reflected light beams to a photo cell when a spoken phoneme caused resonance in the circuit corresponding to that specific phoneme. Flowers’ invention could actually identify isolated phonemes according to their specific wave patterns. Distortions of phonemes due to the subsequent phoneme in a word, an effect referred to as co-articulation, however, made it impossible for Flowers’ machine to identify entire words or even sentences. There is a striking similarity between the vision Flowers had in 1916 and the description of current state-of-the-art speech recognizers designed for the same task: You can dictate text and commands directly into your favorite applications, control your computer and express yourself in the easiest way of all ... by speaking. As you dictate, you’ll see your words transformed directly on your screen [IBM 97]. The analysis of pressure waves permits to identify the phonemes correlated to a speech signal. The phoneme sequences are then transcribed into words and sentences. The spoken utterance is recorded and converted into a digital format by use of an analogue-digital converter, often applying means of frequency analysis such as Fast Fourier Transformation (FFT). Thus a dynamic frequency description of the spoken sentence is produced. This pattern is then compared to the patterns in the search space which contains a description of all valid sentences, i. e. the sequences of words the ASR has been configured to recognize. The principles of automatic speech recognition systems are depicted schematically in Figure 3-1. The dictionary contains the vocabulary, i. e. all words the ASR has been designed to recognize. The words are stored as sequences of phonemes and a phoneme database is used that contains statistical descriptions of all phonemes, including variations in the speaking characteristics among different speakers, if the ASR has been designed to be operated by more than one speaker. The phoneme database is derived by statistical analysis of speech samples of a broad variety of different speakers. The syntax contains a description of valid combinations of words to sentences. Thus, a complete description of all valid sentences is available for the pattern matching process. Variations among different speakers and even dynamic variations of the speaking characteristics of individuals must be considered as well as possible disturbances such as background noise. The speaker model describes how to interpret words as phoneme sequences and how to consider variations in the speech characteristics. Speaker models are typically based on Hidden Markov Models or Neural Nets.
Chapter 3 – Automatic Speech Recognition Technology 27

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Spoken Utterance

Ac oustic Processor

Speaker Model

Parametric Representation

PATTERN MATCHING PROCESS

Decoded Utterance

Speec h Rec ognizer

Dic tionary

Syntax

Figure 3-1

The principles of automatic speech recognition.

During the pattern matching process hypotheses about the best match between the spoken sentence and all sentences specified in the syntax are dynamically constructed and assessed. After a hypothesis about the best match has been selected and confirmed with sufficient reliability, the corresponding sentence is returned as the result of the decoding process and transmitted to the user. Automatic speech recognition systems differ very much in their specific properties. Characteristics of ASR systems include: ? Speaker dependency: Prior to use, speaker-dependent systems must be trained to the speaking characteristics of individual operators. Speaker-independent systems are designed to be operated by any user without training. A database covers the speaking characteristics of the intended speaker population in a statistical description. An intermediate implementation between speaker-dependent and speaker-independent systems are speaker adaptive systems. These systems are speaker-independent but possess an additional module allowing them to adapt to the characteristics of individual users. ? Isolated word recognition vs. continuous speech recognition: Isolated word recognizers require a brief pause between two words in order to identify the word boundaries. Continuous speech recognizers permit the operator to speak in a natural way without pauses. ? Phoneme based vs. word based systems: The frequency descriptions of the words in the search space can either be generated using a phoneme database plus a dictionary or by using a word database. Since phoneme based systems
28 Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation make better use of system resources and provide greater flexibility in constructing new words, most ASR systems today use phoneme databases. ? Structure of the syntax: Deterministic syntaxes specify sequences of words in a deterministic way, e.g. in the form of transition diagrams. Probabilistic syntaxes use transition probabilities for pairs or trigrams of words, i.e. the probability of one word following one or two other words. The transition probabilities are derived by statistical analysis of many sentences in the domain in which the recognizer will be operated. As the effort to generate deterministic syntaxes increases dramatically with a growing number and complexity of the sentences, ASR systems with large search spaces often use probabilistic syntaxes. ? Recognition performance: The recognition performance can be described in terms of recognition speed and recognition accuracy. While the recognition speed refers to the time the system needs to decode a spoken utterance, the recognition accuracy refers to the percentage of successfully decoded utterances8. The recognition performance depends greatly on some of the other parameters mentioned above. Speaker-dependent systems generally perform better because they consider the speaking habits of individual speakers only. Isolated word recognition is more reliable than continuous speech recognition, because the identification of word boundaries is a source of errors when decoding sentences. The greater the search space, i.e. the higher the number of valid sentences, the higher is the likelihood of an erroneous recognition. The speech recognition application typically determines the required system characteristics, including a minimum recognition performance. An ASR system with the corresponding properties is then chosen and adapted to the application. The resulting recognition performance is then judged sufficient or insufficient for successful application. Some 20 years ago commercially available speech recognition technology was limited to speaker-dependent systems capable of identifying a handful of isolated words with moderate accuracy. Computer and speech recognition technology improved considerably during the latest decades and today speech recognition technology is commercially used in a variety of applications, including: ? Call centers: The majority of inquiries in a call center, where customers seek information or wish to place orders, follows standard patterns. Therefore at least some of the conversations can often be standardized and handled by speech

The recognition accuracy is further divided into word recognition accuracy and sentence recognition accuracy, the first referring to the percentage of words that have been decoded successfully while the later indicates the percentage of correctly identified sentences. Chapter 3 – Automatic Speech Recognition Technology 29

8

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation recognition and synthesis systems. ASR systems for call center applications must be speaker-independent but only require a very limited vocabulary and syntax. ? Voice operated command and control: Voice control often provides advantages over traditional ways of interaction with technical systems, particularly if the operator cannot use his hands or focus his eyes on the system. Examples of these applications include car telephones and the control of welding robots. Command and control systems normally use very limited vocabularies and are often designed as speaker-dependent isolated word recognizers. A very high recognition performance and robustness against background noise are often crucial. ? Dictation: Dictation systems serve to enter text into a text processor directly via voice. These systems require vocabularies containing the most frequent words in a language, typically around 60,000 [IBM 97]. Dictation systems are typically speaker-adaptive isolated word recognizers that use probabilistic syntaxes. As an erroneous recognition can be corrected easily the recognition rate does not necessarily have to be very high for efficient system use.

3.3

Present State of the Art

The complexity of natural language enables us to communicate our most sophisticated ideas efficiently, allowing for human endeavors such as poetry, humor and irony. Understanding the meaning of spontaneous speech, however, is a very complex process, far beyond the capabilities of machines imaginable from the present point of view. Applications of today’s speech recognition technology are more promising in areas where spoken communication takes place in a standardized form with little or no ambiguities and a very limited vocabulary [Harrison et al. 86]. Communication in air traffic control shows a moderate complexity and limited vocabulary. Background noise and distortions during radio transmission deteriorate the speech quality, the costs of errors are extremely high and communication takes place under time-critical conditions. To permit efficient communication under such adverse conditions, the ATC phraseology is almost completely standardized and ambiguities are eliminated as far as possible. Still, the introduction of speech recognition into operational air traffic control raises many unanswered questions about safety and system usability and reliability. Air traffic control simulation provides a non safety-critical environment in which, apart from distortions and background noise, the communication takes place in exactly the same way. For these reasons, ATC simulation has become a field of interest for many speech recognition researchers. Still, previous efforts in the application of speech technology in ATC simulation have achieved only very limited success [Weinstein 91].
30 Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation A study at the US Naval Training Systems Center in 1989 focused on the integration of computer technology for advanced training of air intercept control (AIC)9 students including simulation of the controller pilot communication [Hamel et al. 89]. A subgoal of the study was to investigate the suitability of ASR for AIC training purposes. In a first system layout pseudo pilot stations were installed allowing control of the simulated aircraft by keyboard input. A second layout replaced the pseudo pilot stations by a speech recognition system permitting the student to speak the commands directly to the simulation computer. Two off-the-shelf speaker-dependent ASR systems were chosen that provided continuous speech recognition and allowed the definition of a deterministic syntax. Speech synthesis systems were used to generate a pilot response. The vocabulary consisted of 19 words typical for simple control exercises. The syntax showed a very low complexity, comprising between 100 and 200 sentences. The initialization time in which the ASR acquired templates of the individuals speaking habits averaged at about 25 minutes. The sentence misrecognition rate, i.e. the percentage of incorrectly identified messages averaged at 4 to 5 percent while the sentence non-recognition rate, i.e. the percentage of rejected messages averaged at 15 to 17 percent. The response time averaged at 5 seconds which is greater than the time pseudo pilots require for data entry. It was found that the gender of the speaker did not influence the recognition, while the speakers emotional state inflicted by stress or frustration effected the recognition as well as the time of day. Fully synthetic speech was considered of limited use because it sounded unnatural and was hard to listen to. Speech synthesis based on playback of pre-recorded sentences was found to be convenient and acceptable. The Intent Monitoring System developed by Magnavox combined speech recognition and reasoning techniques to increase recognition performance in the ATC domain [Dunkelberger et al. 95]. A system that possesses knowledge about the intentions of the ATC controller could greatly facilitate his or her work. As the intentions are mostly encoded in the speech signal, the key to intent recognition is speech recognition. Promising applications for intent-aware assistance systems include target tagging and compliance monitoring. Magnavox used a two step process: In a first step, a speaker-independent continuous speech recognizer analyzed the spoken signal using a non-exact grammar including fuzzy sets and transcribed it into a Nbest list of sentence matches. The second step applied phrase spotting and reasoning techniques using situational information to correct recognition errors and to infer the relevant phrases, i.e. the relevant items of information in the utterances. Experiments were carried out based on instructions transcribed from tower control communications, which were spoken in everyday ATC manner, so that the phraseo9

Air intercept controllers control fighter aircraft during attack and landing maneuvers in the vicinity of ships. 31

Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation logy did not necessarily comply with ICAO standards. The results indicate a 90 percent phrase recognition using the speech recognizer plus the phrase spotting and reasoning techniques. As the sentence usually consisted of several phrases, the sentence recognition rate was much lower. A study at the School of Computer Studies at the University of Leeds, UK investigated the possibility of supporting controller pilot communication with automatic speech recognition [Atwell et al. 95, Churcher 96]. The idea was to retrieve relevant pieces of information from the ATC clearance and transmit them to the aircraft in a digital format either for direct aircraft control or as an additional information in case that the pilot understood the clearance incorrectly or incompletely. The ASR would be located at the ATC center rather than onboard of the aircraft for two reasons. First it was easier to install an ASR at the controller’s working position than to equip each aircraft in the control zone with a recognizer certified for airborne operation. Second the signal quality would be much better before the speech suffered distortions caused by radio transmission. A state-of-the-art speaker-independent continuous speech recognizer was used and tested under laboratory conditions. The ASR allowed the user to specify the relevant sentences in a syntax. After transcription and analysis of ATC transmission samples at Leeds airport, a generic language model was generated. The active vocabulary covered the words relevant for ATC transmissions. Three syntaxes were designed: ? a base syntax, allowing for any combination of words in the vocabulary ? a key-phrase syntax, defining relevant key phrases and allowing for any combination of words around the key phrases ? a corpus-based syntax defining phrases for each part of the transmission. A sample selection of ATC transmissions at Leeds airport was transcribed and read by test subjects. The recorded utterances were then decoded by the ASR using either of the three syntaxes. While the base grammar and the key-phrase grammar averaged at about 20 percent and 22 percent word recognition rate, the rate increased to 55 percent using the corpus-based grammar10. After a further restriction of the third syntax, the word recognition accuracy increased to 66 percent, the sentence recognition rate at that time being 23 percent. A system decoding three out of four clearances incorrectly would be a disturbance rather than an assistance for the pilot. The authors conclude:

10

The study used the word recognition rate, i.e. the number of correctly recognized words. As a complete sentence would only be decoded correctly if all words were decoded correctly, the percentage of correct sentences would be much lower. Chapter 3 – Automatic Speech Recognition Technology

32

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Thus our most important conclusion is that we have demonstrated that current state-of-the-art speech recognition is just not good enough for this domain [Churcher 96]. A study at the Eurocontrol Experimental Centre analyzed the performance of a speaker-dependent, word-based continuous speech recognition system provided with the TRACON/Pro ATC training simulator developed by Wesson Int. [Hering 95, David & Pledger 95]. The phraseology is limited to the ICAO standards implemented by the FAA in the United States. The ASR must be trained to the speakers voice in a session of between 35 and 40 minutes duration prior to use. Experiments were conducted with eight controllers each completing five sessions of 30 minutes duration. Seven of the eight test subjects were native English speakers. The scenarios were adapted to the skills of the controllers in order to obtain comparative levels of workload. On the average, a clearance was issued every 30 seconds during the simulations. The subjective workload was measured with different techniques in order to assess the intrusiveness of workload assessment techniques on the speech recognition performance. The measurements were taken either using the method of Instantaneous Self Assessment (ISA) or the Subjective Workload Assessment Technique (SWAT) and either entered using a keyboard or prompted by an experimenter. The fifth scenario did not measure the workload. Subjective workload was again measured after each experiment using the NASA Task Load Index (TLX) technique. Although the system manufacturer claims an average recognition rate of above 95 percent, the mean percentage of correctly identified clearances was only 81.2 percent during the Eurocontrol experiments. It was found that the online measurement of subjective workload was intrusive, i.e. the recognition performance decreased. The effect was stronger for the SWAT technique than for the ISA technique. Furthermore, the results indicate that an increased workload has a negative effect on the recognition performance. A subsequent study compared three commercially available speech recognizers using recordings of the controller pilot communication during simulations at the Eurocontrol ATC simulation facility [Hering 98]. As the spoken sentences included words yet unknown to the recognizer plus errors and aborted or interrupted utterances, and even a few advisories in the French language, the speech samples form what Hering describes as "worst-case conditions" for the recognizers. As the study aimed at the installation of a central speech recognition system in a simulation network, microphone-independent ASR systems were chosen that used the limited frequency range of standard telecommunications facilities. The recognition rates,
Chapter 3 – Automatic Speech Recognition Technology 33

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation accordingly were very poor and averaged between 26 percent and 39 percent word recognition rate. Female speakers exhibited slightly better results. A parallel computational approach to increase the accuracy of combined speech recognition and language processing in the ATC domain has been proposed by Chung, Moldovan and DeMara in 1993 [Chung et al. 93]. A commercial speech recognizer was used to understand phrases which were transcribed from ATC communications. These, in most cases, were not in compliance with the official phraseology. The system used semantic networks for knowledge representations and memory-based parsing techniques applied in a close interaction between low-level phoneme sequences and higher-level knowledge sources. The syntax was comparatively simple, consisting in approximately 1400 nodes with a vocabulary of 200 words. Tests indicated a strong increase in the decoding speed and an average recognition rate of 80 percent. The MINDS system (Multi-modal INteractive Dialogue System) developed at Carnegie Mellon University in 1989 combines natural language processing approaches with speech recognition for a problem solving dialogue [Young et al. 89]. The MINDS system has been designed for information retrieval from a database describing vessels of the US Navy. During experiments the users were confronted with a damaged ship and had to find out whether the ship could continue its mission or better be replaced by another ship. To solve this problem they had to retrieve information from the database, interacting with the system via voice, keyboard, and mouse. The system displayed information on a computer terminal and gave acoustic feedback. For the speech recognition task the SPHINX system designed by Carnegie Mellon University was used, one of the most powerful speaker-independent systems for large vocabulary continuous speech recognition. MINDS uses a vocabulary of circa 1000 words and limits the recognition search space using syntactic and semantic restrictions. Besides, it uses higher level knowledge sources to restrict the search space dynamically. This knowledge includes plans and goals, problem solving strategies, focus, ellipsis11, and user domain knowledge. The system possesses a knowledge base with domain concepts, hierarchical goal trees, and a user model. Based on the actual context and the dialogue history, MINDS uses its higher level knowledge to generate a set of predictions about probable sentences. These predictions are then transcribed into a semantic network using grammar subsets. Tests were carried out with sentences from a typical prob11

The interpretation of elliptic, i.e. grammatically incomplete sentences relies on use of contextual information such as the dialogue history, e.g. the use of the words ‘it’ in an actual context. Chapter 3 – Automatic Speech Recognition Technology

34

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation lem solving task as described above. The search space included 3.8 x 1019 sentences when complete semantic grammars were used without contextual information and was reduced to 1.0 x 109 sentences with predictions about probable sentences. The word error rate decreased from 17.9 percent to 3.5 percent. The major innovations introduced by MINDS are the use of multiple knowledge sources and the predictive use of contextual knowledge for the restriction of the speech recognition search space. Some of the ideas proposed by Young et al. and implemented in MINDS were transferred to the ATC domain during a prototype development at LIMSI/CNRS in France aiming at the introduction of speech recognition in the ATC training domain in order to replace pseudo pilots [Matrouf et al. 90a, Matrouf et al. 90b]. A speakerdependent continuous speech recognizer was applied which used word transition probabilities for pairs of words. Static knowledge sources involved a hierarchical representation of clearance structures based on Minsky’s frame theory (see chapter 6.1) plus a word confusion matrix for error correction during the recognition process. Dynamic knowledge sources involved the dialogue history and the context, i.e. the actual flight parameters of each aircraft in the simulation. The approach focused on limiting the search space of possible clearance categories, referred to as concepts, dynamically to improve the recognition accuracy. As proposed by Young et al. three levels of predictions were distinguished. Syntactic and semantic predictions limited the phraseology to the most commonly used sentences in the ATC domain and allowed only for meaningful instructions. The predictions were implemented by rules in order to dynamically estimate the word transition probabilities. Pragmatic knowledge focused on: ? Punctual exchange: When the system asks a question, the universe of responses is limited and the system can predict the response category. ? Dialogue history: Before speaking an instruction, the controller occasionally asks the pilot for the value of a parameter. If an utterance is a question concerning a category, there is a high probability that the next message belongs to the same category. ? Callsigns: Only aircraft presently in the sector are taken into consideration. ? Concepts: non-expert controllers use only certain concepts. Experiments were carried out with six speakers reading ATC instructions from a table and answering system inquiries in a simulation. Three ASR setups were tested. A traditional speech recognizer setup without transition probabilities served as the reference. The second setup used static word transition probabilities. The third setup
Chapter 3 – Automatic Speech Recognition Technology 35

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation used dynamically updated transition probabilities. Both the acoustic accuracy (accuracy of literal transcription of the sentence) and the semantic accuracy (accuracy of correctly identifying the meaning of spoken instructions) are depicted in Table 3-1.
Base system Acoustic accuracy Semantic accuracy 68% 84% System with transition probabilities 74% 96% System with dynamic transition probabilities 78% 96.5%

Table 3-1

Recognition accuracy in the experiments of Matrouf et al.

Results indicate that the benefits of using dynamic knowledge are comparatively small: the semantic recognition rate increased from 96 percent to 96.5 percent. Besides, the experimental design is representative for laboratory conditions rather than for operational service in air traffic control as the instructions have been read from a board rather then being uttered spontaneously. In a co-operation between LIMSI/CNRS, CENA, Sextant Avionique and Vecsys, the prototype developed at LIMSI was equipped with speech synthesis for pilot responses and connected to an air traffic control simulator [Marque et al. 93]. The syntax was extended to the French ATC phraseology, as it is common practice in France to use both the French and the English language in air traffic control. The effort aimed at the introduction of the system in ATC student training in France. According to personal conversations with the authors, the performance has been judged insufficient after initial experiments at Voisy and Orly airports. Especially the digit recognition was considered critical and the speaker training, required in consequence of the speaker-dependency, were considered an obstacle to its introduction. A successful application of speech recognition has been demonstrated for a knowledge-based pilot support system for IFR flights [Onken & Gerlach 93, Gerlach 97]. CASSY (Cockpit ASsistance SYstem) possesses several modules for autonomous flight planning, crew vigilance and crew support during the flight plan execution. The automatic flight planner calculates and optimizes flight plans which are proposed to the crew. The piloting expert forecasts pilot behavior expected for the execution of the flight plan. The pilot intent and error recognition module observes pilot behavior and tries to determine if deviations from the forecasts are either due to changes in the intentions of the crew or due to errors. In the first case the valid flight plan is adapted to the new intentions while in the latter case a warning is issued. Additional modules are the execution aid, monitoring module and the dialogue manager which constitutes the human-machine interface of CASSY.

36

Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation During flight planning and avionics tasks many eyes-busy and hands-busy situations occur. Therefore, speech recognition was identified as a valuable complement to the traditional ways of human machine interaction The dialogue manager of CASSY uses a speaker-independent, continuous speech recognizer for flight planning tasks, the control of avionics systems and for understanding ATC instructions read back by the pilot. To achieve the high recognition rates required in the aviation domain situation knowledge is used to dynamically compose the ASR search space. The syntax of possible sentences is constructed as an augmented transition network (ATN) which gives valid transitions between the words in the vocabulary so that sentences are legal paths in the network12. The transition probabilities in the network are then estimated in a situation dependent manner, using knowledge about ? valid flight plans, ? expected and deviating pilot behavior, ? flight state, aircraft system states, ? environmental parameters (such as ATC instructions), ? pilot inputs. The speech recognizer uses situation knowledge generated by the other components of CASSY. Commands the pilot or ATC could issue are represented in frames and used to construct the syntax. The speech recognizer is used in a mode in which it returns a network of word hypotheses as the decode. The utterance is divided into segments so that phonemes are issued as possible decodes for sequences of segments. Additionally, for each decoded word a score is provided representing the quality of the match. The identification of the spoken sentence consists in finding the path in the syntax network whose combined score, calculated from the score in the word hypotheses network and the syntax, is maximum. Evaluation under experimental conditions proved the approach to be very powerful. The command recognition rate increased from 77.3 percent to 96.7 percent for pilot instructions and from 62.3 percent to 87.0 percent for ATC instructions repeated by the pilot. Experiments with line pilots during typical flight missions showed an overall recognition rate of 85.5 percent.

3.4

Summary

Although the language in air traffic control is limited, standardized, and mostly unambiguous, the required recognition rate and the time pressure under which controllers work have so far impeded the successful implementation of speech recognition technology in ATC as well as ATC simulation. Although promising approaches have been
12

Augmented Transitions Networks will be discussed in chapter 6.1. 37

Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation presented during the last years, no satisfying solution has yet been found and additional efforts besides refining the sensory system and the pattern matching algorithms of speech recognizers appear promising. One reason why human performance in understanding spoken sentences is yet unparalleled by machines is the fact that humans, unlike machines, possess a broad knowledge about whatever subject they are currently communicating about, including situation knowledge. Accordingly, some approaches to improve the performance of automatic speech recognizers include the use of situation knowledge either in order to predict possible utterances or to interpret what has been said. Experiments have proved this approach to be quite powerful and feasible for further investigations.

38

Chapter 3 – Automatic Speech Recognition Technology

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

4

The Use of Context Knowledge for Automatic Speech Recognition Systems

Among other approaches, the restriction of the pattern matching search space is a powerful means to improve the speech recognition performance. The use of dynamic and situation-dependent knowledge in order to predict sentences the user may speak requires a dynamic assessment of the situation, a costly though rewarding approach. User models of information processing and problem solving can be used to implement knowledge about how the users actions are depending on situation factors. Past approaches to design cognitive models of air traffic controllers either focus on information processing or on raw task analyses. No existing model is capable of predicting what controllers will do in a certain situation. However, a prediction is required to support the speech recognition process. A variety of knowledge acquisition techniques are presented that can be used to identify controller expert knowledge, a prerequisite for its implementation in a model.

4.1

Approaches to Improve the Performance of Speech Recognition Systems

The application domain usually defines the requirements for an automatic speech recognition system. After an ASR system has been chosen and adapted to the application, initial tests are carried out in order to decide whether the recognition accuracy and response time are sufficient for successful use. As this is often not the case, continuous efforts are necessary to improve the recognition performance both for the development of new systems and in order to make optimum use of existing systems. The principles of automatic speech recognition are depicted in Figure 4-1 (compare Figure 3-1 on page 28). According to the different modules of automatic speech recognizers, a variety of approaches are possible to improve the recognition performance. Manufacturers of speech recognizers mostly focus on approaches that are directly concerned with the hardware and recognition algorithms, such as signal preprocessing, improvements of the speaker model and the word/phoneme databases, and background noise suppression. Refinements of Hardware and Preprocessing The acoustic processor converts analogue signals from the microphone into a digital format which is then again converted into a parametric description. Both the design of the acoustic processor and the algorithms for the parameter conversion can be enhanced to improve the recognition accuracy. However, technology has reached a level of saturation in these fields.
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 39

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Spoken Utterance

Acoustic Processor

Speaker Model

Parametric Representation

PATTERN MATCHING PROCESS

Decoded Utterance

Speech Rec ognizer

Dictionary

Syntax

Word/Phoneme Database Syntax Tool

Syntax Generation Tools

Sentenc es

Figure 4-1

The principles of automatic speech recognition (2).

Improved Speaker Models Most modern ASR systems are designed to be used by many speakers, each possessing different speaking habits. Even the speaking habits of an individual speaker may vary over the time and attributes such as pitch, velocity, and loudness change even within one utterance. The speaker model which describes the characteristics of the speech signal must consider all these variations. Usually, possible variations among a variety of parameters are specified by stochastic means. An efficient speaker model is crucial for the recognition performance: On one hand possible variations among a multitude of speakers must be taken into account. On the other hand, the description must be limited to the most common variations in order to gain a non-ambiguous specification of phonemes so that confusion with other phonemes are suspended as far as possible. Many efforts in the ASR research focus on improving the speaker models. Refined Word/Phoneme Databases Speaker-independent systems use databases containing acceptable pronunciations of the words or phonemes of a language13, based on speech samples of large populations of potential users. A representative and sufficiently large population of speakers is crucial for a good recognition performance. In some cases improvements can be obtained by collecting additional speech samples and adapting
13

ASR systems are usually based on phoneme or word databases (see chapter 3.2). Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

40

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation the database. For example the pronunciation of Germans speaking English may vary from those of native English speakers in a specific way. If an English recognition system is used in a domain where German speakers communicate in the English language, its performance may be enhanced by collecting a new database or adapting the existing database with speech samples of German speakers. However, collecting and analyzing large quantities of speech samples is a costly process. Background Noise Suppression Most speech recognizers use noise suppressing microphones in order to eliminate distortions of the speech signal caused by background noise. As the wave patterns of noise signals are generally different from those of speech signals, noise models can further be used to describe background noise mathematically and to filter it from the relevant acoustic signals. For reasons of competition, the algorithms of the speech recognizer are generally not made available to the customers who are therefore not able to modify the ASR itself and basically focus on adapting existing speech recognizers. Approaches such as decode post-processing, redundancy and syntax adaptation are mostly used by customers who wish to integrate an ASR system into a specific application. Decode Post-Processing Analyzing the decode after the recognition has been completed can help to identify at least some recognition errors. For example, some applications use word confusion matrices containing pairs of words that, due to their phonetic similarity, are frequently confused. If a sentence is decoded that is grammatically incorrect but contains a word that is frequently confused with another word and if after exchanging one word for the other the sentence would be correct, it is likely that the words have actually been confused during the decoding process. Speech recognizers that return a detailed record of the decode facilitate post-processing. For example, the ASR may return a so-called N-best list containing a number of possible decodes that match the spoken utterance best, including the respective levels of confidence for each decode. The possible decodes may even be returned as sequences of words or phonemes including word/phoneme decode confidences. A successful application of both the word confusion matrix and post-processing of weighted phoneme chains has been presented by Gerlach [Gerlach 96]. Redundancy By using at least three different speech recognizers and comparing their decodes, the overall recognition performance can be improved. If at least two of the three systems return the same decode it is accepted, otherwise the decodes are rejected and the system prompts for the utterance again. A study described by Barry et al.
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 41

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation revealed that this approach holds the potential for improving the recognition accuracy when three ASR of approximately the same performance are used. However, at least three different recognition systems and an additional module to compare the decodes are required [Barry et al. 94]. Adaptation of the Syntax During the pattern matching process the spoken utterance is compared to each sentence specified in the syntax. There are basically two kinds of errors that may occur during the pattern matching process: ? Lack of sentences: The search space does not contain the utterance. In this case the decode is refused or a different but phonetically similar utterance is returned as the best match. ? Confusion of sentences: The spoken utterance is contained in the search space but confused with a phonetically similar utterance which is then returned as the best match. Figure 4-2 schematically depicts the likelihood of occurrence of the two errors in relation to the size of the search space, i.e. the number of sentences in the syntax. The error due to a lack of sentences in the search space can be eliminated if the syntax contains every sentence the user may ever say. Unfortunately, this increases the number of phonetically similar sentences, resulting in a higher likelihood of confusions. The syntax should on the one hand contain all relevant sentences and on the other be restricted as far as possible so that the sum of both errors is minimized.

recognition error rate error due to confusion of sentences

error due to lack of sentences

size of the search space

Figure 4-2

Two types of recognition errors (schematically).

42

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

4.2

The Use of Static and Dynamic Knowledge for the Syntax Definition

The SPHINX system developed at Carnegie-Mellon University is considered to be one of the best speech recognition systems worldwide. But even the SPHINX system has a word recognition error rate of nearly 30 percent for speakerindependent, connected speech recognition when recognizing word sequences generated at random from a vocabulary of 1000 words, i.e. without using knowledge about the structure of correct sentences [Young et al. 89]. Higher level knowledge sources are applied to reduce the number of valid sentences, i.e. legal sequences of words in the syntax. Young et al. propose a distinction of three different categories of knowledge: ? syntactic knowledge: knowledge about the structure of grammatically correct sentences in a language, ? semantic knowledge: knowledge about sentences that are meaningful in a certain domain, ? pragmatic knowledge: knowledge about the context and the actual system state. Although the basic ATC vocabulary consists of only 200 to 300 words, composing sentences from any sequences of these words would result in an enormous search space. Table 4-1 illustrates how the three types of knowledge may be applied to restrict the number of valid sentences in the air traffic control context.
Syntactic Knowledge (structure of grammatically correct sentences) Phraseology of a Descend Clearance: <Callsign>, DESCEND TO FLIGHT LEVEL <FL> Semantic Knowledge (knowledge about meaningful sentences) Flight Levels in the lower airspace: between 100 and 240 in steps of 10 <Callsign>, DESCEND TO FLIGHT LEVEL <100|110|...|240> Pragmatic Knowledge (knowledge about actual state) The present FLIGHT LEVEL is 130, descends are valid to flight levels below <Callsign>, DESCEND TO FLIGHT LEVEL <100|110|120>

Table 4-1

Syntactic, semantic and pragmatic knowledge.

Syntactic and semantic knowledge is static and must only once be obtained and implemented into the syntax. As the structure and parameter ranges of ATC clearances are standardized by international conventions, most approaches to introduce speech recognition in air traffic control use syntactic and semantic knowledge. Pragmatic knowledge is dynamic and its use requires to analyze the situation and
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 43

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation derive estimates about sentences that are likely in the actual situation. As the situation analysis must be performed continuously and the results must immediately be transmitted to the speech recognizer, a considerable effort is required. One approach to implement pragmatic knowledge is the definition of syntax subsets. According to the actual situation, an appropriate subset is loaded and used for the decoding process. This approach is referred to as context-switching and often used in applications with simple dialogue structures such as call-centers where, according to the dialogue context, only a small set of words or sentences is relevant. Domains with a higher complexity often do not permit the use of syntax subsets associated with distinct situations. Rather than to classify complex situations, it is often more promising to describe them by a number of parameters. A first step in generating a situation-tailored syntax consists in ignoring clearances that are syntactically and semantically correct but not valid in the actual context. For example, clearances to aircraft which are currently not in the control sector may be excluded from the search space. Further, descend clearances to aircraft within the control sector may be limited to altitudes below the current altitude, etc. Whether an ATC instruction is realistic in the actual situation can generally be derived from the actual aircraft parameters and the airspace structure. A discrimination between realistic and less realistic instructions is a deterministic decision and can be formalized and implemented with moderate effort. Controller activities are not a random selection of all actions that are physically possible but an application of specific procedures in order to achieve specific goals and, therefore, the search space can be reduced even further. If it were possible to estimate the probability of occurrence of ATC instructions as a function of parameters describing the actual airspace situation, the search space could be limited to the most probable instructions. However, predicting which instructions the controller may apply in a certain situation requires a deep understanding of his or her work. One approach to obtain and implement knowledge about user behavior consists in the employment of statistical means. Provided a sufficient quantity of data can be collected and the major determinants of the user’s behavior can be identified, the frequency of specific instructions equals the probability that it will occur under similar conditions in the future. Statistical analysis is a powerful tool for knowledge acquisition and a major advantage is that it can also be used if little or nothing is known about the domain under investigation. However, domains in which the system behavior is determined by a great number of parameters require the collection of huge amounts of data and it is often not possible to identify which parameters are of

44

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation particular relevance. In the field of air traffic control additional means apart from statistical analysis are required to predict user behavior with sufficient reliability. It may be rewarding to analyze and understand the processes involved in air traffic control and to implement this knowledge in a model in order to derive an estimation what the controller can be expected to do under certain conditions. Even if the effort is considerable, a user model is a powerful tool as it allows to integrate knowledge from a variety of sources to provide a more or less precise estimate for the majority of situations, even for those, for which little or no statistical data is available. As a starting point, a detailed analysis of the ATC controller’s work and the cognitive processes involved in air traffic control are required. Psychological user models contribute to understanding the basic mechanisms of human information processing and problem solving. The review of these studies will contribute to the design of a model as required to support speech recognition in the ATC domain.

4.3

Cognitive Models of Human Performance

Not only among psychologists but also among engineers and software designers a variety of motivations exists to develop cognitive models of human behavior. Psychological research is interested in the study of cognitive processes involved in human thinking and problem solving in order to understand human behavior. The results of psychological research are often used by other disciplines of science, such as ergonomics or human engineering. A detailed understanding of human problem solving permits an adequate design of the human-machine interface with regard to an optimum selection and presentation of information. A more intuitive understanding of the technical processes the user wishes to control can be obtained if the presentation of the required information corresponds to the internal model he possesses of his environment. Even a predictive evaluation of the quality of human-machine interfaces is possible if the mental processes and required information can be derived from the task. Models of cognitive processes may also be implemented as computer-based systems serving as intelligent agents to which the user delegates a part of the planning or execution tasks or as supervisory systems. The human operator thus can be freed from tedious routine occupations with more cognitive resources remaining for higher level tasks. A model of human information processing proposed by Wickens is depicted in Figure 4-3 [Wickens 92]. Physical stimuli are captured by the sensory apparatus and remain in the short-term sensory store for a period of 0.1 to 0.5 seconds after the stimulus
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 45

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation has terminated physically [Johannsen 93]. The stimuli are then retrieved during the process of perception in what Wickens refers to as a "many to one mapping", that is, each stimulus is assigned to one perceptual category. As the process of perception
Attention resources

Sensory Processing Receptors

Stimuli Perception

Decision and response selection

Response execution

Responses

Short-term sensory store

Working memory

Long-term memory Memory

Feedback

Figure 4-3

Model of human information processing [Wickens 92].

depends largely on the individual experiences no unique deterministic correlation exists between a physical stimulus and its subjective perception [Johannsen 93]. Once the stimulus has been categorized, the operator must decide upon appropriate action. The decision process may take place on different levels of elaboration ranging from problem solving to reacting automatically. After a response has been chosen it must be executed. The outcome of the action is consequently monitored and the corresponding events are perceived again as they cause physical stimuli. Attention must be attributed to many of the steps following the short-term sensory store to function properly. Attention may thus be regarded as a "searchlight"-like means to select relevant pieces of information and to derive and execute the appropriate responses. Also, attention must be regarded as a resource of limited capacity and its allocation as a process of conscious or unconscious weighting of information, events and mental processes.

46

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation While the model proposed by Wickens is mostly concerned with information processing, a model proposed by Rasmussen and depicted in Figure 4-4 focuses on the problem solving process. Rasmussen distinguishes three levels of performance of skilled human operators that, according to their level of experience, are relevant for their interaction with the environment [Rasmussen 83, Rasmussen 86]: ? Skill-based behavior represents sensory-motor performance which takes place as highly integrated patterns of behavior without conscious control. In routine situations, little or no attention is directed towards skill-based behavior. ? Rule-based behavior applies to familiar situations for which stored rules or procedures exist that have been learned during instruction, derived empirically or composed in a process of conscious problem solving during previous encounters of similar situations. The goals are typically stored implicitly in the rules or procedures. Rule-based behavior is generally based on explicit knowledge and controlled consciously. ? Knowledge-based behavior is applicable when unfamiliar situations are encountered for which no stored rules, procedures, or skill-based reactions are available. In these situations, the behavior involves problem solving in a goal-driven way, using the explicit knowledge and information the human possesses about his environment. Plans developed during a process of knowledge-based problem solving may be stored as rules for future encounters of similar problems, while rules executed frequently may take the shape of skills and be performed in an increasingly automated manner.

Goals Knowledge based Behavior
Symbols

Identification

Decision of task

Planning

Rule based Behavior
Signs

Recognition

Association State/ Task

Stored rules for tasks

Skills

Signs

Feature formation
Sensory Input Signals

Automated sonsori-motor patterns
Actions

Figure 4-4

Model of three levels of performance [Rasmussen 83].

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

47

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation A model of the acquisition of human expertise has been proposed by Anderson [Anderson 83 in Kolrep 96]. ACT (Adaptive Control of Thoughts) distinguishes between declarative and procedural knowledge. Declarative knowledge is knowledge about objects and facts while procedural knowledge involves production rules with conditions and actions. Production rules are applied in a two-step process: the first step is a selection of procedures with conditions matching items in the working memory. The second step is a decision which production rule shall be executed, taking into account the quality of the match of the conditional part and the strength of the production rule which reflects how often the rule has previously been executed successfully and therefore, "how easy it comes to the individual’s mind". The acquisition of knowledge includes three phases: ? During the cognitive phase, declarative knowledge is used in combination with more general strategies to control the problem solving process. The sources for declarative knowledge include written and aural instructions and manuals. ? During the associative phase declarative knowledge is converted into domain specific production rules. Two kinds of learning processes exist for these conversions: ? Proceduralization: The conditions under which a successful action has been executed are directly linked to the action to generate a new production rule. ? Composition: Two or more production rules that are frequently used successively are combined to one single rule. ? In the autonomous phase, production rules are adapted by tuning mechanisms: ? Generalization: The conditional part of a production rule is generalized if the rule turns out to be successful under other than the original conditions. ? Discrimination: If necessary, the context of validity of a production rule is reduced by using more narrow conditions. ? Strength adaptation: The specific strength of each production rule is adapted according to the frequency of its successful application. Although developed in a different context, the ACT model shows striking similarities to the model proposed by Rasmussen, as the cognitive phase, the associative phase, and the autonomous phase correspond to what Rasmussen refers to as knowledge-based, rule-based, and skill-based behavior. ACT explains mechanisms by which knowledge involved in the problem solving process is refined, adapted and converted into elaborate and less attention consuming forms by frequent use. Figure 4-5 depicts a conceptual model of operator’s tasks based on Rasmussen's model which has been proposed by Rouse & Rouse. This model served as a basis

48

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

Observation of System State

No Problem Evident? Yes Choice of Hypothesis

Testing of Hypothesis

Choice of Goal

Choice of Procedure

Execution of Procedure

Figure 4-5

Conceptual model of operator’s tasks [Rouse & Rouse 83].

for a human error classification scheme that would permit to attribute errors to the step in the problem solving process during which they occurred [Rouse & Rouse 83]. After the relevant information has been retrieved, the user decides whether a problem exists, i.e. an unforeseen and unwanted situation has occurred for which no standard procedure is available. Provided there is a problem to solve, the user first constructs a hypothesis about the state of the system or environment he interacts with. He or she then tests the hypothesis by comparing conclusions derived from the hypothesis with the observations in the real world. After a goal is selected a procedure is chosen that correlates to that specific goal and the procedure is executed. Provided no problem exists that requires explicit problem solving, adequate proce-

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

49

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation dures are chosen and executed directly after observation of the system state. This loop corresponds to rule based behavior described by Rasmussen.

4.4

Models of Air Traffic Control as a Cognitive Process

According to a definition given by D?rner, air traffic control may be called a complex situation [D?rner 95, D?rner et. al. 83]. Aspects of complex situations are: ? Complexity and interdependency: A multitude of aspects and variables exist in the systems the user wishes to control and at least some of the variables are mutually interdependent so that actions executed to influence one aspect of the system often have unwanted impacts on other aspects. For instance, reducing the speed of one aircraft in order to maintain a safety separation to the preceding aircraft may have an unwanted effect on the separation to a third aircraft behind. ? Opaqueness: Not all system variables can be monitored directly, some must actively be retrieved while yet others can only be guessed. An air traffic controller may, for example, ask pilots about the minimum and maximum speed or about the climb performance of their aircraft. However, he or she may only guess about the meteorological conditions in a certain area. ? A dynamic system, i.e. a system that changes its state even without input from the operator often causes time pressure. The need to obtain complete and valid information and to decide upon an optimum action often competes with the necessity to react under time pressure. Air traffic is a highly dynamic system, even further complicated by the inertia in the pilot-controller collaboration. ? Incomplete system knowledge: The operator's knowledge about the system he or she wishes to control may be incomplete or even partially incorrect. Even if air traffic controllers are highly skilled, they often cannot have exact knowledge how a specific aircraft type may perform in a certain maneuver. Moreover, how and how quickly the aircrew will react to ATC instructions depends on individual factors the controller can hardly know. ? Conflicting goals: The overall-goal to guide each aircraft safely and efficiently through the sector results in a multitude of quickly changing subgoals. Often, actions executed in favor of one goal have negative effects on other goals. Experiments to find out how the control strategies of ATC controllers change with an increase in the traffic density have been discussed by Spérandio [Spérandio 77, 78 in Kolrep 96]. In situations of moderate traffic density, the controller tries to permit maximum efficiency and comfort in addition to the primary goals to avoid unsafe situations and to maintain safety separations. However, with an increase in traffic he
50 Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation or she abandons all non safety-relevant goals in order to guarantee safety. While the controller attends to each aircraft in an individual way during periods of low traffic density, groups of aircraft following standard procedures with equal speeds are generated when the traffic density increases. With a further increase, he or she returns to standard procedures such as assigning holding patterns. This adaptation of strategies to the traffic situation permits the controller to avoid too high workloads. A multitude of dynamically changing parameters determines the traffic situation and the controller must continuously update his or her mental representation. Whitfield & Jackson refer to the air traffic controllers picture as a structured and continuously updated mental representation of the traffic situation [Whitfield & Jackson 82]. Anderson demonstrated that as a consequence of experience, a variety of unconscious mechanisms are developed that permit an efficient use of the working memory [Anderson 93]. For example, experienced chess players are capable to remember a greater number of positions than novice players, as they mentally group sets of three to five pieces. It is very probable that memory organization mechanisms such as these also apply to air traffic control. Leroux claims that two mechanisms of monitoring the traffic situation exist that permit the controller an efficient use of his working memory [Leroux 91]: ? For aircraft towards which, due to their safe distance from other traffic, no particular attention must be directed, sentry parameters are monitored, such as the cleared flight level or the expected altitude when reaching a waypoint. The compliance with the sentry parameters is monitored occasionally while the actual parameters are not explicitly remembered. ? Additional attention is directed towards aircraft which deviate from the expected behavior or are in close proximity of other traffic. For these aircraft, relevant parameter monitoring applies, i.e. the actual parameters are monitored frequently and stored in the working memory. Not only the present situation must be monitored, but the future state and behavior must also be estimated in order to detect possible conflicts and to derive adequate control advisories. As this involves the dynamics of multiple aircraft in three dimensions, a great percentage of the air traffic controller’s mental capacities is directed towards visuo-spatial tasks of high complexity. Isaac found that the capabilities of three-dimensional mental representation among ATC controllers are significantly higher than among non-controllers [Isaac 94 in Kolrep 96]. He also found a good correlation between their 3-dimensional memory and their skills as estimated by their superiors.

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

51

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Redding, Cannon & Lierman presented an analysis of the tasks involved in the work of air traffic controllers [Redding, Cannon & Lierman 91]. They differentiated between the control and problem solving strategies of novice, intermediate, and expert controllers and derived implications for the training of controller disciples. The development of a mental model14, i.e. an internal representation of the environment, plays a crucial role for the acquisition of skills. Lenorovitz et al. identified a variety of factors influencing the work of controllers, including his or her experience and the size and structure of the sector [Lenorovitz et al. 91]. An implication of their findings is that the ATC controller’s working position should be customizable in order to meet the unique demands of each individual controller working in a specific air sector. Detailed task analyses of air traffic control have been presented by various authors [Seamster et. al. 93, Vortac et. al. 94 in Kolrep 96]. Most of the controller’s activities are directly correlated to one of the following tasks: ? monitoring the present traffic and the compliance of aircraft with instructions, ? estimating future positions and testing whether conflicts may arise in the future, ? deriving adequate control advisories to direct aircraft along preferred routes and to avoid conflicts, ? communication with pilots and with other controllers, ? entering data to flight strips and computers. An early attempt to sketch the mental processes involved in air traffic control has been proposed by Bisseret. Without experimental proof, Bisseret argues that the main mental activity of the ATC controller was a categorization task [Bisseret 71]. From the present state of single aircraft, the controller would estimate the future relative positions of pairs of aircraft and classify these pairs as either possibly conflicting or non-conflicting. Flow-charts were drawn for this reasoning process, with flight levels and relative positions on top of a hierarchical structure of information associated with each aircraft. According to the mental flow-chart, the operative memory of the controller is organized by and for the reasoning processes. In experiments set up to test the operative memory, Bisseret found that the number of attributes used to memorize aircraft averaged at three, independent on traffic density and controller qualifications. The number of non remembered aircraft decreased with higher qualification. A decomposition of the mental tasks of air traffic controllers has been proposed by Winter and Van Hood [Winter & Van Hood 95]. The analysis is based on the recognize-act-cycle (or stimulus-response-cycle) [Boff et al. 86, Kantowitz & Sorkin
14

A mental model refers to an internal representation of the environment including it's dynamics and causal relations and provides the basis of the user's interaction with his or her environment. As opposed to that, a cognitive model refers to an external model of cognitive processes. Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

52

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation 83] according to which the goal-directed interaction of an operator with his or her environment can be modeled according to the discrete steps of Monitoring, Diagnosis, Plan Generation, Plan Selection, and Plan Execution. The operator’s activity is directed towards two meta-goals: ? Establishing and maintaining safe separation between aircraft ? Providing flight path (or ground path) guidance information/instructions to the aircraft. During the situation assessment phase the actual state is monitored, i.e. the operator updates his or her mental model of the airspace situation. In case that the actual state of the system deviates from the desired or expected state, the diagnosis function is activated to analyze the situation of specific aircraft as well as the overall airspace state. Hypotheses about reasons for unexpected behavior are constructed and assumed or rejected. If operator interference is required in order to meet the above mentioned meta-goals, one or more plans are generated. If more than one plan is apt to attain the desired future system state, one plan must be selected. As the air traffic consists of a multitude of interdependent sub-systems, each plan must be carefully analyzed with regard to wanted or unwanted impacts on other traffic participants. The execution of a plan usually consists in speaking the appropriate instructions to one or more aircraft. Freed and Johnston focused on simulating cognitive processes of the man-machine collaboration to evaluate of new ATC technologies [Freed & Johnston 95]. A model predicting the occurrence of errors and the time and attention requirements that the completion of each task causes among the different sensory and cognitive modalities could help to assess the performance of the human-machine interaction and serve as a complement to costly simulations with controllers. The major components of a model proposed by the authors are concerned with: ? knowledge acquisition, ? action control, ? vision and visual attention, ? speech production, ? hand movements. Whereas mechanisms of visual attention, information retrieval, and action control are central to the model, no attempts are discussed to predict controller behavior in terms of decision making. The model is limited to two cognitive modalities: the more elaborate "recalling and executing routine plans" corresponds to the rule-based behavior described by Rasmussen while "mapping perceptual events to motor responses" corresponds to skill-based behavior.
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 53

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation As a means to study man-machine interaction and evaluate human-machine interfaces in air traffic control, Leroux tried to create an explicit cognitive model of the ATC controller and implement it as an expert system [Leroux 91]. He subdivided the controller’s knowledge base into the categories meta-knowledge, declarative knowledge and situation knowledge, the latter containing all the information that would change during the control process. The memory model distinguishes the working memory that contains all the relevant information and the operative memory for the presently non-relevant information. A memorization module updates the memory and classifies pieces of information as relevant or non-relevant. As discussed above, sentry-parameter monitoring applies to aircraft or pairs of aircraft for which no conflicting situation is expected, while relevant parameter monitoring applies for possibly conflicting aircraft. The implementation of the model into a fuzzy-set based expert system is mentioned, but no details or experimental results are given. The goal of the research project EnCoRe (Enroute Controller’s Representation) at the Technical University of Berlin was to develop a model of the dynamic mental representation enroute ATC controllers possess of the air traffic situation. The allocation of attention is central to the model MoFl (Model der Fluglotsent?tigkeit Model of Controller Behavior) which has been developed during the project. The main components of MoFl are depicted in Figure 4-6. In accordance to Whitfield &
CONTROL PROCEDURES TIMING UPDATE

PICTURE
ANTICIPATION events Decision CONFLICT RESOLUTION Alternative Solutions objects with signal features focal extrafocal Sector knowledge objects

Anticipation

Decision

Action DATA SELECTION

Data e.g. radar screen, flight strips, communication

Figure 4-6
54

Components of the controller model MoFl [Niessen et. al. 97].

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Jackson the term picture was chosen for the mental representation of the traffic situation. The picture is divided into a focal part and an extrafocal part. The focal part contains aircraft of specific relevance, such as possibly conflicting aircraft, aircraft during climb or descent maneuvers or aircraft that recently entered the sector. The extrafocal part of the picture contains aircraft without particular relevance due to less critical operations. The contents of the focal part of the picture are updated more frequently and are more attention-demanding than the extrafocal contents. Three major components of information processing operate on the picture: ? The monitoring cycle includes data selection from different sources of information plus a regular update of the aircraft attributes. Due to pre-attentive signal extraction of their attributes, aircraft are classified as either focal or extra-focal. ? The anticipation cycle operates on the focal part of the picture, estimating the future states of the respective aircraft. The anticipation is based on a dynamic mental model involving spatial, temporal and causal relations between the objects under consideration. Provided a potential conflict is detected between two or more aircraft, an event is generated in the focal picture. ? In case of a potential conflict, the conflict resolution cycle initiates a timing process to decide which conflict has to be resolved first. Then solutions for these conflicts are generated or recalled and a decision is made about the most adequate solution. This solution is then executed. Hints have been found that aircraft are mentally grouped and that for aircraft in close vicinity short-term information, such as the actual flight level and speed, are memorized, while for other aircraft mid-term parameters are memorized, e.g. the cleared flight level [Bierwagen et. al. 94]. MoFl has been implemented in ACT-R, which was designed for the computer-based implementation of Anderson’s Adaptive Control of Thoughts (ACT) theory. The model allows to monitor the information processing cycles in a conflict-free enroute scenario [Niessen et. al. 97, Bierwagen et. al. 97].

4.5

Expert Knowledge Acquisition

Air traffic controllers are ‘experts’ which means that much of their knowledge is available on an intuitive or even unconscious level. Nisbett and Wilson investigated the human capability of introspection for situations in which intuitive decisions are involved [Nisbett & Wilson 77]. They conclude that " ... when people attempt to report on their cognitive processes, that is, on the processes mediating the effects of a stimulus on a response, they do not do so on the basis of any true introspection. Instead, their reports are

Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

55

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation based on a priori, implicit causal theories, or judgments about the extent to which a particular stimulus is a plausible cause of a given response." Apparently, the mechanics of decision processes cannot be observed reliably by asking subjects about the reasons for their decisions. The problem of expert knowledge acquisition is well recognized in the field of artificial intelligence where, driven by the desire to make expert knowledge available at multiple locations and for multiple tasks at the same time, expert systems have been developed. Before implementing expert knowledge in computer-based systems, it must be acquired and structured. Knowledge engineering techniques have been developed for the acquisition and analysis of expert knowledge. These techniques include: ? Introspection: The subject reflects about his or her own mental processes and the reasons that lead to decisions or actions. While introspection had been widely used in the early days of psychology, the limitations of this approach have lately become transparent. The reflection about mental processes may sometimes interfere with these processes and a great percentage of human knowledge is resident in an intuitive or unconscious way, so that it can hardly be verbalized. As Nisbett and Wilson point out, subjects may sometimes be capable to give reasons for their behavior that sound sensible but do not necessarily reflect the mental processes [Nisbett & Wilson 77]. ? Think aloud protocols: The subject verbalizes every thought or consideration that comes to his mind while solving a problem. He or she is instructed not to reflect about these utterances or to structure them in a way considered desirable but to say freely whatever comes to his or her mind. This method is based on the assumption that thinking often takes place in the form of a mental dialogue. However, when intuitive knowledge and semi-automated patterns are applied, subjects are mostly unable to think aloud. Besides, this technique may interfere with tasks that require verbal communication. ? Inquiries and questionnaires: The subject is asked about specific aspects of his or her behavior in a standardized form. The questions must be restricted to such fields and posed in such a manner that the subject is able to answer appropriately. In order to avoid interference with the primary task, inquiries and questionnaires are mostly used off-line. ? Direct observation: The experimenter observes the subject’s behavior and takes notes of his of her observations. This helps to answer questions or construct hypotheses about behavioral aspects independent on the subjects capability of verbalization. However, direct observation is limited to physical processes and only very indirectly helps to construct hypotheses about mental processes.
56 Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation ? Teach back techniques: The subject is asked to instruct another person who accomplishes the task under observation and to give detailed reasons for every instruction. The advisories are collected and transcribed for later analysis. However, teach back experiments are mostly limited to problem solving for which explicit knowledge exist. If the task itself requires verbal activity, teach back approaches can affect the problem solving process itself. ? Statistical analysis of experimental protocols: If both the parameters classifying the situation and the user’s responses to that situation can be logged, these protocols can be analyzed statistically. This does not permit an observation or even an understanding of the mental processes underlying the user’s behavior. However, correlations between situational data and responses may be detected by statistical means. ? Data Mining: Advanced statistical methods such as data mining (or knowledge discovery in databases) use a variety of algorithms to retrieve knowledge from huge databases. Data mining can be employed to detect hidden patterns in the database, to classify and cluster objects and to generate models of the objects described by the data. All methods of knowledge acquisition possess specific advantages as well as disadvantages (for a detailed discussion of these techniques see [Engelhardt 97]). Therefore, a combination of several methods of knowledge acquisition has been used to identify the knowledge relevant for controllers' behavior.

4.6

Summary

A cognitive model of the air traffic controller would be a great help for supporting the speech recognizer with situation specific knowledge. A cognitive model could perform an estimation of those clearances that are possible and probable in the actual air traffic situation, in order to limit the speech recognition search space to the most probable sentences. However, this requires a very powerful cognitive model because an estimation must be performed in runtime, on the one hand restricting the search space as far as possible, while on the other hand only removing those sentences that with great probability will not be used. Existing models mostly focus either on the information processing aspects or provide a very raw task analysis of air traffic control. No existing model performs a prediction or estimation what an air traffic controller may actually do in a certain situation. However, this function is strictly required in order to support the recognition process. A variety of knowledge acquisition techniques are available and can be used to identify knowledge of air traffic controllers.
Chapter 4 – The Use of Context Knowledge for Automatic Speech Recognition Systems 57

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

5

A Cognitive Model of the ATC Controller

The Cognitive Controller Model (CCM) is proposed as a model for rule- and skillbased mental processes involved in standard and routine situations of air traffic control. Corresponding to existing cognitive user models CCM consists of three functions for situation observation and classification, decision making, and execution of actions. Rather than to aim at a deterministic forecast, CCM provides a selection of the most probable user actions. Knowledge acquisition techniques are discussed that help to identify the knowledge required for the model and it is found that a combination of several techniques should be used. Besides, it seems reasonable to limit the scope of CCM to one specific ATC sector, because the controller’s activity is to a large degree dependent on the sector geometry. The mental processes involved in situation observation and classification, decision making, and execution of actions are discussed in detail, based on simulations in an enroute sector.

5.1

The Cognitive Controller Model (CCM)

Existing models of cognitive processes in air traffic control mostly focus on information processing and attention allocation aspects or provide raw task analyses. A model designed to support the recognition process must generate a prediction about the most probable user actions and should therefore focus on behavioral aspects. However, even if the internal processes are not of primary interest, they must be analyzed attentively, as it seems that black-box approaches are not very promising for the ATC domain. In [Bisseret 71] Bisseret states that: "... a knowledge of external behavior alone seems insufficient: the controller’s actions are fairly infrequent and simple; his external activity does not add much to his load. Even the verbal communication remains intrinsically not very complex. It is rather the operation of the mental processes resulting in these messages which sets up the real control complexity." Apparently, modeling the decision processes is the most promising approach in order to predict controller behavior. However, there are always several solutions to one problem and it cannot be decided objectively if and at what time actions are required. A variety of parameters influences the decision process and, due to individual preferences and expertise, each controller demonstrates a unique pattern of behavior. Deterministic predictions what individual controllers will actually do in a certain situation, therefore, are very difficult to accomplish. A user model must take into account as much knowledge about cognitive processes as necessary. The quality of the model, however, is determined by the quality of the
58 Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation clearance predictions rather than by the quality with which the cognitive processes are mapped. The model must meet the following requirements: ? Generation of a situation-tailored prediction of the most probable ATC instructions, as restricted as possible and as large as necessary in order to include the largest percentage of advisories. ? Translation of ATC instructions into the sentences, i.e. sequences of words, the controller is expected to speak. ? Implementation in a computer-based model, which generates predictions continuously and in real-time. Casner investigated the predictability of ATC clearances during line flights in the United States, i.e. the likelihood that an ATC clearance in a certain situation could be anticipated by pilots due to earlier encounters of a similar situation [Casner 94]. A considerable number of instructions could be foreseen by pilots who had been flying on this route before, while the predictability depended on the flight phase and the sector geometry. He concluded that the controllers’ actions are determined by situational factors and individual factors at roughly 50 percent each. Even if Casner’s work demonstrates that it is hardly possible to predict an individual controller's actions, it illustrates the benefits of situation knowledge. Controllers are highly skilled and trained; during normal operation very few situations occur that require explicit problem solving. Most situations have been encountered and dealt with before, so that, in terms of Rasmussen’s model, most controller activity can be described as rule-based and skill-based activity. A great percentage of controllers activity consists of visuo-spatial representation and tasks of mental arithmetics associated with system observation and preparation for the decision making. The Cognitive Controller Model (CCM) is proposed as a model of the cognitive processes involved in air traffic control. CCM focuses on ‘standard situations’ in which procedures apply that have been learnt or derived at an earlier encounter of a similar situation. CCM does not map explicit problem solving. Also, CCM is limited to the executive controller and does not consider the more strategic conflict checking and problem solving tasks of the planning controller. The Cognitive Controller Model consists of three functions that correspond to three steps in the model proposed by Rouse & Rouse (see Figure 4-5, page 49), provided that no problem exists that would require knowledge based problem solving. In this case the model is reduced to three modules: Observation of System State, Choice of Procedure, and Execution of Procedure. Accordingly, CCM possesses three functions:

Chapter 5 – A Cognitive Model of the ATC Controller

59

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation ? Observation Function: The relevant parameters of each aircraft in the control sector are observed, such as position, flight level, heading, speed, etc. Additional information, such as aircraft type, flight plan, and destination is assessed in order to obtain a complete picture of the situation. Based on these parameters, the dynamics and positions of each aircraft are estimated during a certain period of time in the future. This allows to estimate whether and at what time a conflict may occur between two or more aircraft and if actions are required. ? Decision Function: After classifying the actual situation, adequate procedures are chosen. These procedures are directly linked to the characteristics of the situation. The choice of procedure involves determining a suitable control advisory and the corresponding parameters. According to the aim of the model, the result of this function is a selection of the most probable clearances. ? Phraseology Function: After an adequate clearance has been determined it must be executed, i.e. the control advisory must be verbally transmitted to the aircraft. The phraseology function determines which sentences the controller may speak to express his or her instructions.

5.2

CCM Functional Architecture

Figure 5-1 depicts the introduction of the Cognitive Controller Model CCM into the ATC simulation and speech recognition environment (compare Figure 2-13 on page 19). CCM continuously observes the simulation and keeps track of dynamic data

Voic Communic e ation

Radar Display

Automatic Speec h Rec ognizer Airc raft Parameter Control Inputs Airc raft Parameters
...

Speec Synthesis h

Syntax Compiler

ATC Simulation ATC Controller Flight Plans Airspac e Structure

Syntax Observation Dec ision Making Phraseology

Dictionary

Controller Model

Figure 5-1
60

CCM in the simulation and speech recognition environment.
Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation such as aircraft parameters and clearance history as well as static data such as flight plans and airspace geometry. On basis of the information it possesses of the present state of the simulation, CCM generates an estimate of the most probable clearances. It then uses the phraseology associated with each clearance to construct a dynamic syntax, which contains the sentences CCM considers likely in the actual situation. The syntax must then be compiled with the dictionary prior to being used by the ASR. In order to ensure that the speech recognizer is in possession of an updated syntax with minimum delay, the syntax construction must be repeated frequently. Presently, CCM generates an updated syntax once per second. In a first step physically possible and impossible clearances are distinguished and a syntax is generated which comprises the possible clearances. For example, only clearances for aircraft which are at present in the sector are possible, or a descent clearance is only possible if a flight level below the present flight level of the concerned aircraft is advised. Compared to the static syntax, the syntax comprising possible clearances already leads to a considerable reduction of the number of sentences. However, further restrictions of the search space are feasible. In a second and more sophisticated step the set of clearances that appear probable in the actual situation is elaborated. Based on the behavior and strategies of air traffic controllers that have been observed in experiments and simulations, CCM decides which instructions the controller will probably use. The set of probable clearances is a subset of the possible clearances. Both syntaxes are compiled and transferred to the speech recognizer for decoding the next sentence. The recognition process uses the syntax of probable clearances which in most cases includes the actually spoken sentence. However, as a certain likelihood remains that the actual utterance is not included in the set of probable clearances, the syntax of possible clearances is used to repeat the decoding if no satisfying match is found during the first cycle. If the spoken utterance is not included in the set of probable sentences but a phonetically similar sentence exists, this sentence is often returned erroneously by the speech recognizer. This kind of recognition error cannot be eliminated completely but restricted as far as possible by designing a lean search space. Figure 5-2 depicts the structure of the Cognitive Controller Model. After the initialization of the airspace structure, a loop of three functions is continuously executed: the observation function, the decision function and the phraseology function. The observation function retrieves the relevant dynamic information from the simulation data file and generates and updates aircraft objects. Hereafter a conflict probe is executed for each pair of aircraft in the sector. The decision function generates a set
Chapter 5 – A Cognitive Model of the ATC Controller 61

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

COGNITIVE CONTROLLER MODEL

Initializing Airspace

Airspace Structure

OBSERVATION FUNCTION

Reading simulation data file -aircraft parameters -flight plans -clearance history Updating aircraft objects Conflict check

Simulation data file

DECISION FUNCTION Rules about possible clearances Rules about probable clearances PHRASEOLOGY FUNCTION Phraseology Standards ICAO/Extended

Initializing clearances Estimating possible clearances Estimating probable clearances

Writing syntax of possible clearances Writing syntax of probable clearances

probable.sts possible.sts

Figure 5-2

The structure of the Cognitive Controller Model.

of clearances that are currently possible, plus a set of the most probable clearances. The decision function uses two different databases of rules, one for the possible and one for the probable clearances. The phraseology function translates the parametric selections generated by the decision function into the phrases that the controller may actually speak. The clearances and the associated phrases are written to the syntax files probable.sts and possible.sts which are then transmitted to the speech recognizer.

5.3

Air Traffic Control in Frankfurt Westradar 1

Besides general ATC knowledge, sector specific knowledge such as airspace geometry and standard procedures influences the controller’s decisions making. Therefore, a controller model capable of estimating the probabilities of clearances must be adapted to each sector. In order to study the feasibility and benefits of context-sensitive speech recognition, it seemed convenient to limit the scope of the model to an appropriate airspace. The sector should be representative in terms of the procedures used and the traffic density. However, it would be helpful if the traffic density permitted some verbal communication with the controller in order to facilitate the knowledge acquisition process.
62 Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation The sector Frankfurt Westradar 1 (WR1), schematically depicted in Figure 5-3, is a lower airspace enroute sector west of Frankfurt airport, extending vertically from flight level 100 to flight level 245. WR1 is scheduled as a one-way sector, so that the traffic uses the air route between Nattenheim (NTM) and Rüdesheim (RUD) only in west-eastern direction. The major part of the traffic consists of arriving aircraft with destination Frankfurt, the remaining aircraft being overflights with various other destinations. The border to the adjacent sector Frankfurt Westradar 2 (WR2) south of WR1 is defined by a line in east-western direction south of the air route NTM-RUD. WR2 is mostly used for departure traffic and itself scheduled as a one-way sector.

ARCKY TAU

NTM

RUD

KIR

Figure 5-3

Structure of Frankfurt Westradar 1.

The greatest percentage of the arrival traffic enters WR1 flying from ARCKY to NTM between flight level 200 and flight level 250. A handover agreement guarantees that the controller of the adjacent sector Frankfurt Arrival receives the inbound traffic under standard conditions. According to the handover agreement, the aircraft are delivered to Frankfurt Arrival overhead the waypoint RUD at flight level 90 at an indicated airspeed of 250 knots. However, the controllers of WR1 and Frankfurt Arrival can agree on other conventions if required. The airspace structure of WR1 has been implemented at DLR’s Air Traffic Management and Operations Simulator (ATMOS). During extensive simulations controllers were observed and interviewed. Different controllers participated in the simulations, all holding a WR1 sector license. The simulations took about 90 minutes each and were recorded for later analysis. Recorded data included:
Chapter 5 – A Cognitive Model of the ATC Controller 63

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation ? electronic protocols of the simulation parameters, aircraft parameters, and pseudo pilot inputs of 13 simulations, ? observations and inquiries during 14 simulations, ? video-tapes of controller activity during 20 simulations, ? audio-tapes of the communication between controller and pseudo pilots during 9 simulations, ? think-aloud protocols of 3 simulations, ? a teach-back protocol of one simulation. Each clearance recorded during the simulations was categorized according to the clearance type. The frequency of occurrence of the different clearance categories is depicted in Figure 5-4. In a first step the electronic transcripts were statistically analyzed in order to identify correlations between the occurrence of specific clearances and the parameters of aircraft, such as the aircraft position or altitude. For some clearance categories correlations were detected, however, these usually corresponded to simple explanations. For other categories no correlations between aircraft parameters and the frequency of occurrence of clearances could be detected. Therefore it seemed that statistical analysis could support the knowledge acquisition process but that it should be combined with other techniques. Often procedures and strategies were observed that could not efficiently be described as correlations between different parameters and, consequently, hardly be discovered by statistical means.
Percentage of Instructions 40,0%

37,6%

35,0%

30,0%

25,0%

20,0%

18,8%

15,0%

13,1%

12,3%

13,0%

10,0%

5,0%

4,3% 1,0%

0,0% Frequency Change Altitude Heading Direct to Fix Speed Rate of Climb/Descent Other

Figure 5-4
64

Frequency of different clearance categories in WR1.
Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Think aloud protocols were found to interfere greatly with the controller’s verbal communication with the pseudo pilots. Besides, it was necessary to continuously remind the participants to utter their thoughts. For both reasons, think aloud protocols did not contribute very much to the knowledge acquisition process. For back-teaching even greater interference with the communication with pseudo pilots was observed. Furthermore, decisions often had to be made under time pressure which did not permit the controller to fully explain specific decisions. Therefore, back-teaching was abandoned for knowledge acquisition. Rather than just to describe correlations between situation parameters and the frequency of clearances, it was attempted to understand and to model the controller's decision making. Observation of air traffic controllers during simulation runs as well as inquiries during and after the simulations were the prime means for knowledge acquisition. The experimenter sat next to the controller to note observations and hypotheses and discuss them with the controller during or after the simulation. About 20 hours of simulation with three ATC controllers were attended in this way. Audiotapes of the verbal communication with the pseudo pilots were transcribed and statistically analyzed to identify the phraseology items used most frequently.

5.4

The Observation Function

The first step in the ATC controller’s cognitive activity is the observation and classification of the air traffic situation. He or she constructs a mental representation of the aircraft states as well as the entire air traffic situation in order to detect possible conflicts and to decide upon adequate measures. A complete description of the situation requires the monitoring of a great number of continuously changing parameters and would exceed the human mental capacities. Mechanisms of memory storage are unconsciously applied that help to alleviate the mental workload and to efficiently deal with the situation. Aircraft and situation parameters are classified as relevant or less relevant and only the most relevant data is used for the mental representation. Furthermore, more attention is directed towards aircraft or pairs of aircraft that for some reason require close observation. By means of paper flight strips or equivalent electronic means, the controller is informed in advance at what time, from which direction, and at which flight level an aircraft will enter the sector. When flying into the sector, the aircraft issues an init call providing its callsign and flight level. At that point a mental aircraft object is constructed. After the aircraft has been guided through the sector and advised to change to the frequency of the adjacent control sector, its maneuvers are still observed until it is out of range of possible conflicts to other aircraft.
Chapter 5 – A Cognitive Model of the ATC Controller 65

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Aircraft are mentally referred to by their callsign, consisting of an airline code and a flight number. Often, the controller uses only the airline for identification, provided not more than one aircraft of the same airline is in the sector. The aircraft position is mostly memorized visually rather than by figures of the aircraft position in a coordinate system. If required for a conflict check or to assess the remaining flight time, the distance to other targets, aircraft or fixes, is estimated. The vertical position is referred to by the flight level and cleared flight level. The flight level is continuously changing during climb and descent maneuvers so that updating the mental object would require constant observation whereas the cleared flight level remains constant until the aircraft is instructed otherwise. For aircraft whose vertical maneuvers are considered uncritical because there is little risk of approximation to other targets, only the cleared flight level is remembered. If for some reason the vertical movements must be watched more attentively, the flight level becomes the relevant parameter and is checked continuously. The way in which other parameters are memorized depends on the aircraft status. For instance, the cleared heading is remembered if assigned, or the cleared fix is stored if the aircraft flies according to flight plan or has been instructed to fly directly to a specific fix. Additional parameters such as airspeed, heading, and rate of climb or descent are rarely remembered, but extracted from the flight strip or aircraft label or inquired from the pilot if necessary. The parameters of clearances are generally marked on the paper flight strips. Thus the controller keeps track of the flight progress and his instructions. Checking for future conflicts takes place in a two-step process. In a more intuitive manner, the sector is scanned for possibly conflicting situations, i.e. groups of two or more aircraft for which a close approximation cannot be excluded. Limited attention is directed towards aircraft the controller perceives to be "safe". If the anticipated route of two or more aircraft may approximate more closely and at a similar altitude, more attention is directed towards each of these aircraft. Based on rules of thumb15, the future position of each aircraft is then estimated explicitly, and it is checked more thoroughly which separation will remain at the point of closest approximation. The "expert" way of classifying situations as either conflicting or safe is the intuitive, pattern-based way, derived from the more explicit way of calculating future positions. Which mechanisms apply in a given situation is a matter of expertise and no clear distinction can be made at what point a controller will perform an explicit estimation.

15

Controllers often use simple rules such as: ?the distance the aircraft will cover in one minute equals its ground speed divided by 60“ or, less precise: ?an aircraft covers about five miles per minute“ to assess future positions. Chapter 5 – A Cognitive Model of the ATC Controller

66

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation
OBSERVATION FUNCTION

Reading simulation data file -aircraft parameters -flight plans -clearance history

Simulation data file

Yes

new aircraft? No

Creating new aircraft object -flight plan

Updating existing aircraft object -aircraft parameters -clearance history

No

1 sec elapsed? Yes

Updating aircraft status

Conflict check

Figure 5-5

The observation function.

After situations have been diagnosed as safe or after conflicts have been resolved, attention is withdrawn from the respective aircraft. Figure 5-5 depicts the structure of CCM’s observation function. After an entry in the simulation data file is read that refers to an aircraft, the depicted loop is executed. The aircraft-related data may concern aircraft position and parameters or clearances that have been issued by the controller and received via mouse or speech recognizer. In case that a new aircraft enters the sector, the complete flight plan information is written to the simulation data file a few minutes in advance. An aircraft object is generated and initialized when the flight plan is read from the simulation data file. After reading one line in the simulation data file, the observation function determines which aircraft is concerned and whether an aircraft object already exists for that aircraft. If the entry concerns an aircraft for which an aircraft object has already been generated, the object is updated with aircraft parameters, such as position, flight level and speed or with clearance parameters16. New aircraft parameters are written to the file every four seconds, corresponding to a typical rotation cycle of radar antennas.

16

CCM requires knowledge about the clearance history because past clearances also determine the aircraft state. Clearances are therefore also read from the simulation data file. 67

Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation All aircraft-related entries in the simulation data file are read and processed successively until a time span of one second has elapsed. To reduce the complexity and the number of parameters that must be remembered, each aircraft is mentally classified with status categories. These states can be considered as auxiliary parameters, facilitating the application of rules in the decision function. Still, they were identified in accordance with the way ATC controllers mentally group and distinguish aircraft: ? The lateral status distinguishes an aircraft flying on course to a waypoint, either according to flight plan or because it has been explicitly cleared to do so (TOFIX), an aircraft flying on a cleared heading (HEADING), an aircraft in a holding pattern (HOLD) or circling (CIRCLE). ? The vertical status distinguishes climbing aircraft (CLIMB), descending aircraft (DESCENT) and aircraft maintaining their present altitude (MAINTAIN). ? The control status distinguishes the controller’s responsibility for the aircraft. After the init call but prior to being assumed by the controller an aircraft is assigned the control state INIT. After it has been assumed and is under control the state changes to CONTROL. After the aircraft has been advised to contact the controller of the adjacent sector, the control state changes to HANDOVER. ? With status, CCM further distinguishes between arrival traffic with destination Frankfurt (ARRIVAL) and overflights with other destinations (OVERFLIGHT). Typically, different control strategies are applied for arrival traffic and overflights. After the states have been determined, a forecast of the future position of each aircraft is performed in order to predict the traffic situation within a planning time horizon. Future speed, heading and vertical speed are calculated based on the actual and the cleared parameters and used for the calculation of the position increments. The future positions and flight levels are calculated incrementally with a resolution of four seconds. The position forecasts are used to check for conflicts between two or more aircraft. The future lateral and vertical separation between every pair of aircraft in the sector is calculated in steps of four seconds within the planning time horizon. Pairs of aircraft whose estimated vertical and lateral future separation is less than the required minimum safety separation of six nautical miles laterally or ten flight levels vertically are marked as conflicting. According to experiments with controllers the planning time horizon during which the future traffic state is considered by executive controllers ranges between two and five minutes. CCM currently uses a time horizon of two minutes which seems to be sufficient to support the decision making process.

68

Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation

5.5

The Decision Function

After the traffic situation has been observed and analyzed, the controller must decide if and which measures are required. The meta-goal of efficiently controlling each aircraft through the sector without causing hazards to other traffic consists of a variety of quickly changing and interdependent subgoals for individual aircraft or sets of aircraft. However, it was observed that controllers had more difficulties to specify subgoals that determined their decisions than to describe situations under which they would apply certain strategies. Control strategies can be described by three items controllers remember explicitly when asked about their decisions. These are ? an abstract description of the required action, ? conditions under which it can be applied, ? an estimated effect. To derive possible control actions that belong to a strategy, the abstract description must be translated into explicit control advisories, depending on the actual aircraft state. Because approaching aircraft account for most of the traffic in the sector WR1 and since arrivals are more complicated to handle than enroute traffic, the following considerations are limited to arrival aircraft in the sector Frankfurt Westradar 1. The scope is also limited to conflict-free traffic. In case that a conflict has to be solved, different strategies have to be considered that are beyond the scope of this study. Control advisories are generally not only depending on the status and parameters of one particular aircraft but also on the entire traffic situation, for example on the status of preceding aircraft. The first aircraft in each simulation, with no traffic ahead of it, is a special case, suitable to demonstrate how control could take place if no dependencies on other traffic had to be considered. Figure 5-6 depicts possible control procedures for an isolated aircraft. ? The aircraft enters the sector at position 1, announcing its presence with init call and the controller confirms radio contact. At that point, the aircraft typically flies between FL 200 and FL 250 with an indicated airspeed of about 300 knots. ? According to its flight plan, the aircraft proceeds via the fixes Nattenheim (NTM), Rüdesheim (RUD) to Frankfurt (FFM, east of RUD). However, the controller sometimes advises the aircraft to turn left at position 2 and left again at position 3, knowing that proceeding on a shorter flight route saves time and fuel and is highly appreciated by pilots. He or she may also advise the aircraft to proceed directly to RUD or FFM when at position 2. ? In order to deliver aircraft to the arrival sector on the scheduled handover level of FL 90 the controller has to advise a descent a some point. With common descent rates ranging between 1500 and 2000 feet per minute, an aircraft starting its desChapter 5 – A Cognitive Model of the ATC Controller 69

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation cent overhead NTM will reach flight level 90 a few miles prior to RUD. Thus, a descent to FL 90 is often advised somewhere around NTM. However, if aircraft reached FL 90 outside the lateral boundaries of the arrival sector, in Figure 5-6 indicated by a line west of RUD, they would enter airspace E, an area in which traffic navigating under visual flight rules (VFR) is also permitted. Therefore some controllers advise a descent to flight level 100 in a first step, followed by a further descent to flight level 90 as soon as the aircraft is overhead the approach area. Due to the lower air density flying at higher altitudes with constant airspeed is less fuel- and time consuming. Therefore pilots generally prefer to maintain a higher flight level as long as possible and the controller, if possible, usually complies. ? In most cases, at some point in the sector a speed reduction to the handover speed of 250 knots indicated airspeed is advised. Again, pilots appreciate a speed reduction late in the arrival. ? When the aircraft has reached the handover fix Rüdesheim (RUD) at an adequate flight level and with adequate speed, it is advised to change to the radio frequency of Frankfurt Arrival control. Controlling one isolated aircraft may appear simple at first glance. However, it demonstrates that even in this case there is little chance of predicting exactly if and at which point the controller will speak a particular instruction. If a number of aircraft have to be controlled instead of a single aircraft, the complexity increases further.

ARCKY TAU 1

2

NTM 3

RUD

KIR

Figure 5-6

Possible control advisories for an isolated aircraft.

70

Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation Figure 5-4 (page 64) depicts the frequency with which clearances of the different categories are issued. The control strategies are mostly concerned with clearances regarding lateral control (guiding aircraft along a suitable lateral routing by heading and waypoint clearances), vertical control (advising flight levels and vertical speed), speed control (instructing the airspeed if necessary), and communication (instruction to change to the radio frequency of other sectors). Lateral control Lateral control strategies are often used to guide aircraft away from the standard routes on which they would otherwise proceed. This can be done in order to maintain separations between aircraft, to meet handover sequences or time proposals17, or to shorten the way. Lateral control is concerned with advising a cleared heading or the way to a fix. Also, aircraft can be advised to circle at their present position or to enter a holding pattern. A strategy frequently applied to control the separation between two or more aircraft without speed control is what controllers call a "lineal holding". It consists in advising a turn of about 10 to 40 degrees away from the standard route on which the aircraft is flying. A while later the aircraft is turned back again to the route so that it is delayed due to the longer distance it has to fly. Lineal holdings are often used to increase the separation to the preceding aircraft or to delay the aircraft in order to meet a handover time proposal. A typical lineal holding is depicted in Figure 5-7. The

ARCKY TAU

2 NTM 1 3 RUD

KIR

Figure 5-7
17

"Lineal holding" in WR1.

The controller assistance system COMPAS (Computer Oriented Metering, Planning, and Advisory System) generates an optimized arrival sequence and time proposals for the handover between sectors. COMPAS has been in operational service at Frankfurt area control center since 1989. 71

Chapter 5 – A Cognitive Model of the ATC Controller

Context-Sensitive Speech Recognition in the Air Traffic Control Simulation aircraft is advised to turn left by 30 degrees at position 1, turns right again by 60 degrees at position 2 and joins the route to RUD again at position 3. Vertical control In order to prevent loss of separation, the controller often advises the second aircraft to descend to a flight level above the cleared flight level of the first aircraft, thus maintaining vertical separation even if the aircraft would for some reason approximate laterally. If, for instance, a communication failure or a failure of the radar system would occur, a standard procedure for the aircraft would be to enter the holding pattern at RUD. In this case a vertical separation of at least ten flight levels must be maintained. Of course, a third aircraft may again be cleared to a higher flight level. Whethe

更多相关文章:

非常超级学习网 fccjxxw.com

copyright ©right 2010-2021。
非常超级学习网内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图