fccjxxw.com
非常超级学习网 学习超级帮手
当前位置:首页 >> >>

éCOLE POLYTECHNIQUE FéDéRALE DE LAUSANNE


Matthias Zenger

Programming Language Abstractions for

Extensible Software Components
Doctoral Thesis EPFL, Switzerland

?COLE POLYTECHNIQUE F?D?RALE DE LAUSANNE

Laboratoire des Méthodes de Programmation Institut d'Informatique fondamentale Faculté Informatique et Communications ?cole Polytechnique Fédérale de Lausanne

? 2004, Matthias Zenger
Programming Methods Laboratory Institute of Core Computing Science School of Computer and Communication Sciences Swiss Federal Institute of Technology, Lausanne, Switzerland
A This document was set in ITC Charter using the LTEX typesetting system on MacOS X. The graphical content was produced with OmniGraf?e Professional.

Acknowledgments
First and foremost I would like to express my sincere gratitude and appreciation to my supervisor, Prof. Martin Odersky, for his encouragement, guidance, and support during the last six years. His expertise and understanding and the countless inspiring discussions have been of great importance and in?uence to me, both academically and personally. From 1997 on, I enjoyed working in his group. I would like to thank my friend and former colleague Dr. Christoph Zenger for his interest in my work, his advice and help, as well as the many loud and passionate discussions about work and life in general. I appreciated the collaboration with him in Karlsruhe, Adelaide, and Lausanne. Many thanks also go to Prof. Martin Sulzmann for his support and the numerous interesting discussions in Adelaide and Melbourne. I would like to thank the members of the thesis jury, Prof. Karl Aberer, Prof. Rachid Guerraoui, Prof. Mira Mezini, and Prof. Oscar Nierstrasz, for the efforts and the time they have spent on the examination of this dissertation. Furthermore, I am thankful to my colleagues at the Programming Methods Laboratory of EPFL for their support and their collaboration. In particular, I thank Dr. Erik Stenman for proof-reading parts of this thesis. Thanks also to all other people I was working with in the last years. I had the pleasure to publish papers together with Dr. Jim Buckley, Vincent Cremet, Matthias Jacob, Prof. Günter Kniesel, Dr. Tom Mens, Prof. Michael Philippsen, Dr. Christine R?ckl, and Dr. Awais Rashid. Last but not least, I would like to thank my parents for the help and support they provided me throughout my entire life. I am especially grateful to my wife, Marie-Louise, without whose love, encouragement, and understanding, I would not have ?nished this thesis. Nyon, Switzerland, October 2003

Abstract
With the growing demand for software systems that can cope with an increasing range of information processing tasks, the reuse of code from existing systems is essential to reduce the production costs of systems as well as the time to manufacture new software applications. For this reason, component-based software development techniques gain increasing attention in industry and research. Component technology is driven by the promise of building software by composing off-the-shelf components provided by a software component industry. Therefore, component technology emphasizes the independent development and deployment of components. Even though components look like perfect reusable assets, they embody general software solutions that need to be adapted to deploymentspeci?c needs and therefore cannot be deployed “as is” in general. Furthermore, as architectural building blocks, components are subject to continuous change. For these reasons, it is essential that components can easily be extended by both the component manufacturer to create new versions of components and by thirdparties that have to adapt components for use in speci?c software systems. Since in both cases concrete changes cannot be foreseen in general, mechanisms to integrate unanticipated extensions into components and component systems are required. While today many modern programming techniques, methodologies, and languages provide means that are well suited for creating static black-box components, the design and implementation of extensible components and extensible software systems often remains a challenge. In practice, extensibility is mostly achieved through ad-hoc techniques, like the disciplined use of design patterns and component frameworks, often in conjunction with meta-programming. The use of design patterns and component frameworks requires a rigorous coding discipline and often forces programmers to write tedious “boilerplate” code by hand, which makes this approach fragile and error-prone. Meta-programming techniques on the other hand are rather code-centric and mostly source codebased. Therefore, they are often not very suitable for today’s component technology practice that stresses the binary reuse of black-box components. In this thesis I argue that technical dif?culties in the development of extensible software components are due to the lack of appropriate programming language abstractions. To overcome the problems, concrete programming language mechanisms are proposed to facilitate the creation of extensible software. The proposed language features are strongly typed to help the programmer extend systems safely and consistently.

vi

Abstract

The ?rst part of the thesis illustrates the vision of truly extensible software components by proposing a simple theoretical model of ?rst-class components built on top of a conventional class-based object-oriented language. This typed model includes a small set of primitives to dynamically build, compose, and extend software components safely, while supporting features like explicit context dependencies, late composition, unanticipated component extensibility, and strong encapsulation. The second part takes some ideas from the theoretical model and applies them in the design of the programming language Keris. Keris extends Java with an expressive module system featuring extensible modules. The main contributions are: ? A module system that combines the bene?ts of classical module systems for imperative languages with the advantages of modern component-oriented formalisms. In particular, modules are reusable, generic software components that can be linked with different cooperating modules without the need for resolving context dependencies by hand. ? A module composition scheme based on aggregation that makes the static architecture of a system explicit, and ? A type-safe mechanism for extending atomic modules as well as fully linked systems statically by replacing selected subsystems with compatible versions without needing to re-link the full system. The extensibility mechanism is non-invasive; i.e. it preserves the original version and does not require access to source code. The overall design of the language was guided by the aim to develop a pragmatic, implementable, and conservative extension of Java which supports software development according to the open/closed principle: Systems written in Keris are closed in the sense that they can be executed, but they are open for unanticipated extensions that add, re?ne, or replace modules or whole subsystems. The last part of the thesis ?nally presents a case study which compares an extensible Java compiler implemented using mainstream object-oriented language features with one that was written in Keris. It shows how in practice, extensible modules can be used to develop extensible systems safely and ef?ciently.

Zusammenfassung
Bei Software-Anwendungen, die mit st?ndig neuen Informationsverarbeitungsanforderungen konfrontiert sind, tr?gt die Wiederverwendung von Code wesentlich dazu bei, sowohl die Produktionskosten, als auch die Zeit für die Entwicklung neuer, verwandter Systeme zu reduzieren. Aus diesem Grund kommt komponentenbasierten Softwaretechnologien verst?rkt Aufmerksamkeit im Entwicklungs- und Forschungsbereich zu. Komponententechnologie basiert auf der Vision, Software prim?r durch eine Kombination von vorgefertigten Komponenten zu erstellen, welche von einer globalen Softwarekomponentenindustrie angeboten werden. Obwohl Komponenten eigentlich als ideal wiederverwendbare Bausteine erscheinen, stellen sie dennoch allgemeine Softwarel?sungen dar, die stets an anwendungsspezi?sche Bedürfnisse angepasst werden müssen und daher auch selten in ihrer allgemeinen Form eingesetzt werden k?nnen. Au?erdem unterliegen Softwarekomponenten, als grundlegende architekturelle Bausteine, natürlicherweise st?ndigen Ver?nderungen. Es ist daher unverzichtbar, dass Komponenten auf einfache Art und Weise, sowohl vom Hersteller, als auch von Klienten, erweitert werden k?nnen. Da in beiden F?llen die zukünftigen, konkreten ?nderungen selten im Voraus absehbar sind, muss es m?glich sein, auch unvorhergesehene Erweiterungen an Komponenten und Komponentensystemen vornehmen zu k?nnen. W?hrend viele moderne Programmiertechniken und Programmiersprachen durchaus gut für die Entwicklung statischer Black-Box-Komponenten geeignet sind, bleibt die Entwicklung von erweiterbaren Komponenten und komponentenbasierten Softwaresystemen oft eine Herausforderung. In der Praxis wird Erweiterbarkeit haupts?chlich mit Hilfe von Ad-hoc-Techniken erzielt; z.B. durch eine disziplinierte Verwendung von Entwurfsmustern und Rahmenwerken, oft in Verbindung mit Meta-Programmiertechniken. Eine konsequente Verwendung von Entwurfsmustern und Rahmenwerken zwingt den Programmierer oft dazu, langweiligen und dadurch auch fehleranf?lligen Anpassungscode von Hand zu schreiben. Meta-Programmierung, auf der anderen Seite, basiert meist auf Quellcode und ist damit nicht sonderlich geeignet für den Einsatz in einer auf bin?ren Komponenten beruhenden Technologie. In dieser Dissertation wird argumentiert, dass die technischen Schwierigkeiten bei der Entwicklung von erweiterbaren Softwarekomponenten auf einen Mangel an geeigneten Abstraktionen auf der Programmiersprachenebene zurückzuführen sind. Zur L?sung des Problems werden konkrete Programmiersprachen-Mechanismen vorgeschlagen, die die Entwicklung von

viii

Zusammenfassung

erweiterbarer, komponentenbasierter Software vereinfachen und damit eine sichere und konsistente Evolution von Systemen erm?glichen sollen. Im ersten Teil der Dissertation wird die Vision von ?exibel erweiterbaren Softwarekomponenten an Hand eines einfachen theoretischen Modells veranschaulicht, welches Komponenten im Kontext einer konventionellen, klassenbasierten, objekt-orientierten Sprache einführt. Das typisierte Modell de?niert eine kleine Menge von Primitiven mit deren Hilfe auf typsichere Art und Weise dynamisch Komponenten erzeugt, kombiniert und erweitert werden k?nnen. Im zweiten Teil werden Ideen des theoretischen Modells aufgegriffen und bei dem Entwurf der Programmiersprache Keris eingesetzt. Keris erweitert die Programmiersprache Java um ein ausdrucksstarkes Modulsystem. Diese Arbeit liefert folgende Beitr?ge: ? Ein Modulsystem, welches die Vorzüge klassischer, imperativer Modulsysteme mit den Vorteilen moderner komponentenorientierter Formalismen verbindet. Module repr?sentieren wiederverwendbare, generische Softwarekomponenten, die mit anderen Modulen kombiniert werden k?nnen ohne dass Kontextabh?ngigkeiten von Hand aufgel?st werden müssen. ? Ein Prinzip zur Modulkomposition, welches auf Aggregation beruht und welches die statische Architektur eines Systems explizit macht. ? Ein typsicherer Mechanismus, um sowohl atomare Module, als auch voll verlinkte Systeme statisch zu erweitern, indem ausgew?hlte Subsysteme durch kompatible Versionen ersetzt werden, ohne dass es notwendig wird, das gesamte System erneut zu linken. Der Erweiterungsmechanismus ist nicht destruktiv; er erzeugt eine neue Version einer Softwarekomponente ohne die alte Version zu ver?ndern und ohne auf den Quellcode der alten Version zuzugreifen. Ein Ziel des Sprachdesigns war es, eine pragmatische und implementierbare Erweiterung von Java zu konzipieren, welche es erlaubt, komponentenbasierte Software nach dem Open/Closed Prinzip zu entwickeln: Systeme die in Keris geschrieben sind, sind geschlossen, in dem Sinne, dass sie ausführbar sind; sie sind aber auch offen für zukünftige Erweiterungen, die Module oder ganze Subsysteme neu hinzufügen oder ersetzen. Im letzten Teil der Dissertation wird eine Fallstudie pr?sentiert, welche einen erweiterbaren Java ?bersetzer, der mit g?ngigen, objekt-orientierten Sprachmitteln geschrieben wurde, mit einem in Keris geschriebenen ?bersetzer vergleicht. Mit Hilfe dieser Fallstudie wird demonstriert, welche M?glichkeiten Keris in der Praxis bietet, erweiterbare Systeme sicher und ef?zient zu implementieren.

Contents
Acknowledgments Abstract Zusammenfassung 1 Extensible Component-Based Software 1.1 Introduction . . . . . . . . . . . . . . . . . . . . 1.1.1 Reusability . . . . . . . . . . . . . . . . 1.1.2 Extensibility . . . . . . . . . . . . . . . 1.2 Characteristics of Extensibility Mechanisms 1.3 Classi?cation of Extensibility Mechanisms . 1.3.1 White-Box Extensibility . . . . . . . . 1.3.2 Gray-Boy Extensibility . . . . . . . . . 1.3.3 Black-Box Extensibility . . . . . . . . 1.4 Extensibility Requirements . . . . . . . . . . . 1.5 Programming Language Support . . . . . . . 1.6 Component Engineering Approaches . . . . 1.6.1 Frameworks . . . . . . . . . . . . . . . 1.6.2 Extensibility . . . . . . . . . . . . . . . 1.7 Overview . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Scope . . . . . . . . . . . . . . . . . . . 1.7.2 Contributions and Outline . . . . . . iii v vii 1 1 1 2 3 5 5 8 8 9 11 13 13 14 15 15 15 19 20 20 21 21 22 23 23 23 25 26 27

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

2 A Formal Model for Extensible Software Components 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Language Integration . . . . . . . . . . . . . . 2.1.2 Coarse-Grained Composition . . . . . . . . . . 2.1.3 Dynamic Manufacturing and Composition . 2.1.4 Extensibility . . . . . . . . . . . . . . . . . . . . 2.2 Prototype-Based Components . . . . . . . . . . . . . . 2.2.1 Components and Component Instances . . . 2.2.2 Service Provision . . . . . . . . . . . . . . . . . 2.2.3 Component Instantiation . . . . . . . . . . . . 2.2.4 Component Specialization . . . . . . . . . . . 2.2.5 Service Forwarding . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

x 2.2.6 Service Abstraction . . . . . . 2.2.7 Composition of Components 2.3 Component Calculus . . . . . . . . . . 2.3.1 Syntax . . . . . . . . . . . . . . 2.3.2 Semantics . . . . . . . . . . . . 2.3.3 Type System . . . . . . . . . . 2.3.4 Type Soundness . . . . . . . . 2.3.5 Instantiation Evaluation . . . 2.3.6 Component Subtyping . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 31 34 34 35 38 41 43 44 46 49 51 51 52 52 54 54 56 59 61 62 65 68 77 84 87 87 90 92 98 98 103 109 112 114 117 123 125 126 126 133 137

3 Static Component Evolution with Extensible Modules 3.1 The Java Package System . . . . . . . . . . . . . . . . . 3.1.1 Modularity . . . . . . . . . . . . . . . . . . . . . 3.1.2 Genericity . . . . . . . . . . . . . . . . . . . . . 3.1.3 Extensibility . . . . . . . . . . . . . . . . . . . . 3.2 The Programming Language Keris . . . . . . . . . . . 3.2.1 Declaring Modules . . . . . . . . . . . . . . . . 3.2.2 Linking Modules . . . . . . . . . . . . . . . . . 3.2.3 Accessing Modules . . . . . . . . . . . . . . . . 3.2.4 Initializing Modules . . . . . . . . . . . . . . . 3.2.5 Re?ning Modules . . . . . . . . . . . . . . . . . 3.2.6 Specializing Modules . . . . . . . . . . . . . . 3.2.7 Class Abstractions . . . . . . . . . . . . . . . . 3.2.8 Type System . . . . . . . . . . . . . . . . . . . . 3.2.9 Runtime Types and Re?ection . . . . . . . . . 3.3 Applications of Keris . . . . . . . . . . . . . . . . . . . . 3.3.1 Generic Class Families . . . . . . . . . . . . . . 3.3.2 Design Patterns as Module Aggregates . . . . 3.3.3 Modular Extensions of Design Patterns . . . 3.4 Implementation of Keris . . . . . . . . . . . . . . . . . 3.4.1 Basic Modules . . . . . . . . . . . . . . . . . . . 3.4.2 Module Re?nements and Specializations . . 3.4.3 Module Access . . . . . . . . . . . . . . . . . . . 3.4.4 Classes and Types . . . . . . . . . . . . . . . . . 3.4.5 Type Tests and Casts . . . . . . . . . . . . . . . 3.4.6 Re?ection . . . . . . . . . . . . . . . . . . . . . . 3.4.7 Module Execution . . . . . . . . . . . . . . . . 3.4.8 KeCo: The Keris Compiler . . . . . . . . . . . . 3.5 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Micro Benchmarks . . . . . . . . . . . . . . . . 3.5.2 Real-World Application . . . . . . . . . . . . . 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents

xi

3.6.1 Module Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 3.6.2 Module Systems and Object-Oriented Languages . . . . . . 143 3.6.3 Keris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4 Case Study: Extensible Compilers 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Extensibility Problem . . . . . . . . . . . . . . . . . . . . 4.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Extensible Compiler Phases with Algebraic Datatypes 4.2 JaCo: Design Pattern-Based Extensibility . . . . . . . . . . . . . 4.2.1 Architectural Pattern: Context/Component . . . . . . . 4.2.2 Application to Extensible Compilers . . . . . . . . . . . 4.2.3 Architecture of JaCo . . . . . . . . . . . . . . . . . . . . . 4.2.4 Extending JaCo . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 JaCo2: Extensibility with Extensible Modules . . . . . . . . . . 4.3.1 Architecture of JaCo2 . . . . . . . . . . . . . . . . . . . . 4.3.2 Extending JaCo2 . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Design Patterns vs. Language Support . . . . . . . . . . 4.4.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Related Work and Conclusions 5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Component-Oriented Programming Languages 5.1.2 Architecture Description Languages . . . . . . . 5.1.3 Software Composition Languages . . . . . . . . . 5.1.4 Module Systems . . . . . . . . . . . . . . . . . . . . 5.1.5 Object-Oriented Programming . . . . . . . . . . . 5.1.6 Aspect-Oriented Programming . . . . . . . . . . 5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 150 150 152 153 157 157 162 166 170 173 175 175 178 180 184 184 186 189 191 191 191 193 193 194 197 198 200 202

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

A Type Soundness for Prototype-Based Components 205 A.1 Subject Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 A.2 Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 B Keris Grammar C Principles of Extensible Algebraic Types Figures 213 217 221

xii Listings Bibliography Index Curriculum Vitae

Contents 223 225 239 243

Chapter 1

Extensible Component-Based Software

1.1 Introduction
1.1.1 Reusability
With the growing demand for software systems that can cope with an increasing range of information processing tasks, the reuse of code from existing systems becomes more and more important. Software reusability refers to the ability of software elements to serve for the construction of many different software products [134]. Software reuse is motivated by the observation that software systems often share common elements. By reusing existing software components for the construction of new software systems, one can expect lower costs due to a reduced development time, decreased maintenance requirements, as well as increased reliability and consistency [140, 107, 134, 166]. Furthermore, reusing software means that less software has to be written and consequently that more time and effort may be spent on improving other factors, such as correctness and robustness. Mainly for these reasons, component-based software development techniques gain increasing attention in industry and research. Component technology is driven by the promise of building software by composing off-the-shelf components provided by a software component industry [195]. This is also why component-oriented software engineering emphasizes the independent development and deployment of software components. Even though components look like perfectly reusable assets, it is, unfortunately, often quite dif?cult to reuse software components off-the-shelf. Even though software development is a highly repetitive activity which involves frequent use of common patterns, there is a considerable variation in how these patterns can be used and combined. Without mechanisms supporting the adaptation and extension of software components, programmers are forced into, what Meyer [134] calls, the reuse-redo dilemma: Either the component is reused exactly as it is, or the job has to be redone completely.

2

Extensible Component-Based Software

1.1.2 Extensibility
Software is extensible if it can be adapted to possibly unanticipated changes in the speci?cation. Extensibility is an important property for software because of the following reasons: ? At the basis of all software lies some human phenomenon and hence ?ckleness, yielding ongoing changes in the speci?cation and the implementation of systems [134]. This is why Nierstrasz encourages to see software as a living and evolving entity which is developed and maintained by people [147]. Software can be changed more easily, if it is designed to be extensible [7]. ? Component technology is based on the notion of components being independently developed and deployed by unrelated parties [195]. Since it is quite unlikely that components from external vendors ?t into a speci?c deployment scenario off-the-shelf, it is necessary that software components are adaptable, not only by the manufacturer, but also by third party users. ? Many software systems share a common architecture or even large parts of the implementation. Such families of software systems [158, 27] are much easier to derive from a base system if the base system is extensible. Similarly, software product-lines [99, 205] rely heavily on a mechanism for creating variants of a system which all share a common structure and some common functionality, but which are equipped with possibly different components. As these points show, extensibility boosts signi?cantly software reuse. Since reusability is one of the main aims of component technology, we also consider extensibility to be a major factor in the development of component-based software systems. Given the fundamental importance of extensibility, it is ironic that systems are often not explicitly engineered with this property in mind. In practice, programmers avoid extensible designs and implementations often for the following reasons [7]: ? Extensibility is a technical challenge that increases the complexity of software, making it more dif?cult to develop, test, and deploy. ? Extensible designs and implementations are more time consuming and are therefore more cost-intensive initially. ? Successful extensible designs and implementations require some knowledge about the way a system is going to be extended at a later time. The less is known about possible evolution scenarios, the more dif?cult it is to keep a system open for future extensions. ? Extensibility often decreases the performance.

1.2 Characteristics of Extensibility Mechanisms

3

While there is indeed a trade-off between extensibility and performance, it turns out in practice that the parts of the system for which extensibility is most bene?cial, are often not the most performance-critical ones. Programmers develop extensible software mainly for reducing the cost of implementing new or similar functionality in a system. If it is clear from the beginning that a system will be modi?ed in the future, or will possibly be reused by third parties, it pays off to spent extra time and money on extensibility considerations. As Allen points out in [7], the extreme programming literature [20, 21] compares the addition of extensibility to the purchase of a stock option: You purchase the option early so that you can easily extend the system at a later date. If you eventually exercise this option, you can greatly bene?t from it, even compensating for the critical items in the previous list.

1.2 Characteristics of Extensibility Mechanisms
Change is pervasive in software development. It involves changes of the requirements, of the design, of the implementation, of data representations, etc. This thesis focuses mainly on implementation-related issues, in particular on implementation techniques and formalisms (i.e. programming languages) that support the development of extensible software. This section will review important factors that characterize extensibility mechanisms on an abstract level [40, 132]. The terminology will be used in the next section to setup a classi?cation of techniques for extending software. Object of change. Different extensibility mechanisms differ in what software artifacts they change and at what time this happens. Mechanisms that introduce extensions directly into the source code do this typically at or before compile-time, whereas mechanisms that extend binaries or intermediate code representations like bytecode ?les typically operate at link- or load-time. Extensibility mechanisms that are applied before runtime are said to evolve a system statically, while all other mechanisms provide some form of dynamic software evolution. This distinction is sometimes unclear if link-time or load-time coincides with runtime (e.g. like in Java). Anticipation. We have to distinguish between mechanisms where changes or variations of a software product have to be anticipated and others which support unanticipated requirement changes. Parameterization is, for instance, a form of anticipation. It allows one to vary a certain prede?ned set of features. Inheritance and overriding in combination with late binding, on the other hand, make it possible to extend software without anticipating all possible directions in which a system may evolve in future.

4

Extensible Component-Based Software

Invasiveness. Extensibility mechanisms that introduce or modify features destructively are called invasive. Invasive changes that are applied in place (i.e. that are not applied to a copy of the original software artifact) have a global impact and possibly in?uence all other depending components. A mechanism that supports non-invasive changes has to provide a way to formulate extensions modularly. Versioning. To avoid that incompatible changes invalidate other software components and therefore endanger the consistency of the whole system, extensibility mechanisms often support some form of versioning [183, 195]. While in unversioned environments changes are typically invasive in the sense that new versions overwrite old ones, versioned environments provide means to distinguish old from new versions. In systems that support versioning statically, new and old versions can physically coexist at compile-time, but they are identi?ed at runtime and therefore cannot be used simultaneously in the same context [40]. In contrast to this, fully versioned systems do not only distinguish versions statically, they also distinguish versions dynamically at runtime, allowing two different versions of one component being deployed simultaneously side by side. This is particularly relevant for the dynamic evolution of systems (e.g. systems that allow components to be hot swapped, i.e. dynamically replaced with compatible versions). Here, safe updates of existing components often require that new clients of the component use the services provided by the new version whereas existing clients of the old component continue to access the services of the old one. Independent extensibility. Software changes may be carried out sequentially or in parallel. With sequential software evolution, changes are always applied to the last, most recent version of a component. For the case of parallel evolution it may happen that a component gets extended independently by different parties at the same time. If software is changed in parallel, we have to distinguish between convergent changes and divergent changes. With convergent changes, parallel versions can be merged or integrated together into a new combined version [131]. For divergent changes, different versions of a component coexist inde?nitely without having the possibility to combine the changes and thus use the two extensions jointly. If, for instance, inheritance is used to extend a class in different ways, objectoriented languages with single inheritance do not permit different extensions to be combined into a single class — changes diverge in this case. Languages with multiple inheritance, on the other hand, allow one to consolidate independent class extensions into a single subclass. Extensibility mechanisms which allow programmers to evolve components in parallel and which make it possible to integrate several, independently developed extensions into a combined system support the notion of independent extensibility [194].

1.3 Classification of Extensibility Mechanisms

5

Safety. We distinguish between extensibility mechanisms that provide static and dynamic safety. A mechanism features static safety regarding certain erroneous program behaviors if it ensures at compile-time that the extended system will never be subject to these erroneous behaviors at runtime. An extensibility mechanism provides dynamic safety if there are built-in provisions for preventing or restricting undesired behavior of software extensions at runtime. There are many different notions of safety. One of them is security, which aims at protecting the software from unauthorized access to sensitive parts of a system or to certain resources. Another is behavioral safety, which covers crashes and unpredictable or meaningless behavior at runtime. Yet another notion is backward compatibility. Backward compatibility guarantees that former versions of a software component can safely be replaced by newer versions without the need for global coherence checks during or after load-time. Obviously, the kind and degree of safety that is required has a direct in?uence on the extensibility mechanisms. For example, a certain degree of static safety can be achieved by a programming language’s type system at compile-time, while dynamic type tests have to be used for those cases that are not covered by the static type system. Similarly, systems that support dynamic loading need coherence checks at load-time to ensure that extended components “?t” the rest of the system. Such checks are even necessary for systems that guarantee safety statically but support some form of separate compilation. Here, components that are compiled together might not necessarily also be deployed together.

1.3 Classi?cation of Extensibility Mechanisms
We will now propose a classi?cation of extensibility mechanisms based on what artifacts are changed and in what way they are changed. In general, we distinguish between three different forms of extensibility: white-box extensibility, gray-box extensibility, and black-box extensibility.

1.3.1 White-Box Extensibility
White-box extensibility refers to the ways in which a software system can be extended by modifying or adding to the source code. This is the least restrictive and most ?exible form of extensibility. Depending on the way changes are applied, we have to distinguish further between open-box extensibility and glass-box extensibility [7].

1.3.1.1 Open-Box Extensibility
In open-box extensible systems, changes are performed in an invasive fashion; i.e. they are directly hacked into the original source code. Here are a few implications of this approach:

6

Extensible Component-Based Software

Source

Compiler

Binary

Copy

Source

Compiler

Binary

Modified Source

Compiler

New Binary

Modification

Modification

(a) Invasive change

(b) Souce code duplication

Figure 1.1: Extensibility based on source code reuse.

? It requires that the source code is available and the source code license permits modi?cations. ? Changing source code (especially if it was written by someone else) is an error-prone and tedious activity, especially if the source code and the way a system works is not well understood. ? Since arbitrary changes can be performed, it is easy to break clients which were written for the original system with extensions that do not preserve backward-compatibility. Thus, open-box extensibility is inherently unsafe in its general, unrestricted form. An illustration of open-box extensibility can be found in Figure 1.1a. Open-box extensibility is most relevant to a development team when ?xing bugs, refactoring internal code, or producing the next version of their own software product. In some cases, open-box extensibility may also be of interest to developers attempting to take advantage of existing open-source software. For deriving a variation of an already existing software product, the source code has to be copied and the changes have to be hacked into the copied sources. This principle is illustrated in Figure 1.1b. While this approach makes it extremely easy to develop a new member of a software system family from scratch, it complicates maintenance signi?cantly. The problem is that extension code and the original application code are mixed. When changes, e.g. bug ?xes, get introduced into the original system, these also have to be somehow integrated into the extended system. This can be an extraordinary dif?cult and time-consuming task. And even if the system is modular enough that the actual code that was changed does not con?ict, it is dif?cult to ensure that no assumed invariants of the new code are broken by the own changes.

1.3 Classification of Extensibility Mechanisms

7

Source

Compiler

Binary

Entry point

e ref

rs

to
extends

Source Extension

Compiler

Binary Extension

Entry point

Figure 1.2: Extensibility based on binary reuse.

Therefore, it is not surprising to see that in practice, changes introduced by an open-box approach of that kind diverge most often.

1.3.1.2 Glass-Box Extensibility
Glass-box extensibility refers to the ways in which a software system may be extended when the source code is available, but may not be modi?ed. Programmers that want to extend the system can view the code, but they have to separate their extensions from the original system in a way that does not affect the original system. Glass-box extensibility has several advantages over open-box extensibility: ? Since extensions and the original system are cleanly separated, it gets easier to understand and maintain extensions, as well as the original system. It is, in particular, more easy to combine new versions of the original system with extensions that were developed for the old one. ? Since glass-box extensibility is not directly based on source code modi?cations, it is less likely that the extension process introduces bugs in the original system or invalidates invariants established in the original system. Object-oriented application frameworks provide an example for glass-box extensibility. These frameworks typically rely on features like inheritance and dynamic binding in order to achieve extensibility. Existing functionality is reused and extended by inheriting from framework base classes and overriding prede?ned hook methods. Such white-box frameworks are also called architecturedriven frameworks. Figure 1.2 illustrates how glass-box extensibility can be used to derive an extended variant of a system. Both the extension’s source code and the corresponding binary refer to the original binary, but they neither physically modify the original source nor the original binary. The binary of the extension provides a second entry-point into the system — which is, in fact, the entry-point into the extended

8

Extensible Component-Based Software

system. The old entry-point is still available and can be used to deploy the original system without the extension.

1.3.2 Gray-Boy Extensibility
Figure 1.2 shows that the glass-box approach does not necessarily rely on the availability of the original source code. Technically, only the original binary is required for developing extensions (assuming that the binary contains all relevant meta-data and the development platform supports late binding). Consequently, it is possible to extend systems in a glass-box fashion even if source code is not available. In this case, programmers of extensions could be given an alternative, more abstract documentation of the system’s specialization interface. This interface lists all abstractions that are available for re?nement and speci?es how extensions interact with the original system. The rules for correctly extending a system can be described in form of reuse contracts [188]. This form of extensibility is called gray-box extensibility. It represents a compromise between a pure white-box and a pure black-box approach in the sense that it does not rely on the full exposure of the source code, but still provides internal details about the implementation and extension of a system.

1.3.3 Black-Box Extensibility
Black-box extensibility refers to the ways in which a software system may be extended when no internal details about a system’s architecture and implementation are available. Black-box extensible systems are deployed and extended only by using their interface speci?cation. This approach allows system manufacturers to fully encapsulate their systems and hide all implementation details. As opposed to white-box extensibility, the extensibility mechanism is explicitly built into the system and part of the system’s design. For designing such an extensibility mechanism, it is often necessary to anticipate all possible extension scenarios. For this reason, black-box approaches are often much more limited than white-box approaches. On the other hand, black-box extensible systems are generally easier to use and to extend since they require less knowledge about internal details of a system. Typically, black-box extensions are made through system con?guration applications or through the use of application-speci?c scripting languages. In the context of object-oriented application frameworks, black-box extensibility is mostly achieved by de?ning interfaces for components that can be plugged into the framework via object composition. Existing functionality is reused by de?ning components that conform to a particular interface and by integrating these components into the framework using design patterns like the Strategy pattern [74]. Figure 1.3 illustrates such a plug-in mechanism. Black-box extensibility is most applicable to proprietary components and

1.4 Extensibility Requirements

9

Source

Compiler

Binary

Entry point

Extension Source

Compiler
Exten sion

Figure 1.3: Extensibility based on plug-ins.

frameworks in which the business model of the original development team requires that the source code must not be published, but where external developers should still be given some degree of ?exibility in customizing and extending the functionality of the software. Black-box extensible frameworks are often also called data-driven frameworks.

1.4 Extensibility Requirements
Software component technology emphasizes the construction of software from off-the-shelf components that are provided by a global software component industry consisting of independent component developers [195]. To facilitate the construction, deployment, and evolution of components in such a context, programming techniques and programming platforms have to satisfy the following requirements: ? Components are generic units of software that are implemented in a modular way with explicit context dependencies and with support for separate compilation [195]. ? Facilities for composing independently developed components have to be ?exible but also safe and should not require access to implementation details (black-box component deployment). ? Component composition and component evolution mechanisms have to scale well, since component-oriented programming is targeted towards programming in the large [56]. ? The reuse of components in different contexts should imply the least possible need for explicit adaptation code. ? In support for a smooth software evolution process, components have to be extensible without the need to anticipate all possible future extensions.

10

Extensible Component-Based Software ? The extensibility and deployment mechanism has to be safe in the sense that it rules out erroneous ways to reuse a system [132]. ? Component systems have to be extensible on the system level as well, allowing programmers to plug in alternative or additional components without the need for rewiring the whole system. ? Extensibility has to be non-invasive, avoiding that local changes have a global impact, and making it possible to derive different, independent extensions from a single component. ? Different extensions of a component have to be able to coexist, requiring an appropriate versioning mechanism. ? The extension of components must not require the availability of the full source code since this would violate the principle of binary component reuse (gray-box component extension).

In general, these requirements pose high demands on the implementation platform. The following list enumerates some features that programming languages have to support in order to meet these requirements [193]. ? High-level abstractions for components and composition mechanisms, ? Modular encapsulation (as a high-level information hiding mechanism), ? Parametric polymorphism (to support genericity on the component level), ? Subtype polymorphism (to enable substitutability and variability of software components), ? Late binding and loading (as a basis for the independent deployment of components by third-parties), ? Static type safety (to make large-scale component engineering practical and safe), and ? Covariant re?nement and specialization of abstractions (including types, to support extensibility and reuse). Mainstream class-based object-oriented programming languages, which are today predominantly used for programming software components, do not live up to many of the requirements. In particular, they do not provide suitable abstractions for components which would allow programmers to make the architecture of a system and the relationships of a system’s components explicit. Furthermore, type abstraction facilities are often quite restricted. Even though inheritance allows one to specialize classes covariantly, there is often no way to specialize composite structures, not even types. Furthermore, object-oriented programs are often full of hard links (static variables and methods, speci?c references to classes, etc.) making it dif?cult to evolve systems.

1.5 Programming Language Support

11

Only recently, this lack of support for component-oriented programming gave rise to research about how to best support component technology on the programming language level.

1.5 Programming Language Support
The principal means for writing software are programming languages. But not all programming languages are suitable for all programming tasks. It is important to choose a language that supports application speci?c requirements well. Since component-oriented programming predominantly deals with the manufacturing of components and their deployment, a programming language supports this paradigm well only if it makes it easy to specify and compose components effectively and safely. Apart from such abstraction capabilities, programming languages offer formalisms for composition, reuse, and veri?cation. Abstraction. Component-oriented programming languages have explicit support for component abstractions; i.e. they provide linguistic facilities for de?ning software components on the programming language level. As a unit of composition with contractually speci?ed interfaces and explicit context dependencies, components correspond, on the level of programming languages, very closely to modules, as introduced by imperative languages like Modula-2 [206] and Ada [196]. Nevertheless, modules in those classical module systems do not fully qualify as suitable abstractions for generic software components, since they hard-wire all dependencies by referring to speci?c cooperating modules. In more recent language designs — such as MzScheme [69], ComponentJ [182], ACOEL [187], and ArchJava [5] — language level component abstractions may abstract over their dependencies, making it possible to develop and deploy components independently of each other. Composition and reuse. The independent reuse of software components in different contexts with different cooperating components is the most common form of reuse in the context of component-oriented software development. This form of reuse is enabled by software composition mechanisms like aggregation (object composition) or mixin-based inheritance (class composition), which allow programmers to compile ever-new composites of components and component instances. This thesis focuses on extensibility, a speci?c form of reuse. On the language level, extensibility can be supported with mechanisms that allow one to create new versions of components or to specialize components for more speci?c tasks. Veri?cation. Static analysis tools approximate the runtime behavior of a program before it is executed. Compilers for programming languages that are

12

Extensible Component-Based Software

equipped with a static type system [43] perform exactly such an analysis at compile-time. They automatically prove the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. Furthermore, type systems are also used to enforce modularity properties and to protect the integrity of user-de?ned abstractions. As static analysis tools, type systems have to be conservative. Even though they can be used to categorically prove the absence of bad program behaviors or illegal interactions, they are unsuitable for proving their presence. Consequently, type systems sometimes reject programs that actually behave well at runtime. Nevertheless, type systems are extremely helpful for developing correct and reliable software [42]. They are signi?cant for the following reasons [164]: ? Error Detection: Static type checking allows detection of some programming errors already at compile-time. In addition to the early detection of errors, it is often possible to pinpoint errors more accurately during type checking than at runtime when their effects may not become visible until some time after the actual moment where things begin to go wrong. Type checkers can also be used as maintenance tools, for example, when abstractions are changed and all clients have to be updated accordingly. ? Abstraction: Type systems support the software development process by enforcing disciplined programming. In the context of large-scale software composition, type systems form the backbone of module systems which are used to assemble the components of large systems. Module systems often enforce a separation of component interface speci?cations from concrete implementations. While types are typically used to specify the signature of module members in interface descriptions, module interfaces themselves can be regarded as types of modules. Structuring large systems in terms of modules with clear interfaces leads to a more abstract style of design where interfaces are designed and discussed independently from their eventual implementations. More abstract thinking about interfaces, in turn, generally leads to a better design. ? Language Safety: Type systems do not only make it possible to setup abstractions, they can also enforce their integrity and their correct usage. In particular, they can guarantee that abstractions and established invariants are not broken by clients. Without static type systems, the safety of a programming language has to be provided solely by dynamic checks. ? Documentation: Type declarations constitute a form of documentation giving useful hints

1.6 Component Engineering Approaches

13

about the functionality of an abstraction and its usage. In contrast to documentation in a natural or semi-formal language, this form of documentation can not become outdated as the program evolves, since with every run of the compiler, the type checker will verify its conformance with the implementation. The role of types as machine-checkable documentation is particularly important at the level of module interface descriptions, but can also be useful when reading programs. ? Ef?ciency: Better understanding about what kind of values are handled by a program term helps compilers to produce more specialized code with better runtime performance. In particular, safe languages allow elimination of many dynamic checks, which would otherwise be needed to guarantee safety at least at runtime. Since each of these points is extraordinarily important for the development of software components, static typing is indispensable for component-oriented programming. For this reason, the following chapters have a strong focus on statically typed programming languages in which the manufacturing, the composition, and the evolution of software components is subject to a static type system.

1.6 Component Engineering Approaches
Today, component-oriented programming is mostly carried out in the context of mainstream object-oriented programming languages. Component programming practice relies on component models that are based on component frameworks and meta-programming technology.

1.6.1 Frameworks
Component frameworks are software entities that establish an environment in which components that conform to a certain standard can be instantiated and in which those instances can be plugged into the system. A component framework de?nes rules and interfaces that govern the interaction of component instances that are plugged into the framework. It also enforces architectural principles by forcing component instances to perform certain tasks via mechanisms that are under the control of the framework itself. Some component frameworks are higher-order; they apply this concept hierarchically such that environments established by a component framework are themselves component instances that can be plugged into other component frameworks. The various industrial component models like the Corba Component Model [83], COM [172], JavaBeans [190], and Enterprise Java Beans [191], all have similar design goals and provide, to some degree, similar functionality,

14

Extensible Component-Based Software

but they follow quite different design philosophies. What they have in common is that the implementation of these models is based on technology that goes beyond the infrastructure provided by mainstream object-oriented languages. Such models typically provide a class framework for modeling components and component interactions together with an informally speci?ed set of implementation rules. Component instances are often composed dynamically using meta-programming technology like introspection and re?ection. Some component models, like EJB, even require that contracts are speci?ed completely outside of the programming language, typically using an XML-based approach. These implementation issues all defeat an effective usage of a programming language as a means to guarantee statically the integrity of component systems and their conformance to the implementation standards required by the framework. In particular, there is often not even a way to statically check if the various software artifacts speci?ed in different formalisms are consistent and hence “?t” to each other.

1.6.2 Extensibility
In practice, extensibility is often either achieved by relying on design patterns or by applying meta-programming. For design pattern-based approaches it is necessary to plan extensibility ahead by rigorously deploying suitable patterns that are typically derived from the AbstractFactory pattern [74]. Frameworks in which extensibility is mainly achieved by design patterns that are based on object composition and forwarding are called data-driven frameworks, frameworks where extensions are derived by inheritance and overriding are called architecture-driven frameworks. As opposed to design patterns, meta-programming technology provides ways to extend systems without necessarily planning extensibility ahead. Metaprogramming has many different variations. As an overview, we will only mention a few approaches here. Generative programming [55] aims at transformational approaches to the construction of software. Approaches based on generative programming work best in two areas: They help to produce a larger number of similar components and they can be used to enhance composed systems. In a world of deployable components, both cases must be handled with care to ensure that component boundaries are respected and possible interferences with cooperating components are avoided. No matter if generative programming techniques are applied to source code or some intermediate bytecodes, they most often do not come with any safety guarantees. Aspect-oriented programming techniques [103] make it possible to modularize crosscutting aspects of a system, facilitating the separation of different concerns. A system consisting of various “aspect slices” is assembled by an aspect weaver which merges the fragments into a whole. In the context of aspect-

1.7 Overview

15

oriented programming, a system is typically extended by adding new aspects. But with the current aspect-oriented programming technology it is relatively dif?cult to set up strong component boundaries which enforce encapsulation and which specify context dependencies explicitly. As a consequence, aspects often do not qualify as generic abstractions which can be deployed in arbitrary contexts, even though they are modular units of software [104]. Systems that have to be available constantly and cannot be terminated for introducing extensions are typically evolved using dynamic meta-programming technology like re?ection. Often such approaches are based on special middleware which explicitly supports the dynamic evolution of systems. Extensibility mechanisms based on meta-programming allow systems to be extended in a ?exible and non-invasive manner, but they rarely give safety or consistency guarantees at compile-time. Since many static meta-programming approaches are mostly source code-based, they are not very suitable for today’s component technology practice which stresses the binary deployment of software components. Nevertheless, with a stronger focus on static safety and a tighter language integration, meta-programming technologies like aspect-oriented programming or multi-stage programming [197] are likely to provide helpful tools for developing extensible software in the future.

1.7 Overview
1.7.1 Scope
This thesis investigates how the development of extensible, component-based software can be facilitated with programming languages that support extensible component abstractions explicitly. The work was driven by the belief that only on the programing language level it is possible to ?nd mechanisms that meet all the requirements listed in Section 1.4. The work puts special emphasis on safety and consistency, promoting solutions in strongly typed programming languages.

1.7.2 Contributions and Outline
Chapter 2. The main part of the dissertation is split up into ?ve chapters. The role of Chapter 2 is two-fold: ? It explains the notion of software components and component composition in a formal way in the context of a class-based object-oriented language. The theoretical model captures typical principles like explicit context dependencies, late composition, and strong encapsulation. It motivates how types help to manufacture and compose components safely.

16

Extensible Component-Based Software ? It illustrates the vision of truly extensible software components by introducing a small set of composition primitives for dynamically building, composing, and extending software components in a concise and safe manner.

Chapter 3. Chapter 3 applies the essential ideas of the theoretical model in the design of the programming language Keris. Keris extends the programming language Java with an expressive system of extensible modules. The main contributions of this work are: ? A practical module system that combines the bene?ts of classical module systems for imperative languages with the advantages of modern component-oriented formalisms. In particular, modules are reusable, generic software components that can be linked with different cooperating modules without the need for resolving context dependencies by hand (wiring inference). ? A module composition scheme based on aggregation that makes the static architecture of a system explicit, and ? A type-safe mechanism for extending atomic modules as well as fully linked systems statically by replacing selected subsystems with compatible versions without the need to re-link the full system. The extensibility mechanism is non-invasive; i.e. it preserves the original version and does not require access to source code. The overall design of the language was guided by the aim to develop a pragmatic, implementable, and conservative extension of Java which supports software development according to the open/closed principle: Systems written in Keris are closed in the sense that they can be executed, but they are open for unanticipated extensions that add, re?ne or replace modules or whole subsystems without planning extensibility ahead. Chapter 3 explains the design of Keris, presents application scenarios, shows how the language is implemented, and investigates how ef?cient this implementation is. Furthermore, it discusses the module system of Keris in relation to other module systems. This discussion is based on a general module system taxonomy. Chapter 4. Chapter 4 presents a case study for evaluating the module system of Keris. It focuses on the development of extensible compilers. Regarding extensibility, compilers are interesting for various reasons: ? Programming languages are often extended by different people for different, often domain-speci?c purposes. Implementing compilers for such extended languages from scratch is a tedious and quite redundant process. If compilers would be extensible systems, the implementation of specialized programming languages would be easier and more effective.

1.7 Overview

17

? Compilers are language processors; extending such systems requires that language constructs as well as operations upon language constructs are extensible. It is a well-known technical problem to facilitate extensions in both dimensions. ? Compilers are relatively complex systems with many recursively dependent datatypes and components. It is a challenge to extend such tightly interconnected systems consistently. Chapter 4 compares an extensible Java compiler implemented using mainstream object-oriented language features with one that was written in Keris. It shows how extensible modules can be used in practice to develop extensible systems safely and ef?ciently. Chapter 5. The last chapter concludes the thesis with a presentation of related work, a summary of the problems and solutions discussed in the dissertation, as well as some notes about possible future work.

Chapter 2

A Formal Model for Extensible Software Components

This chapter introduces basic concepts of software component technology and component-oriented programming by discussing details of a simple but expressive theoretical component model. This component model is designed to support the implementation and evolution of lightweight, extensible components in object-oriented programming languages. The model is expressed as a small component-oriented programming language featuring dynamic component manufacturing, composition, and extension in a type-safe way through a minimal set of component specialization primitives. Furthermore, there is support for principles like explicit context dependencies, late composition, unanticipated extensibility, and strong encapsulation of component services. In contrast to other approaches, the services which components provide and require do not have to be linked explicitly. Instead, components are composed using high-level composition operators which implicitly link services by inferring the wiring. Finally, a formalization of the component model is presented which extends Featherweight Java [95, 164], a typed core calculus modeling the essential features of Java. The remainder of this chapter is organized as follows. Section 2.1 discusses the implications of component-based software development, emphasizing, in particular, the importance of software adaptability, extensibility, and software evolution in general. Section 2.2 introduces the component model step by step, presenting the various component specialization primitives and motivating their application. A formalization of the model is described in Section 2.3 in form of a core component calculus. We present a type system and prove that this system is sound with respect to the given operational semantics.

20

A Formal Model for Extensible Software Components

2.1 Motivation
Before presenting details we motivate speci?c design principles of the component model. The main features of the model include: ? Components are ?rst-class core language abstractions, ? Composition operators enable coarse-grained component composition, ? Components can be manufactured and composed dynamically, ? Components are extensible, promoting advanced component reuse, adaptability, and evolution. Furthermore, the model expresses many principles common among componentoriented formalisms, like explicit context dependencies (external linking), cyclic component linking, and strong encapsulation of component services. Component manufacturing, composition, and specialization are type-safe. The type system supports subtype polymorphism for components and component instances.

2.1.1 Language Integration
Currently, component-based programming relies on mainstream object-oriented programming languages. Object-oriented languages seem to promote component-based programming well: They support encapsulation of state and behavior, inheritance and overriding enable extensibility, and subtype polymorphism and late binding make it possible to ?exibly reuse objects and classes. Unfortunately, object-oriented techniques alone are not powerful enough to provide ?exible and type-safe solutions for component composition and evolution. Therefore, industrial component models like CORBA [83], COM [172], and JavaBeans [190] rely on additional concepts, namely component frameworks and meta-programming. They provide a class framework for modeling components and component interactions together with an informal set of implementation rules. Components are composed using meta-programming technology, e.g. re?ection. This ad-hoc approach yields a dynamic and ?exible composition mechanism, but often does not guarantee static type security. Furthermore, the degree of extensibility depends on the framework or the meta-programming tools. In general, it has to be planned ahead, for instance by using suitable design patterns typically derived from the AbstractFactory pattern [74]. This lack of support for unanticipated extensibility hinders a smooth evolution process substantially. In [5], Aldrich and Chambers point out another important issue. They observe that in general, implementation languages are only loosely coupled to architectural descriptions. As a consequence, speci?cations of software architectures [159, 184] formally expressed in architecture description languages [130] are often quite different from the actual object-oriented implementations. This makes it dif?cult to trace architectural properties in the implementation, on

2.1 Motivation

21

which basis it would be possible to verify that an implementation is consistent with the corresponding architectural description. Related to this issue is the critique of Achermann et.al. that object-oriented source code exposes class hierarchies but not object interactions [2]. This lack of explicit architecture makes it dif?cult to adapt an application to new requirements since the code that plugs objects together is distributed among the objects themselves and therefore dif?cult to overlook. For these reasons, various proposals have recently been put forward to integrate concepts from architecture description languages into object-oriented programming languages [182, 187, 5]. These so-called component-oriented programming languages offer linguistic facilities for programming software components, for de?ning component interactions, and for composing software from components. Their promise is to do that in an effective and type-safe way, ruling out illegal interaction patterns.

2.1.2 Coarse-Grained Composition
Existing proposals for component abstractions on the programming language level like ComponentJ [182] and ArchJava [5] directly adopt concepts and principles of architecture description languages. They provide constructs for manufacturing components with required and provided services. A service associates a port name with a type. Components are composed by linking ports with explicit plug instructions. The type system ensures that all ports are linked and that links are established only between compatible ports or service providers. This approach does not scale, since for linking a component with n services, n explicit plug instructions have to be issued specifying the wiring of the component. For large-scale components with a lot of services involved, linking components in such a way is tedious and error-prone. Furthermore, the sequence of plug instructions rather obscures the architecture of the system instead of making it explicit. Therefore, McDirmid, Flatt, and Hsieh argue that component systems should offer the possibility to connect many services at once [129]. We address this requirement by simplifying the interface of components and by providing means to infer the wiring of components. Components can be composed with simple operators and without explicitly plugging ports. We also support incremental linking; i.e. we allow that components get only partially linked. For instance, components can be sent around in a distributed system and only the services available at a speci?c location get linked until we have a fully linked component that ?nally can be instantiated.

2.1.3 Dynamic Manufacturing and Composition
Software component technology distinguishes two main tasks: component manufacturing and component composition. These two tasks are often presented as

22

A Formal Model for Extensible Software Components

separate steps being performed one after the other. But in practice, both tasks coincide when new components are built by composing other components. This form of component manufacturing is called hierarchical component composition. Often it is assumed that component manufacturing is done statically before component composition takes place. Component composition itself cannot always be performed statically, since there are cases where the concrete components are only known at runtime. For this reason, component-based systems are often required to support some form of dynamic linking. This observation implies that we also have to be able to manufacture software components dynamically, since component linking and manufacturing coincide in hierarchical component compositions. Thus, it makes no sense to assume that both manufacturing and composition are atomic tasks that are performed consecutively. In highly dynamic systems, component manufacturing and composition is rather an interleaved process in which components are created and linked incrementally.

2.1.4 Extensibility
When using components from external vendors, it is quite unlikely that the interfaces of these third-party components ?t to the required interfaces off-the-shelf. Therefore, it is often necessary to adapt components before they can be deployed in a particular system [90, 150]. As Section 2.1.3 pointed out already, there are applications where components are only supplied at runtime. In this case it is indispensable that components can even be adapted dynamically on-the-?y. In a prototype-based system, new entities can only be created by cloning an existing entity and by modifying the cloned entity afterwards. Our component model is prototype-based in this sense: new components can only be created by specializing an already existing component. As a consequence, we can derive two different components from a single base component. By doing this, we factor out potentially reusable pieces, avoiding duplicated programming effort. In addition, this technique supports software evolution. Software evolution includes the maintenance and extension of component features and interfaces. Supporting software evolution is important, since components and component systems are architectural building blocks and as such subject to continuous changes. Extensibility is not only required for smoothly evolving single software components. It is even more desired for enabling the development of families of software applications and product-lines in general. Traditionally, components are static black-boxes emphasizing encapsulation over extensibility. Features can be added to components only by creating new components which forward all existing services to the old versions in addition to the new services they provide. This is a cumbersome and error-prone procedure that duplicates programming efforts and complicates maintenance.

2.2 Prototype-Based Components

23

2.2 Prototype-Based Components
We now introduce step by step our component abstractions in the context of a small, statically typed, object-oriented Java-like base language. Our component model relies on a nominal type system [164] of the base language. In nominal type systems, two types with the same structure but a different name are considered to be different, as opposed to structural type systems that match the structure and not the name. The model does not require other base language features like inheritance or even classes, even though we present it here in a class-based context. Therefore it should be straightforward to add similar component abstractions to other object-oriented languages with nominal object types.

2.2.1 Components and Component Instances
In our model, a component is a unit of computation that can be accessed through a well-de?ned interface. A component is a ?rst-class citizen. Its interface speci?es the services it provides to allow other components to interact with it. The interface also speci?es the services a component requires from other components to be able to provide the own services. The component model is prototype-based; i.e. the only way to create a new component is by specializing an already existing prototypical component. For bootstrapping purposes, there is a single prede?ned component that does not provide or require any services. This empty component is denoted by the keyword component. We strictly distinguish between components and component instances. A component describes a template for possibly multiple component instances. It is the component instances that provide the actual services. Services are described by object types, e.g. types de?ned by classes or interfaces. Objects act as service providers. They usually get created at component instantiation time. Therefore, components can be seen as organizational units with well-de?ned interfaces that structure object interdependencies. Components have neither a unique identity, nor an observable state. They come to life through objects at the time they get instantiated. In the remainder of this section we introduce prototype-based components by example. We derive some simple software components that could be used, for instance, in online retail stores to manage stock and clients.

2.2.2 Service Provision
We start by manufacturing a software component that provides access to a customer database. We want every customer to have a unique client number. A service that maps customer names to client numbers could be described by the following interface de?nition:

24

A Formal Model for Extensible Software Components
interface CustomerIDs { int lookupId(String name); }

The CustomerIDs interface consists of a single method lookupId. Given a customer’s name, this method tries to ?nd the corresponding client number. If there is no client number yet for this customer, a new number will be issued and returned by lookupId. Imagine we have the following implementation of the CustomerIDs interface:
class MyCustomerIDs implements CustomerIDs { MyCustomerIDs() { ... } int lookupId(String name) { ... } ... }

With this implementation we are able to manufacture a software component that provides a CustomerIDs service. Since we can only create new components by specializing existing ones, we have to take the empty component as a prototype and specialize it such that it provides a CustomerIDs service. In our calculus, this is done with the provides primitive:
c0 = component provides CustomerIDs as This with new MyCustomerIDs();

The clause d provides C as x with e returns a new component that specializes component d by providing some possibly new services C . These services are implemented by an object speci?ed with expression e. Note that we are extending a component here. Therefore, expression e only gets evaluated at component instantiation time. x is a variable that gets bound to the own component instance. In object-oriented languages this self reference corresponds to variable this or self referring to the own object. Only expression e is in the scope of x. Typically, expression e refers to other services of the own component instance via x. We use a graphical notation to illustrate the structure of components. Figure 2.1 gives an overview. Here, a component is represented by a box. The gray part corresponds to the prototype of the component, the white part de?nes the specialization. In our graphical notation, services are symbolized by diamonds. Objects are black dots. An arrow from a service to an object expresses that this object implements the service. We also have outlined arrows that depict service dependencies. These dependencies are found by inspecting the code; they are not explicit in the calculus. If an object refers to other services, for instance via the self reference, then every such dependency is speci?ed with an outlined arrow. Figure 2.2 illustrates the structure of the previously de?ned component c0.

2.2 Prototype-Based Components

25

Service access Service
pt

Prototype

A

B

Refinement

Object

Service implementation

Figure 2.1: Schematic component notation.

2.2.3 Component Instantiation
We already pointed out that components have to be instantiated before services can be accessed. In our component calculus, a component gets instantiated with the new primitive.
i0 = new c0;

The services of a component instance like i0 get accessed via the service selection operator ::. The expression e :: C selects a service C from component instance e. C is a type name that identi?es a service and at the same time describes the interface of the service. Other component models refer to services via named ports. In these models it is possible to have two distinct ports with the same interface type but different port names. In programming languages with nominal type systems like Java [82] or C# [86], types do not only de?ne structural object properties like available methods or ?elds. They also stand for semantic speci?cations [38], and as such, they are well-suited for specifying roles. In those type systems it is possible to have two distinct types with the same interface description but different type names. Therefore, it is no restriction to describe a service only by its type without having a port name in addition. This simpli?es the de?nition of components and the service access in general signi?cantly. It also acts as a standardization of port names. One only has to know a service’s type in order to access it from a component instance. It is not necessary to lookup the port name in the component speci?cation. We will see later in Section 2.2.7 that this standardization of component port names has another advantage: it promotes automatic composition mechanisms. Of course, in the few cases where two ports

26 Component c0:

A Formal Model for Extensible Software Components Component c1:
c0

Component c2:
c1

A

A B

A B

A = CustomerIDs B = CustomerDB Figure 2.2: Component evolution.

could share a type, we have to create new type names and in the worst case use wrappers to adapt existing objects. Here is an example demonstrating the usage of the component i0. In this example we call the lookupId method of the CustomerIDs service provided by component instance i0. The service selection operator :: and the . operator are both left-associative and both operators have the same precedence.
i0::CustomerIDs.lookupId("John Smith");

2.2.4 Component Specialization
Now imagine the requirements for our customer administration component c0 are changing and we also need the capability to store customer names and addresses. We can describe this new database service with the following interface:
interface CustomerDB { void enter(String name, String address); String lookupName(int id); String lookupAddr(int id); }

Method enter stores a new address in the database. Whenever a new customer is entered, a new client number will automatically be assigned to this new customer. The methods lookupName and lookupAddr ?nd a name or address for a given client number. The following class implements CustomerDB. It depends on a component instance that provides a CustomerIDs service. This component instance is passed as a parameter to the constructor. Following [182], we use the notation [S1 , ..., Sn ] to specify the type for component instances supporting at least the services S1 to Sn .

2.2 Prototype-Based Components
class MyCustomerDB implements CustomerDB { [CustomerIDs] This; MyCustomerDB([CustomerIDs] This) { this.This = This; } ... This::CustomerIDs.lookupId(name) ... }

27

We already mentioned that prototype-based components offer a smooth component evolution mechanism. For creating an extended version of a component, we just have to interpret the old component as a prototype. In our example, the new specialized component evolves out of the old one simply by an application of the provides primitive. The following code specializes component c0 by additionally providing the service CustomerDB.
c1 = c0 provides CustomerDB as This with new MyCustomerDB(This);

The provides primitive can also be used to specialize a component by de?ning a new service implementation for an already provided service. In this case we override the old implementation. Here is the de?nition of component c2 that specializes c1 by using, for instance, a more ef?cient client numbering service.
c2 = c1 provides CustomerIDs as This with new EfficientCustomerIDs();

The service implementation for CustomerDB, speci?ed already in the prototype of c2, now automatically refers to this new numbering service implementation. A graphical illustration of components c1 and c2 can be found in Figure 2.2.

2.2.5 Service Forwarding
Until now, we are only able to develop new components by adding new services or by overriding existing service implementations of a prototypical component. Every service we add gets exported automatically; i.e. it can be accessed from outside the component. This “white-box approach” is necessary to keep the component extensible, because it allows one to override service implementations and to add new service implementations that refer to already existing services. But often we do not want to publish internally used services. Being able to hide internal interfaces is an important feature of component-oriented programming. Our component calculus supports this form of encapsulation with the component projection operator forwards. The clause d forwards C as x to e extends component prototype d with the services C . The new component forwards accesses of these services to the component instance e. Expression e can refer to other

28

A Formal Model for Extensible Software Components Component c3:
A = CustomerIDs B = CustomerDB
c2

A B B

Figure 2.3: Service forwarding.

services of the own component instance via the self reference x. This primitive is primarily used for composing components hierarchically. In the following example it is speci?cally used to hide services and service interconnections. Thus, it turns a “white-box” into a “black-box” by wrapping the original component.
c3 = component forwards CustomerDB as This to new c2;

In this example we create a new component c 3 which only provides a single service CustomerDB by forwarding calls to a component instance of c2. Thus, we hide the CustomerIDs service of component c2. We say, an instance of c2 is nested inside every instance of component c3. We call the hidden CustomerIDs service an internal service of component c3. An illustration of c3 instances can be found in Figure 2.3. Here, the instance of component c2 which is contained in c3 is depicted by a nested box. Service implementations are now arrows pointing from external services to internal services of nested component instances.

2.2.6 Service Abstraction
The previous sections showed how to evolve a component by incrementally adding new services either by a new service implementation or by forwarding services to a nested component instance. In both cases new services and service implementations were introduced at the same time. This approach does not cover the development of components that depend on services provided by other components. Furthermore, it does not even allow programmers to de?ne two services where service implementations depend mutually on each other, because services get introduced linearly, one after the other. We tackle both problems with a service abstraction facility. Before going into detail, we proceed by manufacturing a new component for handling orders of a shop. The service for placing orders is described by the following interface:

2.2 Prototype-Based Components
interface OrderDB { void order(int id, String article, int num); }

29

With method order, new orders can be placed. Orders consists of a client number, an article descriptor and the number of items to deliver. If possible, this method tries to execute the order immediately. Therefore it needs access to a stock database service speci?ed by the following interface:
interface StockDB { void enter(String article, int num); void remove(String article, int num); int available(String article); }

Method order checks if the articles are available. If this is the case, it removes them from the stock database and sends the articles to the customer’s address. Therefore, service implementations of OrderDB like MyOrderDB also need access to the CustomerDB service. Thus, the constructor of the following class expects a component instance providing StockDB and CustomerDB services.
class MyOrderDB implements OrderDB { [StockDB, CustomerDB] This; MyOrderDB([StockDB, CustomerDB] This) { this.This = This; } ... }

Since we do not want our order system component to already commit to a speci?c service implementation for the StockDB and the CustomerDB service, we have to factor out these two services. In order to make use of the component later, we then either have to provide the missing service implementations from outside at composition time, or we further specialize the component and provide service implementations from inside the component. In our component calculus, services are factored out with the service abstraction primitive requires . The requires primitive allows one to de?ne services that are required for implementing other services without the need for specifying a concrete service implementation. We make use of this abstraction facility in the following implementation of component d0 which requires two services CustomerDB and StockDB and provides a OrderDB service. Figure 2.4 contains an illustration of component d0.

30 Component d0:

A Formal Model for Extensible Software Components Component e0:
B = CustomerDB C = OrderDB D = StockDB

B

D

D

C

C

Figure 2.4: Service abstraction.

d0 = component requires CustomerDB requires StockDB provides OrderDB as This with new MyOrderDB(This);

The expression d requires C takes a prototypical component d and returns a specialized version with a service C that has to be provided before the component can be instantiated. Other service implementations can refer to this service, even though there is no implementation known yet. This is why in the example above, self reference This has type [CustomerDB, StockDB, OrderDB] and thus is a legal parameter for the constructor of MyOrderDB. Components have a type of the form (R1 , . . . , Rn ? P1 , . . . , Pm ) where R1 to Rn are services required by the component, and P1 to Pm are the provided services. Consequently, the type of component d0 is (CustomerDB, StockDB ? OrderDB). As already mentioned before, component d0 cannot be instantiated, since not all service provisions are resolved yet. We ?rst have to derive a new component that speci?es implementations for all required services before we can actually create component instances. We continue in our example by de?ning a new component e0 that provides an implementation for a StockDB service.

e0 = component requires OrderDB provides StockDB as This with new MyStockDB(This);

The implementation of service StockDB makes use of an externally supplied OrderDB service. This is, because in cases where new stock arrives and orders are still pending, it would trigger the process of sending out the articles. The type of component e0 is (OrderDB ? StockDB).

2.2 Prototype-Based Components Component f1:
d0

31 Component f2:
d0

B
e0

B

B = CustomerDB C = OrderDB D = StockDB

D

D

D

C

C

C

e0

Figure 2.5: Component composition.

2.2.7 Composition of Components
In the previous section we de?ned two components d0 and e0 that mutually refer to each other; i.e. the service provided by one component is required by the other one. We would now like to link these two components together yielding a component which only requires a CustomerDB service and provides both a OrderDB and a StockDB service. The simplest way to achieve this is to specialize component d0 with an implementation for service StockDB. This service is provided by a specialized version of e0 that refers back to the OrderDB service provided by the enclosing d0 prototype.
f0 = d0 provides StockDB as This with (new (e0 provides OrderDB as Me with This::OrderDB))::StockDB

This technique does not work for components where more than two services depend mutual recursively on each other. For such cases we have to use the forwards primitive in order to link the components together. A graphical illustration of the resulting component f1 can be found in Figure 2.5.
f1 = d0 forwards StockDB as This to new (e0 provides OrderDB as Me with This::OrderDB)

The previously discussed composition schemes use service forwarding where the nested component instance refers back to services provided by the enclosing component being de?ned. Our component calculus offers an alternative to this rather complicated composition pattern. With the mixin operator it is possible to create a new component by mixing in the services provided by another component. The expression e mixin d specializes the prototypical component e with

32

A Formal Model for Extensible Software Components

component d ; i.e. e gets specialized by including all the services provided by component d . Services that are already present in e are automatically overridden by the corresponding services of d . This operation identi?es the self references of both components e and d by binding it to the resulting merged component. The resulting component requires services that are either required by e or d and that are not provided by any of the two components. It provides all the services that are provided by either e or d . Thus, the following expression yields a component f2 of type (CustomerDB ? OrderDB, StockDB).
f2 = d0 mixin e0

When using such a mixin-based composition scheme, one has to be aware that for the expression above, all services e0 provides get mixed in, no matter what static type e0 has in this context. Thus, we might accidentally override services provided by component d0. Sometimes this is desired, for instance, when we want to express that e0 has got the more recent or more trustworthy service implementations than d0. For cases where we want to de?ne explicitly what services to override, we have to use a forwarding-based composition scheme instead. For instance, we could write d0 forwards StockDB as This to new (e0 forwards OrderDB as Me to This). All three components de?ned in this section are equivalent in the sense that they provide and require the same services and that services are implemented by the same objects. Though, Figure 2.5 reveals that the internal structure of components manufactured using the forwarding and the mixin technique are quite different. This is an indication that they possibly behave differently when it comes to specialize both components. In the given example, this is not the case. But one might imagine a bigger nested component instance where overriding a service of the enclosing component does not have any effect on the formerly forwarded service of the nested component, while it would have an effect on the mixin-based approach. We ?nish this section by manufacturing a component that permits access to customer related services only; i.e. CustomerDB and OrderDB. We do this by ?rst linking together the customer management component c2 and the stock management component f2. The linked component c2 mixin f2 provides all the various services introduced in this section. Since we want to restrict the access to customer related services, we have to project the resulting component to a new component g0 offering only the desired services.
g0 = component forwards CustomerDB, OrderDB as This to new (c2 mixin f2)

g0 has type ( ? CustomerDB, OrderDB); thus, it is possible to instantiate this component. The structure of an instance of our ?nal component g0 is presented in

2.2 Prototype-Based Components

33

c2

A

B

B

f2

D

C

C

Figure 2.6: The ?nal component g0.

Figure 2.6. Leaving out some intermediate steps, we could have composed g0 out of three essential components: c2 which administers clients, d0 which handles orders, and e0 which manages the stock.
g0 = component forwards CustomerDB, OrderDB as This to new (c2 mixin d0 mixin e0)

This short expression demonstrates how concise component manufacturing and linking is in the presented model if high-level composition operators like forwards and mixin are used. Furthermore the expression indicates how components are typically deployed. The sub-expression c2 mixin d0 mixin e01 ?rst links components c2, d0, and e0, yielding a single extensible component. This component exposes internal interfaces. We might want that for instance to use this component as a basis for further specializations. But before instantiating (or even selling) it, the internals should be hidden by wrapping the component in a black-box only offering speci?c functionality with restricted support for extensibility. In the example above, this is achieved using the component projection primitive forwards.

1 Please

note that the mixin operator is associative.

34

A Formal Model for Extensible Software Components

2.3 Component Calculus
In this section we present a formalization of our prototype-based component model for a functional subset of Java. Our calculus is built on top of Featherweight Java (FJ) [95]. We omit type casts from the original calculus since type casts are irrelevant for our application and complicate the formal treatment unnecessarily.

2.3.1 Syntax
The syntax of the calculus is presented in Figure 2.7. Like in FJ, a program consists of a collection of class declarations plus an expression to be evaluated. The syntax of classes, constructors, and methods is identical to FJ. We only extend the set of expressions with the primitives introduced in Section 2.2. In particular, Program
P = L;e

program class declaration constructor declaration method declaration variable ?eld selection method invocation object creation empty component service abstraction service implementation component projection component mixin component instantiation service selection object type component type component instance type
Figure 2.7: Syntax.

Class
L = class C extends C { T f ; K ; M }

Constructor
K = C(T f ) { super(f ); this.f = f ; }

Method
M = T m(T x) { return e; } x e.f e.m(e) new C(e) component e requires C e provides C as x with e e forwards C as x to e e mixin e new e e :: C

Expressions
e = | | | | | | | | | |

Types
T = C | C ? C | [C]

2.3 Component Calculus

35

we add an empty component, a service abstraction and implementation primitive, a component projection primitive as well as a component mixin operator. In addition, we have a construct for instantiating components and a service selection operator for accessing services from a component instance. In the calculus, a service is characterized by a class name. In contrast to the presentation in Section 2.2.2, the calculus only supports a provides primitive that introduces a single service. This is no restriction since we can easily model the former semantics by using the more general forwards construct in combination with a nested component that implements several services with a single object. For instance, we could encode the component de?ned by the expression component provides C, D as This with new Impl(This) in the following way:
component forwards C, D as This to createNested(new Impl(This))

This code relies on a function createNested which could have the following implementation:
[C, D] createNested(Impl impl) { return new (component provides C as This with impl provides D as This with impl); }

FJ’s types only consist of class names. For simplicity, even Java’s interface types are left out. For working with components and component instances we need syntactical forms for expressing component types and component instance types. Please note that compared to the explanations in Section 2.2.6 on page 28, we use a slightly simpli?ed syntax for component types without enclosing parenthesis. As in FJ, we write T as a shortcut for T1 , . . . , Tn . We use similar shorthands for sequences like C, f , e, etc. as well as for pairs of sequences like T f . Such a pair of sequences is a shorthand for T1 f1 , . . . , Tn fn . We assume that sequences of ?eld declarations, parameter names, and method declarations do not contain duplicate names. Furthermore, the service implementation and the component projection operators always introduce fresh names for their self reference variable. For the presentation of the operational semantics in the next section we assume to apply alpha-renaming whenever necessary to avoid name capture.

2.3.2 Semantics
The semantics of our calculus are formalized in Figure 2.8 and Figure 2.9 as a small-step operational semantics. The reduction relation has the form e → e

36

A Formal Model for Extensible Software Components

(R-Fld)

?elds(C) = T f
new C(e).fi
(R-Inv)

→ ei

(R-Serv)

service(new e, e, C) = e
new e :: C

→ e

mbody(m, C) = (x, e0 )
new C(e).m(d) → [d/x, new C(e)/this] e0

(R-Req) e requires C (R-MixP)

→ e

(R-MixC) e mixin component → e

e mixin (e0 provides C as x with d) → (e mixin e0 ) provides C as x with d e mixin (e0 forwards C as x to d) → (e mixin e0 ) forwards C as x to d

(R-MixF)

Figure 2.8: Operational semantics.

(RC-Fld)

e → e e.f → e .f

(RC-InvR)

e → e e.m(d) → e .m(d)

(RC-InvA)

ei → ei d.m(. . ., ei , . . .) → d.m(. . ., ei , . . .) ei → ei
new C(. . ., ei , . . .) → new C(. . ., ei , . . .)
(RC-Serv)

(RC-NewA)

(RC-Inst)

e → e
new e → new e
(RC-Req)

e → e e :: C → e :: C

e → e e requires C → e requires C e → e

(RC-Prv)

e provides C as x with d → e provides C as x with d e → e e forwards C as x to d → e forwards C as x to d e → e
(RC-MixR)

(RC-Fwd)

(RC-MixL)

d → d e mixin d → e mixin d

e mixin d → e mixin d

Figure 2.9: Congruence rules for the operational semantics.

2.3 Component Calculus Field lookup ?elds(Object) = ?
CT (C) = class C extends D { T f ; K ; M }

37

?elds(D) = U g

?elds(C) = U g, T f Method body lookup
CT (C) = class C extends D { U f ; K ; M } T m(T x) { return e; } ∈ M

mbody(m, C) = (x, e)
CT (C) = class C extends D { T f ; K ; M } m not de?ned in M

mbody(m, C) = mbody(m, D) Service lookup service(e, e0 provides C as x with d, C) = [e/x] d service(e, e0 forwards C as x to d, Ci ) = [e/x] d :: Ci
D=C

service(e, e0 provides C as x with d, D) = service(e, e0 , D)
D∈C

service(e, e0 forwards C as x to d, D) = service(e, e0 , D)
Figure 2.10: Auxiliary de?nitions for evaluation.

expressing that expression e evaluates to expression e in a single step. Figure 2.8 speci?es the basic evaluation rules, Figure 2.9 de?nes the evaluation contexts. We adopt all reduction rules from FJ and de?ne various new rules for the new syntactical constructs. Service abstractions simply reduce to the prototype component, so they do not have any computational effect. The semantics of mixins are described by three reduction rules, depending on the form of the right operand. Mixing in the empty component results in the same component. For service implementations and component projections the prototype of the right operand is mixed into the left operand and the specialization itself is applied to that new component. Thus, the two operands are incrementally combined into a single component where concrete (i.e. non-abstract) service de?nitions of the right operand override de?nitions of the left operand. The reduction rule for service selection operations relies on an auxiliary function service(e , e, C) which searches the component de?nition e of component instance e for a service C . Note that the service lookup performed by service(e , e, C) is only de?ned on service implementation and component pro-

38

A Formal Model for Extensible Software Components Well-formed types
Object wf

CT (C) = class C extends D {. . .} C wf C ∩C =? C wf [C] wf

C, C wf

C ? C wf

Subtyping
C <: C C <: D D <: E CT (C) = class C extends D {. . .} C <: D D?C [C] <: [D] C <: E C?D D ?C

C ? C <: D ? D

Figure 2.11: Well-formed types and subtyping.

jection terms. Thus, even for cases where e provides a service C , evaluation of service(e , e, C) may not be well-de?ned if e has not been evaluated far enough. In such a case, we ?rst have to apply rules (RC-Inst) and (RC-Serv) to further evaluate the component before making use of the actual service selection rule (RServ). An overview of all auxiliary de?nitions used by the operational semantics of Figure 2.8 are given in Figure 2.10.

2.3.3 Type System
There are three different forms of types: object types, component types and component instance types. An object type is simply denoted by a class name C . An object type is well-formed if the class name appears in the domain of the class table CT . The class table is a mapping from class names to class declarations. As in the presentation of FJ, we assume that we have a ?xed prede?ned class table to simplify the notation. Otherwise we would have to parameterize all typing rules with CT . It is assumed that CT satis?es some sanity conditions: Object ∈ dom(CT ), all types appearing explicitly in CT are well-formed, and there are no cycles in the subtype relation induced by CT . Component types have the form C ? C where C speci?es the services required by the component and C speci?es the provided services. Services are described by object types. A component type is only well-formed if the sets of the provided and required services are disjoint. [C] types a component instance that provides the services C . Figure 2.11 summarizes the well-formedness criteria on types. Method types cannot be written explicitly. In the type system, we use the notation T → T for a method with the argument types T and the result type

2.3 Component Calculus

39

Expression typing
(T-Var) Γ

x : Γ (x) d:C

(T-Fld)

Γ

e:C Γ Γ

?elds(C) = T f
e.fi : Ti e:U U <: T

(T-Inv)

Γ

mtype(m, C) = T → T
Γ d.m(e) : T Γ e:U

(T-New)

?elds(C) = T f
Γ e:??C
new e : [C]
(T-Com) Γ

U <: T

new C(e) : C
(T-Serv)

(T-Inst)

Γ Γ

Γ Γ

e : [C] e :: Ci : Ci

component : ? ? ?

(T-Mix)

Γ Γ

e:C?C

Γ

d:D?D

e mixin d : (C ∪ D)\(C ∪ D ) ? C ∪ D C wf Γ Γ e:D?D

(T-Req)

e requires C : D ∪ C ? D \C Γ , x : [D ∪ D ∪ C] d:B B <: C

(T-Prv)

C wf

Γ Γ

e:D?D

e provides C as x with d : D\C ? D ∪ C e:D?D Γ Γ , x : [D ∪ D ∪ C] d : [B] C?B

(T-Fwd)

C wf

Γ

e forwards C as x to d : D\C ? D ∪ C

Method and class typing
T wf T wf x : T , this : C e:U U <: T

(T-Meth) CT (C) = class C extends D {. . .}

override(m, D, T → T )

T m(T x) { return e; } ok in C D wf
(T-Class)

T wf

K = C(U g, T f ) { super(g); this.f = f ; }

?elds(D) = U g

M ok in C

class C extends D {T f ; K ; M } ok

Figure 2.12: Type system.

40

A Formal Model for Extensible Software Components Method type lookup

CT (C) = class C extends D { U f ; K ; M }

T m(T x) { return e; } ∈ M

mtype(m, C) = T → T
CT (C) = class C extends D { T f ; K ; M } m not de?ned in M

mtype(m, C) = mtype(m, D) Valid method overriding mtype(m, C) = U → U implies U = T and U = T override(m, C, T → T )
Figure 2.13: Auxiliary de?nitions for typing.

T . Note that depending on the context, T denotes either a sequence of types (T1 , . . . , Tn ) or a set of types {T1 , . . . , Tn }. We use shorthands of the form C ∪ D for expressing C ∪ {D}.

Figure 2.11 also de?nes a subtype relation T <: T between two types T and T . Subtyping of object types is identical to subtyping in FJ. A component instance type is a subtype of another component instance type if the services provided by the supertype constitute a subset of the subtype’s provided services. A component type τ1 = C ? C is a subtype of another component type τ2 = D ? D , if τ1 requires less and provides more services than τ2 ; i.e. C ? D and D ? C . This corresponds to the typical co/contravariant subtyping rule for function types [46] adopted already by related approaches to component subtyping [71, 182, 79]. Section 2.3.6 discusses an alternative subtyping rule which is more ?exible but also more complex. The type system is presented in Figure 2.12. There are three different typing judgment forms. The one for classes has the form “L ok” meaning that class declaration L is type correct. The judgment for method declarations has the form “M ok in C ,” expressing that the method declaration M typechecks as a declaration of class C . Both rules are directly taken from FJ. The judgment for expressions Γ e : T relates a type T to an expression e. Most typing rules for expressions are straightforward. (T-Prv) and (T-Fwd) are among the interesting rules. Here, the service provision expression is typed under an extended environment, including the self reference to the own component instance. We assume that the type of the self reference variable corresponds to a component instance type offering both, the services that are required and provided by the component being specialized. The auxiliary de?nitions used for typing ?eld and method selections as well as object creations are directly adopted from FJ and summarized in Figure 2.13.

2.3 Component Calculus

41

2.3.4 Type Soundness
The main purpose of a static type system is to prevent the occurrence of errors during the execution of a program. In the context of our language, errors would typically arise when unknown methods get invoked, or unde?ned names or component services are accessed. In this section we show that our type system from Figure 2.8 prevents such errors. Our reasoning follows the style of Wright and Felleisen [209]. It is based on a subject reduction theorem stating that for a welltyped term e which evaluate to a new term e , this new term e has to be welltyped as well, with a type that is a subtype of the original type. Unfortunately, this property does not hold for the type system and the semantics presented so far. This is due to the fact that our type system supports two different ways to abstract over services: explicitly via the requires primitive, and implicitly via subtyping. The following example exhibits the problems. Suppose we have the following class de?nition which refers to an arbitrary type A:
class C { A a; C(A a) { this.a = a; } (A ? C) foo((A ?) x) { return x provides C as This with new C(This::A); } }

Under the assumption that the identi?er a refers to an object of type A, evaluating the expression new C(a).foo(component) of component type A ? C will yield the term component provides C as This with new C(This::A). Obviously, this term is not well-typed anymore, since This has only type [C] and therefore does not provide the service A which is selected in the service implementation term new C(This::A). We could easily ?x the problem by making component subtyping invariant in the required part. This approach would not restrict the expressiveness, but require that programmers coerce components explicitly to the right type by issuing extra requires clauses. Such coercions could also be done automatically by an appropriate type system formalism which makes implicit introductions of required services explicit by inserting additional requires clauses during type assignment. Since we do not want to make sacri?ces regarding the subtype relation, we pursue a different approach. Instead of making component subtyping invariant in the required part, we give up the necessity to declare required services explicitly before accessing them in another service implementation. In such a setting, the type checker has to infer the requirements of service implementations. In

42

A Formal Model for Extensible Software Components

the type system, this is achieved by weakening the typing rules for provides and forwards terms in the following way:
(T-Prv’)

C wf Γ

Γ

e:D?D

Γ , x : [D ]

d:B

B <: C

e provides C as x with d : (D ∪ D )\(D ∪ C) ? D ∪ C Γ e:D?D Γ , x : [D ] d : [B] C?B

(T-Fwd’)

C wf Γ

e forwards C as x to d : (D ∪ D )\(D ∪ C) ? D ∪ C

In this weaker system, which uses the rules (T-Prv’) and (T-Fwd’), we allow that provides and forwards primitives introduce service abstractions in a nondeterministic way. We show type soundness for this weaker type system. As a consequence, the type system with the stronger typing rules, presented in Figure 2.12, is sound as well in the sense that term evaluation will not get stuck due to “symbol not found,” “unknown method,” or “unknown service” errors. Programs written by users are initially typed with the stronger typing rules. The stronger type system has the advantage that typings are deterministic. Furthermore, its design follows the principle that service abstractions have to be declared explicitly. Weakening the type system is only necessary for subject reduction to hold. We present the type soundness results for our weaker type system in the style of Wright and Felleisen [209]. The full proof can be found in Appendix A. Theorem 2.3.1 (Subject reduction) If all types in Γ are well-formed, Γ and e → e , then Γ e : T for some T <: T .
e:T

For a well-typed term which can be reduced to a second term, Theorem 2.3.1 states that this second term is also well-typed. Furthermore, the type of the second term is a subtype of the type of the ?rst term. In addition to that we can show that the evaluation of every well-typed term does not get stuck. To formalize this, Figure 2.14 introduces a term subset denoting values. A value is either a component, a component instance or an object. For component values we have three different constructors. One denotes the empty component, one adds a new service to an existing component value, and a third one adds services by forwarding them to another component instance. Note that during evaluation, service abstractions are eliminated in expressions with reduction rule (R-Req). Therefore, the de?nition of component values does not include the requires primitive. Theorem 2.3.2 states that every well-typed term is either a value or it can be reduced to another term. In other words, evaluation does not get stuck for welltyped terms. Theorem 2.3.2 (Progress) some e . If
e : T then e is either a value or e → e for

2.3 Component Calculus Values
v = c | new c | new C(v)

43

component component instance object

Component values
c = component | c provides C as x with e | c forwards C as x to e

Figure 2.14: Term values.

2.3.5 Instantiation Evaluation
The operational semantics presented in Figure 2.8 formalizes an evaluation strategy that does not allow for the reduction of service implementation expressions inside of component instances. At component instantiation time, in fact none of these terms get evaluated. A term specifying a service implementation, for example in provides or forwards primitives, only gets evaluated when the service is accessed via the :: operator. Evaluating a service implementation expression more than once does not cause any problems in our calculus, since we only have functional objects without any side-effects. In real-world systems, this form of lazy evaluation can be ef?ciently implemented using a memoization technique, so that for multiple accesses to the same service, the service implementation expression will be evaluated only once. We decided to have this restriction in our calculus for several reasons. First, it keeps the calculus simple. But lazy evaluation also constitutes a reasonable evaluation strategy for service implementations. A strict evaluation order would be dif?cult to de?ne. For instance, we could evaluate the service implementations in the order the component evolution primitives introduce a service. But this would be a completely arbitrary choice, since services can be introduced using the requires primitive in any order, not implying any dependencies. With any ?xed strict evaluation order one risks to access a not yet initialized service from the service implementation that is currently being evaluated. With a lazy service evaluation strategy one still faces this problem, but only for recursive service references. With our operational semantics, such recursive dependencies could possibly lead to in?nite computations. We avoided this problem in the examples of the previous sections by not accessing services of the own component instance in service provision expressions directly. Instead, objects that implement a service of a component access other services of the same component instance only at the time a method of the other service actually has to be called, which happens typically after the component got instantiated.

44

A Formal Model for Extensible Software Components

(S-Emb)

e → e d;D e e C∈D e0 provides C as x with e C\D = ? e0 forwards C as x to e e0 e0 e0 provides C as x with e e0 e0 e0 forwards C as x to e

(S-Prv)

[d/x] e → e d;D e0 provides C as x with e [d/x] e → e d;D e0 forwards C as x to e d;D ∪ C d ;D e0 provides C as x with e d;D ∪ C d;D e0 forwards C as x to e

(S-Fwd)

(SC-Prv)

(SC-Fwd)

Figure 2.15: Operational semantics for component instantiation.

In order to support any reasonable evaluation strategy2 for component instantiations, we could extend our operational semantics. We only have to replace rule (RC-Inst) of Figure 2.8 with the following rule (R-Inst):
(R-Inst)

new e ; ?

e

e

new e → new e

This rule relies on a context dependent reduction semantics for service implementations during component instantiation. Intuitively, the clause d ; D e e expresses that evaluation of term e within component instance d results in term e . Furthermore, services contained in D are overridden and excluded from evaluation. This service exclusion ensures that we do not execute service implementations that are superseded by other more recently de?ned implementations. A de?nition of the service evaluation semantics can be found in Figure 2.15. Rule (S-Emb) embeds the original reduction relation → into making sure that the new semantics are a conservative extension of the previous version. Rules (S-Prv) and (S-Fwd) evaluate a service implementation expression. The rules (SC-Prv) and (SC-Fwd) propagate evaluation to more deeply nested services.

2.3.6 Component Subtyping
The subtyping rule presented so far only supports width-subtyping for component types; i.e. subtypes provide more and require less services. We could relax this rule to support a form of depth-subtyping which incorporates subtyping of
evaluation strategy is considered to be reasonable if it does not evaluate overridden service implementation expressions.
2 An

2.3 Component Calculus

45

service interface types. Here, τ1 <: τ2 would hold for two component types τ1 and τ2 , if the required service types of τ1 are supertypes of the required service types of τ2 . Similarly, the provided service types of τ1 are supposed to be subtypes of the provided service types of τ2 . The following alternative subtyping rule expresses exactly this relationship:
?i?j : Dj <: Ci ?i?j : Cj <: Di ?i?j : Cj <: Di [C] <: [D]

C ? C <: D ? D

To make use of such a rule in our type system, we would also have to update the subtyping rule for component instances together with the typing rules (T-Mix), (T-Req), (T-Prv), and (T-Fwd). Furthermore, the service lookup function would have to be modi?ed to re?ect the fact that we can now override a service by introducing a new service with a specialized type. Overall, depth-subtyping was not considered in the formalization of our calculus since it would have complicated the technical treatment signi?cantly. It would require many subtle technical restrictions dealing with “overlapping services” (service types that share a supertype) and service overriding in general, which would yield both complex and unintuitive semantics and typing rules.

46

A Formal Model for Extensible Software Components

2.4 Discussion
We now summarize the main ingredients of our component model, explain design decisions, and compare the constructs with related work. The following features are reviewed in detail: 1. Components require and provide services, 2. Components are templates for component instances, 3. Components are composed with explicit links (forwarding), or implicitly by wiring inference (mixins), and 4. Components are extensible through inheritance and overriding. Prototype-based components. In the presented model, components are ?rstclass abstractions that have neither state nor identity. Components de?ne the structure of component instances in the same way as classes de?ne the structure of objects. In most class-based languages, classes are either second-class entities, or they are ?rst-class and speci?ed using meta-classes. For simplicity, and in order to avoid such a meta-regress [201], our ?rst-class components are prototypebased [1]. Thus, instead of instantiating components from meta-component descriptions, new components are derived from prototypical components by a set of specialization primitives. Since components are stateless, we do not need a cloning operation known from object-based programming languages [49, 201]. This approach emphasizes the reuse of components in the creation of new, extended components by specialization. In fact, even component composition, which is mostly regarded as the only form of component reuse, is explained in terms of component specialization. Services. Components specify implementations for a set of provided services. These implementations may rely on services provided by other components. Thus, component types are characterized by a set of required and provided services. The presented model requires that required and provided services are speci?ed explicitly. Alternatively, it would be possible to fully infer such information, as Nierstrasz explains in his work on contractual types [148]. Services are described by nominal object types. In Section 2.2.3 we explained already why this approach does not constitute a restriction compared to component models with named ports [182, 187, 5]. Our service abstraction does not only allow us to conveniently refer to an aggregate of functionality, as opposed to individual methods, for instance. It also facilitates to override an aggregate of functionality consistently and promotes distinct, non-interfering views of components. Service speci?cations that are solely based on nominal object types were inspired by COM [172, 93].

2.4 Discussion

47

Composition. Services are added to a component using the service abstraction and service implementation primitives. For composing components, two mechanisms are supported: forwarding and mixin-based composition. Forwarding delegates the implementation of a set of services to another, possibly nested component instance. The signi?cance of the forwarding primitive is two-fold: On the one hand it enables hierarchical component compositions, on the other hand, it is used to hide internal services of encapsulated components. Unlike forwarding, the mixin-based approach merges two components by specializing one component with the services provided by another component and by rebinding the self reference to the merged component. Compared to the approach based on forwarding where the services of the nested component cannot be overridden and are therefore statically linked, component composition based on mixins yields a fully extensible component where it is possible to rede?ne service implementations by overriding. On the other hand, forwarding allows one to specify exactly what services to include, in contrast to the mixinbased approach which always mixes in all provided services. As mentioned already in Section 2.2.7, this may lead to accidental overrides. This weakness of the type system could be addressed, for example, by making overriding explicit and by including negative information in component types. Discussions about forwarding versus delegation (object-based inheritance), which is sometimes also used as an implementation technique for mixins, can be found, for instance, in [195, 105, 39]. Support for dynamic object-based inheritance in a class-based context is provided by Büchi’s and Weck’s generic wrappers [39], Kniesel’s object model Darwin [105], Ostermann, Mezini, and Wittman’s work on FamilyJ [208], and by Ostermann’s delegation layers [156]. Mixins were ?rst identi?ed as linguistic abstractions for generalizing inheritance by Bracha and Cook [29]. It was also Bracha who observed that inheritance can be seen as a mechanism for modular program composition [31]. With his work on the programming language Jigsaw [28], he lifts the notion of classbased inheritance and overriding to the level of modules. A formal account of mixins and mixin-based inheritance is given in [26, 72, 9]. In particular, Bono, Patel, and Shmatikov’s calculus of ?rst-class classes and mixins is similar to our work [26]. Bono’s mixins correspond to components in our model. Classes correspond roughly to components without required services. Based on the same framework, Bettini, Bono, and Venneri recently showed that mixins are suitable abstractions for mobile software components [24]. As opposed to the work by Bono et al., the programming language Scala [151] does not distinguish between classes and mixins. It only has the notion of classes that are interpreted as mixins when used in mixin-based class compositions (inheritance). This is identical to the way components are interpreted in the presented model. Scala’s mixins are formalized in [153]; the design was inspired by Strongtalk [17, 30], an extension of the programming language Smalltalk.

Chapter 3

Static Component Evolution with Extensible Modules

This chapter presents Keris, a pragmatic, backward-compatible extension of the programming language Java [82] with explicit support for modular, componentoriented programming of extensible software. The design of the programming language Keris was driven by the observation that extensibility on the module level can help to develop highly extensible applications [94]. Keris tries to facilitate the development of extensible software in Java by providing an additional layer for structuring software components. This layer features extensible modules as the basic building blocks of software. Keris provides primitives for creating and linking modules as well as mechanisms for extending modules or even fully linked programs statically. Programs written in Keris are closed in the sense that they can be executed, but they are open for extensions that statically add, re?ne or replace modules or even whole subsystems of interconnected modules. Extensibility does not have to be planned ahead and does not require modi?cations of existing source code, promoting a smooth software evolution process. Keris is a strongly typed language. The type system ensures that the de?nition, assembly, and evolution of modules is safe. The overall design of the language was guided by the aim to develop a pragmatic, implementable, and conservative extension of Java which fully reuses Java’s compilation model and target platform. This allows for a seamless integration of existing Java code with Keris. For this reason, Keris does not even utilize customized class loaders, which could have complicated the use of existing Java technology, like the RMI API, which relies already on special class loading techniques itself. Keris adopts Java’s compilation model because it proved to work well in practice. Furthermore, changes would be confusing for many programmers that are used to a development process which exploits separate compilation, as well as dynamic class loading and linking, and which relies on the notion of binary compatibility for relating different versions of binary components, i.e. class?les.

50

Static Component Evolution with Extensible Modules

The remainder of this chapter ?rst discusses shortcomings of the package system of Java for implementing reusable software components. This is followed by a presentation of the design of Keris. In this section, the new language features are introduced incrementally and explained with the help of many small examples. After that, various application scenarios are given to explain how extensible modules support the safe development of extensible software. The following section explains how the Keris compiler translates the high-level language to plain Java code. Finally, benchmarks are used to evaluate this source-level translation with respect to code size and runtime performance. The chapter concludes with a discussion about module systems in general and the module system of Keris in particular.

3.1 The Java Package System

51

3.1 The Java Package System
Like many popular object-oriented languages, Java provides relatively weak abstractions for programming in the large [56]. In this section we argue that Java’s packages are not expressive enough to be useful as abstractions for reusable and extensible software components. We do this by looking at three important properties: modularity, genericity, and extensibility. A more extensive discussion about module system related issues can be found in Section 3.6.

3.1.1 Modularity
Modularity is about the separation of components from other components both logically and physically. Therefore, modularity is essential to allow software components to be developed and compiled independently. This is typically achieved by means of encapsulation and by the explicit speci?cation of contracts between components. These contracts de?ne explicitly what services a component provides and what other components are needed to render the services. Java’s package system offers relatively good support for modular program development. It allows that context dependencies are speci?ed explicitly and it has support for separate compilation. On the other hand, Java’s package abstraction is often too coarse-grained, so that structuring software systems consisting of many smaller subsystems can become very dif?cult on the package level. For instance, large libraries often require means for internal structuring. It is possible to nest packages, but this also limits access to non-public members. Therefore all classes that need to access library internal data, which does not get exposed to the outside world, have to reside in the same package. Java’s package mechanism was designed mainly for structuring the name space and for grouping classes. A package does not even allow programmers to fully encapsulate a set of classes since the Java programming language does not offer a way to close packages.1 Thus, like in most popular object-oriented languages, classes are predominantly used to structure software systems. Classes on the other hand do not fully support modular programming either [192, 37]. In general, classes cannot be compiled separately; mutually dependent classes have to be compiled simultaneously. Since classes do not de?ne context dependencies explicitly, it is dif?cult to ?nd out on what other classes a class depends. This can only be found out by inspecting code. Even though classes are the basic building blocks for object-oriented programming, most classes do not mean anything in isolation. They have a role in a spe1 In Java, class loaders can be used at runtime to ensure that only a ?xed set of classes is loaded

from a package. The concept of sealed packages exploits this mechanism to restrict class loading for classes of such a package only to a particular Jar ?le. Regarding the open nature of packages, it is surprising to see that adding classes to a Java package is a fragile and unsafe operation. It can break programs that import all classes of a package via the star-import command.

52

Static Component Evolution with Extensible Modules

ci?c program structure, but there is only limited support to formulate this role or to make this role explicit. A priori, class interactions are implicit, if not using a special design pattern that emphasizes cooperating classes. Due to the lack of expressing dependencies between classes explicitly, formulating design patterns, software components, the architecture of a system, and even expressing the notion of a library on the level of the programming language turns out to be extremely dif?cult in general. A good example for this problem is the way how industrial component models represent software components in class-based object-oriented languages. In these models, the implementation of a software component is typically guided by a relatively weekly speci?ed programming protocol (e.g. JavaBeans [190]). The composition of software components is even mostly performed outside of the programming language, using meta-programming technologies. Thus, neither the process of manufacturing a component nor the component composition mechanism are type-safe.

3.1.2 Genericity
Modularity is essential for the independent development of software components. But modularity alone does not allow programmers to deploy components independently of each other. Support for independent deployment requires that modules are generic with respect to their context dependencies; i.e. they have to abstract over depending modules. Furthermore, a mechanism is needed to instantiate a component and resolve its context dependencies by linking it with concrete instances of depending components. Thus, genericity is required whenever one wants to reuse a single component in different contexts with different cooperating components. Java packages are not generic. Packages hard-wire their context dependencies (imports) by referring to other concrete packages. Thus, there is no explicit support for the reuse of packages in different contexts or with different compatible dependent packages. Even though references to other packages are speci?c, the Java runtime environment offers possibilities to adjust the “linking context” so that a different implementation of a cooperating package is chosen at load-time; for instance by modifying the class path or by using special class loaders [163]. Such hacks are statically unsafe and therefore do not provide acceptable alternatives for genericity.

3.1.3 Extensibility
Besides modularity and genericity, extensibility is another important property. As the previous chapters pointed out already, extensibility is important because in general, independently developed components do not ?t off-the-shelf into arbitrary deployment contexts. They ?rst have to be adapted to make them compliant

3.1 The Java Package System

53

with a particular deployment scenario. Apart from this, extensibility is an essential requirement for facilitating the evolution of software. Software evolution includes the maintenance and extension of component features and interfaces. A typical software evolution process yields different versions of a single component being deployed in different contexts [132]. Extensibility is also required when developing families of software applications [158, 27]. For instance, software product-lines [99, 205] rely heavily on a mechanism for creating variants of a system which share a common structure but which are con?gured with possibly different components. Java supports the development of extensible software only on a very low level by means of class inheritance and subtype polymorphism. Extensibility has to be planned ahead through the use of design patterns, typically derived from the AbstractFactory pattern [74]. Furthermore, extensibility can often only be achieved by using type tests and type casts due to the lack of appropriate means to re?ne abstractions, in particular types and classes, covariantly. In general, such techniques circumvent the static type system and are therefore dangerous to apply in practice. With Java’s late binding mechanism and its support for re?ection, developing open software which can be extended with plug-ins is relatively easy. Again, this has to be planned ahead and allows only extensions in a restricted framework [132]. For writing applications that are open for extensions that have not been anticipated, often complicated programming protocols have to be strictly observed. An example for such a protocol is the Context/Component design pattern described in Section 4.2.1.

54

Static Component Evolution with Extensible Modules

3.2 The Programming Language Keris
Keris2 extends the programming language Java with an expressive module system that aims at facilitating the development of extensible software components [213, 211]. With the module abstractions of Keris, it is possible to give concrete implementations for concepts like design patterns, libraries, applications, or subsystems. All this is done in a completely extensible fashion, allowing to re?ne existing software or to derive new extended software from existing pieces. To keep software extensible, Keris promotes programming without hard links which are frequently found in Java programs in form of class instantiations or accesses to static methods or ?elds. The module system of Keris is designed to ?t smoothly between Java’s class and package level. With support for true modules, the package system is now mainly used to structure the module name space. Of course, it would easily be possible to add module name space management facilities to the module abstractions of Keris if backward compatibility to Java would be irrelevant.

3.2.1 Declaring Modules
In Keris, modules are the basic top-level building blocks of software supporting separate compilation as well as function and type abstraction in an extensible fashion. In general, modules depend on functionality provided by other modules. In Keris, such context dependencies are speci?ed explicitly. A module can only be deployed in contexts that meet such dependency requirements. We now present a small example that de?nes a module SORTER which provides functions for reading a list of words, for sorting, and for printing out lists. Throughout this chapter we write module names in capital letters to distinguish them clearly from class, method, and variable names. A formal grammar describing the syntax of Keris can be found in Appendix B.
module SORTER requires INOUT { String[] read() { ... INOUT.read() ... } void write(String[] list) { ... INOUT.write(list[i]) ... } String[] sort(String[] list) { ... } }

Explicit context dependencies. The header of the module declaration states explicitly that module SORTER depends on functionality provided by another module INOUT. Within the body of a module it is possible to access the members of the own module as well as all the members of modules that are declared to be
Keris is a double edged dagger originating in the Javanese culture. It was considered a magical weapon, ?lled with great spiritual power. Today it is an object of reverence and respect, symbolizing strength and safety.
2A

3.2 The Programming Language Keris
SORTER
INOUT
implements

55

CONSOLE
String[] read(); void write(String[] list); String[] sort(String[] list); String read(); void write(String str);

INOUT
String read(); void write(String str);

(a) Sorter module

(b) Console module

Figure 3.1: Schematic illustration of modules SORTER and CONSOLE.

required. Members of modules are generally accessed by qualifying their member names with the corresponding module reference. This distinguishes requirements from imports of many traditional module systems which make members of other modules accessible so that they can be used in unquali?ed form. Some module systems have both forms; e.g. Modula-2 [206] supports regular imports as well as quali?ed imports which correspond to our requirements. Module members. In Keris, modules encapsulate functions, ?elds, and class abstractions, like interfaces and classes. The syntax for module member declarations corresponds to the Java syntax used on the class level. Later we will see that modules may also contain submodules. Module interfaces. Regarding the previous listing, it remains to show a speci?cation of module INOUT. We do this by de?ning a module interface that speci?es the signature of this module. Such a module interface does not contain program code, it only lists all members provided by concrete implementations of this module together with their corresponding types.
module interface INOUT { String read(); void write(String str); }

We will now de?ne a module CONSOLE that implements this interface and thus is a possible candidate for being used in conjunction with module SORTER.
module CONSOLE implements INOUT { String read() { ... System.in.read() ... } void write(String str) { System.out.println(str); } }

56

Static Component Evolution with Extensible Modules

This module implements the functions read and write by forwarding the calls to appropriate methods of the standard Java API for text in- and output on a terminal. An alternative implementation for INOUT based on functionality provided by a third module LOG is given by the following program:
module LOGIO implements INOUT requires LOG { String read() { ... System.in.read() ... } void write(String str) { LOG.log("log: " + str); } }

The previous example code shows that a module interface can be implemented by many modules. On the other side, a module implementation can implement many module interfaces. Note that modules are not explicitly required to implement a module interface. This is not strictly necessary since every module implementation implicitly de?nes a module interface of the same name containing all exported module members. Nevertheless, the separation of module implementations from interfaces is an important mechanism that is essential to enable separate compilation of recursively dependent modules. Some module systems, e.g. Oberon’s module system, provide means to support separate compilation without separating module interface de?nitions from module implementations, but this works only for modules without recursive dependencies. Access control. Beside separate compilation, explicit module interfaces are also important as a facility for hiding concrete representations of module members. Furthermore, they can be used as a vehicle for presenting different views on a single module implementation. This gives the programmer more ?exible control over access rights to module members than Java does with its ?xed scoping levels expressed with access modi?ers like public, private, and protected. The only access modi?ers Keris supports on the module level are public and private. Module members which are not explicitly tagged with an access modi?er are considered to be public. Only the public members get exported and can therefore be accessed from clients of the module. Figure 3.1 illustrates some of the modules de?ned so far. Module illustrations consist of two parts: one part shows both the module to de?ne (the topmost box) and the required modules (the boxes on the left side). The other part lists all the other module members. We use the convention that boxes refer to modules and rounded boxes refer to module interfaces.

3.2.2 Linking Modules
Before discussing the module composition mechanism, we have to stress the distinction between modules and module instances. A module can be seen as a “template” for multiple module instances of the same structure and type. We have

3.2 The Programming Language Keris

57

to differentiate between the two, since we want to be able to deploy a module more than once within a software system. For instance, we could have two different instances of the SORTER module that are linked together with different INOUT module instances. Hierarchical composition. In Keris, modules are composed by aggregation. More concretely, a module does not only de?ne functions and variables. It may also de?ne module instances as its members. These nested module instances, we also call them submodules,3 can depend on other modules visible in the same context. The following de?nition for module APP links module SORTER with module CONSOLE by declaring both to be submodules of the enclosing module APP.
module APP { module SORTER; module CONSOLE; void main(String[] args) { String[] list = SORTER.read(); list = SORTER.sort(list); SORTER.write(list); } }

Submodule de?nitions start with the keyword module followed by the name of the module implementation. The enclosing module aggregates for every submodule de?nition an instance of the speci?ed module. Thus, in the example above, module APP aggregates two module instances SORTER and CONSOLE. A submodule can only be de?ned if its deployment context, given by the enclosing module, satis?es all the requirements of the submodule. The requirements of a submodule are satis?ed only if all modules required from the submodule are either provided as other submodules, or if they are explicitly required from the enclosing module. The program above de?nes two submodules SORTER and CONSOLE. Module SORTER requires a module instance INOUT from the deployment context, CONSOLE does not have any context dependencies. The module de?nition of APP is wellformed since it de?nes a CONSOLE submodule that implements INOUT, and therefore provides the module that is required by the SORTER submodule. Note that module CONSOLE is only present in module APP for that reason. Module APP does not refer to members of CONSOLE directly. Figure 3.2(a) illustrates the structure of module APP. The submodules of APP are displayed as nested modules. The wiring of the submodules, which is implicit in Keris programs, is made explicit with an arrow from the implementing module to the requirement.
use a terminology here which is not fully consistent with the one on the class level. Submodules denote nested modules and have nothing to do with subclassing. The motivation for naming nested modules submodules comes from nested modules modeling subsystems. Our terminology is consistent with other module systems supporting hierarchical compositions of modules.
3 We

58

Static Component Evolution with Extensible Modules

LOGSORTER APP
SORTER SORTER INOUT INOUT String[] read(); void write(String[] list); String[] sort(String[] list); LOG CONSOLE String read(); void write(String str); void main(String[] args); LOG String read(); void write(String str); String[] read(); void write(String[] list); String[] sort(String[] list);

LOGIO

(a) Executable module APP

(b) Module LOGSORTER

Figure 3.2: Schematic illustration of modules APP and LOGSORTER.

Unresolved context dependencies. Similarly to the previous code, we could try to link module SORTER with module LOGIO, as shown by the following de?nition of BUGGYAPP.

module BUGGYAPP { module SORTER; module LOGIO; }

E

A veri?cation of the context dependencies reveals that this module declaration is not well-formed. LOGIO requires a module instance LOG which does not get declared within BUGGYAPP. Since we want BUGGYAPP to be parametric in the cooperating module LOG, we have to abstract over the LOG instance by requiring it from the context. This has the effect that inside of the module body we are now able to refer to a module instance LOG without actually giving a concrete de?nition. Therefore the following de?nition of module LOGSORTER is well-formed. Figure 3.2(b) gives a schematic illustration which shows that for LOGSORTER, all requirements of submodules are resolved.

module LOGSORTER requires LOG { module SORTER; module LOGIO; }

3.2 The Programming Language Keris

59

Discussion. As the previous examples show, modules get composed by hierarchically aggregating submodules. A module that hosts a set of submodules is only well-formed if it satis?es the context requirements of all of its submodules. A module satis?es the requirements of a submodule if modules required from that submodule are either present in form of other submodules, or are explicitly required by the host module, or are subsumed by the host module itself. This hierarchical composition mechanism has the advantage that the static architecture of a system becomes explicit. Furthermore, module composition does not require to link modules explicitly by specifying how context dependencies are satis?ed at deployment time. Instead, the module interconnection gets inferred. With this approach we avoid linking modules by hand which can be a tedious task that raises scalability issues [212]. On the other hand, our inference technique only succeeds if we avoid ambiguities; i.e. our type system has to ensure that references to module instances identify modules uniquely in every context. One implication of this is that a module can never de?ne or require two nested module instances (submodules) that implement the same module. If this would be the case, a simple module name could not identify a module implementation unambiguously anymore. A system like this is reminiscent of classical module systems for imperative programming languages like Modula-2 or Ada. Such module systems allow only one implementation for each module globally, whereas Keris has this restriction only locally for every module context. Globally, there are no restrictions, allowing systems to include as many instances of a single module as required. An intuitive motivation for only allowing a single implementation of a module per context is that it would be redundant to have identical module instances providing exactly the same services. Of course, with regard to genericity and the possibility to encapsulate state, this argument is weak. But we will see later, that these limitations can be easily overcome by using module specializations introduced in Section 3.2.6. This mechanism makes it possible to de?ne different, specialized versions of a module together in the same context. Furthermore, it is, of course, always possible to introduce nested modules for instantiating multiple instances of a single module. The type system, discussed in Section 3.2.8, ensures that such multiple instances are used in a consistent, non-con?icting manner.

3.2.3 Accessing Modules
Accessing submodules. The code of module APP, presented in the previous section, shows that aggregated submodule instances are accessed just like required modules simply via the module name. For accessing exported submodules of other modules one has to use the :: operator. For instance, the expression SYSTEM::APP::SORTER accesses the SORTER module instance, which is a submodule of APP, which is a submodule of SYSTEM. Such expressions are also called module paths.

60

Static Component Evolution with Extensible Modules

With the :: operator it is possible to access partial units of other modules. There is a whole body of literature on programming style that tries to motivate why it is good “only to talk to your immediate friends, and never to strangers” [117, 116, 114]. This principle is often called the Law of Demeter [115]. It states that every unit should only communicate directly with its immediate neighbors in a system. The :: operator allows programmers to break this principle. On the other hand, Keris programs that do not make use of the :: operator conform to the Law of Demeter on the module level. Here, modules only access services of other modules that are either explicitly required or aggregated. This restriction, and the Law of Demeter in general, is comparable to the communication integrity property stated by Aldrich, Chambers, and Notkin [145, 5, 6]. Importing modules. For accessing members of deeply nested module instances, it is not very practical to require that one always has to fully qualify module instances by pre?xing them with the right module path. Therefore Keris offers a mechanism to shorten module paths by importing nested module instances. Here is an example:
module SYSTEM { module APP; import APP::SORTER; void main(String[] args) { SORTER.write(SORTER.sort(args)); } }

In this program, module SORTER nested in module APP gets imported in the body of SYSTEM so that one can refer to this module instance simply via the unquali?ed module name. There is no need anymore to pre?x this name with module APP. An import statement is only legal if it does not create ambiguities. It really must be seen as a means to introduce shorthands for modules, since it will never have an in?uence on the inferred wiring of submodules. Similar to Java’s import mechanism on the package level, Keris offers a second import form which imports all members of a speci?ed module instance. This “star-import” statement is used in the following program, which is otherwise identical to the listing above.
module SYSTEM { module APP; import APP::SORTER.*; void main(String[] args) { write(sort(args)); } }

3.2 The Programming Language Keris

61

Keris’ lookup procedure for module members ?rst searches the current module and only if no member is found there, it proceeds by searching all “startimported” modules. If a member is found in more than one “star-imported” module, this is considered to be an ambiguity. Ambiguities have to be resolved by the programmer, who has to explicitly qualifying the member name with the corresponding module instance.

3.2.4 Initializing Modules
Executing modules. Modules without context dependencies like module APP from the previous section can be executed if they de?ne a main method with the following signature: void main(String[] args). When a module is execute, a module instance gets created and the main method of this instance is invoked. Initializing modules. A Keris module implementation can de?ne an arbitrary number of module initializers which initialize variables and perform other sideeffects. Similar to the class initialization process of Java, Keris modules are initialized lazily right at the time one of the module members gets accessed for the ?rst time.4 This approach can provoke cycles in the module initialization process if module initializers of mutually dependent modules refer to each other recursively. In Keris, this problem is resolved in the same way Java handles static initializers in classes: The cycle is broken dynamically, making it possible that a module accesses in its initializer a recursively dependent module which is only partially initialized. For uninitialized variables this means that they refer to some default value. Here is an example for a lazy module initialization:

module MAIN { module OPTIONS; void main(String[] args) { if (args.length > 0) OPTIONS.parse(args); } }

module OPTIONS { HashMap options = new HashMap(); { // initialization code options.put("-silent"); } void parse(String[] args) { ... } }

In this program, submodule OPTIONS only gets initialized if the length of array args in the main method of module MAIN is greater than 0. Otherwise module OPTIONS does not get accessed and therefore the initializers will not be executed. The module remains uninitialized in this case.

Keris, accessing a submodule of a module does not trigger execution of this module’s initializers.

4 In

62

Static Component Evolution with Extensible Modules

Controlling initialization. Sometimes, the designer of a module wants the module to be seen from the outside world as an atomic unit even though internally it is constructed by aggregating several submodules. One can achieve this by declaring the submodules to be private, and thus hide them from external clients. This means does not go far enough, if the designer of such a compound module even wants the module to get fully initialized at the same time. Since submodules are initialized lazily, initializing the compound module does not imply the initialization of all submodules. To address this issue, Keris allows the programmer to interlink the initialization of a submodule with its host module such that whenever the host module is initialized, initialization of the submodule is triggered automatically. In Keris such a coupling can be expressed by tagging a submodule de?nition with the synchronized modi?er. All synchronized submodules get initialized before the host module in the order they are declared, as long as module dependencies do not enforce a different order dynamically. Synchronized submodules are needed when the initializer of a submodule performs side-effects that are supposed to take place before the host module initializer is executed. They are also helpful to control the initialization of submodules whose initializers depend recursively on each other. Here, a wrong initialization order might result in a submodule initializer accessing an uninitialized ?eld of another partially initialized submodule. See Section 3.4.1 on page 98 for a discussion of related issues.

3.2.5 Re?ning Modules
We now come to the problem of extending modules. Since we do not want to break code that uses existing modules, we are neither allowed to touch binaries nor the source code of existing modules. In short, extensibility has to be additive instead of being invasive. Non-invasive extensions. Keris has support for non-invasive extensions through a module re?nement mechanism. This mechanism allows programmers to re?ne an existing module by providing new functionality or by overriding existing functionality. The re?ned version of a module is backward compatible to the original module in the sense that it can be substituted for it. Thus, Keris lifts the notion of compatibility between classes expressed by a subtyping relation to the more coarse-grained level of modules. Creating re?nements. We now present a re?nement of module SORTER that provides a more ef?cient implementation for the sort function. In the example below, we use a merge-sort technique for sorting. Apart from a new implementation of sort which overrides the existing implementation, we also de?ne various other helper functions. One of them is declared to be private, which hides it

3.2 The Programming Language Keris

63

from clients of the module. Such functions do not get exported and can only be used internally.
module FASTSORTER refines SORTER { private String[] sub(String[] list, int start, int end) { ... } String[] merge(String[] first, String[] second) { ... } String[] sort(String[] list) { return (list.length < 2) ? list : merge(sort(sub(list, 0, list.length/2)), sort(sub(list.length/2, list.length))); } }

Module FASTSORTER is a re?nement of module SORTER. Sometimes we also call SORTER the parent module of FASTSORTER. A re?nement inherits the implemented interfaces and the member implementations, including all submodules, from its parent module. Note that it also inherits all the context dependencies. For the example above, this actually means that module FASTSORTER requires an INOUT module. Note that the set of required and aggregated modules has to be disjunct. Otherwise, a reference to a module that is both required and aggregated would be ambiguous. It is therefore not possible that a re?nement aggregates a required module of the parent module. Similar to the re?nement of module implementations and their implicit interfaces, it is also possible to re?ne plain module interfaces like INOUT. Overriding members. Keris uses, like most object-oriented languages, overriding as the primary means to modify members of modules and classes. The overriding rules for module members are almost identical to the rules of Java. In particular, it is not possible to override variables, and method overriding is invariant in the parameter types. As opposed to Java, method overriding is covariant in the result type; i.e. an overriding method may have a result type that is a subtype of the result type of the overridden method. This overriding rule is reminiscent of the rule used in GJ [32]. Covariant return type specializations are often extremely useful when writing extensible software. In particular, they allow one to re?ne factory methods of AbstractFactories so that one can access newly introduced functionality of the created objects without using type casts. Return type specializations are also important when it comes to implement a clone method for a hierarchy of classes [35]. In Keris, module re?nements see the module they re?ne as gray-boxes, as opposed to clients that see a module as a black-box with a well-de?ned interface. Thus, module re?nements can freely access and override private members of the parent module. The private modi?er only indicates that a module member does not get exported and consequently can only be used internally — this includes the use in re?nements which do not logically constitute a different module, but

64

Static Component Evolution with Extensible Modules

can rather be seen as amendments yielding a new version of an existing module. Members which are not supposed to be modi?ed by re?nements have to be declared final. Overriding submodules. So far, we only saw how to re?ne the functionality of atomic modules. Such re?nements are non-invasive; i.e. they do not affect existing code. The question is now, how to integrate re?nements, like the more ef?cient sorting module, into a system that already makes use of the old SORTER module? Since systems are represented by modules, it is probably not surprising that this is done again with a re?nement. As explained before, Keris promotes programming without hard links. Following this idea, we allow overriding submodule declarations in module re?nements. The following code re?nes the executable module APP by covariantly overriding submodule SORTER.
module XAPP refines APP { module FASTSORTER; }

The re?ned module XAPP replaces the nested module implementation SORTER with module FASTSORTER. Consequently, the inherited main method now refers to the FASTSORTER submodule. In fact, we can now access the FASTSORTER submodule via both module names, SORTER and FASTSORTER. The only difference is that when accessed via FASTSORTER, we can refer to the new functions. The ability to re?ne a module interface stepwise to allow different access levels is called incremental revelation [44]. Interface ascription. Submodules may be overridden by re?nements, but it is not possible to replace a submodule with a compatible version which is not a re?nement. For instance, it is not possible to override the submodule CONSOLE in re?nements of APP with alternative implementations of module interface INOUT, even though SORTER just requires an arbitrary INOUT implementation. Keris offers a facility to constrain the interface of a submodule M to a single implemented module interface I . Syntactically this is expressed with the following submodule declaration form: module M implements I . Only members of the designated module interface I may be accessed by the host module or other submodules. This arti?cial restriction of a module implementation to one of its interfaces is called interface ascription.5 It allows a submodule to be overridden with a module that implements this interface but which is not necessarily a re?nement. Here is a reformulation of module APP which makes it possible to replace CONSOLE with an alternative INOUT implementation.
term interface ascription was chosen following the module system terminology of ML which supports a similar concept called signature ascription.
5 The

3.2 The Programming Language Keris

65

module APP { module SORTER; module CONSOLE implements INOUT; ... }

module XXAPP refines APP { module GUI implements INOUT; } module GUI implements INOUT { ... }

Discussion. The previous examples demonstrate that the module assembly and re?nement mechanism is not only restricted to the extension of atomic modules. It also allows programmers to extend fully linked programs, represented by modules with aggregated submodules, by simply replacing selected submodules with compatible versions. There is no need to establish module interconnections again; the fully linked program structure is reused and only the submodules and functions to replace or add have to be speci?ed. This extensibility mechanism features plug-and-play programming. It neither requires source code, nor touches existing binaries. After having re?ned application APP with module XAPP it is still possible to run the old application APP. It would even be possible to assemble a system that makes use of both modules without risking unpredictable interferences.

3.2.6 Specializing Modules
Re?nements vs. specializations. Re?ning a module is the process of extending a module by adding new functionality or by modifying existing functionality through overriding. A module re?nement yields a new version of an existing module. This new version subsumes the old one; i.e. it is backward compatible to the old version. As a consequence, it is always possible to replace a module with one of its re?nements. In the following code, module BUGGYMOD aggregates a submodule that subsumes another submodule. In other words, BUGGYMOD de?nes a context in which two different versions of one module are present. Since references to submodule SORTER are ambiguous within module BUGGYMOD, the program is ill-formed and rejected by the Keris compiler.
module BUGGYMOD requires INOUT { module SORTER; module FASTSORTER; }

E

As Section 3.3.2 will motivate, we would sometimes like to reuse a general module implementation when de?ning a new module with a a more specialized structure and functionality and use different specializations side by side in the

66

Static Component Evolution with Extensible Modules

same context. This process of creating new distinct modules which reuse the definition of an existing module is called module specialization. It is similar to module re?nement in that it is based on inheritance; it is different, because specializations yield new modules compared to re?nements which only create new versions of existing modules. While re?nements are backward compatible to their original module and therefore can safely replace original module instances, specializations represent new, independent modules which do not subsume their original module. This is also why different specializations may be used jointly in the same context. Creating specializations. As an example, we de?ne a specialization of the SORTER module in the following code. Module SETSORTER implements a set semantics for sorting and is due to this change in semantics not implemented as a re?nement of SORTER.
module SETSORTER specializes SORTER { String[] filterDuplicates(String[] list) { ... } String[] sort(String[] set) { return super.sort(filterDuplicates(set)); } }

Module SETSORTER inherits all members and requirements from SORTER and de?nes two new functions. Function filterDuplicates can be used to ?lter out duplicate entries in lists. Function sort overrides the corresponding function in SORTER, but its implementation is still able to refer to the former implementation via the keyword super. As a specialization of SORTER, module SETSORTER is not required to be backward compatible to module SORTER. One of the consequences is that module specializations in general are free to break invariants or contracts established by their parent module without risking erroneous deployments. Such risks get ruled out technically because module specializations do neither subsume their parent module, nor do they automatically implement any of the module interfaces that are implemented by their parent module. Therefore, it is not possible to substitute an instance of the parent module with a module instance of one of its specializations. This restriction turns SORTER and SETSORTER into completely different modules. Moreover it is perfectly legal to de?ne a module with both a SETSORTER and a SORTER submodule, like in the following program.
module SORTING requires INOUT { module SORTER; module SETSORTER; }

3.2 The Programming Language Keris

67

While module re?nements promote the substitutability of modules, module specializations support the notion of conceptual abstraction on the module level [174]. Conceptual abstraction refers to the ability to factor out code and structure shared by several modules into a common parent module which gets specialized independently into different directions. The specializations represent distinct modules that cannot be substituted for the common parent module [52]. Rewiring modules. Often, mutual referential modules have to be specialized at the same time consistently. The ability to refer to a specialized version of a module requires that we are able to specialize context dependencies as well. This “rewiring” is expressed in the following code using the as operator. The as operator allows programmers to replace a module with one of its specializations. In the following code, the MYSORTER module specializes module SORTER and instead of requiring the original INOUT module, it now refers to a specialized MYINOUT module instance.

module MYSORTER specializes SORTER requires MYINOUT as INOUT { ... }

Of course it is also possible to specialize submodule de?nitions if required. We could, for instance, specialize module APP of Section 3.2.2 and use MYSORTER instead of SORTER. Note that for doing this correctly, we also have to specialize module CONSOLE, otherwise we would break a link established in the parent module. Here, module SORTER’s required INOUT module is wired to submodule CONSOLE which implements INOUT.

module MYAPP specializes APP { module MYSORTER as SORTER; module MYCONSOLE as CONSOLE; } module MYCONSOLE specializes CONSOLE implements MYINOUT { ... }

This example shows that module specializations cannot break arbitrary contracts established in parent modules. To ensure type safety, modules linked in parent modules still have to be linked in module specializations. A more complete example for module specializations and the rewiring of modules by specializing context dependencies and submodule de?nitions can be found in Section 3.3.2 on page 90.

68

Static Component Evolution with Extensible Modules

3.2.7 Class Abstractions
Until now we only considered modules with function and variable members. With these modules, Java’s static variables and static methods get super?uous. Static class members can easily be implemented as module members with the bene?t of extensibility and improved reusability. Even though functions on the module level can be quite useful to model global behavior, it is more common for object-oriented languages to have modules that contain class de?nitions. Classes de?ned in a module can freely refer to other members of the module as well as to modules required from the enclosing module. The following module de?nes a class for representing points.
module SPACE { class Point { int x, y; Point(int x, int y) { this.x = x; this.y = y; } int getX() { return x; } int getY() { return y; } } }

Module systems for Java-like programming languages that allow to abstract over classes are not only dif?cult to handle in theory, they are also extremely dif?cult to implement in practice if one wants to stick to Java’s compilation model. In such module systems, classes can, for instance, extend classes of required modules for which only the interface might be given. Consequently, at compile-time, a compiler has to translate the class without knowing its concrete superclass. This raises implementation issues, but also more fundamental questions, e.g. about the possibility to create cycles in the inheritance graph or about methods that override superclass methods accidentally. Ancona and Zucca discuss problems related to this trade-off between class abstraction and implementation inheritance in greater detail in [11]. Separating interfaces from implementations. Since Keris is designed to be fully compatible with Java, including full support for Java’s compilation model, while being implementable on the standard Java platform, Keris does not offer a facility for abstracting over regular classes. Instead, it introduces the notion of class ?elds as an alternative type and class de?nition and extension facility. The design philosophy of Keris follows one of the most important features of module systems, which is the separation of interfaces from implementations, and regards regular classes as implementations of abstract data types. Interfaces, on the other hand, are regarded as nominal signatures of abstract data types. Class ?elds are

3.2 The Programming Language Keris

69

abstractions that connect interfaces with implementations. As opposed to classes and interfaces which are considered to be static and immutable, Keris allows one to abstract over class ?elds. Furthermore, class ?elds are extensible; they can be covariantly overridden in module re?nements and specializations. The rest of this section will discuss Keris’ class abstractions in detail. Class ?elds. In most object-oriented languages class de?nitions introduce many entities at the same time. They de?ne a type, a default implementation of that type, and a constructor which creates new instances of this implementation. The notion of inheritance allows types and class implementations to be reused in the de?nition of new types and new class implementations, but the original entities itself are usually immutable at the language level. So changes to these entities have to be carried out destructively, possibly raising consistency issues. In support of extensible class abstractions, Keris introduces the notion of class ?elds. A class ?eld is a class abstraction which separately de?nes an interface and an implementation. While the interface speci?cation typically refers to a set of regular Java interfaces, the class ?eld implementation consists generally in a reference to a regular class. The following example code de?nes an interface, a class, and a class ?eld within a single module POINTS. The de?nition of interface IPoint within module POINTS shows that interfaces in Keris can also specify the signature of constructors, in contrast to regular interfaces in Java where this is not possible. In addition to interface IPoint, module POINTS declares a class CPoint which implements IPoint and therefore can be used to represent concrete point objects. Finally, a class ?eld Point is de?ned by separately specifying its interface and implementation.
module POINTS requires INOUT { interface IPoint { IPoint(int x, int y); int getX(); int getY(); Point move(int dx, int dy); } class CPoint implements IPoint { int x, y; CPoint(int x, int y) { this.x = x; this.y = y; } int getX() { return x; } int getY() { return y; } Point move(int dx, int dy) { return new Point(x + dx, y + dy); } } class Point implements IPoint = CPoint; Point root() { return new Point(0, 0); } void print(Point p) { INOUT.write(p.getX() + "/" + p.getY()); } }

70

Static Component Evolution with Extensible Modules

In general, the de?nition class T implements I1 , ..., In = C de?nes a class ?eld T which implements the interfaces I1 to In with class C . More precisely, each class ?eld declaration introduces a new nominal type which is a subtype of all the implemented interfaces I1 , ..., In . Furthermore, it speci?es a default implementation C which is used to represent instances of the new type at runtime. The implementations of the functions print and root show that class ?elds are used just like regular classes: They denote types, they can be instantiated, and members of corresponding objects can be accessed. The main difference to regular classes is that class ?elds are virtual and therefore can be covariantly overridden in re?ned modules. Covariant overriding of class ?elds includes the extension of the set of implemented interfaces as well as the ability to specify new class ?eld implementations. On the other hand, class ?elds do not support implementation inheritance — a feature which is only supported by regular classes. Therefore, class ?elds must be rather seen as abstractions that complement classes than abstractions that fully replace them. Overriding. The next example program explains how class ?elds can be overridden in module re?nements. In this program, re?nement COLORPOINTS declares that class ?eld Point now also supports the IColor interface and is implemented by the CColPoint class. Furthermore, print is overridden to include the color in the output. At this point, one might wonder what happens to method root of the original module POINTS which instantiates class ?eld Point. In fact, for the re?ned module which inherits this method, root now returns a colored point since class ?eld Point is overridden in the re?nement.
module COLORPOINTS refines POINTS requires COLOR { interface IColor { void setColor(COLOR.Color col); COLOR.Color getColor(); } class CColPoint extends CPoint implements IColor { COLOR.Color col = COLOR.black; CColPoint(int x, int y) { super(x, y); } void setColor(COLOR.Color col) { this.col = col; } COLOR.Color getColor() { return col; } } class Point implements IPoint, IColor = CColPoint; void print(Point p) { super.print(p); INOUT.write(" col = " + p.getColor()); } }

3.2 The Programming Language Keris

71

Type re?nement. The ability to covariantly re?ne types (or class ?elds in our case) is essential for extending object-oriented software. Most object-oriented languages support interface and implementation inheritance. But inheritance alone does not support software re?nement or software specialization well. Existing code refers to the former type and often cannot be overridden covariantly in a type-safe way to make use of the extended features. For special cases like binary methods, some languages support the notion of self types [36, 35, 151]. But these are not suitable for mutually referential classes that have to be re?ned together to ensure consistency [59]. Here, only virtual types are expressive enough [96, 199, 58, 124]. Unfortunately, virtual types rely in general on dynamic type-checking. Therefore recent work concentrated on restricting the mechanism to achieve static type safety [200, 34]. A formal account of type-safe virtual types is given in [152], which introduces a calculus of classes and objects with abstract type members. Class ?elds are statically type-safe in Keris. This is mainly due to the nature of module re?nements: A re?ned module subsumes the former module and cannot coexist with the former module in the same context. It rather replaces the former module consistently in explicitly speci?ed contexts. Module specializations do not compromise type safety either, since they conceptually yield new modules with class ?elds that do not necessarily have a (subtype) relationship with the original class ?elds in the parent module. The explicit separation of interface and implementation de?nitions does not only promote modular programming, it also helps enormously to evolve software more ?exibly, because modi?cations on the level of interfaces do not impose anything on the implementation level, and vice versa. It is possible to safely extend the interface of a class ?eld without changing its implementation (e.g. to expose more functionality to clients), but it is also possible to extend or even fully exchange the class ?eld implementation without modifying anything on the interface level. As we will discuss in Section 3.4.6, this strict separation of interfaces from implementations also eases hot swapping of modules which might cause a dynamic change of class ?eld implementations. Opaque class ?elds. It is possible to leave out an implementing class when de?ning a class ?eld. Class ?elds without implementations are called opaque. They are similar to abstract methods which only de?ne a signature and defer the concrete implementation. Modules with opaque class ?elds or abstract functions have to be declared abstract. Abstract modules cannot be deployed and their only function is to serve as a parent module for re?nements and specializations. In this sense, module interfaces are abstract by default. Opaque class ?elds are mainly used within module interfaces to de?ne new class types. To illustrate this, we reformulate the POINTS example by cleanly separating the module interface from its implementation. This example also nicely shows that by separating the

72

Static Component Evolution with Extensible Modules

module interface from the implementation, we can express that the POINTS abstraction does not rely on an INOUT module itself. It is, in fact, only the concrete implementation that has this context dependency.
module interface POINTS { interface IPoint { IPoint(int x, int y); int getX(); int getY(); Point move(int dx, int dy); } class Point implements IPoint; Point root(); void print(Point p); } module SIMPLEPOINTS implements POINTS requires INOUT { class Point implements IPoint = { Point(int x, int y) { ... } int getX() { ... } int getY() { ... } Point move(int dx, int dy) { return (dy == 0) && (dy == 0) ? this : new Point(x+dx, y+dy); } } Point root() { return new Point(0, 0); } void print(Point p) { INOUT.write(p.getX() +"/"+ p.getY()); } }

Anonymous class ?eld implementations. In module SIMPLEPOINTS we use an anonymous class declaration to provide a concrete implementation for class ?eld Point. An anonymous class declaration consists of a block de?ning class members. This block can optionally be preceded by a reference to a super class. The use of anonymous classes is sometimes necessary to give the self reference this the right type. In anonymous classes, this is given the type of the corresponding class ?eld. In the example above, it is therefore possible for method move of class ?eld Point to simply return this if both its parameters are zero. This optimization would have not been possible in our former implementation of module POINTS. Note that in selections of the form this.methodOrVariable we type the self reference this with the anonymous implementation type so that one is still able to access private ?elds and methods.

3.2 The Programming Language Keris

73

Abstract class ?elds. Class ?elds can also be tagged with the abstract modi?er. Like abstract classes, such class ?elds cannot be instantiated. Their main aim is to de?ne extensible types. Since abstract class ?elds cannot be instantiated, there is also no need to specify an implementation. Our terminology follows the one of Java which is unfortunately inconsistent regarding the use of modi?er abstract: Abstract classes possibly de?ne an implementation, whereas abstract methods never provide a concrete implementation. The next paragraph will present an example which motivates the use of abstract class ?elds in Keris. This example refers to class ?eld Shape de?ned in the following declaration of module GEO.
module GEO requires POINTS { import POINTS.*; interface IShape { boolean inShape(Point pt); } abstract class Shape implements IShape; void registerShape(Shape s) { registeredShapes.add(s); } HashMap registeredShapes = new HashMap(); }

Note that this module does not have to be declared abstract. It has indeed a class ?eld without an implementation, but since this class ?eld cannot be instantiated, we can safely omit the abstract modi?er in the module de?nition. Dependencies. So far we explained how to declare and how to evolve class ?elds. The presented mechanism does not allow one to relate different class ?elds to each other; every class ?eld represents an own isolated class abstraction. We need a mechanism similar to subclassing (on the implementation level) and subtyping (on the interface level) that makes it possible to introduce dependencies between class ?elds. For this reason, it is possible in Keris to specify subtype relationships between otherwise unrelated class ?elds explicitly. The following code illustrates this feature.
module SHAPES requires GEO, POINTS { import GEO.*; import POINTS.*; interface IBox { IBox(Point topleft, Point botright); } class Box extends Shape implements IShape, IBox = { Box(Point topleft, Point botright) { GEO.registerShape(this); ...} boolean inShape(Point pt) { ... } } }

74

Static Component Evolution with Extensible Modules

In this program we refer to module GEO de?ned in the previous paragraph. GEO de?nes an abstract class ?eld Shape. As we mentioned already before, abstract class ?elds are like abstract classes: They cannot be instantiated but they de?ne a type which can be extended. Module SHAPES de?nes a class ?eld Box which extends Shape. This extends declaration turns Box into a subtype of Shape requiring that Box implements at least all the interfaces implemented by Shape. The type checker has to make sure that it is not possible to link re?nements of GEO and SHAPES where this invariant is broken. Thus, such subtype dependencies between class ?elds promote the consistent re?nement or specialization of class ?eld hierarchies. Here is an example which successfully links modules GEO and SHAPES in the context of module GEOSHAPES.
module GEOSHAPES requires POINTS { module GEO; module SHAPES; }

Imagine, we develop a re?nement XGEO of GEO that adds a new method scale to Shape:
module XGEO refines GEO { interface INewShape { void scale(int factor); } abstract class Shape implements IShape, INewShape; }

We now cannot simply re?ne GEOSHAPES and introduce the re?ned XGEO module in place of the previous GEO module. This would break our dependency, since SHAPES.Box now would not cover the newly introduced XGEO.INewShape interface. Thus, we ?rst have to consistently re?ne SHAPES as well, such that class ?eld Box also implements the new interface XGEO.INewShape. This re?nement can then be linked with XGEO, as the following code fragment shows. The concrete implementation of XSHAPES.Box is irrelevant and therefore left out in the following program.
module XSHAPES refines SHAPES requires XGEO { class Box extends GEO.Shape implements GEO.IShape,XGEO.INewShape =... } module XGEOSHAPES refines GEO { module XGEO; module XSHAPES; }

Note that extends declarations have no implications on the implementation side. For our example this means that Box can be implemented by an arbitrary

3.2 The Programming Language Keris

75

class which can be a subclass of the implementation of Shape, but which is not required to be one. It is possible for two or more class ?elds to depend mutually on each other (yielding a cycle in the extension relationship). In such a case, all recursively dependent class ?elds have to implement the same interfaces — this is a consequence of our requirement that a class ?eld that extends another class ?eld has to implement all the interfaces of this other class ?eld. Since subtyping is antisymmetric in Keris, mutually dependent class ?elds denote the same type. However, they can still have different implementations. In contrast to this, classes (i.e. class ?eld implementations) many not introduce cycles in their inheritance hierarchy. More precisely, a class may not extend other classes that extend already the ?rst class, no matter with what module path the classes are pre?xed. Inheritance and subtyping. Inheritance is a powerful reuse mechanism which is in most statically typed object-oriented languages coupled with subtyping. Sometimes this is unfortunate if one wants to reuse a class without establishing a is-a relationship. For instance, many introductory texts to object-oriented programming motivate inheritance with an example where class Circle inherits from class Point. While on the level of code reuse, this makes perfectly sense, on the level of subtyping, this relationship is questionable: Intuitively, circles are not really specialized points. In Keris it is possible to express the intention of reusing an implementation without implying anything on the typing side. The following module introduces a class ?eld Circle which is a subtype of Shape but which is implemented by a an anonymous subclass of CPoint. In other words, we implement a new subtype of class ?eld Shape by reusing class CPoint which is completely unrelated to Shape.

module MYSHAPES requires GEO, POINTS { import GEO.*; import POINTS.*; interface ICircle { ICircle(int x, int y, int r); } class Circle extends Shape implements IShape, ICircle = CPoint { int radius; Circle(int x, int y, int r) { super(x, y); radius = r; } boolean inShape(Point pt) { ... } } }

From the viewpoint of static typing, class CPoint is also unrelated to Circle. The anonymous subclass of CPoint acts as the implementation of class ?eld Circle so that at runtime, instances of Circle are actually instances of the anonymous subclass of CPoint. Thus, dynamically, there is a strong relation between the

76

Static Component Evolution with Extensible Modules

two.6 This relationship is hidden statically to promote extensibility by allowing programmers to exchange implementations easily with compatible ones. Opposed to this example where subtyping is decoupled from inheritance, Keris also supports the more common approach where types and implementations are re?ned simultaneously, as the following program will exemplify.
module NUSHAPES requires GEO, POINTS, MYSHAPES { import GEO.*; import POINTS.*; import MYSHAPES.*; interface IRing { IRing(int x, int y, int ro, int ri); } class Ring extends Circle implements IShape,ICircle,IRing = Circle { int innerRadius; Ring(int x, int y, int ro, int ri) { super(x, y, ro); innerRadius = ri; } Ring(int x, int y, int ro) { super(x, y, ro); innerRadius = 0; } boolean inShape(Point pt) { ... super.inShape(pt) ... } } }

In this program, we de?ne a class ?eld Ring which depends on the class ?eld Circle.7 It is implemented by an anonymous class that inherits from the implementation of class ?eld Circle. Two things have to be noted: 1. When a superclass of an anonymous class ?eld implementation refers to another class ?eld, this anonymous class actually inherits from the implementation of this other class ?eld. 2. It is only possible to inherit from the implementation of another class ?eld if there is also a dependency on the typing side. Since types de?ned by regular classes are never subtypes of class ?eld types, the second restriction also excludes that regular classes inherit from class ?eld implementations.
there is no difference between a class ?eld and the class ?eld implementation. Therefore it is no coincidence that Keris uses the = letter to specify class ?eld implementations 7 This example shows that it is not always a good idea to put constructors into regular interfaces. A consequence is that dependent class ?elds have to implement such constructors as well. This can make sense, but often it does not. Therefore Keris allows interfaces to be tagged with the modi?er constructor. Such constructor interfaces are collections of constructor signatures. They do not de?ne a type and therefore do not have to be supported by depending class ?elds.
6 Dynamically,

3.2 The Programming Language Keris

77

The second restriction is necessary for type safety reasons. Since the self reference inside of an anonymous class ?eld implementation is typed with the corresponding class ?eld type, it has to be guaranteed that it can only be subclassed by classes where the self reference is assigned a subtype. Otherwise we could re?ne the original class ?eld independently breaking our new class ?eld implementation (where the self reference is typed with the re?ned class ?eld type). In the implementation of class ?eld Circle in module MYSHAPES we do not get this problem, because class CPoint, from which the anonymous class ?eld implementation inherits, is immutable — it cannot be re?ned. Nevertheless, it is important to understand what happens if class CPoint exposes its self reference in some way as an object of type CPoint. Since this code is inherited to the implementation class of Circle, there exists the possibility that at runtime, an object which is statically typed as a Circle, is used as a CPoint. This does not endanger type safety, since Circle objects are anyway instances of a CPoint subclass, but when used as a CPoint it allows access to ?elds and methods that might not explicitly be speci?ed in the implemented class ?eld interfaces. Here one can see that interfaces given in class ?eld de?nitions act as views on the actual class ?eld implementations such that clients of a class ?eld object can only access the speci?ed class ?eld members. Other parts of the system have possibly a different view with different access rights.

3.2.8 Type System
In this section we informally review the basic principles of the type system of Keris. We explain what restrictions have to be made to ensure statically that a system assembled from modules is sound. Furthermore we explain how the type system helps to evolve software consistently. Types. Type systems of Java-like object-oriented languages are usually nominal; i.e. types are identi?ed by their names, not by their structure. Not considering the package system and class nesting, reference types in Java have simply the form C , where C corresponds to a class name. In Keris, on the other hand, classes are typically not de?ned on the top-level, but rather within modules. Since modules can aggregate submodules arbitrarily, a reference type in Keris corresponds to a class name C which is quali

更多相关文章:

非常超级学习网 fccjxxw.com

copyright ©right 2010-2021。
非常超级学习网内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图