This book will introduce the JS++ type system. The type system in JS++ is one of its most distinguished and unique features. Learning the JS++ type system is critical to mastering the JS++ programming language.
This book covers materials that range from the beginner level, such as declaring typed variables, to advanced design patterns. We try to break down advanced engineering and theoretical concepts as much as we can. Feedback is always welcome; just email me at roger at roger dot net.
Programs contain data. This data can be in the form of numbers, text, JSON, and so on. All data have types, commonly known as the "data type" or just "type" for short. For example, text is often represented with a "string
" type. Numbers may be represented by types like "int
" or "double
".
Why is it important to have types? Without types, all kinds of errors can happen. For example, how would one logically subtract two texts (strings)? It's impossible, and it should be an error. Likewise, functions can be called, but what if the callee was not a function but a number? Again, this is an error.
Generally, there are two distinct typing disciplines: dynamic typing and static typing. In dynamic typing, the types are determined at runtime when you run the application. In static typing, the types are determined at compile time—before the program ever runs. However, there is no such thing as being "untyped." Programs have data, and data have types! JavaScript is dynamically-typed (typed at runtime) while JS++ is statically-typed (typed at compile time). Why is this distinction important? It means - in JavaScript - your program can have type errors at runtime; in JS++, before you ever run your program, all type errors must be resolved.
In JavaScript, programs will simply keep running with errors, causing errors to creep and propogate across the application. Recall that - even in JavaScript - there is no such thing as "untyped" (or no types). Data have types. In dynamic typing, you are simply having to keep track of all the types in your head. This is fine for one or two lines of code, but as applications grow in complexity, this can happen:
var a = 1; var b = "abc"; // Hundreds of lines later... var subtotal = a - b; // NaN document.write("Your subtotal is: " + subtotal); // Your subtotal is: NaN
Yet, JavaScript isn't totally forgiving. Some errors can actually crash the script:
// Simple variable declaration. Innocuous, right? var a = typo; // ReferenceError: typo is not defined console.log("Script started"); // Never runs. Script already crashed with ReferenceError.
In certain static type systems, even though values can change often, the types remain consistent. Thus, even if we generate two random numbers, it does not matter if we are subtracting 1, 2, 3, 5000, etc. The type remains the same: it's always a number type, and subtraction is valid on numbers. Types remaining unchanged can be illustrated as follows:
int x = 1; // Variables - by definition - vary. However, their types always remain the same. x = 2; // Data changes. Type remains the same. x = 3; // Data changes. Type remains the same. x = "a"; // Data changes. We tried to change the type, but there is no valid conversion. Error.
In static typing, errors happen at compile time. Compile-time errors must be fixed; otherwise, the program doesn't compile. If a program doesn't compile, there is no program to run. Therefore, a program that might crash from a type error never actually runs... because it never compiled to begin with.
Even with programs that are highly-dynamic, such as programs that behave differently on each run due to random number generation, the types remain the same. Since the types remain the same, we can apply logic and mathematics to prove that a program is correct and is free of type errors—before it ever runs! This is the essence of static typing. We want full confidence that our Node.js server will not crash from a single typo or that our shopping cart will not produce NaN
(Not a Number) error values before we ship it. Whether you're writing ten lines of code or 100,000 lines of code, static typing allows you to build high-quality applications. It's the same tool that experienced, battle-hardened software engineers use to ship high-quality, commercial and industrial software from banking to rocketry.
We will briefly discuss the notion of logical "soundness." Soundness is based on "truth." In other words, if a type system is sound, the types you declare will never be incorrect. (Note that in JS++, you don't have to declare types, which we will discuss.) If a type system is unsound, you can declare a string
that, at runtime, turns into a number.
The JS++ type checker implements the first type system that enables static typing but is able to consume a dynamically-typed language (in this case, JavaScript) while remaining sound. Thus, a string
will always remain a string
, an unsigned int
will always remain an unsigned int
, and an object based on an Employee
class will always remain an Employee
object. This is what is meant when you hear that JS++ solved the computer science problem of soundly consuming dynamically-typed code from a statically-typed programming language—the types you declare are simply never incorrect and will always be correctly preserved.
Consider the following code:
var func:any = function() {}; var str1:string = func; var str2:string = str1.toLowerCase(); // TypeError: undefined is not a function console.log(str2); // Never runs. Program already crashed from TypeError.
The above code illustrates what can happen when a programming language's type system is unsound. str1
was declared as a string
but became a function
at runtime. Likewise, str2
- which depends on str1
- crashes. The compiler trusted str1
to be a string
even though it wasn't and permitted the method toLowerCase
(a method exclusive to strings) to be called. However, since str1
was actually a function
, which doesn't have a toLowerCase
method, the application crashes at runtime.
In JS++, the declared types are correctly preserved even when introducing the erratic behavior of JavaScript. Whether a feature was implemented differently in one browser to another, an ActiveX was introduced, Object.prototype
was overwritten using V8's C++ API (and thus the overwriting is never exposed to JavaScript itself), the types you declare with JS++ are guaranteed to always remain correct. Because, ultimately, why would you declare thousands of types when your strings can become numbers, functions, or nulls on every tenth run of the application? Soundness matters.
In JS++, you are not required to declare types for your variables and functions. The types are only guaranteed to be correct if and only if you specify a type. If you write your JS++ code with var
and function
as you would in JavaScript, you are not guaranteed to benefit from type safety. However, if you change your var
into a string
, you are guaranteed to always get a string
.
("Adding types" to JavaScript might sound simple on the surface, but this was surprisingly difficult to get right - especially to get it logically sound. See Appendix B which discusses why simply "adding types" to JavaScript was such a difficult problem to solve.)
Ultimately, the hope is that you will gradually and incrementally add types to your JavaScript code. Thus, you can "migrate" a JavaScript program from 0% typed to 100% typed. Especially in complex web applications that exceed one million lines of JavaScript code, having the complete confidence that one change to the type of a single variable or function, guaranteed to always be correct, will empower you to take the first small steps in the daunting task of upgrading complicated code. However, the JS++ type system will benefit developers working on projects large and small by helping to deliver higher quality with some simple changes. Let's get started...
Rather than jumping immediately to the concepts and fundamentals of the type system, this book begins introducing you to the JS++ type system by example.
This book assumes you know how to compile and execute JS++ code.
In this chapter, you will discover how to start adding types to your variables and functions and how to import JavaScript libraries for use in JS++.
In JS++, there are two "kinds" of types: internal types and external types. Internal types are the types exclusive to JS++. External types are any types that are not JS++ types (in most cases, these will be JavaScript types). As we will discuss in later chapters, all external types are "unified" into a single type, the "Unified External Type" or "external type" for short.
By default, the var
keyword will declare a variable whose type is the unified external type. We can declare a basic variable using var
like this:
var x = 1; var y = "abc";
Notice how var
allows different types of data to be stored. This is because the Unified External Type includes all JavaScript types. Since both numbers and strings have equivalent JavaScript types, they can both be valid for var
. In contrast, a float
is a type exclusive to JS++ but has no JavaScript equivalent. Therefore, the following code will produce an error:
var x = 1.1f; // Floats are numbers with an 'f' or 'F' suffix
[ ERROR ] JSPPE5000: Cannot convert `float' to `external' at line 1 char 8
This actually illustrates the JS++ type system's principle of "isolation," which we will discuss more in the next chapter.
Generally, the best practice is to avoid using var
when declaring variables—but only where practical. Recall that JS++ is optionally typed. Everything that is declared with the external type (such as any variable declared with var
) exposes us to runtime risk. We don't want runtime type errors. Ideally, we want to catch as many of them at compile time as possible. Therefore, our first example would be better written as:
int x = 1; string y = "abc";
In addition, to use floats, we would declare the variable as the internal type, float
:
float f = 1.1f;
When declaring variables, var
declares a variable with external type. All other keywords/types will declare a variable with an internal type. For a list of built-in types, see Appendix A.
Once a variable is "typed," all values and data must fit within the specified type's range of valid values. For example, you cannot store a string
inside an int
without getting an error:
int x = "abc";
[ ERROR ] JSPPE5000: Cannot convert `string' to `int' at line 1 char 8
Declaring a basic function is the same as in JavaScript:
function minus(a, b) { return a - b; }
By default, the function
keyword will declare a function whose return type is external. Furthermore, parameters without a type specified implicitly carry an external type. As we know, external types are dangerous because we can call our minus
function like so:
minus(1, 1); // 0, OK minus(2, 1); // 1, OK minus(1, "a"); // NaN (Not a Number)
However, when we begin adding types, it gets more interesting:
int minus(int a, int b) { return a - b; } minus(1, 1); // 0, OK minus(2, 1); // 1, OK minus(1, "a"); // Compile error. Fix or it won't compile.
Now, if we try to subtract a string
from an int
, we will immediately get a compile error before we ever try to run our program.
Even if you try to be clever and declare one of the function's parameters as a string
in order to get the function call on line 7 to pass, the compiler will catch you:
int minus(int a, string b) { return a - b; }
[ ERROR ] JSPPE5000: Cannot convert `string' to `int' at line 2 char 12
[ ERROR ] JSPPE5015: `-' cannot be applied to type `string' at line 2 char 8
Sometimes, functions may have no return value. In these cases, the function should be declared with void
like so:
// Example 1 // No return statement void foo() { // Do nothing } // Example 2 // Return statement that doesn't return a value void bar() { if (true) return; }
Even if a function has an external return type and external parameters, it is still an internal function if it is declared within a JS++ source code file. In other words, the function is more akin to a JS++ function rather than a JavaScript function - it merely has a return type that allows it to return data of any external type and accept parameters of any external type. However, it's still an internal function that possesses the properties of JS++ functions and can be overloaded. In contrast, a function that was declared in JavaScript and was "imported" into JS++ (via the external
statement, which we will discuss in the next section) cannot be suddenly overloaded.
In JavaScript, all functions have an arguments
object which allows developers to access arbitrary arguments. In JS++, this is not the case unless the function's return type is external (declared with function
keyword).
Finally, when declaring function parameters, we have learned that all parameters not specifying a type will default to the unified external type. However, sometimes we want to be explicit that the parameter should be typed with the unified external type. In this case, we use the external
keyword like so:
function foo(external a, external b) {}
This is equivalent to:
function foo(a, b) {}
However, a function's return type cannot be declared with the keyword external
. This was a design decision since JavaScript functions are declared with the function
keyword. If you want a JavaScript function (a function with an "external" return type), declare it with the function
keyword.
As we've previously demonstrated, a simple typo can crash your script or program when you use JavaScript:
// Simple variable declaration. Innocuous, right? var a = typo; // ReferenceError: typo is not defined console.log("Script started"); // Never runs. Script already crashed with ReferenceError.
However, typos can never crash a JS++ program. This is because all names ("identifiers") must be declared in JS++, and the verification of variables, functions, classes, and so forth being declared is done at compile time rather than runtime.
Typically, JavaScript libraries export everything under a single "namespace" to the global scope. For example, jQuery can be accessed under the $
namespace, which is available globally once you include jQuery on your web page. However, we can't just start using $
in JS++ because it hasn't been declared to JS++ yet. Therefore, we use external imports. Essentially, we are "importing" a JavaScript library into JS++ simply by declaring it with the unified external type. We achieve this using the external
statement.
To import jQuery, all we need to write is:
external $;
However, "$
" by itself isn't very readable. jQuery actually exports two identifiers into the global scope: $
and jQuery
. As a quick history lesson, this occurs because - at the time jQuery was developed - the $
identifier was used for many things with no relation to jQuery such as being used as an alias for the DOM API's document.getElementById
. This is why jQuery provides the noconflict method. Nevertheless, we can actually leverage this fact for readability. We can declare both names as an external import:
external jQuery, $;
This is a common idiom in JS++ for importing jQuery and improves readability. The duplication doesn't affect performance. There isn't actually any "importing" going on under the hood. It really is just a fancy declaration (to ensure that all names must be declared) that is used during compile-time analysis, but the code generator will discard all external imports. External imports are a way to declare, "This library comes from JavaScript and is available under __________ name." Remember, there are internal types and external types. Naturally, there are "internal imports" and "external imports." While external imports are done using the external
statement, internal imports are done with the import
statement.
When we "import" JavaScript libraries, we are simply just declaring a name as having the unified external type. The name must be accessible from the global scope.
Those are the only two prerequisites. Once those prerequisites have been satisfied, we can just start using the JavaScript library with types:
external jQuery, $; string url = $("#logo").attr("src");
We will discuss how the above snippet works under the hood in the next chapter once the type system's fundamentals have been fully explained.
There is a simple elegance to the JS++ type system. Despite the complexities of adding types to JavaScript, we've successfully imported and started using jQuery with types in just two source lines of code (excluding whitespace).
As a final tip, if a name is not available under the global scope (such as with complex ad-hoc JavaScript module systems), all you need to do is to write a small script (in JavaScript) to export what you want into the global scope.
Although "importing" is the common use case, the external
statement is not restricted to importing JavaScript libraries to JS++. In fact, any name that can be accessed from JavaScript's global scope can be declared with external
.
Therefore, a very basic DOM program using alert
and prompt
can be built using external
statements:
external alert, prompt; string name = prompt("What is your name?"); alert("Hi, " + name);
The external
statement is useful because A) it declares a name to the compiler so that the compiler does not complain that an undeclared name was used, and B) it gives the name the Unified External Type, which we will learn about in the next chapter.
Now that we have provided some examples of the JS++ type system in action, it's easier to grasp the type system's fundamentals.
This chapter will discuss the fundamental pillars of the JS++ type system and how it all works.
In JS++, there are two categories of types: internal types and external types. Internal types are the types that are built into or defined via JS++. External types are all non-JS++ types, such as JavaScript types.
However, "JavaScript types" is a loaded term. We discuss this in detail in Appendix B but, to illustrate the point, how do we determine the types for the following?
var foo = {}; foo[Math.random()] = 1;
Very simply, the "structure" of the object in the above code changes on every run of the application. This is a contrived example, but Appendix B.2 cites a real-world example from one of the most popular JavaScript libraries where the structure and types can vary at runtime.
Here's another example:
var foo:NodeList = document.getElementsByTagName("*");
In Gecko-based browsers like Firefox, getElementsByTagName
will return HTMLCollection
. However, in all other browser engines, it returns NodeList
. The programmer now has a false sense of security and begins writing code as though they have a NodeList
(or HTMLCollection
), causing runtime errors and crashes. It is further misleading the programmer because he or she ends up consulting the documentation for the wrong type. In most cases, bugs like this can be fixed, but Mozilla has filed this bug as WONTFIX because getElementsByTagName
should return an HTMLCollection
according to the DOM specification. If Mozilla wanted to fix this bug, they would have fixed it in 1999 (when the bug was first filed).
Instead, as software engineers, we've learned that it's best to ignore the implementation details. We call this "abstraction." In the same way, the Unified External Type abstracts all the platform nuances, implementation variations, and corner cases in trying to type JavaScript.
The Unified External Type, essentially, combines all possible JavaScript types into a single type at compile time. This is a compile-time type only. At runtime, it can be a string
, number
, host object type, etc. However, at compile time, it's just one type.
By simplifying this complex problem, the JS++ compiler is able to powerfully reason about your types, your code, and your data. Through simplification, it's not only easy to build a robust type system, it's also much easier for the users (developers) to learn and understand.
The JS++ type system has two "regions": the JS++ region, and the non-JS++ region. The two regions cannot intermingle without special circumstances.
Effectively, JS++ "isolates" JavaScript and code that cannot be considered "type safe." This makes "optional typing" easy to grasp. We are starting with 0% safe regions, and we want to gradually migrate to 100% safe regions. Furthermore, we are essentially isolating the statically-typed regions and dynamically-typed regions (which can be difficult for a compile-time type checker to type).
JS++ has no "any
" type—where the two regions can intermingle without the compiler's knowledge. Type systems that pre-date JS++ typically had an "any
" type to deal with the cases where it would be very difficult to type the JavaScript code. However, this is also why type systems pre-dating JS++ were unsound.
The Unified External Type is not the same as an "any
" type. An any
type would include the types of the statically-typed language and the dynamically-typed language together. Once an any
type is encountered by the type checker, the type system is compromised and becomes unsound; the type checker is no longer able to distinguish between internal and external types. In contrast, the Unified External Type does not include internal types. The Unified External Type effectively "walls off" the external types from contaminating the type system.
Isolation is further expanded to references and closures. If data is "passed" from JS++ to JavaScript, it can be modified by reference. However, if internal types can never be passed by reference (without special rules) to the non-JS++ region, modification never occurs. The types can always be trusted to be correct. Likewise, in the other direction, if data is passed from JavaScript to JS++, we aren't concerned with the runtime type. It could be JSON whose API (and structure) constantly changes. Therefore, we just treat it as one type: the Unified External Type.
However, with primitive types like string
, bool
, and char
, the data for these types are passed by value. JavaScript cannot make modifications that will invalidate these data types.
My colleagues in theory are going to lambast this section, but Facebook also stumbled on a certain observation independently. The observation is that most JavaScript programs are already well-typed. There is logic behind this: at the end of the day, JavaScript still needs to run on a computer.
In order to successfully develop a type system compatible with JavaScript, it's important to understand how JavaScript is written, how JavaScript developers "think," and so on. Developers don't just stop "thinking" just because all variables are declared with "var
". They still can't subtract the strings "a" - "b" and hope to get a meaningful result. Thus, even entry-level JavaScript developers are cognizant of "types."
Furthermore, we have an intuitive grasp of types when reading code—even when type annotations aren't available. For example, look at the following JavaScript code. What do you think the types are for each variable? (Please answer using JavaScript types.)
employeesCount; employeeName; isEmployee;
Fortunately, most popular JavaScript libraries have documented their types. For instance, jQuery has clearly documented that the .attr()
method will return a string
. As promised, let's re-visit the jQuery example from Chapter 2:
external jQuery, $; string url = $("#logo").attr("src");
Since we can be certain we're going to get a string
, we store the value into a string
variable. Even if this weren't documented, we intuitively know that either A) URLs are represented with strings, B) the "src" attribute of an <img> element is going to be a string, or C) all attributes of HTML elements are strings.
What's important to observe here is that most developers can get the types correctly if they just stop and think. Otherwise, var
should be used. When in doubt, leverage the Unified External Type. Type annotations in JS++ are optional.
However, what happens if we get the type wrong?
When you get the type incorrect, it's a logical conversion error, not a type error. Consider how JavaScript is written. When JavaScript developers want data to certainly be a string, what do they do? They convert it to a string:
var foo = bar.toString();
In the same spirit, JS++ will force a conversion on all primitive data types. Strings will always be strings, numbers will always be numbers, and Booleans will always be Booleans because the conversion is forced. (See Appendix A for a list of all primitive "built-in" types.)
Consider the following JS++ source code:
external foo; string bar = foo;
In the above code, the non-JS++ region (typed as external
) intersects with the JS++ region (typed as string
). Since JavaScript's behavior is highly dynamic, we need 100% assurance that - even if we think we're getting a string - we will always get a string. Thus, the compiler will force a conversion automatically for variables declared with a primitive type, and the generated code will look something like this:
var bar = foo + "";
Aside: The above code uses an implicit conversion to string rather than
toString()
and the like because it's generally faster. It's generally faster because - at the JavaScript VM level - we avoid method lookups in case of re-definition, prototype chain lookups, growing the call stack, etc. The JS++ compiler cares about performance. Fortunately, you don't have to worry about performance over readability when writing JS++. The compiler handles that.
Here's a table that quickly illustrates the conversion process. The "incoming" JavaScript value is on the left, and the value it gets converted to in JS++ is listed on the right:
JavaScript Value | Converted Value |
---|---|
"abc" |
"abc" |
"John Smith" |
"John Smith" |
"" |
"" |
1 |
"1" |
59120 |
"59120" |
0 |
"0" |
true |
"true" |
false |
"false" |
Likewise, what about the int
type (which represents a 32-bit signed two's-complement integer)?
JavaScript Value | Converted Value |
---|---|
0 |
0 |
1 |
1 |
1234 |
1234 |
59120 |
59120 |
"abc" |
0 |
"John Smith" |
0 |
"" |
0 |
true |
1 |
false |
0 |
NaN |
0 |
undefined |
0 |
null |
0 |
function(){} |
0 |
{ "a": 1, "b": 2 } |
0 |
As illustrated, when an int
conversion fails, it simply converts to zero. There is no NaN
(Not a Number) as in JavaScript. Why convert to a type's default value (zero) instead of throwing an error? (Hybrid static/dynamic typing like this is known as "soft typing.") Intuitively, you will get the types right most of the time. Runtime errors are not an acceptable trade-off here when most developers will get the types right with surprising accuracy. The general thinking behind runtime contracts is that the errors should be used during development but removed during production. This results in type errors creeping through the system... in production. Meanwhile, if runtime exceptions are being thrown, you will have multiple points of failure, which are not acceptable for production systems. It would be impractical and unacceptable for an application that could potentially have hundreds, thousands, or millions of points where "safe" and "unsafe" code could intersect to potentially throw errors at all the intersection points or have every intersection point wrapped in try-catch
blocks.
Conversions are the compromise. It's where JS++ draws the line between theory and practice. Exponentiating the possible points of failure? No. Try-catch blocks everywhere? No. Conversions catching us when we fail so we can narrow our logical + type errors down to just logical conversion errors? Yes. Conversions feel "natural" to the dynamically-typed language programmer. It's a logical error because you performed a conversion and converted incorrectly. You can either fix the conversion, or, if in doubt, use the Unified External Type. An incorrect conversion from an external to internal type doesn't invalidate the type system's logic because the types remain the same due to forced conversions.
Conversions are a fundamental aspect of the JS++ type system. We've discussed how JS++ divides into "regions" with a JS++ region and a non-JS++ (JavaScript) region. We've also discussed that data can be "passed" between these two regions. However, our discussions have been limited to restricting how data can be passed between these two regions. Interestingly, restrictions can be removed if we can effectively "safeguard" or sanitize the data. Consider a StringArray
class defined in JS++. If this StringArray
is passed to JavaScript, it can fundamentally change the structure, remove methods, and - worst of all - change the underlying array elements to non-string values. All of these changes would be unknown to the JS++ compiler. How do we protect our StringArray
class from JavaScript? What if we had an EmployeeArray
class where the structure of the underlying Employee
class differed from organization to organization? JS++ understands that there would be too much variation in user-defined types constructed with classes. Therefore, JS++ leaves the user to define how the conversions work for these classes. How you want to "safeguard", normalize, convert, or sanitize your StringArray
or EmployeeArray
is completely up to you - who defined the class. If no conversion is defined, objects built with the class simply cannot "cross over" to the non-JS++ region. To revisit the power of simplification via the Unified External Type, you aren't having to define conversions to and from hundreds of possible types; in fact, you only need to define at most two conversions: from the Unified External Type and to the Unified External Type.
All in all, regarding conversions, it's important to remember: JS++ has built-in conversions for built-in data types and enables user-defined conversions for user-defined types (such as classes).
Finally, conversions have little to no performance overhead. They only need to occur once on variable declaration, assignment, or function calls when internal types intersect with external types. The JS++ type system is easy, fast, and very powerful.
As we discovered in the previous chapter, we can use the external
statement to declare any name that we can access from JavaScript with the Unified External Type. We used the example of the DOM API's alert
and prompt
to create a simple program:
external alert, prompt; string name = prompt("What is your name?"); alert("Hi, " + name);
However, many APIs - including the DOM API - may declare many more symbols than just alert
and prompt
. Naturally, this means we can end up with hundreds of external
declarations to use an API. As software engineers, we want to keep our code "DRY" (Don't Repeat Yourself). Therefore, ideally, we shouldn't be including the same external
declarations across multiple source files just because these source files all consume the same API. Instead, we can leverage the JS++ module system:
module BasicDOM { external alert; external prompt; }
We can then import the module like so:
import BasicDOM; string name = prompt("What is your name?"); alert("Hi, " + name);
Now, we get the same effect but we are able to "re-use" our external declarations across many source code files simply by importing them. Declaring a module composed of external
declarations is known as an "External Module" and such modules exist in the JS++ Standard Library (e.g. Externals.DOM
) in order to provide a convenient way to interface with existing JavaScript.
As demonstrated in the previous chapter where we explored the JS++ type system through basic examples, the JS++ programming language was carefully designed as a JavaScript superset.
Existing JavaScript variables and functions (declared with "var
" and "function
", respectively) have the Unified External Type. Since they do not have an "any
" type, JS++ data cannot "cross over" unless there is a valid conversion. Therefore, JS++ seamlessly builds on top of JavaScript. Your existing JavaScript code will continue to work as you incrementally add types.
JS++ fully supports signed and unsigned 8-, 16-, 32-, and 64-bit integer types. This is in stark contrast to JavaScript which has a single number
type and every number is an IEEE-754 double. However, JS++ does support floating point numbers via float
and double
.
byte a; // Unsigned 8-bit integer signed byte b; // Signed 8-bit integer short c; // Signed 16-bit integer unsigned short d; // Unsigned 16-bit integer int e; // Signed 32-bit integer unsigned int f; // Unsigned 32-bit integer long g; // Signed 64-bit integer unsigned long h; // Unsigned 64-bit integer float i; // IEEE-754 single-precision floating-point number double j; // IEEE-754 double-precision floating-point number
This is important because a lot of data in computers are not represented using double
s like JavaScript. For example, JavaScript has to deal with RGB values. RGB values range from 0-255. This is an unsigned 8-bit integer. In JS++, we can use byte
, but, in JavaScript, every number is a double
. Additionally, web applications commonly need to interact with a database. The MySQL database supports signed and unsigned 8-, 16-, 32-, and 64-bit integer types and IEEE-754 floating-point numbers. The numeric type for a SQL column can influence the size of the database, optimization, etc.
Numeric types also have ranges. See Appendix A for the range of possible values for each numeric type. What happens if a number moves out of its type's range at runtime? It wraps. Therefore, if you have a byte
whose range is 0-255, and you increment it to 256, it will wrap around to 0. Interestingly, this is how C, C++, Java, C#, and various other languages work too. For example, consider the following C code:
#include <stdio.h> int main(void) { signed char x = 127; printf("x: %d\n", x); // x: 127 ++x; // Wrapping occurs here at runtime printf("x: %d\n", x); // x: -128 return 0; }
Furthermore, numeric literals can have suffixes to set their type:
double a = 1.1d; // 'd' suffix for 'double' double b = 1.1D; // 'D' suffix for 'double' float c = 1.1f; // 'f' suffix for 'float' float d = 1.1F; // 'F' suffix for 'float' long e = 1L; // 'L' suffix for 'long' unsigned long = 1UL; // 'UL' suffix for 'unsigned long'
There isn't a lowercase "l" suffix for long
like other languages because - with some fonts - it can be too easily confused with the number one (1
). For consistency, there isn't a lowercase "ul" suffix for unsigned long
either.
Note that the float
, long
, and unsigned long
types are "emulated" in JS++ when compiling to JavaScript. Thus, where performance is a concern, these data types should be avoided. However, the other numeric types will compile to performant code.
JS++ supports character types. After all, strings are just an array of characters. Character literals are enclosed with backticks (`), otherwise known as the "grave accent". Characters can only contain a single 16-bit Unicode character ranging from U+0000 to U+FFFF. These code points are supported in both the UTF-16 and UCS-2 encodings, which are the two compatible encodings for JS++.
char a = `a`; char b = `b`; char c = `\u0063`; char d = `\x64`;
Notice that Unicode and hexadecimal escapes are also allowed for character literals.
Callbacks are an important part of JavaScript and JS++. Essentially, a callback is a function that is passed as an argument to another function and will be executed (or "called back") when convenient. A prominent example in JavaScript is setTimeout
where the callback function is called after the specified timeout:
setTimeout(function() { console.log("1000 milliseconds elapsed"); }, 1000);
JS++ takes this further by including "callback types". Effectively, you can restrict callbacks to a certain type or you can store callbacks in a variable. Callback types are straightforward. They have a syntax of:
return_type(parameter_type, parameter_type, ...)
Notice that we only list the types. We do not provide names in the type itself. Here's an example:
int(int, int) plus = int(int a, int b) { return a + b; };
In this example, the callback type is int(int, int)
at the beginning of the code. We can see this is a variable declaration having the type int(int, int)
. On the right side of the variable declaration (on the right of the =
sign), we create an anonymous function expression with the return type int
and two parameters: a
and b
, both typed as int
.
However, since callbacks are usually passed as an argument to a function, they are most useful in function parameters for restricting the types of the input arguments:
external console; void readFile(void(bool) callback) { bool SUCCESS = true, FAIL = false; bool read_OK = /* Do the file I/O work... */; if (read_OK) { callback(SUCCESS); } else { callback(FAIL); } } readFile(void(bool status) { console.log(status); }); readFile(void() { console.log("No status parameter."); }); // Compile-time error
JS++ has support for array types. The arrays in JS++ are not restricted to a specific size. Instead, they can dynamically resize at runtime. Arrays can be declared by including []
after a type:
int[] x = [ 1, 2, 3 ]; string[] y = [ "abc", "def" ];
In addition, JS++ has support for jagged arrays ("arrays of arrays"):
int[][] x = [ [ 1, 2, 3 ], [ 4, 5, 6 ] ];
As we mentioned earlier, JavaScript only supports IEEE-754 double
values for numbers. We cited an example with RGB colors being a collection of unsigned 8-bit integers with values ranging from 0-255. We can use arrays with the byte
underlying type to represent RGB values:
byte[] rgb = [ 0xFF, 0xFF, 0xFF ]; // Hexadecimal: #FFFFFF. Color: White.
At this point, we have learned enough about JS++ to start writing "Typed JavaScript". Typed JavaScript is simply a collection of patterns, idioms, and best practices to leverage the JS++ type system to write JavaScript with types.
JS++ comes with many features. However, you don't need to use all its features. You can simply use JS++ to start writing JavaScript with types ("Typed JavaScript").
JavaScript has six basic types: number, string, boolean, function, object, and undefined. JavaScript's string
and boolean
types map directly to the JS++ string
and bool
types, respectively. JavaScript's null
and undefined
fit into JS++'s void
type. JavaScript's number
type can map directly to the JS++ double
type; however, this is not ideal. double
being used everywhere can be considered a bad practice. Instead, especially if the number is a whole number, use one of the integer types. As a rule of thumb, int
should work in most cases where whole numbers are needed. This leaves us with functions and objects, which we will discuss in this chapter.
JS++ performs "auto-boxing" of primitive types. Essentially, "auto-boxing" involves wrapping a primitive value with its corresponding object wrapper class. For instance, a string
value might be boxed like so:
// Original "abc"; // Boxed. Pseudo-code to aid understanding. This would be optimized away by the compiler. new System.String("abc");
However, the JS++ Standard Library implements standard objects like String
differently from JavaScript. For example, in JS++, the String.match
function never returns null
if no matches are found; instead, it returns an empty array among other fixes. Due to the differences in implementation, the JS++ compiler needs to know: should we "box" a value with JS++ classes or the JavaScript object equivalent?
In fact, you'll find you can't even call methods on primitive values without specifying if you want boxed with JS++ or JavaScript!
"x".toString();
[ ERROR ] JSPPE5038: Failed to auto-box `string'. Did you forget to import `System' or `Externals.JS'? at line 1 char 0
What the JS++ compiler is saying here is that it does not know if the toString
method comes from JS++ or JavaScript. It's very easy to specify. If we want JS++, we should import the System
module:
import System; // Uses JS++ Standard Library "x".toString(); // OK
However, we just want to write Typed JavaScript. Therefore, we should avoid the JS++ libraries. We can use the native JavaScript methods by importing the Externals.JS
module:
import Externals.JS; // Uses JavaScript "x".toString(); // OK
However, be careful when using Externals.JS
as it will basically box all primitive values as external
!
As a rule of thumb, JavaScript objects and JSON should be typed with the Unified External Type.
var x = { a: "a": b: true, c: 1 };
This is for reasons of simplicity. One of the foundations of the JS++ type system was simplifying a very hard problem. As cited in an earlier example, it's very easy for an object's structure to change unpredictably and non-deterministically at runtime:
var foo = {}; foo[Math.random()] = 1;
It's possible to add types to JavaScript objects via the "typed wrapper" design pattern which we will cover later in this book, but it involves using JS++ classes and would fall outside the definition of simply using "JavaScript" with types.
There is no implicit conversion from internal functions to external. Therefore, a manual conversion is necessary if you want to pass functions to JavaScript. This can be achieved using "proxy functions":
external jsFunction; int plus(int a, int b) { return a + b; } function proxy(a, b) { return plus(a, b); } jsFunction(proxy);
Please note that closures cannot be simply "passed" to JavaScript. They would need to be safeguarded such that they no longer bundle their environment; thus, they need to be safeguraded with a proxy function that is a plain function rather than a closure. This can be achieved by passing arguments to the function directly rather than referring to free variables of an enclosing scope. This rewrite of the function will not be performed by the compiler because how you arrange parameters and declare the function will affect how you call it. Therefore, for seamless transition between JS++ and JavaScript, converting closures are left to the developer.
The problem with importing the Externals.JS
module and boxing all primitives with the external type every time we need to access a property or method is that all properties and methods of an external are also external. This is an illustration of the types depending on whether you are using JS++ or Typed JavaScript:
import System; // Uses JS++ Standard Library true.toString(); // 'string'
import Externals.JS; // Uses Typed JavaScript true.toString(); // 'external'
This is not ideal. Fortunately, JS++ comes with a "Convert
" module for explicitly converting external types to internal types. We simply need to import the module and start using it:
import Externals.JS; // Uses Typed JavaScript import Convert; Convert.toString(true); // 'string'
Now we are able to get the JS++ internal "string
" type rather than "external
". This increases the "typed" regions of our code further and helps us to incrementally move towards having fully-typed code.
Here's a quick list of methods available on the built-in Convert
module:
Method | Signature |
---|---|
Convert.toString |
string(external) |
Convert.toBoolean |
bool(external) |
Convert.toByte |
byte(external) |
Convert.toSByte |
signed byte(external) |
Convert.toShort |
short(external) |
Convert.toUShort |
unsigned short(external) |
Convert.toInt |
int(external) |
Convert.toUInt |
unsigned int(external) |
Convert.toDouble |
double(external) |
Convert.toChar |
char(external) |
The built-in Convert
module can also be used for converting internal types. In the following example, we see an unsigned int
does not have an implicit conversion to int
, and we receive a compiler error:
unsigned int x = 1; int y = x;
[ ERROR ] JSPPE5016: Cannot convert `unsigned int' to `int'. A cast is available at line 2 char 8
However, we can force the conversion from unsigned int
to int
by using the Convert
module's toInt
method:
import Convert; unsigned int x = 1; int y = Convert.toInt(x);
OK (0) errors and (0) warnings
The code now successfully compiles.
Please note that the Convert
module will actually generate conversion code when its methods are called. Therefore, in the above code, integer wrapping could potentially occur in order to keep values within the range of the type being converted to.
One of the "features" JS++ removed from JavaScript is Automatic Semicolon Insertion (ASI). This has commonly been cited as a reason to enforce a certain curly brace style whereby the curly brace is always on the same line. To quote Douglas Crockford from the book, JavaScript: The Good Parts:
I always use the K&R style [sic], putting the
{
at the end of a line instead of the front, because it avoids a horrible design blunder in JavaScript'sreturn
statement.
This refers to the following scenarios:
function foo() { return { "success": true }; }
function foo() { return { "success": true }; }
The first snippet of code will return the object. The second snippet will return undefined
. This is due to JavaScript's "Automatic Semicolon Insertion" silently inserting a semicolon at the end of line 2 in the second snippet. Thus, the second snippet might be better visualized as:
function foo() { return; { "success": true }; }
Therefore, to achieve consistency, JavaScript developers often do not have curly braces on a new line. However, in JS++, a statement ends only when a semicolon is encountered. There is no "automatic" insertion of semicolons. As a result, in JS++, you are free to use whichever curly brace style suits you.
Here's a snippet from the JavaScript language standard on Automatic Semicolon Insertion:
There are three basic rules of semicolon insertion:
When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one LineTerminator.
- The offending token is }.
When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.
When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation “[no LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 12.6.3).
JavaScript's rules on Automatic Semicolon Insertion are "anti-engineering." If you don't believe your entry-level engineers can figure out the above, they're bound to make mistakes and your code base will suffer from a lack of consistency and quality. When semicolons are required and predictable, this can never happen.
Object-oriented programming (often abbreviated as "OOP") is a programming paradigm centered around objects. An "object" can be anything: your dog, your co-worker, your house.
JS++ is a multi-paradigm programming language that implements object-oriented programming via classes. OOP with classes can be a powerful way to build, structure, and organize your applications.
The particulars of classes and OOP are outside the scope of this book. Instead, this chapter will only discuss classes and OOP as they pertain to the JS++ type system.
When you declare a class, you are also creating a type. The class can be used as a type following its declaration:
class Employee {} Employee john = new Employee();
JS++ is nominally-typed. In other words, the name of the class determines the equivalence of types and whether a type is a subtype of another. This should be familiar to developers coming from C, C++, Java, or C#.
By default, when a class is declared, it is an internal type. In other words, instances of classes - by default - cannot be passed from JS++ to JavaScript.
However, once we define conversions from the class to the Unified External Type, we can pass instances of the class to JavaScript. The beauty of the Unified External Type is that - through simplification - it makes it easy for programmers: instead of defining many conversions to each possible JavaScript type, you only need to define two conversions for your class: one conversion to JavaScript and one conversion from JavaScript.
A conversion to JavaScript (outgoing) might look like this:
import System; class ListView { private string[] items; // Define conversion from internal 'ListView' type to 'external' explicit external(ListView b) { // Clone the object before sending to external return deepCopy(b); } }
Likewise, a conversion from JavaScript (incoming) might look like this:
import System; class ListView { private string[] items; // Define conversion back to internal type explicit ListView(external e) { // Tampering check. Make sure JavaScript did not modify the object if (!deepEqual(e, this)) { throw new System.Exception("ListView was unexpectedly modified! Could not convert back to JS++."); } // If the object was not tampered with by JavaScript, we can just return it directly. // In a class that allows modifications from JavaScript, you may want to use reflection // to "fix" the object back up. return this; } }
You can have both conversions (an incoming and outgoing conversion to/from external
) in your class. However, we split the code to make it easier to read and distinguish.
In our code above, when we send data to JavaScript, we perform a deep clone first. Therefore, any modifications from JavaScript will not affect the data inside JS++. This is in contrast to sending out a reference or shallow copy, where the references may be modifiable by JavaScript. (You may not absolutely want a deep copy of the data each time you have to convert to external
. For example, for performance purposes, you may be satisifed to sacrifice safety for speed and you only want to pass a shallow copy of the data. This is completely acceptable, and how you implement the conversion logic is completely flexible and up to you.) When we have data incoming from JavaScript, we perform a deep equality check; for example, the deep equality check will ensure our string[]
field ("items
") does not have numbers or undefined values inside it. Altogether, we've created a custom ListView
class in JS++ that can successfully - and easily - work with JavaScript without compromising the integrity of the JS++ type system.
The previous examples illustrated how we can "safeguard" our data. However, since conversions are user-defined, you are free to convert to/from JavaScript however you wish.
We have seen how JS++ is able to remain sound when consuming dynamically-typed code with primitive data types. As illustrated in this chapter, JS++ is actually able to soundly consume dynamically-typed code in a statically-typed, object-oriented programming language that can expand to thousands or millions of custom types through the power of simplification.
In JS++, when inheritance occurs, the derivative class is considered a subtype of the base class. In other words, if Cat
and Dog
both derive from Animal
, both types will "fit" into the Animal
type by nature of their subtyping relationship.
class Animal {} class Cat : Animal {} class Dog : Animal {} Animal cat = new Cat(); // OK Animal dog = new Dog(); // OK
However, in the JS++ type system, conversions are not inherited! Keep this in mind when developing your classes.
In the above example, if the Animal
class defined a conversion to external
, Cat
and Dog
will not be able to convert to external
unless conversions are individually defined for the Cat
and Dog
classes too.
If there is duplicate functionality that you want to inherit, it may be best to refactor the functionality into a method that gets inherited and can be called by all relevant conversion functions.
All built-in data types in JS++ have a corresponding object wrapper class. For example, bool
corresponds to System.Boolean
, string
corresponds to System.String
, and so on. For a full list of built-in data types and their corresponding wrapper classes, see Appendix A.
When the object wrapper class is used as the type, all primitive data values can be implicitly converted to the object wrapper class. This is known as auto-boxing.
import System; System.String s1 = new System.String("abc"); // OK System.String s2 = "abc"; // OK. Auto-boxing occurs.
Auto-boxing is necessary because the JS++ Standard Library implements classes differently from their JavaScript counterparts. Auto-boxing is also optimized by the compiler for maximum performance.
Auto-boxing can only happen for built-in data types with Standard Library classes. It cannot be implemented for user-defined classes. In relation to the type system, object wrapper classes and their corresponding primitive data types carry the same conversion rules. In other words, if the System.String
object wrapper class has a conversion to external
then so does the string
primitive data type. In fact, the primitive data types obtained these conversions from their object wrapper classes via auto-boxing.
This chapter will discuss design patterns and other common patterns for JS++ as they relate to the type system.
The typed wrapper is a wrapper class that uses JS++ classes to retrofit types to JavaScript.
In Chapter 5, "Typed JavaScript", we recommended that all JavaScript objects and JSON should be best typed with the Unified External Type. However, that's only if we restrict our syntax to simply JavaScript "with types". If we're able to flex a bit and take full advantage of JS++, especially with classes, we can achieve even better type safety. For example, consider the following interface for a JavaScript library:
var StringUtils = { isWhitespace: function(s) { /* ... */ }, isNumeric: function(s) { /* ... */ }, trim: function(s) { /* ... */ }, trimLeft: function(s) { /* ... */ }, trimRight: function(s) { /* ... */ }, splitLines: function(s) { /* ... */ } };
An equivalent JS++ class - with types - might be implemented as:
external StringUtils; class TypedStringUtils { public static bool isWhitespace(string s) { return StringUtils.isWhitespace(s); } public static bool isNumeric(string s) { return StringUtils.isNumeric(s); } public static string trim(string s) { return StringUtils.trim(s); } public static string trimLeft(string s) { return StringUtils.trimLeft(s); } public static string trimRight(string s) { return StringUtils.trimRight(s); } public static string[] splitLines(string s) { return StringUtils.splitLines(s); } }
Additionally, all the methods of the typed wrapper can be declared with inline
. Effectively, this will cause the compiler to perform a function inlining optimization so that there would be practically no overhead to using the typed wrapper pattern to retrofit types to JavaScript.
Here's an example usage of the typed wrapper:
TypedStringUtils.isWhitespace(" "); // true
Using the "typed wrapper" design pattern is a powerful way to add types to existing JavaScript code rather than simply declaring everything with the Unified External Type. Effectively, JS++ gives you the best of both worlds: simplicity and convenience with the Unified External Type, and power and flexibility with typed wrappers.
Using typed wrappers, we can gradually migrate our code from 0% type safe to - ideally - 100% type safe.
Array conversions are not automatic. Instead, when an array is incoming from JavaScript, it will need to be manually converted. This usually just involves a simple for
loop to iterate over the array elements of the external array and converting each element to an internal type:
import System; external arr = [ 1, 2, 3 ]; int[] output = []; for (int i = 0, len = arr.length; i < len; ++i) { int el = arr[i]; output.push(el); }
Name | Class | Description | Width | Range |
---|---|---|---|---|
bool | Boolean | Logical Boolean | 8 | true or false |
string | String | A sequence of characters | ||
external | External type | |||
byte | UInteger8 | Unsigned 8-bit integer | 8 | 0 to 255 |
signed byte | Integer8 | Signed 8-bit integer | 8 | -128 to 127 |
short | Integer16 | Signed 16-bit integer | 16 | -32,768 to 32,767 |
unsigned short | UInteger16 | Unsigned 16-bit integer | 16 | 0 to 65,535 |
int | Integer32 | Signed 32-bit integer | 32 | -2,147,483,648 to 2,147,483,647 |
unsigned int | UInteger32 | Unsigned 32-bit integer | 32 | 0 to 4,294,967,295 |
long | Integer64 | Signed 64-bit integer | 64 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
unsigned long | UInteger64 | Unsigned 64-bit integer | 64 | 0 to 18,446,744,073,709,551,615 |
float | Float | Single-precision floating point type | 32 | -3.402823e38 to 3.402823e38 |
double | Double | Double-precision floating point type | 64 | -1.79769313486232e308 to 1.79769313486232e308 |
char | Character | A single 16-bit Unicode character | 16 | U+0000 to U+FFFF |
First, it's important to understand the difficult challenge of adding types to JavaScript. The problem is not as simple as taking the ECMAScript grammar and augmenting it with type annotations. There's a reason that Microsoft (TypeScript, Safe TypeScript), Google (AtScript, SoundScript), and Facebook (Flow) have all collectively attempted this problem and came up short.
Furthermore, it's very important to understand the corner cases. If you can't trust the language designer to have a strong depth and breadth of understanding of JavaScript corner cases, only use the language with extreme caution because, ultimately, one corner case can cause the entire system to need to be fundamentally re-designed. This document does not attempt to be comprehensive, but it will cover several corner cases that other alternatives either didn't know about or haven't thought about.
We start our list of problems with host objects because they can create the most problems. ECMAScript, the language specification that governs JavaScript, defines host objects as:
A host object is any object supplied by the host environment to complete the execution environment of ECMAScript. Any object that is not native is a host object.
Host objects can thus be anything. You can define an object using C/C++ (with its own types - including custom types) and expose it to JavaScript (e.g. via interfaces provided in V8) and that's considered a valid host object. You might be using Node.js, and your database library might be implemented using native code or your cryptography functions might be implemented in native code - all using host objects. You can insert an ActiveX object into the web page, and this is considered valid JavaScript according to the ECMAScript standard. (Contrary to popular belief, Internet Explorer is not the only web browser that allows ActiveX. Legacy Firefox applications may utilize ActiveX via GeckoActiveXObject
.) You might be using Windows Script Host and JScript to automate server administration tasks with host objects. Finally, the most popular implementation of host objects: the DOM (Document Object Model) API. Essentially, host objects are the Wild Wild West of JavaScript; they can do whatever they want whenever they want, and this erratic and unpredictable behavior is perfectly legal and valid according to the language standard!
The scariest part of them all? They can introduce thousands of possible types into JavaScript that are not part of the ECMAScript standard, but are perfectly legal according to the ECMAScript standard. In fact, ActiveX does exactly that. This isn't about IE being IE; this is a conforming implementation! Even without IE, who's to say one of your Node.js developers won't leak C/C++ types into your pristine JavaScript in the future? ECMAScript's typeof
table tells it all:
Type | Result |
---|---|
Undefined | "undefined" |
Null | "object" |
Boolean | "boolean" |
Number | "number" |
String | "string" |
Object (native and doesn't implement [[Call]]) | "object" |
Object (native and implements [[Call]]) | "function" |
Object (host) | Implementation-dependent |
In other words, host objects can introduce whatever types they want into JavaScript's type system.
JavaScript objects allow arbitrary key names with arbitrary values. The only guarantee that you have is that the key is a string. The value can be any value having any type - including host object types.
The problem with type checking JavaScript objects is that 1) the key names are arbitrary, and 2) the value types are arbitrary. If at least one were consistent, type checking JavaScript objects would not be an issue. Let's examine how this problem might surface in real code:
var players = {}; // ... var tokenId = Math.random() * 100; var tokenValue = players[username][tokenId];
... And that's it. That's all it takes to fool the current generation of type checkers. Let's examine this:
The type checker sees "players
". It can prove that "players
" is always a JavaScript object. No problems here.
The type checker sees "players[username]
". This is where it falls apart. If the user is not logged in, this value could be undefined. However, if she is logged in, the corresponding value for this key looks like it might be a nested JavaScript object. The only way to know the type for the value associated with players[username]
is to actually execute the code.
Even if we just assume "players[username]
" is always an object, we trip up again. Once again, we have dynamic property names. Assuming we guess the property name correctly, how can we get the type of the associated value? What if it came in from a web service? What if the API authors decide to change the return types for the REST call?
Of course, this is all hypothetical in order to illustrate the point. Here's a real-world example from Chart.js, which has nearly 20,000 GitHub stars and over 5,000 forks. Here's the exact snippet we're concerned about:
var computeDimension = function(element,dimension) { if (element['offset'+dimension]) { return element['offset'+dimension]; } // ... }
In this code, for a very popular library, this is clearly a utility function for computing offsetX
or offsetY
for an arbitrary DOM element. However, dimension can actually be any arbitrary input. (Although "Width" and "Height" are expected.) There's safeguarding code in the conditional checking for the existence of the property before returning, but, in the eyes of the type checker, this makes no difference. If we supply the expected argument ("Width" or "Height"), we will get a numeric return value; however, if we supply anything else, the value is most likely undefined
. Since the type checker never runs code, it cannot know what the return type here may be. It can be number
, undefined
, or the property may have been a user-defined "offset-foo" which returns a host object type. Quite a lot of code in the wild does have dynamic user-defined properties on DOM elements. Even if we can determine that no such user-defined property exists in the current library, we have no way of guaranteeing that another library we may use now or in the future will not define such a property.
JavaScript is an example of a highly-dynamic language. It's very difficult to solve this problem until you acknowledge that JavaScript is highly-dynamic.
Here is an example of the type of code that gets written every day in JavaScript that can cause problems:
function getMessage() { if (messages.length > 0) { return messages.pop(); } } var message = getMessage(); if (message) { console.log(message); }
You have to get inside the "mind" of the JavaScript developer. In this case, she is leveraging JavaScript's lax rules on "truthiness". The function getMessage
will typically return a string popped from an array. However, if there are no messages available, the default function return value is returned: undefined
. Thus, getMessage
has two return types: string
and undefined
. At line 7, we call the function and store its returned value in the variable message
. At line 8, this is where the average JavaScript developer exploits truthiness. Since undefined
or an empty string would be considered "falsy" values in JavaScript, the developer decides it would be more concise to just let the if
statement coerce the value based on its truthiness. Thus, there is no incentive to actually return an empty string - it's just extra code and isn't as aesthetically pleasing as exploiting truthiness. The safer (but more verbose) version of this code will look like this:
function getMessage() { if (messages.length > 0) { return messages.pop(); } else { return ""; } } var message = getMessage(); if (message !== "") { console.log(message); }
... But code is never written this way because JavaScript developers think verbosity is "too much like Java."
Even with truthiness, you eventually get a mish-mash of spaghetti code with conditionals ensuring a variable is not undefined. JavaScript is almost never written in a way where every variable is checked for existence, checked for correct type, etc. It would just be too much code. Thus, you open up your business logic to risks like this:
function getMessage() { if (messages.length > 0) { return messages.pop(); } } // Hundreds of lines of code later var message = getMessage(); var messageType = message.charAt(0); // First character denotes message type processMsg(messageType, message);
Recall that getMessage
has two possible return types: string
and undefined
. This code runs fine if getMessage
returned a string. However, undefined
has no method named charAt
. Thus, if getMessage
returned undefined
, line 10 will crash with a TypeError
: Cannot read property 'charAt' of undefined. This will bring down the script. In the case of Node.js, this will bring down the entire server. Thus, the call to processMsg
at line 11 never runs — because the application has already crashed at line 10 from a TypeError
.
The next problem is AJAX, specifically XMLHttpRequest
. If your only experience with AJAX is via jQuery then you never really got to see what was underneath the hood. Here's what it looks like:
function GetXmlHttpObject() { var xmlHttp = null; try { xmlHttp = new XMLHttpRequest(); } catch (e) { try { xmlHttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); } } return xmlHttp; }
The ActiveXObject
s in the above code are how AJAX worked underneath the hood in Internet Explorer. If you were thinking, "Well, my code would never use ActiveX!" You're wrong; you're already using it.
What are the possible return types for the GetXmlHttpObject
function above? null
and several host object types: DOM API XMLHttpRequest
instance and ActiveX types. Furthermore, AJAX can return some of the following types: text (DOMString
), JSON (JavaScript objects, see Section B.2 above), binary data, implementation-dependent types (e.g. moz-blob
), etc.
Do you know how real-time streaming works? I don't mean at a high level; I'm talking about at the level of the implementation?
Here's some retired code:
if (window.XDomainRequest) { // Generate unique ID for URL so IE will load page correctly var bustcache = ""; for (var i=0; i < 16 ; ++i) { bustcache += (~~(Math.random() * 11)).toString(16); } var xdr = new XDomainRequest(), byteOffset = 0, padFlag = false; xdr.open("GET", servlet + "?serve=xdr&pad=ie8&bustcache=" + bustcache + (queryString ? ("&" + queryString) : "")); // Disconnect event handler xdr.onload = function() { typeof _this.disconnect == "function" && _this.disconnect(); }; xdr.onerror = function() {}; xdr.onprogress = function() { if (!padFlag) { padFlag = true; byteOffset = xdr.responseText.length; return false; } _this.callback.apply( _this.caller || arguments.caller, [ xdr.responseText.substring(byteOffset) ] ); byteOffset = xdr.responseText.length; }; xdr.send(); }
What's peculiar here is the XDomainRequest
. Fortunately, it's not an ActiveXObject this time; however, it's still non-standard. It was used in IE8 and IE9, and later removed in favor of a standards-compliant implementation of CORS (cross-origin resource sharing). Real-time streaming is not the only use of XDomainRequest
, but if you wanted to successfully stream from a resource that would otherwise violate the same-origin policy (a fundamental browser security policy), XDomainRequest
needed to be used along with CORS.
If you're doing real-time streaming, there's non-standard code polluting your code base. Do you know what else might be inside real-time streaming libraries? A decade ago, what we now call "real-time streaming" used to be known by a buzzword: Comet. It was a buzzword like "AJAX." (A decade ago... time flies; have the TypeScript designers even written JavaScript for that long?) Once again, circling back to the "Well, my code would never use ActiveX!" crowd:
// IE6 htmlfile if (window.ActiveXObject) { transportDoc = new ActiveXObject("htmlfile"); // ... setInterval(this.temp, 10000); // Stop garbage collection from killing htmlfile }
Your Comet library may or may not be using the "htmlfile hack" for IE5 and IE6. (You can achieve real-time streaming in IE 5/6 without it.) As you can see, there is once again an ActiveXObject
. Just like with AJAX, if you're doing real-time streaming, there could be ActiveX in your code.
Another peculiarity is the hack around Internet Explorer's garbage collector. It's possible transportDoc
can be a host object type, or it can undefined
. However, this was a bug in the browser implementation - specifically, in the garbage collector. How do current-generation type checkers catch this? If we simply perform static analysis (e.g. using Facebook Flow's type inference logic), we never actually see that "transportDoc
" can ever be undefined
- because it never appears in the code itself and, therefore, cannot be logically deduced.
The previous code to enable real-time streaming in IE6 actually looks more like this:
// IE6 htmlfile /*@cc_on @if (@_jscript_version >= 5) if (window.ActiveXObject) { transportDoc = new ActiveXObject("htmlfile"); // ... setInterval(this.temp, 10000); // Stop garbage collection from killing htmlfile } @end @*/
The code is not commented out. The comments are for the Internet Explorer JScript pre-processor, which we used to ensure the code will only execute in IE6. (See MSDN documentation on conditional comments.)
How do the current generation of type checkers reason about conditional compilation? This is basically pre-processor code that can completely modify the program behavior and semantics.
That means - somehow, somewhere - out there in your sea of legacy code, there could be conditional compilation. There's a good chance you're not going to catch it if you're using human-fed type annotations (e.g. Microsoft TypeScript) because the developer may not know about conditional compilation and think that the code was simply commented out. In his efforts to deduce the types in your sea of legacy code, he deduces incorrectly, and now your entire system is compromised and your type checker cannot be trusted. One error is all it takes.
... Which brings us to the next point: legacy code.
It's easy to get trapped in microcosmic thinking. If your day job involves always what's "new" and what's "cool," it's easy to forget that a large majority of developers don't enjoy such luxuries.
There are Fortune 500 companies with over 1 million lines of JavaScript code. This bears repeating: over 1 million lines of JavaScript code! This code has usually been worked on for over a decade by multiple software developers (that may no longer even be with the company) on a multi-million dollar budget.
It's not just any JavaScript either; it's JavaScript that's designed to work for legacy web browsers. Corporate intranet applications, web applications, SharePoint applications, and so on. The enterprise software development motto is: "If it ain't [broken], don't fix it!" They have no incentive to migrate to the latest, cutting-edge web browser. There's a reason there are still COBOL systems alive and strong today.
Some of these applications might even fall under the category of "mission-critical" applications: for instance, a major insurance company using web forms (for internal staff use) to view and modify customer details. Any disruptions - real or perceived - can affect the business's operations. 99% confidence is about as good as 0% confidence in this world - the 1% of false positives or false negatives (e.g. originating from Microsoft's or Facebook's type systems) can disrupt business. In a talk I gave at DeveloperWeek 2016, I made the case that having a false sense of security (e.g. via Microsoft TypeScript or Facebook Flow) is worse than having no security at all.
If you're living on the cutting edge, it's easy to forget that a lot of these developers don't know the modern JavaScript best practices. Thus, it's unreasonable to develop a type system around "well-written" modern JavaScript conforming to best practices. It's too difficult to say with certainty that a code base with 2 million lines of JavaScript will not contain a single corner case.
JS++ gives you enterprise-level reliability. Attempts by Microsoft, Facebook, and Google are aimed at consumers; where a mobile app that works only 60% of the time may be "good enough."
This is also why ECMAScript 6 is such a design debacle: it introduces breaking changes that can break existing legacy code but doesn't provide enough incentive to trigger a migration. A lot of developers aren't interested in WeakMaps or "maximally minimal classes" (read: crippled classes). Furthermore, it requires a compile step when there are superior alternatives. (Because who wants to compile a dynamically-typed language?) Just because it's an ECMA specification doesn't mean migration is forced.
To quote Douglas Crockford, author of JavaScript: The Good Parts:
"So instead of creating classes, you make prototype objects, and then use the object function to make new instances. Objects are mutable in JavaScript, so we can augment the new instances, giving them new fields and methods. These can then act as prototypes for even newer objects. We don't need classes to make lots of similar objects."
Thus, this is completely possible:
function F() {} F.prototype[Math.random() * 100] = 100;
If an object were to inherit from "F", as Crockford discussed, we have no way of analyzing its structure until runtime (because the randomly-generated number is not generated until runtime). In another example, the property name could depend on user input, like the username of the user currently logged in. When we cannot analyze the structure of a prototype, we cannot know if we might be accessing a non-existent member of the prototype or, even if the member exists, what type the member expects.
The current generation of type checkers just assume you are writing code with best practices in mind. However, if we have 2 million lines of legacy code, we can't make that assumption. We also can't predict the future. What if we hire someone that isn't familiar with this one case that could break the type checker (especially when he has to remember the hundreds of other cases that can compromise the type checker)? The entire inheritance chain becomes compromised, and all code depending on any of the prototypes become compromised and untrusted.
In the developer population, there are typically more bad developers than good developers. Even the good developers were once bad developers. It is unreasonable to assume, in the billions of lines of JavaScript in the wild, that not a single developer has extended Object.prototype or String.prototype. In fact, Prototype.js - which used to be one of the most popular JavaScript libraries - did exactly this. Prototype.js is still very relevant in every industry that isn't using the "newest stuff" - especially in enterprise legacy code. In another instance, Douglas Crockford - often considered an authority on JavaScript code style and best practices - recommends extending Object.prototype
with an Object.prototype.begetObject
for prototypal inheritance.
It doesn't end there. Staying true to the spirit of robust design, we can't just assume the worst will never happen. The following code is completely possible in JavaScript:
String.prototype.toString = function() { return 1; }; "abc".toString(); // 1 typeof "abc".toString(); // "number"
This is a contrived example, but it illustrates a point: not only do the prototypes for built-in objects get extended, they can sometimes even be completely overwritten!
In millions of lines of legacy code - possibly shrouded by minification, obfuscation, or even IE conditional compilation - this can be extremely hard to detect. When the prototype members are overwritten, their types can be completely changed. In a language like Microsoft TypeScript, this simple unknown can fundamentally invalidate their entire type system.
This is just one example of "bad practices and bad code." For the sake of brevity, we will not be covering other examples.
ECMAScript for XML, or E4X for short, offered a simpler syntax for working with XML documents. Like ECMAScript itself, E4X was a standard governed by ECMA International. The specification is published as ECMA-357. Here's a look at the syntax:
var doc = <document> <author name="John Smith" /> </document>; var authorName = doc.author.@name;
What's more interesting is that E4X will modify the type system:
typeof <x></x>; // "xml"
E4X also introduces new host objects:
typeof XML; // "function" XML.toString(); // "function XML() { // [native code] // }"
The point to emphasize here is that we can't just dwell on corner cases. There are standards-compliant implementations (in the past, present, and future) that can introduce new types and objects into the ECMAScript execution environment.
E4X is currently deprecated by Mozilla. None of the other major web browsers (IE, Chrome, Safari) implemented it at all - despite its standardization. There are legacy systems relying on E4X for XML handling. How do I know? Because, despite V8 being all the rage, we've used SpiderMonkey and Rhino on the backend before for XML handling. It's just much more intuitive and simpler to deal with XML via E4X.
One of my favorite interview questions to ask is when you might use XML over JSON or vise versa. It's a simple question, and it demonstrates critical thinking and breadth of experience without testing them on some hypothetical they may never encounter on the job. One of the best use cases for XML is when you have lots of source code that would otherwise need to be escaped. Choose a data format:
// JSON { snippet: "string x = \"The quick brown fox\\njumped over\\nthe lazy dog.\";" }
<snippet> string x = "The quick brown fox\njumped over\nthe lazy dog."; </snippet>
These are contrived examples. You can imagine how this can quickly become unwieldy at scale.
Even if we can successfully type check objects and prototypes, they can very easily be modified via their references.
For instance, consider the following code:
var x = { y: "This is a string." }; z(x);
When the function z
is called, x
is passed by reference. If the function z
was defined in JavaScript, the x
object can very easily be modified in ways that are difficult to analyze at compile time: dynamically-generated keys and values, conditional deletion of keys, values of varying types, etc.
In the case of inheritance, the structure of the prototype can be modified such that our static analysis becomes incorrect. This is dangerous so that any prototypal OOP that claims to be analyzed or verified by a compiler cannot be trusted.
In JavaScript, there are very few type restrictions. Thus, it is possible for a variable to have many possible types. Consider how a few simple if
statements can dramatically increase the complexity of our reasoning:
var x; if (maybeTrue()) { if (alsoTrue()) { x = new Foo(); } else { x = new Bar(); } } else { if (yes()) { x = new Baz(); } else { x = {}; } } var y = x; // Without running the code, what type does 'x' have? x.doStuff(); // Does 'Foo', 'Bar', or 'Baz' have a method 'doStuff'? // The empty object doesn't have this method. // Should we raise an error?
The instantiations to different types - depending on the branch - is not straightforward to type check. (See the previous section on prototypal inheritance.) It's very easy to get a false positive or false negative as well. If we are very restrictive, we can require all possible types for 'x' to have the 'doStuff' method; otherwise, we raise an error. Consider that 'Foo' does define a method 'doStuff' - but none of the other prototypes do. If 'x' was instantiated as an instance of 'Foo', line 20 will be correct. We now have a false negative. Furthermore, by adding restrictions to JavaScript code, we cannot guarantee that millions of lines of legacy code will have abided by these "best practices" or restrictions.
There is often a difference between the standards and the implementations of the standards. For instance, in Chapter 1, we discussed how even modern web browsers do not consistently implement the DOM API. The following code passes a type check in Microsoft TypeScript:
var foo:NodeList = document.getElementsByTagName("*");
As previously mentioned, Firefox (and all Gecko-based web browsers) will return HTMLCollection
for getElementsByTagName
. However, in all other browser engines, it returns NodeList
. The programmer now has a false sense of security and begins writing code as though they have a NodeList
(or HTMLCollection
), causing runtime errors and crashes. Whilst in most cases, bugs like this can be fixed, Mozilla has filed this bug as WONTFIX because getElementsByTagName
should return an HTMLCollection
according to the DOM specification. If Mozilla wanted to fix this bug, they would have fixed it in 1999 (when the bug was first filed).
If ever there was a strong case against human type annotations (and especially human-fed type annotations which the type checker should just implicitly trust), such fundamental incorrectness in an official 1.0+ release from Microsoft should be enough.
In another example, Internet Explorer incorrectly implements the type for the this
keyword in DOM event handling—a major aspect of DOM programming. From QuirksMode:
"The event handling function is referenced, not copied, so the
this
keyword always refers to thewindow
."
The popular opinion is to bash Internet Explorer every time "standards vs. implementations" are discussed. However, just about every major web browser has an inconsistent implementation of sorts. For example, on Mozilla Firefox, AJAX calls can return data of type moz-blob
- as usual, this is unaccounted for by Microsoft TypeScript.
It is not uncommon for web browsers to have non-standard extensions to standard objects (prefixed with moz
, webkit
, etc - especially for CSS, which gets exposed to JavaScript via the DOM API). One of the major arguments against trusting human editors for type annotations is that even if they get it 100% right (rare), the API can change tomorrow. In this era of auto-updated web browsers, today you could be using Firefox 39, and tomorrow you'll be on Firefox 40 which just removed the proprietary extensions in favor of standards-compliant implementations. Unless you have full-time staff dedicated to just auditing your type annotations, you're going to get it wrong.
We also discussed in Chapter 1 a subtle nuance of instanceof
:
// A.html var myArray = [ 1, 2, 3 ]; myArray instanceof Array; // true // B.html myArray instanceof Array; // false
Despite myArray
being initialized to an array, myArray instanceof Array
returns false
. However, it returns true
if the instanceof
expression was executed from A.html. This is because the Array
object in A.html is not the same Array
object in B.html. Both web pages have their own unique Array
object. This is the correct implementation. If Array
gets overwritten or extended in B.html, should that occur in A.html too? Thus, it is imperative for a type checker to account for this. The problem extends beyond instanceof
, a type system needs to fundamentally consider whether or not all of the built-in objects (including DOM objects) should have multiple clones. JS++ considered this case and decided on nominal typing (via external
) over structural typing. Array
can be extended in one iframe but not another, and this can be hard to track in dynamically-typed code. Furthermore, Array
and other built-in global objects can be extended in statically-typed external code, such as extension through the V8 API.
Third-party and untrusted code can overwrite all of your assumptions. Did you assume Number.prototype.toPrecision
returns a string
? Well, a library author may have decided to go with his own custom implementation because the library needed it.
What if you were loading jQuery from an external source and the source was compromised? What if - in a more innocuous scenario - you loaded a library from an external source and the library author updated the library? All your assumptions would be invalidated; thus, when dealing with JavaScript, it's best to make no assumptions.
The nature and history of the web makes these types of scenario occur quite often. Lots of websites - large and small - load third-party code quite often. The rise of content delivery networks (CDNs) will see third-party and untrusted code continue to rise—especially since CDNs play an integral role in page load optimization, a critical element of web development, e-commerce, and user experience. Therefore, we should fully expect third-party and "untrusted" code to be a major component of web development.
Given that all the corner cases make analyzing JavaScript very difficult, it's very difficult to rely on static analysis. To further complicate circumstances, these third-party resources are often minified, optimized, and obfuscated. This is especially true in legacy code, where the original source files may no longer be available or may be very difficult to obtain.
Inevitably, a solution needs to be future-proof. It cannot work today and fail tomorrow.
Human-fed type annotations are not future-proof. The API can change tomorrow. Unless you have a full-time auditing team just to watch for changes, you have to be extremely careful. One error in the type system's logic is like an avalanche that can compromise the entire type system because one error in one statement, which is depended upon by another statement, which is further depended upon by several more statements leads to an exponentiating problem.
We have only covered some of the problems with adding types to JavaScript. This document does not attempt to be comprehensive, and there are hundreds of edge cases that are not listed in this document but were considered in the design. There should be more than enough evidence we understand the problem more than anyone else at this stage. For instance, there was no discussion of:
The undocumented ActiveX "unknown" type
Dynamic evaluation (eval
, Function
constructor, and friends)
The scoping semantics of eval
Closures being passed by reference that can modify and invalidate every single variable, function, and class/prototype
Typing exceptions
etc.
This document only scratches the surface of the corner cases that had to be considered.
You now have a sense of how much work was involved in developing a solution. Each proposal was vetted heavily. Did it pass this edge case? Does it work with that edge case? If not, throw it out and start over.
It's easy to think about this problem from a superficial level. This kind of code isn't hard to type check:
var x = 1; var y = "abc"; var z = x - y; // Error. We didn't even need type annotations.
It's once you begin thinking about the problems at a deeper level - which can only be gained from experience - that you begin to recognize the colossal challenge. You would then need to ground that real-world experience with a strong understanding of the theoretical computer science and research. Once again, if you're just going to add types to the syntax, don't you think Microsoft, Facebook, or Google would have solved this by now?
The key takeaway from this chapter is that the next time someone presents a new type system for JavaScript, you only need to ask yourself one question: does he understand the problems with type checking JavaScript? Examples of questions not to ask include: A) "Does it come from a major company like Microsoft, Facebook, or Google?", or B) "Does he have a great understanding of type systems (like Hindley-Milner type inference)?" The former is not a guarantee of quality, and the latter might just end up being a science project.