Lexical environments: Common Theory

Time:2019-12-2

original text

ECMA-262-5 in detail. Chapter 3.1. Lexical environments: Common Theory.

brief introduction

In this chapter, we will discuss the details of the Lexical Environment – a mechanism that is used by many language applications to manage static scopes. In order to better understand this concept, we will also discuss some other — dynamic scope (not directly used in ECMAScript). We’ll see how the environment manages nested code structures and closures. The ecma-262-5 specification introduces the lexical environment, although it is a concept independent of ECMAScript and is applied to many functions. In fact, some of the technical parts related to this topic have been discussed in the previous Es3 series, such as variables and activation objects, scope chains. Strictly speaking, lexical environment is only more theoretical and abstract than the concept in Es3. But this is an era of Es5, and I suggest these new definitions to discuss and explain ECMAScript. Although, more general concepts, such asActivation record(activation record) (activation object in Es3)Call stack(call stack), etc., have been discussed at the level of low-level abstraction. This chapter is devoted to the general theory of environment, and it will also coverProgramming language theory(Programming Languages Theory). We will implement it in different languages from different perspectives to understand why lexical scopes are needed and how these structures are created. In fact, if we fully understand the general theory of scope, the scope problems in es will disappear.

general theory

The concepts in ES (activation object, scope chain, Lexical Environment) are all related to the concept of scope. The definition mentioned in ES is a local implementation of scope and related terms.

Scope of action

Scope is used to manage the visible rows and accessibility of variables in different parts of the program. some
The abstract concepts of encapsulation (such as namespace, module) are all related to scope, which is used to make the system more modular and avoid the conflict of named variables. Functions have local variables, code blocks have local variables, scope encapsulates the internal data, and improves the level of abstraction. Scope enables us to use the same variable in a program, but represents different meanings and has different values. From this point of view, scope is a closed context in which variables are associated with values. We can also say that scope is the logical boundary of a variable and its meaning. For example, global variables, local variables, etc. all reflect the declaration cycle of this variable. Code blocks and functions let us have properties of one main scope – nesting other scopes or being nested. Therefore, we can see that not all implementations support function nesting, and also not all implementations provide block level scope. Let’s consider the following C code:

// global "x"
int x = 10;
 
void foo() {
   
  // local "x" of "foo" function
  int x = 20;
 
  if (true) {
    // local "x" of if-block
    int x = 30;
    printf("%d", x); // 30
  }
 
  printf("%d", x); // 20
 
}
 
foo();
 
printf("%d", x); // 10

It can be represented by the figure below

Lexical environments: Common Theory

ECMAScript does not support block level scopes before version 6

var x = 10;
 
if (true) {
  var x = 20;
  console.log(x); // 20
}
 
console.log(x); // 20

ES6 standardletKey to create block level variables

let x = 10;
if (true) {
  let x = 20;
  console.log(x); // 20
}
 
console.log(x); // 10

This block level scope can be implemented by anonymous self calling functions

var x = 10;
 
if (true) {
  (function (x) {
    console.log(x); // 20
  })(20);
}
 
console.log(x); // 10

Static (lexical) scope

In the static scope, the identifier points to the closest lexical environment. The word “lexical” in this case refers to the attributes written by the program, the source text of the lexical variables, and the place where the variables are declared. In that scope, variables will be resolved at run time. The word “static” means that the scope of the identifier is determined in the process of lexical analysis of the program. That is to say, before the program starts, we can read the code to determine which scope the variable will be parsed. For instance

var x = 10;
var y = 20;
 
function foo() {
  console.log(x, y);
}
 
foo(); // 10, 20
 
function bar() {
  var y = 30;
  console.log(x, y); // 10, 30
  foo(); // 10, 20
}
 
bar();

In this case, variable x is defined in the global variable – meaning that at run time, it will also be resolved in the global object. Variable y has two definitions. As we said, consider the lexical scope that has the closest variable. The scope of the variable itself has the highest priority. Therefore, in the bar function, the variable y is resolved to 30. The local variable y in the bar function overrides the global variable y with the same name. However, the variable y with the same name is still resolved to 20 in the foo function, even if it is called inside the bar function, and there is also the variable y inside the bar function. In this case bar is a caller of foo, and foo is a call. Because of the location where foo functions are defined, the most recent lexical environment with variable y is the global environment. Today, static scopes have been used in many languages: C, Java, ECMAScript, python, ruby, Lua, etc.

dynamic scope

Dynamic scope does not parse variables in lexical environment, but forms variable stack dynamically. Whenever a variable declaration is encountered, the variable is placed on the stack. At the end of the declaration cycle of a variable, the variable is ejected from the stack. Let’s look at a piece of pseudocode.

// *pseudo* code - with dynamic scope
 
y = 20;
 
procedure foo()
  print(y)
end
 
 
// on the stack of the "y" name
// currently only one value 20
// {y: [20]}
 
foo() // 20, OK
 
procedure bar()
 
  // and now on the stack there
  // are two "y" values: {y: [20, 30]};
  // the first found (from the top) is taken
 
  y = 30
 
  // therefore:
  foo() // 30!, not 20
 
end
 
bar()

Environment calls affect the resolution of variables. It’s not the point

Name binding

In high-level languages, we do not operate on the address, which points to the data in memory. We directly use variable names to refer to those data. A name binding is an association of an identifier and an object. An identifier can be bound or unbound. If the identifier is bound to an object, it points to that object.

Re bind

// bind “foo” to {x: 10} object
var foo = {x: 10};

console.log(foo.x); // 10

// bind “bar” to the same object
// as “foo” identifier is bound

var bar = foo;

console.log(foo === bar); // true
console.log(bar.x); // OK, also 10

// and now rebind “foo”
// to the new object

foo = {x: 20};

console.log(foo.x); // 20

// and “bar” still points
// to the old object

console.log(bar.x); // 10
console.log(foo === bar); // false

Lexical environments: Common Theory

variability

// bind an array to the "foo" identifier
var foo = [1, 2, 3];
 
// and here is a *mutation* of
// the array object contents
foo.push(4);
 
console.log(foo); // 1,2,3,4
 
// also mutations
foo[4] = 5;
foo[0] = 0;
 
console.log(foo); // 0,2,3,4,5

Lexical environments: Common Theory

Environmental Science

In this part, we will talk about the technology of lexical scope implementation. We will operate on more abstract entities and discuss lexical scope. In future explanation, we will use environment instead of scope, because Es5 is also the term, global environment, local environment of functions, etc. As we mentioned, the environment illustrates the meaning of identifiers in expressions. ECMAScript uses call stack to manage function execution. Consider some general models to hold variables. Some things are interesting, systems with and without closures.

Activate record model

If there is no first-class function or internal function is not allowed, the easiest way to store local variables is to call the stack itself. The data structure of a special call stack is called activation record, which is used to save the environment binding. Sometimes called call stack frame. Whenever a function is called, an activation record (containing parameters and local variables) is pushed onto the stack. Therefore, when a function calls another function, another stack frame is pushed into the stack. When the context ends, the activation record pops up from the stack, meaning that all local variables are destroyed. This model is used in C language.
for example

void foo(int x) {
  int y = 20;
  bar(30);
}
 
void bar(x) {
  int z = 40;
}
 
foo(10);

The call stack will change as follows

callStack = [];
 
// "foo" function activation
// record is pushed onto the stack
 
callStack.push({
  x: 10,
  y: 20
});
 
// "bar" function activation
// record is pushed onto the stack
 
callStack.push({
  x: 30,
  z: 40
});
 
// callStack at the moment of
// the "bar" activation
 
console.log(callStack); // [{x: 10, y: 20}, {x: 30, z: 40}]
 
// "bar" function ends
callStack.pop();
 
// "foo" function ends
callStack.pop();

When the bar function is called

Lexical environments: Common Theory

Many similar logical methods of function execution are used in ECMAScript. Then, there are some important differences. The call stack means the execution environment stack in ES, and the activation record means the activation object of Es3. Unlike C, ECMAScript does not remove the active object from memory if it has a closure. When the closure is an internal function, it is used to create variables in the external function, and then the internal function is returned outside. This means that the active object should not be on the stack, but on the heap (dynamically allocating memory). It will always be saved when references to closures use variables in the active object. What’s more, it’s not just an active object that is saved, if necessary, all of the parent’s active objects.

var bar = (function foo() {
  var x = 10;
  var y = 20;
  return function bar() {
    return x + y;
  };
})();
 
bar(); // 30

Lexical environments: Common Theory

If the foo function creates a closure, its frame will not be removed from memory even if the foo execution is finished, because there are references to it in the closure.

Environment frame model

Unlike C, ECMAScript containsInternal functionAnd closure. In addition, all functions are first-class citizens.

First order function

First class functions are treated as ordinary objects, parameters and return values. A simple example

// create a function expression
// dynamically at runtime and
// bind it to "foo" identifier
 
var foo = function () {
  console.log("foo");
};
 
// pass it to another function,
// which in turn is also created
// at runtime and called immediately
// right after the creation; result
// of this function is again bound
// to the "foo" identifier
 
foo = (function (funArg) {
 
  // activate the "foo" function
  funArg(); // "foo"
 
  // and return it back as a value
  return funArg;
 
})(foo);

Function parameters and higher order functions

When a function is taken as an argument, it is called “funarg” – short for functional argument. The function with function as parameter is called high order function, which is similar to the concept of operator in mathematics.

Free variable

A free variable is a variable used in a function. It is neither a parameter nor a local variable of a function. In other words, free variables do not exist in their own environment, but in the surrounding environment.

// Global environment (GE)
 
var x = 10;
 
function foo(y) {
 
  // environment of "foo" function (E1)
 
  var z = 30;
 
  function bar(q) {
    // environment of "bar" function (E2)
    return x + y + z + q;
  }
 
  // return "bar" to the outside
  return bar;
 
}
 
var bar = foo(20);
 
bar(40); // 100

In this example, we have three environments: Ge, E1 and E2, which correspond to the global object, foo function and bar function respectively. Therefore, for bar functions, variables X, y, Z are free variables. They are neither function parameters nor local variables of bar. It is worth noting that the foo function does not use free variables. But the variable x is used in the internal bar function. In addition, the bar function is created during the operation of foo function. Nevertheless, the binding of the parent environment is saved in order to pass the binding of X to the internal nested function. Correct and expect 100. When the bar function is executed, it means that the bar function remembers the environment when the foo function is activated, even if the foo function is over. This is different from the stack based activation record model. When we allow internal functions and want to have static lexical scope and treat functions as first-class citizens, we should save all the free variables needed by functions when they are created.

Environment definition

The most direct and simple way to implement such an algorithm is to save the complete parent environment in which the function is created. Then, when the function executes itself, we create our own environment, save our own local variables and parameters, and then set our external environment to the previously saved one, in order to find free variables there. We use the term environment to refer to individual bound objects, or all bound objects based on the depth of nesting. In the latter case, we call the bound object an environment frame. An environment is a sequence of frames, each of which is a record binding that associates variable names with values. We record it with abstract concept without specifying its implementation structure. It may be hash table in the heap, stack memory, registers of the virtual machine, etc. For example, environment E2 has three frames: its own bar, Foo’s and global. Environment E1 has two frames: Foo’s own and global. Global environment GE has only one frame: global.

Lexical environments: Common Theory

There is at most one binding for any variable in a frame. Each frame has a pointer to the environment around it. The link to the external environment of the global frame is null. The value of a variable with respect to an environment is the value given by the binding of the variable in the first frame in the environment that contains a binding for that variable.

var x = 10;
 
(function foo(y) {
   
  // use of free-bound "x" variable
  console.log(x);
 
  // own-bound "y" variable
  console.log(y); // 20
   
  // and free-unbound variable "z"
  console.log(z); // ReferenceError: "z" is not defined
 
})(20);

A series of environment frames form what we call a scope chain. One environment may contain multiple internal environments.

// Global environment (GE)
 
var x = 10;
 
function foo() {
 
  // "foo" environment (E1)
 
  var x = 20;
  var y = 30;
 
  console.log(x + y);
 
}
 
function bar() {
   
  // "bar" environment (E2)
 
  var z = 40;
 
  console.log(x + z);
}

Pseudo code

// global
GE = {
  x: 10,
  outer: null
};
 
// foo
E1 = {
  x: 20,
  y: 30,
  outer: GE
};
 
// bar
E2 = {
  z: 40,
  outer: GE
};

Lexical environments: Common Theory

The binding of variable x to environment E1 hides the binding of variables with the same name in the global environment.

Rules of function creation and Application

A function is created relative to a given environment. This causes the function object to be composed of the code (function body) of the function itself and a pointer to the environment in which the function itself is created.

// global "x"
var x = 10;
 
// function "foo" is created relatively
// to the global environment
 
function foo(y) {
  var z = 30;
  console.log(x + y + z);
}

Equivalent to pseudo code

// create "foo" function
 
foo = functionObject {
  code: "console.log(x + y + z);"
  environment: {x: 10, outer: null}
};

Lexical environments: Common Theory

Note that the function points to its environment, one of which is bound to the function itself. A function is called, a series of parameters constitute a new frame, in which local variables are bound, and then the function body is executed in the new environment created.

// function "foo" is applied
// to the argument 20
 
foo(20);

Corresponding pseudo code

// create a new frame with formal 
// parameters and local variables
 
fooFrame = {
  y: 20,
  z: 30,
  outer: foo.environment
};
 
// and evaluate the code
// of the "foo" function 
 
execute(foo.code, fooFrame); // 60

Lexical environments: Common Theory

closure

Closures are made up of function code and the environment in which the function is created. Closures were invented to solve the problem of function parameters.

Function parameter problem

When a function is returned outside, what happens if the function uses the free variables in the parent environment that created it?

(function (x) {
  return function (y) {
    return x + y;
  };
})(10)(20); // 30

As we know, lexical scopes hold closed frames in the heap. This is the key to the problem. It’s not possible to stack bindings like C. The saved code block and environment are closures. When we pass a function as a parameter to other functions, how are the free variables in the function parameters resolved, in the scope of the function definition, or in the scope of the function execution?

var x = 10;
 
(function (funArg) {
 
  var x = 20;
  funArg(); // 10, not 20
 
})(function () { // create and pass a funarg
  console.log(x);
});

The key to answering this question is lexical scope.

Combined environment frame model

Obviously, if some variables are not needed by the inner function, there is no need to save them.

// global environment
 
var x = 10;
var y = 20;
 
function foo(z) {
 
  // environment of "foo" function
  var q = 40;
 
  function bar() {
    // environment of "bar" function
    return x + z;
  }
 
  return bar;
 
}
 
// creation of "bar"
var bar = foo(30);
 
// applying of "bar"
bar();

No function uses the variable y, so we don’t need to save it in the closures of Foo and bar. The global variable x is not used in the function foo, but we should still save it because it is required by the deeper internal function bar.

bar = closure {
  code: <...>,
  environment: {
    x: 10,
    z: 30,
  }
}

Note: the following is an example of Python and other language closures