Depot4 User Manual(3P2)

    previous         next         contents

3.8. Target elements

The result of a translation is called target. Every rule may specify none or several such targets. A target text definition starts with the symbol ->. Its structure can be described by means of nearly the same mechanisms as known already from the source part.
Every target gets a number dependent on the order starting with one. The maximal number of targets is fixed (currently to 8).
Within target definitions no explicit concatenation operator must be used.

Example:

   nonterm = SourcePart
    -> TargetPart1        (* can be referenced as nonterm_ or nonterm_1 *)
    -> TargetPart2.       (* dto. nonterm_2 *)

3.8.1 Targets

The i-th target of a nonterminal or class terminal name can be used by name_i. For i=1 the digit can be omitted. (name_1 = name_) Using a target for which no definition is given (e.g. nonterm_4 in the above example) causes a runtime error message and inserts a <***UNDEFINED TARGET***> text in the generated result.

3.8.2 Literals

Literals are written as strings like in the source part of the rule. Substitutions may also be used. As abbreviations are pointless in the target, the character '$' has no special meaning (and must not be doubled in leading position).

3.8.3 Expressions

It is possible to include arbitrary expressions as long as the translator can determine their types and, in addition, these types are INTEGER, SYM, or TXT. Imported variables and functions are always assumed to be of type TXT. (See pseudo conversions to learn how to overcome problems which may originate from this.)

There is also a conditional expression

   (!boolExpr; targetExpr1 | targetExpr2 ) or
   (!boolExpr; targetExpr1 )

where, depending on the value of boolExpr, targetExpr1 or targetExpr2 (resp. nothing) is inserted.

3.8.4 Source copy

If the type of the source and that of the target are compatible (as in text to text translation), this feature may be applied to copy the source text stretch accepted by a production to the target. Be aware that leading blanks and comments belong to this text usually.

Example:

   nonterm = SourcePart
    -> SOURCETEXT.       (* just copies the source *)

Remark: SOURCETEXT must not be applied if SourcePart contains (direct or indirect) calls to From or Back.

3.8.5 Generic target elements

These elements are essential for the extensibility of generated language processors. The Ml4 translator needs not to know anything about them, because their interpretation is defined elsewhere. Their syntax is:

'<' identifier ' ' {character} '>'

It is the user's responsibility to bind a procedure to that identifier, which will take the character sequence as parameter. If there is no such interpretation, generic elements are ignored during target output.

Example:

   nonterminal = 'IF' expression 'THEN' statement
    -> 'if (' expression_ ')' <'Indentation +'> statement <'Indentation -'>.

This could be interpreted as formatting information.

3.9. Controls

The application of EBNF structures such as iteration or option during the parse of the source text also delivers implicit information (e.g. number of iterations or the selected branch). These data can be made explicit by use of so called controls. In fact, controls are INTEGER variables. There are two modes of operation: control by source and control by value.

3.9.1 Control by source

Control by source is applicable in the source (parse) part of a Ml4 production only.
It means that values resulting from the parsing process, i.e. defined by the structure of the source, are implicitly assigned to special control variables (of type INTEGER).

Implicit control:

In every Ml4 production four variables are declared implicitly:

c: The number of the accepted branch is stored in the variable c. The counting starts with one.
(* An identity mapping; removing heading spaces and comments:*) Digit = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9' -> c-1.
O: The variable O is 1, if a source part enclosed in option brackets appears in the actual source, otherwise its value is 0. In the following example the variable O is used implicitly to transfer information from source to target. In the target part, the character "-" is included in the target only if it is present in the source.
SigndNum = ['-'] Num -> ['-'] Num_.
i, N: The two variables i and N deal with iterations. In the variable N the number of iterations found in the source is stored, i is the loop variable. In the following example N is again used implicitly in souce and target, such that the numbers of iterations in source and target are identical. For the nonterminal Dcl a respective number of instances is needed. These are distinguished using an index with respect to the loop variable i.
DclSequence = { Dcl[i] } -> { Dcl_[i] }.

Be aware that implicitly controlled structures may interact in unexpected ways if they are nested as they may access the same control variables.
The implicit handling of a branch within a repetitive structure is not handled sufficiently.

Example:: The rule
UnexpectedTranslation = { ('1'|'2'|'3') } -> { ('1'|'2'|'3') }.
applied on "12123" will result in "33333". The reason is that the value of (implicit) c is changed in every iteration, and finally set to 3, i.e. its last value is always propagated into the target generation. (The effective handling of such cases will be dealt with next.)

Explicit control:

Alternatively, it is possible to have the appropriate value assigned to a variable. This variable must have been declared explicitly or be one of the implicitly declared ones. Explicit controls are enclosed in slashes /... /. A control must be the first significant entity (after the opening bracket, if any) of its surrounding structure.

For options and alternatives the control consists simply of a variable.
DclSequence = /c/ ident | [/O/ '-'] number.

In iterations, it is possible to choose a loop variable, its start value, and a variable for the total number of iterations (end value). The default case is /i=1..N/. There are several shortcuts:

   /j/               for     /j=1..N/
   /=2/              for     /i=2..N/
   /..M/             for     /i=1..M/
   /j=2/             for     /j=2..N/
   /=3..M/           for     /i=3..M/

Suppress control

Sometimes there is no need to store the values into controls. But even then they are used implicitly and this may cause problems when structures of the same type are nested. Therefore it is possible to suppress any assignment by giving // as (empty) control.

   DclSequence = 'VAR' {// ident {//',' ident}':' type}.

3.9.2 Control by value

Here the flow of information is in the opposite direction, the value of an expression controls the syntactical structure function. Control by value may be implicit or explicit, too. There is only a slight difference in syntax: instead of variables one can write integer expressions (exception: iteration index variable). Control by value is the only mode available in target descriptions. An expression [/v/ 'abc'] acts as if v>0 then generate('abc') .
The following rule produces the text "123456789":

Generate1to9= -> {/..9/ i }.

Control by value can also be useful in the source part. There is a simple way to override the default control by source behavior: Write an expression, which is syntactically different from a simple variable, e.g., by enclosing a variable in brackets.

Example:

NonCFG = {'a'}{/..(N)/'b'}{/..(N)/'c'}.

accepts the famous (non context-free) language aⁿbⁿcⁿ. The first iterator is controlled by source, i.e., the number of 'a' determines the value of N. The subsequent iterators will fail, if the number of 'b' and 'c' is not the same as N.

Another example for exploiting controls and indices is the inversion of a (non-empty) comma separated identifier list:

RevertList = ident[1] {/=2/ ','ident[i]} -> ident[N]{/..N-1/',' ident[N-i]}.

3.10 Parse control

3.10.1 Conditions

Parsing may be controlled by boolean conditions. Evaluating a boolean expression to FALSE has the same effect as missing a symbol in the source text. (This is sometimes called a semantic condition .)
The syntax is

!boolExpr

Thus, the above example could be rewritten as

NonCFG = {'a'}{/..c/'b'} !c=N; {/..O/'c'} !O=N.

(This is also an example of safe re-using the predeclared variables c and O - there are not options or alternatives.)

The condition, in connection with an alternative can be used to simulate a conditional statement in Ml4:

(!boolExpr; src1 | src2 )

is semantically equivalent to (imaginary)

IF boolExpr THEN src1 ELSE src2 FI

It is important that there is always a second (maybe empty) branch even in incomplete conditional statements, i.e. the meaning of IF boolExpr THEN src1 FI must be expressed as (!boolExpr; src1 | ).

3.10.2 Avoidance of backtracking

There is a special operator to avoid further backtracking: !! (without any symbol in between). Preferably, it can be applied after acceptance of unique keywords or the like.
Its use may speed-up parsing and will usually improve error localization. However, this operator changes the global state of the current parser (similar to Prolog's cut), e.g., if it is reached somewhere within an alternative no further branch will be checked. So it should be used sparingly and in high level (i.e. near to the root) rules preferably.

3.11. Declarations

The scope of variables in Ml4 may be local or global. There are also static variables, which will be discussed in 3.14.3.

3.11.1 Local variables

The scope of a local variable is the single production, where it is declared within a VAR-declaration part:

VAR oi, ot: INT; AR: ARR 20 OF SYM;

There must not be more than one declaration part per production.

3.11.2 Global variables

Global variables are used to transfer information between different productions. Their lifetime is, as for local ones, connected with the invocation of the respective production. The difference is that they are seen from any production, which is called directly or indirectly from this production, i.e., during its lifetime.

For global variables, one has to distinguish between declaration and specification of variables:

Declaration is the creation of an appropriate variable instance. (This will possibly shadow an existing variable with the same name.)
Specification arranges the access to an already existing variable. The location of the actual declaration may vary for different elaborations. It is the user's responsibility to ensure that there is always a declaration active if the rule containing the specification is invoked.

There is one global variables part, starting with the keyword GLOBVAR.

Declaration:

DCL count, number: INT;

Specification:

USE count, number: INT;

Declarations and specifications of global variables can be arbitrarily mixed.

The runtime system connects specified global variables with their targets. During translation there is no check for adequate bindings. At runtime it is checked for each specified variable if there a target instance has been declared (otherwise the parser issues an error). The runtime system keeps also track of type compatibility (based on structure equivalence).

3.12. Local substitution

By using the directive LOCAL the implementation of nonterminals can be locally overloaded. (Locally means for the parse tree rooted by the nonterminal containing the directive.) This change influences the actual production and all productions which are called subsequently. When the production is left finally, the initial configuration is reestablished.
For every nonterminal which shall be overloaded the name of a module containing the overloading implementation must be assigned.

LOCAL 'id' <- 'Dp4Stdlex';

I.e., as long as the current production is active, the implementation for the symbol id will be taken from a module Dp4Stdlex.
A module, which is intended for substitution only should have a name different to all its productions.
The module name (on the right of the arrow) must not be that of the module containing the LOCAL declaration.
Additionally, one can substitute a production by a differently named one. For example,

LOCAL 'id' <- 'ident:Dp4ExtLex';

means that id is substituted by ident from module Dp4ExtLex.

3.13. Additional features

3.13.1 Symbolic value generation

Every production can also generate a SYM value - referenced as target number zero ("nt_0").

Example:

  myIdent = ident
   => 'yy' ident_0  (* this is the SYM value, referenced by myIdent_0 *)
   -> 'yy'ident_.   (* ordinary target, used as myIdent_ or my_Ident_1 *)

This is useful mainly when describing typical (class) terminals by means of Ml4.

3.13.2 Parameter

Currently, parameterized productions have not been implemented yet.

3.14. Static environment

A production may have a static environment, consisting of import declarations, type definitions, variable declarations, and initialization. All but the last have to precede the rule. In modules, there is only one common environment.

3.14.1 Imports

Imports allow the linking of Ml4 code with external software, i.e., making it an open system. As Depot4 claims host language independence there are restrictions.
The Ml4 translator assumes that any imported entity is applied correctly. Usually, this requests some knowledge regarding implementation strategies (type mapping etc.) of the respective system. Incorrect application of imported elements will be reported not until by the system's host language compiler.
One can import constants, types, variables, and procedures of the host language.
At translation time, there is no check whether the referenced entities exist or are accessible.

Imports are declared in a list of simple identifiers, e.g., IMPORTS Module1, pack2 . As this does not meet all requirements, there is an additional mapping mechanism. It allows to define for each of these module identifiers a string, which is inserted actually in the host program.
Mappings can be declared locally in an import or globally (e.g. in the configuration). Individual mapping borrows its syntax from Oberon, i.e., it looks like IMPORTS Module1:= 'Module_1', pack2:= 'MyPacks.Pack.p2'.

3.14.2 Type definitions

An environment may contain one type definition part enclosed in TYPE ... TYPEND. Type definitions are of form identifier = type and semicolon separated, where type has to be a valid type description (see 3.4.).
(Types may be defined by use of imported types.)

3.14.3 Variable declarations

Variable declarations in the environment describe static variables, i.e. they come into existence when the implementation is actually loaded and keep their values until the module is replaced eventually.
The syntax of such a declaration is the same as for local variable declaration.

3.14.4 Initialization

Initialization starts with the keyword INIT and follows after the (last) production. It may contain assign statements and procedure calls only. All used variables must be static.
As an example, the identifying message of the Ml4 translator is issued in the init part of its root production:

INIT Dp4OP.WrStr('Depot4: Ml4/Java - Translator 1.9.2 '); Dp4OP.WrStr(Ml4Date); Dp4OP.WrLn();

3.15. Comment format definition

Comment formats of the source may be defined by calling the intrinsic procedure DefCom. It takes two parameters, both of type SYM. Up to four different comment formats can be active at the same time. They are handled in a stack like fashion whereby each entry can be replaced independently from its current position. There is a first match strategy applied, whereas the last defined format (on top of stack) is searched for first, then the last but one, etc.
The many subtile differences in comment formats that happen to occur in real world, make the definition a little clumsy at first look. As nesting is the default, non-nesting formats must always be defined by use of a control sequence.
A comment definition consists of two parts. The first part defines the comment start characters. It may be headed by the control sequence. The second string describes the comment's end and one character which is the comment's equivalent in terms of the grammar (usually a white space, in line comments one may like to use "\n"). This character must not be omitted.

One simple, possibly nested comment format:
e.g. DefCom('(*', '*) ') for Pascal type comments, the space at the end of the second string is essential (in Pascal as in many other languages a comment acts like a single space)
Additional comment formats (also nesting possible): first string starts with a control sequence, enclosed in $ $
e.g. DefCom('$+${', '} ')
Preceding definitions can be overridden by defining the position of the new format,
DefCom('$1${', '} ') makes this the top record (all other are deleted)
Delete just the last definition:
DefCom('$-$', '')
Non nested comment formats need always the extended form, starting with "$?x", where ? stands for one of the characters "+" (additional), "-"(replacing the last), or "1"..."4"
e.g. DefCom('$1-$/*', '*/ ') for C style comments (overriding the first/default)
and x is "-" for not nesting comments resp. "*" or empty for nesting comments.
There is the possibility to invoke a nonterminal NONTERM to consume the text stretch following the opening. Any remaining text will be skipped as usual, i.e. the closing must not be consumed by NONTERM!
E.g. DefCom('$+:NONTERM${$', '} ') for TurboPascal like directives
Remark: Any targets generated by NONTERM will be ignored (if not saved via parameters or global variables).

Examples:

Java comments
```
  DefCom('$1-$/*', '*/ '); DefCom('$+-$//','\n ');  
```
suffices to define comment formats for Java. However, if there is a need to handle documenting comments differently, one has to add
```
  DefCom('$+-$/**', '*/ ');
```
Be aware of the sequence
```
  DefCom('$+-$/*', '*/ ');... DefCom('$+-$/**', '*/ ');
```
because otherwise any comment starting with "/**" will be already skipped as one of form "/*".
Handling of pseudo comments
Comment format definition:
```
  DefCom('$+:NtDirect$(*!', '*) ');
```
Nonterminal definition:
```
  NtDirect=
    GLOBVAR USE traceState: BOOL;
    ('notrace'|'trace'|(*ignore*)); traceState:= c=2;.
```
could be used to analyze ... (*!trace -here starts the ordinary comment- version 33 29-FEB-00 *) ... and would result in traceState set true.

Remark

Depot4 and its Ml4 language have evolved during several steps of change, adding and removing of features. Some features were added rather ad hoc following an urgent need. Later on, it became clear that they do not fit well in the general structure, they are just a special case of some problem, which should be solved more generally, or they are limited to certain host languages. Unfortunately, if a program has been released and applied, it is no longer easy to remove anything without invalidating existing applications. I.e., such features have to be kept in the system, even if there is no need for their existence. That's common in software engineering and called backward compatibility . Ml4 is no exception in this respect. We have marked these features, which still are present but should not be used in any new program with (-).

This are:

Variables chrChr, nxtChr and the intrinsic procedure getChr.
As there is no character type they do not fit in the type system. By this, their implementation causes several restrictions.
Intrinsic procedure NoSkip and ReSkip
There is no need for them, use < ... > instead.

    previous         next         contents