previous next contents
Ml4
languageMl4
Ml4
(Meta-Language of Depot4) is based on EBNF. In fact,
it is a true extension of the notation introduced by N. Wirth.
A Ml4
program per se does not exist, instead there is a set of
Ml4
productions, which can be translated independently of each other. Thus, Ml4
features production (resp. rule) based modularization. Translators are configured dynamically by selecting
one of the rules as root production. I.e., the nonterminal
on the left-hand side of the production is declared as start symbol of
the grammar. By this an applicable language processor is formed.
The settlement of the language root can always be changed dynamicly.
Together with the dynamic loading of the modules this enables the testing
of parts of the language processor before finishing the implementation of
all the productions.
The formal description of the EBNF in section 3.3. is
already a set of valid Ml4
productions,
which can be translated by the Depot4 metalanguage translator into executable
code. By choosing Rule
as start symbol we get
an acceptor for EBNF productions.
A Ml4
production has the general structure:
identifier = sourceExpression -> targetExpression .
where the part starting with ->
, called target production, is optionally and may occur
more than once.Ml4
's unique features.
All the structure operators of the EBNF are available on both the source and the target side. There are
further extensions such as declarations, assign and call statements, etc.Ml4
, which are not part of the basic EBNF (i.e. extensions) are separated among
themselves and from those basic elements by semicolons. (In fact, semicolons may be used within the EBNF
parts, too.)
(*
and *)
and may be nested, i.e., a comment must not
contain any unbalanced *)
even if quoted.
DO
or
do
are very likely to clash and thus should be avoided.)Ml4
language:
ARR DCL END FLEX GLOBVAR IMPORTS INIT MODULE REC TYPE TYPEND USE VAR
':='
or 'BEGIN'
.
If the character '
itself is needed in a literal, it has to be written
twice, e.g. '''Hallo!'''
. \n newline choosen corresponding to the actual operating system \c carriage return \l line feed \f form feed \t horizontal tabulator \v vertical tabulator \B bell \b backspace \\ \ \0 Nullbyte
$i
, where i is one of the digits 1...9 giving the
number of characters requested at least. So '$3INTEGER'
accepts the strings
INT
, INTE
, INTEG
, INTEGE
, and INTEGER
, but not
INTEGERS
. If a literal starts with the character '$',
then the character '$' has to be written twice, e.g. '$$a$'
accepts the string $a$
.
'REAL'
would also accept the starting sequence of a string
REALUM
, but writing 'REAL' $
prevents this. identifier = expression.
identifier
is the name of a nonterminal.
The dot marks the end of the production. expression
is the collection of all right-hand sides
of the productions with identifier
on the left-hand side. B = A1 A2 ... An.
B = A1 | A2 | ... | An.
B = [ A ].
Due to an intersection with indexing in in the enhanced language (Ml4), an option following an
identifier must be separated by at least one space (or other deliminator).
Iteration may be directly represented (without using recursion) by curly braces. Iteration is useful
to express left association when left recursion is forbidden (as in Depot4).
B = { A }.
B = 'a' ('b'|'c') 'd'.
describes the language {'abd', 'acd'}
.
Rule = ident '=' Expr '.'.
Expr = Term { '|' Term }.
Term = { Factor }.
Factor = string | ident |'('Expr')'|'['Expr']'|'{'Expr'}'.
Ml4
allows empty productions, i.e. empty = .
is valid.
Ml4
: primitive types, structured types, and
opaque types. The latter are of interest only in connection with the import feature and allow a simple
handling (declaration, parameter passing) of foreign data.
Ml4
production.
- INT - actually $3INTEGER
- Integer type is mapped on the respective type of the host language.
- REAL
- Floating point type, mapped on a real type, too. There is only a limited support for this type, e.g., no conversions are available.
- BOOL - actually $4BOOLEAN
- The boolean type, whose values are
TRUE
andFALSE
- SYM
- A type, whose values are symbols, i.e., possibly limited strings of characters. They can be, at least, concatenated and compared.
- TXT
- This is the basic target type. Values of this type can only be concatenated.
- TAR - actually $3TARGET
- This is, exactly, no primitive type, as it is the result of a nonterminal's invocation, i. e. the collection of targets. There are no operations other than assignment or selecting a certain target with the trailing underscore notation.
- RECORD - actually $3RECORD
- The syntax of a record definition follows that of Pascal/Modula (without variants).
- ARRAY - actually $3ARRAY
- An array is a constant sized vector of elements (which may be in turn of array type again). Only the number of elements is given, their counts start with zero.
- FLEX - also FLEX1, resp. FLEX2
- Flexible arrays (FLEXes) are suited to store information in connection with EBNF's iteration construct. They have no upper limit for the number of their elements. Accessing a non-existing element
f[i]
will create it.
The index range of FLEX starts with one.
The use of this data type requires runtime management of the associated data structures and, thus, is expected to be in most host languages less efficient than ordinary arrays.
Flexible arrays may be of dimension one (FLEX / FLEX1) or two (FLEX2).
ARR 20 OF INT REC name, town: SYM; age: INT; gen: BOOLEAN END FLEX OF SYM FLEX2 OF RECORD F: FLEX OF INTEGER; AAR: ARRAY 10 OF ARRAY 5 OF REAL END
Ml4
.
Because of efficiency reasons Ml4
allows to combine several productions into a module.
This is restricted to groups of nonterminals, where only one is called from
outside, but the remaining are needed locally onlys. The name of the
module has to be the name of the nonterminal called from outside.
Productions resp. modules are translated separately. There is no need for any used nonterminal (i.e.,
a nonterminal on the right-hand side of the rule) to be defined yet.
The nonterminal's identifier (i.e., the left-hand side of the rule) becomes the identifier of all the
generated entities (host language source file, object file, etc.). This means, if there are two
productions with the same left-hand side, translating one of them will possibly overwrite the
implementation of the other.
There exists just one global name space for all productions. Thus, it is useful to follow a naming
convention when defining new rules.
Depot4 supports prefixing, i.e., if an identifier contains a small letter or digit followed by a capital
letter, all the part before the first such capital is regarded as common prefix. (E.g. Dp4
is the prefix of Dp4ExAmPlE1
.) This avoids name collisions and is also applied for
automatic structuring (into subsystems/packages) if the host system supports this.
Ml4
production.
The basic structure of this part is given by EBNF.
The following terminals are supplied with every Depot4 implementation:
digit
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
letter
hexdigit
id
, ident
letter {letter|digit}
See also Treatment of Keywords
intn
, num
, integer
digit{digit}
number
digit{digit}['.'{digit}[('E'|'D')
['+'|'-']digit{digit}]]
filename
line
ident4root
any
str
, string
stresc
, stringesc
"abc\"de"
is accepted as abc"de
.dqstr
"abc""de"
is accepted as abc"de
.dqstresc
"abc\"de"
is accepted as abc"de
.sqstr
'abc''de'
is accepted as abc'de
.sqstresc
'abc\'de'
is accepted as abc'de
.[ident] 'END'
as the closing end will be
accepted as an identifier. There are at least two ways to overcome this. At first, one can change the
grammar, e.g. into (ident 'END'|'END')
which solves the problem.Depot4
has a more convenient solution now. One can write all these words that
are not identifiers into a file. As a default Depot4
looks in the current directory
for a file NoIdent.lst
(can be changed in module Dp4Config
) and
excludes all the words that it contains from being recognized as identifiers.NoIdents
from module Dp4Stdlex
. The argument is the filename string.
This call discards the previous list and installs a new one, which will be empty if no file was found. pushNoIdents(filenameString)
saves the old settings in addition, while
popNoIdents()
restores the saved status.
The syntax of an exclusion file is simple: just list the words, separated by spaces or newlines. If capitalization is insignificant, upper case letters must be used.
IMPORTS Dp4Stdlex;
lextst = Dp4Stdlex.NoIdents('PascalNoIdent.lst');
{ ident } 'END'
Dp4Stdlex.pushNoIdents('CNoIdent.lst')
{ ident } 'end'
Dp4Stdlex.popNoIdents();
{ ident } 'UNTIL'
.
with file PascalNoIdent.lst containing at least END
and UNTIL
, and file CNoIdent.lst
containing end
, this rule will acceptalfa beta END END UNTIL end end UNTIL
There are two possibilities to modify nonterminals in the description of the source:
Name:NT
the nonterminal NT
gets the new designation
Name
. Renaming is usually used if a nonterminal occurs on
several positions in a production: Prod = F1:Fact [ Op F2:Fact]
-> F1_ [Op_ F2_].
But renaming can also be used in the reversed way.
It is possible to give different nonterminals in different branches of an
alternative the same name if they are to be treated equally: Stat = S:IfStat | S:AssStat | S:ForStat
-> S_.
NT[index]
it is possible to provide nonterminals with indices.
This is usually used in connection with iterations:
DclSeq = { Dcl[i] }
-> { Dcl_[i] }.
Every nonterminal can get at most two indices.
To distinguish between the parentheses for indices and for options the following has to be
obeyed: There must not be a space, newline or comment between the nonterminal and the
opening index parenthesis. In contrast there has to be a delimiter between a nonterminal
and an opening option parenthesis.
Seq = { D:ConstDef[i] | D:TypeDef[i] }
-> { D_[i] }.
<
and >
. In
this way class terminals can easily be implemented, too.
Integer = digit < { digit } >.
By the enclosure in < ... >
delimiters inside the number are prohibited.
An exception is the first digit, so that delimiters in front of the number can be ignored.<digit [digit>]
will not work correctly if only
one digit was accepted.
Ml4
aims at the goal of translation descriptions which are highly
independent from the system's actual host language it does not take a purist's view and
offers an interface to those basic system features. The interface is defined by procedures
(or routines or methods) encapsulated in an unity called module, e.g.
a class in Java or
an Ada package. Calls to such procedures may be embedded in the source text of the parsing part.
The import of modules is described in 3.14.1. Ml4
code position.Intrinsic procedures are described in 3.7.2, regardless if they are proper procedures (i.e. have no return value) or not.
The result of an assignment is not reverted during back-tracking. (Thus permitting unbounded lookahead.)
3.7. Expressions
Expressions can be build similar to the rules of Pascal, i.e., with three levels of priority. Unary
operators (sign, NOT
) are of the highest level.
3.7.1 Operators
+, -, OR
*, DIV, MOD, &
=, #, <=, >=, <, >
not equal
ABS(IntegerExpression):IntegerValue
ABS(RealExpression):RealValue
IntegerExpression MOD 2 = 0
ODD(IntegerExpression):BoolValue
Len(SymExpression):IntValue
Len(TxtExpression):IntValue
Str2Int(SymExpression):IntValue
Str2Int(TxtExpression):IntValue
Int2Str(IntExpression):SymValue
SourcePosition():SymValue
BOOLEAN(BooleanExpression):BooleanValue
INTEGER(IntegerExpression):IntegerValue
REAL(RealExpression):RealValue
SYM(SymExpression):SymValue
TXT(TxtExpression):TxtValue
LogPos
and Dp4OP.ERROR
, thus avoiding the need to import
Dp4OP
for this reason only.ERROR
but indicated as "warning" and issued only if a showWarnings flag is set.
Ml4
production and, thus,
must not be explicitly redeclared. They serve as default control variables (see there), but can
- with some care - be applied elsewhere, too.Integer: N, O, i, c
Variables with special function (all of type SYM):
Ml4Date
: the current date in a default format (as defined in procedure
Date
see)
curChr
: current character to parse (-) see
nxtChr
: the character following curChr
(if exists)
(-) see
previous next contents