previous         next         contents

Appendix C: Solutions for frequent problems

This appendix gives solutions for frequently asked questions. We tried to show working examples. Most of them are included in the examples package of the Depot4 distribution. You can help to make this more complete by sending your questions to

Skip text until keyword/symbol

Sometimes one has to process/translate only parts of the source, which start with special symbols, keywords, or something else.
ExampleSkipExcl =
 { ('%%'  | any); !c#1; }

ExampleSkipIncl =
 c:= 2;
 {!c#1; ('%%'  | any) }
ExampleSkipExcl stops in front of the keyword, ExampleSkipIncl skips the keyword, too.
If there is a keyword to be found, one should add a branch:
ExampleSkipExcl =
 { ('REPEAT' $ | ident | any); !c#1; }
Otherwards it would be possible that the parser stops already on something like NOREPEAT.

A call to Skip in front of large alternative may also speed up the parser if it is likely to have to skip many characters at this point.

back to list

Avoid heading spaces, comments,... in SOURCETEXT

As SOURCETEXT means an exact copy of the appropriate source text stretch there is no way to exclude comments, spaces, newlines etc. within a construct from being copied. However, you can avoid such insignificant characters heading the construct. This is done by invoking the intrinsic procedure Skip() before entering the respective rule.
idcp = ident -> SOURCETEXT.
withhd = idcp -> idcp_.
withouthd = Skip(); idcp -> idcp_.
Then withhd will translate "    alfa" into "    alfa", while withouthd produces just "alfa".

back to list

Two-pass translation

This can be achieved easily by writing the target out on a file and including this file afterwards. If this is to slow, one can use internal buffering by replacing Dp4Streams with Dp4StrBuf.
There is no need to limit this to two passes.
  IMPORTS Dp4Streams, Dp4Chars, Dp4DocFiles;
ExampleTwoPasses =
(* write 1st target of ExampleFirstPass on file: *)
  Dp4Streams.Tar2Strm(ExampleFirstPass_, Dp4DocFiles.New("Temp1File"));
(* make this file the source for further processing: *)
-> ExampleSecndPass_.

ExampleFirstPass = ident -> ident_ '1st'.
ExampleSecndPass = ident -> '2nd Pass:' ident_.

back to list

Individual test of productions in complex environments

Selecting a certain nonterminal as root will no longer work if global declarations are involved. (very likely in any real application)
Here is a proposal to overcome this problem:
Create a special rule that establishes the needed environment and then calls ident4root. Test your productions by putting their name in front of the source.
  GLOBVAR DCL globvar1, globvar2,....;  (* declarations *)
  globvar1:= ...                        (* initializations - if needed *)
  ident4root                            (* calls the wanted production *)
-> ident4root_ .                        (* gives the result *)
Lets say, you want to test your rule MyExpr, then select TestRoot as root and write as source
MyExpr a+(B*Pi)/128
and the result will be quite similar to selecting MyExpr as root in a simple grammar without global variables.

back to list

Tuning branch selection

If branch selection is determined by heading keywords it can be speeded up by applying the following translation principle:

Instead of

 ('keyword1' $ ... |'keyword2' $ ... |'keyword3' $... )
one should better use
This avoids touching the source repeatedly.
(Do not expect too much. More often than not this tuning will not result in sensible speed-ups. If there is really a big number of keywords, i.e. >20...40, and the selection is touched really frequently, one should think about using an imported hash table. Thus, replace the second line by (/My.hash(Ident_0)/ ...|...|...) )

One can achieve a nominal speed-up by placing a Skip() in front of alternatives. But be careful, this may interact with suppression of skipping (<...> constructs).

back to list

Make linefeed ended comments

There are programming languages where a line end also closes comments (e.g. Ada, C++). One can produce such comments simply with help of the line terminal.
makeComment = { line[i] }
-> {'// ' line_[i] }.
This rule describes a translation which converts any text in a C++ comment format.

back to list

Copy comments from source to target

Keeping comments is a much more demanding task than one may think on first sight.
There are questions that cannot be solved on a syntactic level in principle, such as if a certain comment belongs to its predecessor or successor. One may, e.g., use to write "END(*LOOP*);" while another likes "END;(*LOOP*)".
Second, comments are, like spaces, skipped usually before parsing. Even if they are saved elsewhere the problem, which syntactic entity they should be attached to, remains.
Comments, in general, are much closer related to natural languages than any other feature of formal languages. An optimal handling would require to understand them, i.e. would require a semantic analysis that is much behind the scope of a syntax-controlled translator. Thus, there is, obviously, no fully satisfying solution.

However, with Depot4 there are two ways to tackle this problem:

  1. Fully parse comments
    This will probably produce the best results but it is also by fare the most expensive approach. It means, comments will no longer be skipped rather they are real elements of the language, .i.e. in fact, your language has no longer comments in the ordinary sense.
    Instead one has to define a certain production, e.g.
       ('(*'c:= 2; { !c#1;('*)'|any)}
       |'{' c:= 2; {!c#1;('}'|any)})
    and (one comment may be followed by another)
     Comment= {PascalComment[i]} ->  {PascalComment_[i]}
    To suppress the default comment definition call DefCom('$-$', '');.
    Finally, calls to Comment must be inserted (and handled!) in the grammar at every point where a comment is allowed. By this, the size of the grammar will easily be doubled - at least.

  2. Collect them during skipping
    This is a more convenient way but it has some drawbacks too. It is based on the possibility to define a production to handle the text inside a certain comment format. This is used to store the skipped text in a global variable. At different points in the grammar the saved comments can be handled (inserted) and the global be cleared.
    The advantage is that the trade-off between the expense to handle comments close to their origin and the resulting blow-up of the grammar can be defined individually. The more often the global is inspected the closer the results will come to the first approach. (Of course, the size of the grammar will also come closer to that.)
    The main drawback is due to the parsing strategy that tries branches sequentially. In some situations, comments will be skipped several times and thus, be saved several times too. To a certain degree, this can be circumvented by insertion of calls to Skip(); in front of alternatives, options and before the end of iterations. (Note, that cannot be done in general because it prevents the correct operation of skipping suppression <...>.)

    The following example code illustrates this approach. You can experience the mentioned problem by deleting the Skip(); in the last but one line.

      GLOBVAR USE coms: FLEX OF TXT;  USE nrOfComs: INT;
      DefCom('[*', '*] ');                   (* accept nested comments but do not
          call ExampleKeepCom recursively *)
     {ExampleKeepC1; INC(nrOfComs); coms[nrOfComs]:= ExampleKeepC1_; }
     DefCom('$1-:ExampleKeepCom$[*', '*] ');  (* re-activate ExampleKeepCom *).
    ExampleKeepC1= {('*]' | any)!c#1; } !N>0;
    -> '(* ' SOURCETEXT ' *)\n'.
    ExampleKeepInsert=  (* call this whenever you want to insert cumulated comments *)
      GLOBVAR USE coms: FLEX OF TXT;  USE nrOfComs: INT;
      VAR nr: INT;
      nr:= nrOfComs; nrOfComs:= 0;
      {/ coms[i] }.
    (*-------------------------------- DEMO ---------------------------------------*)
    ExampleKeepComDemo= (* --- root of the demo --- *)
        GLOBVAR DCL coms: FLEX OF TXT;  DCL nrOfComs: INT;
        nrOfComs:= 0;
        DefCom('$1-:ExampleKeepCom$[*', '*] ');
        '(*' { ExampleKeepComDemoElems[i] ExampleKeepInsert[i]}'*)'
    -> { ExampleKeepComDemoElems_[i] ExampleKeepInsert_[i]}.
      Skip();  (* this is important to avoid multiplied comment texts - try it *)
      (//e:id | e:str | e:num) -> e_ ':'.
Combined solution
If, perhaps due to some conventions, comments are frequently used at fixed points of the grammar (e.g. the end of a construct is qualified as "END (*IF*)" or "} /*while*/"), a mixed approach may be useful:

In general, comments are handled by the second method, but at these fixed points the first one is applied.

 statementWithComment=  statement <{' '} Comment>
 -> statement_ '--' Comment_ '\n'.
Insert a call to statementWithComment wherever you expect a comment of this type.
(Note: This will append the Ada-like comment regardless whether there is a comment in the source or not. This can be avoided if the new comment delimiters are added already within Comment.)

back to list

Copy and process comments

Keeping comments may cause additional problems if source and target language embody different commenting principles, e.g. free format comments vs. line bound comments.
Then comment texts need not only be saved but also be processed. With Ml4 this can be achieved by a nested layer of processing.
IMPORTS Dp4StrBuf, Dp4Streams;
  VAR bs: Dp4Streams.Stream;
  DefCom('(*', '*) ');               (* as above *)
   bs:= Dp4StrBuf.New('tmpstr');     (* make input stream  *)
   Dp4Streams.Tar2Strm(KeepC1_, bs); (* from accepted      *)
   From(Dp4Streams.StrmSrc(bs));     (* comment text       *)
   KeepC2 Back();                    (* and process it     *)
   INC(nrOfComs); coms[nrOfComs]:= KeepC2_;
 DefCom('$1-:KeepCom$(*', '*) ');    (* re-activate KeepCom *).

KeepC1= {('*)' | any)!c#1; } !N>0;

KeepC2=  {line[i]}                (* prefix each line *)
-> '\n' {'//' line_[i] }.


back to list

Reentrance and multithreading

Generally, Depot4's translators are reentrant and, as long as no imported or external components are involved, thread safe.
One can perform a nested translation by use of the From/Back mechanism (see example).
In principle, the instantiation of a separate parser by use of Dp4Tools.translate is possible too.
Starting from version 1.9, multithreading is possible. The data structures for the management of global data and parsing objects (cp. substitution of implementations) are stored in a Translator object now. To avoid changes to existing translators the previous interface is maintained by help of a (widely hidden) default Translator object. To exploit the enhancement one has to explictely use a dedicated Translator object for each thread.
However, the runtime does not handle any locking/synchronization issues when translators from different threads want to access the same object.

back to list

Defining application specific error messages

A simple and convenient way to define translator specific error messages is by setting them in the static environment of the root production or a dedicated production, which in turn is invoked by the root.
MODULE ExampleRoot
IMPORTS Dp4Messages;
ExampleRoot= ( ExampleActualRoot  (* This will cause an error as it is not implemented *)
             | ERROR('exp.general'); )
-> ( ExampleActualRoot_
    | 'Sorry, nothing generated'
  Dp4Messages.setErrorTxt('exp.general', 'Unspecified error');
  Dp4Messages.setErrorTxt('exp.typ', 'Type mismatch');
  Dp4Messages.setErrorTxt('exp.unKwn', 'Unknown/undeclared identifier');
END ExampleRoot
Using the above as a translator will produce a log like:
*Loading ExampleActualRoot failed
"" pos 0  error: Unspecified error

back to list

    previous         next         contents

© J. Lampe 1998-2010