REXX Language implementation
crexx is a custom REXX-to-bytecode toolchain that translates Classic REXX semantics into an optimized bytecode format executed by a specialized VM. The process happens through four main binaries:
rxc - The Compilerrxas - The Assemblerrxlink - The Linkerrxvm - The Virtual Machine (Interpreter)The pipeline of transforming REXX source code into executable bytecode is structured as follows:
.re rules (e.g., compiler/rxcpbscn.re and assembler/rxasscan.re).Token structs).Lemon (e.g., compiler/rxcpbgmr.y and compiler/rxcpopgr.y).DO ... END is overloaded: statement-leading DO remains the normal grouped/loop form, while expression-position DO ... END becomes BLOCK_EXPR. The grammar resolves the command-start ambiguity by routing top-level command expressions through a restricted command_expression spine while leaving the general expression grammar free to accept block expressions.ASTNode C structs that capture operations, scopes, typing, and tree associations.==, >>, <<, >>=, <<=) are carried as a distinct OP_COMPARE_S_* family. During type validation both operands are retargeted to TP_STRING, but the optimiser still preserves intrinsic numeric constant types long enough to stringify from value rather than source spelling. That keeps folded behaviour aligned with runtime cases like 01 == 1.NUMERIC DIGITS / FORM / CASE settings.BLOCK_EXPR (DO ... END used as an expression) and LEAVE_WITH (LEAVE WITH expr). The association pointer links each LEAVE_WITH back to its owning BLOCK_EXPR, similar to how loop LEAVE / ITERATE link to DO.rxcp_exit.c), which intercepts unrecognized IMPLICIT_CMD nodes, invokes user-provided rxplugin macros to generate replacement source code, parses the interpolated strings (preserving literal quotes), and surgically grafts the resulting AST back into the main tree without violating return-type constraints.main() wrapper, that procedure is marked is_implicit_main. Later typing and emission use that marker to interpret classic arg() / arg[] / arg[n] access against the hidden command-line .string[] that the VM already passes to main. Ordinary procedures still use normal vararg semantics, and explicit zero-argument main() does not gain accidental source-level visibility of the hidden VM argv payload.DO / IF / INSTRUCTIONS emitted by exits are therefore a supported shape, and debug validation is staged after that scope rebuild so the validator sees the stabilized tree rather than the transient pre-scope fragment form.namespace ... expose global variables to implicitly bind into local PROCEDURE scopes, eliminating legacy PROCEDURE EXPOSE boilerplate.rxcp_val_sym.c (Step 3 - Pass 3), the compiler walks the AST (build_symbols_walker) to identify explicit NODE_REGISTER allocations via the with register.N[.view] clause on class attributes. It then automatically maps any remaining unmapped attributes of a class to unused VM registers (r1, r2, etc.) by synthesizing implicit NODE_REGISTER AST nodes. For normal classes, prefer this implicit allocation and keep callers on factories/methods; explicit with register... mappings should be reserved for genuine physical interop where a fixed layout is required.rxcp_ast_walk.c, rxcp_emit_*.c) traverse the tree.rxas Assembly instructions.expose, assembler aliasing,
dynamic-index varargs, receiver/copyback shapes without proof, and
interface/member dispatch that cannot be proved monomorphic.META_INLINE payloads alongside
normal callable metadata. Libraries preserve this metadata for downstream
rxc optimisation; final linked images strip it by default.rxas)
rxas Assembly instructions.rxbin bytecode).rxlink, optional)
.rxbin modules into a single linked image with one shared constant-pool record and one shared-backed module record per selected module.META_SRC and META_FILE) for smaller deployable artifacts without removing runtime contract metadata.rxvm)
rxvm reads and executes the rxbin bytecode.rxldmod.rxvm_run function (e.g., in rxvmmain.c / rxvmintp.c).rxc does not treat every import location as both source and binary
space anymore. Import discovery is now split into two root classes:
.rexx.rxbin, optional .rxas, and .rxpluginThe primary source root is the directory containing the source file
being compiled. Additional source roots come from -s. Binary roots
come from any -i paths and the compiler executable directory.
Repeated -i and -s options are accumulated in order. Search order
keeps project source files ahead of deployed binary artifacts.
For source discovery the compiler now performs a lightweight header scan
before any full parse. That scan reads the leading options,
namespace, and import clauses so namespace-invisible files can be
rejected before rexbpars() and rxcp_val() are invoked. Full source
parsing is still used once a source file is actually selected as an
import candidate.
Within a binary root, same-stem artifacts are collapsed to the freshest
candidate. If timestamps tie, .rxbin is preferred over .rxas.
Level B interface support is now implemented across the compiler, assembler metadata path, and VM.
The source surface includes:
interfaceclass implements .iface ...* factories and named factoriesmatchexpr as .typeexpr is .typetypeof(expr).pkg..thing()Interface methods with bodies are emitted as final/default methods. The class must still implement abstract members, but it may not override a final/default interface member.
Qualified references use namespace..symbol; the left side must be an
imported namespace, not a class or interface name. namespace::symbol
remains accepted as a compatibility alias.
The contract model is carried through normal .rxbin metadata with:
META_INTERFACEMETA_IMPLEMENTSMETA_MEMBERThat metadata is sufficient for import reconstruction of class/interface headers without parsing procedure bodies. Imported stubs are not re-exported as new local contracts, and richer imported stubs replace poorer duplicates.
Created objects carry their concrete class identity. The VM then resolves contract calls through load/link-time registries:
srcmethod resolves the effective method for an interface/class receiver. The
registry prefers a concrete class method and otherwise falls back to a final
interface default method.
srcfproc resolves interface factories. Every candidate provider is evaluated
through its effective match; omitted match behaves as score 1, scores
<= 0 reject, highest positive score wins, and tied scores are broken
alphabetically by concrete class name.
cREXX now has an explicit split between the user-facing source model and the mutable compiler tree.
SourceNode tree in compiler/rxcp_source_tree.c.context->source_tree is the canonical user-facing tree for authored
structure, diagnostics, semantic sidecars, metadata anchors, and editor
projection.context->ast / work_ast remains the mutable compiler tree for import
loading, exit dispatch, fixed-point rewrites, optimization, and emission.ASTNode instances keep explicit links back to the source tree so later
rewritten nodes can still report against authored source.Parser mode (rxc --syntaxhighlight) uses the same parser and early source
preparation, but it routes through compiler/rxcp_highlight_controller.c and
serializes DSLSH from source_tree, not from the later rewritten work tree.
The controller also keeps retained parser-mode cache state for imports and exit
discovery across requests.
For the compiler-side build order and tree-split details, see Parsing Pipeline Anatomy.
For the DSLSH/editor mapping and parser-mode contract, see cREXX DSLSH Integration.
ASTNode (from compiler/rxcp_ast.h)The ASTNode forms the backbone of the compilation process, maintaining context, tree structure (parent/child/sibling relations), value/target typing details, code generation fragments, and parser token details.
struct ASTNode {
Context *context;
int node_number;
NodeType node_type;
char* file_name;
ValueType value_type; /* Value type */
size_t value_dims; /* Value dimensions */
int *value_dim_base;
int *value_dim_elements;
char* value_class; /* Value class name */
int *target_dim_base;
int *target_dim_elements;
ValueType target_type; /* Target type */
size_t target_dims; /* Target dimensions */
char* target_class; /* Target class name */
int high_ordinal; /* Order of node after validation but before optimisations */
int low_ordinal; /* lowest in this tree root */
int register_num;
char register_type;
int additional_registers;
int num_additional_registers;
char is_ref_arg;
char is_opt_arg;
char is_const_arg;
char is_varg;
ASTNode *free_list;
ASTNode *parent, *child, *sibling;
ASTNode *association; /* E.g. for LEAVE / ITERATE relevant DO node or LEAVE_WITH relevant BLOCK_EXPR */
Token *token;
Scope *scope;
char *node_string;
size_t node_string_length;
char free_node_string;
rxinteger int_value;
int bool_value;
double float_value;
char* decimal_value; /* Decimal value as a string */
int exit_obj_reg; /* VM register index of the attached Exit object */
/* These are only valid after the set_source_location walker has run */
Token *token_start, *token_end;
char *source_start, *source_end;
int line, column;
SymbolNode *symbolNode;
/* These are used by the code emitters */
OutputFragment *output; /* Primary node output or loop assign / init instruction */
OutputFragment *cleanup; /* Clean up logic */
OutputFragment *loopstartchecks; /* Begin Loop exit checks */
OutputFragment *loopinc; /* Loop increments */
OutputFragment *loopendchecks; /* End Loop exit checks */
};
DO loops from Lemon Parser to ASTIn the compiler, block scopes such as DO loops are strictly translated from Lemon grammar tokens into a parent-child AST topology.
Example from compiler/rxcpbgmr.y:
tk_doloop(D) ::= TK_DO(T).
{ D = ast_f(context, DO, T); }
do(G) ::= tk_doloop(T) dorep(R) TK_EOC instruction_list(I) TK_END.
{ G = T; add_ast(G,R); add_ast(G,I); }
TK_DO token is identified, a new ASTNode of type DO is instantiated (via ast_f).dorep (like TO, BY, FOR attributes) into a REPEAT AST node or docond (like WHILE / UNTIL).instruction_list(I) contains all expressions and assignments defined within the loop body.REPEAT clause, WHILE/UNTIL conditions, and the actual loop body instructions are iteratively appended as child nodes to the parent DO node using the add_ast(parent, child) and add_sbtr(older_sibling, younger_sibling) C functions.association pointer to link commands like LEAVE or ITERATE directly back to the target enclosing DO loop node.Once compilation via rxc and rxas is complete, rxvm handles the execution.
Modules are ingested into memory mapping via functions like rxldmod. The VM spins up its contexts, loading dynamically or statically linked extensions, and invokes rxvm_run to march through and execute the virtual CPU instructions matching the loaded byte sequence.