How does MJS Yabasic work? MJS Yabasic compiles your source code into an object bytecode designed to be similar to the Java Virtual Machine, but is in reality similar to the one used in the CPython implementation of Python. The compilation is performed by the ProgramInfo constructor. First the program is transformed from textual source code into an array of 'Form's known as the 'Form Array'. A 'Form' is a representation of a single Yabasic command (PRINT, FOR, DOT, GTRIANGLE, etc.) together with all its parameters. The 'Form Array' holds Forms indexed by their sequence number in the code. The sequence number consists of the line number of the Form in the source code, followed by some number of suffixes, such as the position of the Form within a line separated by colons. As the Form array is manipulated, additional elements can be inserted between existing ones by appending alphabetic suffixes. The Form Array is constructed by sourceLinesToFormArray(), which takes the raw text content of the textbox as a string and returns the program in its Form representation. sourceLinesToFormArray() first splits the incoming program string into an array of lines against the newline character. No Yabasic form is allowed to straddle multiple lines. (This implicitly makes the newline character into a special separator token within the Yabasic grammar.) For each line, we split it into an array of symbolic tokens with tokeniseLine(). This is a syntactic operation, not semantic. An array of token objects is returned for each line. This is done by calling getFirstToken() on the string, which returns the first valid token from the string together with the remainder of the string. Specifically, getFirstToken() takes a string as input and returns one of three possibilities: * An object {token: token object created from parsing, remainder: string remainder to be analysed later} if the token analysis was successful. * true if there are no more tokens to be retrieved from this line. * false if the character sequence is egregiously not a token and compilation should fail. The following tokenise rules are defined in MJS Yabasic: Free whitespace (outside of a string literal) seperates out tokens but is not a token in itself. Lines that begin with a straight quote ' or hash # are comments. The rest of the line is discarded. The symbols ( ) + - * = , : ; can all be immediately identified as standalone symbol tokens. / may be followed by / to produce a // line comment which suppresses the line, else it is a /. < may be followed by > or = to produce <> or <=, else it is a <. > may be followed by = to produce >=, else it is a >. If the character sequence begins with ", it is a string literal. A string literal is a pair of speechmarks characters which contains any number of: Anything except the speechmarks or backslash (capturing all non-latin printables), OR an escape sequence which is captured as a pair. If this requirement is violated, this is a syntax error. When a string literal is captured, the valid Yabasic escape sequences contained within are converted to their JavaScript equivalents (this is the escaped slash, newline and carriage return). If the character sequence begins with a number or dot, it is considered a numeric literal (notice that + and - do not designate a numeric literal). If the character sequence begins with an underscore or an alphabetic character (notice that this doesn't extend to non-latin characters unfortunately), it is an identifier or keyword. After locating the full string, if its text matches one of the Yabasic keywords (case insensitive), this token is a keyword and the text is converted to uppercase. Otherwise it is an identifier and the case is maintained. If none of these rules apply, the token is bad and there is a compilation error. A token object consists of {type: a TokenType. constant indicating the type of the token, text: the text of the keyword or literal as it appeared in the source, value: asdasd, valuetype: A low-level internal Yabasic ValueType instance holding the value of the token if a literal. } Some examples: The type field is the primary designator of tokens. Tokens that are just symbols with no parameters have the following structure: {type: TokenType.LEFT_BRACE} For Yabasic keywords, the text of the keyword is stored in the text field in uppercase. {type: TokenType.KEYWORD, text: 'PRINT'} For string literals, the text of the string is stored in the valuetype field, and the original string with speechmarks is kept in text. {type: TokenType.STRING_LITERAL, text: '"matt"', valuetype: ValueTypeString("matt")} For numeric literals, the same rules apply. {type: TokenType.NUMERIC_LITERAL, text: 2.5, valuetype: ValueTypeNumeric(2.5)} For each line, we aim to produce a unique sequence identifier of the form 'LINE-SUBSET' where LINE is the 1-based line number and SUBSET is the number of the command across the line accounting for colons. (A 'subsequence'.) For each line, the token array is treated as a FIFO stack and searched for colons. Each token sequence either side of a colon token is a separate subsequence. Each subsequence is subsequently treated as a new line in future stages. Once this is done for all program lines, we have the complete token line array holding the token sequence for each subsequence present in the program. The function tokenLineArrayToFormArray() converts a token line array into a complete Form Array by analysing the token sequences and comparing them against a set of valid preset syntactic forms based on the Yabasic language grammar. The function tokenLineToFormList() attempts to analyse a token series and return a list of all the found Forms. Each token line may result in one or more Forms. Each line is analysed for a valid Form, and once one has been located, its tokens are removed from the line. If there are any tokens remaining, they are then analysed for additional forms: MJS Yabasic allows for complete Forms to appear consecutively without interposed colon tokens. The rules for Form analysis are rather arbitrary: If the first token is a variable name, then the Form is either EVALUATE_AND_DISCARD (an expression is evaluated and the result discarded) or ASSIGNMENT (an expression is evaluated and the value placed into a left-hand-side value). If, after scanning for a left-hand-side value, the next token is a = symbol, then this is an ASSIGNMENT Form: the value of evaluating the right-hand-side expression is assigned to the left-hand-side value. Else, this is an EVALUATE_AND_DISCARD Form: the value of evaluating the expression is discarded. Similarly, if the first token is a Yabasic keyword that acts as a function, this is an EVALUATE_AND_DISCARD Form. If the first token is a numeric literal, then it is expected the line is of the form '10 PRINT "Hello"': the numeric literal is taken as the text of a new LABEL Form. This is one of the cases where allowing consecutive forms is a good idea. If none of the above special cases applies, then the keyword pattern matching system tokenLineKeywordFormsAssimilate() is used. This function attempts to match the contents of the token stack against a list of predefined syntactic templates representing the built-in features of the Yabasic language. It tries each template in turn, returning a complete Form as soon as a complete match is found. The test is performed by the function tokenLineKeywordFormAssimilate() which takes a Form template and a copy of the token stack as parameters. A predefined Form template is specified as a string containing a space-separated series of Yabasic keywords or token-type placeholders. Keywords are specified as words all in uppercase. These must be matched exactly for the Form to be applicable. When a keyword is tokenised, it is converted to uppercase. Lowercase letters indicate a placeholder for a token or token tree: n is an evaluatable expression. v is a single variable name. cn is a comma separated list of expressions. There may be none.?? CHECK S is a string literal. cl is a comma separated list of valid left-hand-side values. cv is a comma separated list of variable names. k is an arbitrary keyword. b is a valid label identifier: a numeric literal or a numeric variable name. The Form is checked left to right, attempting to assimilate tokens from the stack. If the token from the stack matches the keyword specified, or the parser can assemble the appropriate expression type from the token series, then it is accepted and the next part of the template is checked. When the function successfully assimilates a placeholder, the token trees generated are placed into the args[] array of the Form object. There are two additional characters that may appear in a Form template: [ and ] denote a series of optional tokens. The brackets may only appear at the end of a form. When a [ is encountered, the assimilation up to this point is treated as a checkpoint. Any error that occurs after this point will result in a Form being returned based on all the tokens used before the [. If the assimilation successfully enters the [ and matches all the templates until the ], then the full Form is assimilated. For example, consider the following template: FOR v = n TO n [ STEP n ] This will match the token series FOR a = 1 TO 21 STEP 2 directly, by matching it as 'FOR', NVARIABLE_NAME, EQUALS, NUMERIC_LITERAL, 'TO', NUMERIC_LITERAL, 'STEP', NUMERIC_LITERAL. The assimilation carries on into the brackets, is successful, and returns the complete Form with four arguments. In this case, no tokens remain to be returned to the caller. It will also match the token series FOR b = 10 TO 20 PRINT b NEXT. When the function reaches the SEQUENCE_COLON after the NUMERIC_LITERAL 20, it does not form part of a valid expression, so it will not be gathered into a token tree with the 20. The function will attempt to match the keyword 'PRINT' against the keyword 'STEP'. This will fail, and the function will return to the checkpoint, returning the Form generated up to this point with three arguments. The remaining tokens are returned to the caller. One important language feature of Yabasic is implemented here as a result of the Form template list: an IF without a THEN is a 'sticky if' (represented internally as an IF_STICKY Form). The 'sticky if' automatically encloses all forms present on the current line. This is implemented by using two rules: IF v THEN is a normal IF Form, which must be followed by an ENDIF. IF v is an IF_STICKY Form. If an IF_STICKY is present when the Forms of a line are generated, it is converted to an IF and an ENDIF is appended to the end of the line. The series of Forms returned from tokenLineToFormList() are iterated through, and each Form assigned a unique sub-subsequence number based on its position within the subsequence. For example: LINE-SUBSET-SUBSUBSET could be 0001-00001-02 for the second Form found within the first sequence of tokens on the first line of the program. At this point, the Form Array is a raw re-representation of the original Yabasic program. To make compilation simpler, the control structure Forms ON, REPEAT-UNTIL, WHILE-WEND, FOR-NEXT can be restated in terms of nested IF, GOTO and LABEL statements. scrambleFormArrayBlockControlStructures() analyses the program structure in terms of nested complete control structure blocks restating them in terms of IF and GOTO base Forms. Unlike some other BASICs, Yabasic expects the control structure Forms to compose well-formed enclosing blocks similar to C, rather than allowing arbitrary NEXT placement. A stack of open control structures is maintained, recording the type of the most recent control structure, and the names of the labels necessary to perform the jump described by CONTINUE or BREAK Forms. When the opening Form of a control structure is encountered, an entry is pushed onto the stack, and label names are generated describing the beginning and the end of the control structure. These names are used to refer to the endpoints of the control structure when it is flattened into GOTO jumps. They are comprised of ascending alphabetic labels, to ensure the correct sequence with relation to each other and the correct placement within the main program. When the corresponding close structure is located, the frame is destroyed, allowing the CONTINUE and BREAK to refer to the next highest control structure. The use of the stack allows the compiler to enforce completeness of the control structure Forms. SUB Forms must always appear at the lowest level. For ON, the transformation proceeds as follows: ON n k cn ON variable GOTO/GOSUB comma separated list of identifiers The runtime examines the value of INT(variable), and executes either GOTO or GOSUB to one of the named labels based on the value. If the value is <= 1, the first label is used. For subsequent labels, if the value equals an incrementing N, then that label is used. Values beyond the last N are treated as if they were the last case. Therefore the form: ON value GOTO label1,label2,label3 can be re-expressed as: 00ONTEST = n IF 00ONTEST < 1 THEN GOTO label1 ELSEIF 00ONTEST = 2 THEN GOTO label2 ELSEIF 00ONTEST = 3 THEN GOTO label3 ELSE GOTO label3 ENDIF The use of a final ELSE allows for simple handling where there is only one or two cases in the label series. As 00ONTEST is only checked within the immediate Form, there is no risk of collision between adjacent ONs. Yabasic identifiers cannot begin with a number, so there is no risk of collision with the user program. For REPEAT, the transformation proceeds as follows: REPEAT BODY UNTIL condition The commands within BODY are executed, then the condition is evaluated. If the result is nonzero, the loop terminates, otherwise the program cursor returns to the beginning of the loop. The runtime considers REPEAT to be the beginning of the loop, immediately preceding the UNTIL to be the condition point, and immediately after the UNTIL to be the exit of the loop. With these definitions in place, REPEAT can be re-expressed as: LABEL XXXX-YYYYY-ZZREPEATSTART BODY LABEL XXXX-YYYYY-ZZREPEATCONDITION IF condition THEN GOTO XXXX-YYYYY-ZZREPEATEND ENDIF GOTO XXXX-YYYYY-ZZREPEATSTART LABEL XXXX-YYYYY-ZZREPEATEND The fragment XXXX-YYYYY-ZZ refers to the sequence index string of the REPEAT Form. This ensures that all program flow changes relating to this REPEAT are logically connected correctly. For WHILE, the transformation proceeds as follows: WHILE condition BODY WEND The condition is evaluated. If the result is zero, the loop terminates, otherwise the program cursor enters and executes BODY. When the program cursor reaches WEND, the program cursor returns to before the condition evaluation. The runtime considers WHILE to be the beginning of the loop, as well as the condition point. Immediately after the WEND is the end of the loop. With these definitions in place, WHILE can be re-expressed as: LABEL XXXX-YYYYY-ZZREPEATSTART LABEL XXXX-YYYYY-ZZREPEATCONDITION IF condition THEN ELSE GOTO XXXX-YYYYY-ZZREPEATEND ENDIF BODY GOTO XXXX-YYYYY-ZZREPEATCONDITION LABEL XXXX-YYYYY-ZZREPEATEND For FOR, the transformation proceeds as follows: FOR v = a TO b STEP s BODY NEXT v When the FOR is encountered for the first time, a, b and s are evaluated in that order. The value of a is assigned to v. Then, if a is beyond the extent of b with respect to the sign of s, the loop is terminated. Else BODY is executed. Then a, b, and s are evaluated again. The value of s is added to v. Then the termination test is executed again. With these bonkers definitions in place, FOR can be re-expressed as: LABEL XXXX-YYYYY-ZZFORSTART FOR_v_VALUE_VALUE = a FOR_v_LIMIT_VALUE = b FOR_v_STEP_VALUE = s v = FOR_v_VALUE_VALUE GOTO XXXX-YYYYY-ZZFORSKIPSTEPADD LABEL XXXX-YYYYY-ZZFORCONDITION FOR_v_VALUE_VALUE = a FOR_v_LIMIT_VALUE = b FOR_v_STEP_VALUE = s v = v = FOR_v_STEP_VALUE LABEL XXXX-YYYYY-ZZFORSKIPSTEPADD IF (v > FOR_v_LIMIT_VALUE AND FOR_v_STEP_VALUE >= 0) OR (v < FOR_v_LIMIT_VALUE AND FOR_v_STEP_VALUE <= 0) THEN GOTO XXXX-YYYYY-ZZFOREND ENDIF BODY GOTO XXXX-YYYYY-ZZFORCONDITION LABEL XXXX-YYYYY-ZZFOREND At this point, the only flow control Forms that are present in the program are IF, ELSE, ELSEIF, ENDIF, GOTO, GOSUB and SUB. The function resolveFormArrayIFControlStructures() dissolves all IF, ELSE and ELSEIF calls, replacing them with a uniform IF_NOT_GOTO Form which branches to its .branch target if its argument evaluates to false-like, or proceeds to .next otherwise. The mechanism of resolveFormArrayIFControlStructures() is similar to scrambleFormArrayBlockControlStructures(): it considers the program as a series of nested control structures designated by beginning and ending Forms. The program at this point will only have IF / ENDIF, IF / ELSEIF / ENDIF, IF / ELSE / ENDIF or SUB / END SUB groupings. Each IF encountered opens a logical block, creating a stack entry. The stack entry records the sequence index of the form where the IF was found. This is used to construct LABEL names to direct control flow through the IF. A case counter is maintained for each IF, reflecting the number of ELSE or ELSEIF cases encountered so far. Each IF is transformed into an IF_NOT_GOTO Form: (The left column shows the sequence index of each Form.) a IF x THEN b BODY c ENDIF Becomes: a IF_NOT_GOTO x a1 b BODY ca LABEL a1 cb LABEL aendif The numeric part of the label is the case counter. As each test within an IF, ELSEIF, ELSE cascade fails, the program counter jumps to the next case to test. This is designed to be similar to how the algorithm would be implemented in assembly language. The endif label is provided for each case to fall into once they have completed their body. a IF x THEN b BODY1 c ELSEIF y THEN d BODY2 e ELSE f BODY3 g ENDIF Becomes: a IF_NOT_GOTO x a1 b BODY1 ca GOTO aendif cb LABEL a1 cc IF_NOT_GOTO y a2 d BODY2 ea GOTO aendif eb LABEL a2 f BODY3 ga LABEL a3 gb LABEL aendif This complete Form Array is returned to the ProgramInfo constructor. The Form Array is embellished in formArrayExpandLOCALSTATIC() by considering the behaviour of LOCAL and STATIC in relation to loop counters. The system uses hidden mangled names to hold the limit and step values for a FOR loop. If the user designates these variables to be LOCAL or STATIC, the label should also be applied to the mangled names (FOR_v_VALUE_VALUE) to ensure the loop semantics carry correctly. This allows recursive calls to a function containing a FOR loop. The Form Array up to this point has been held a disconnected series of independent Form objects. The sequence has been enforced by storing the Forms in an array, referencing them by a lexical sort. The function formArrayResolveJumps() connects the Forms together by populating their .next field with a reference to the next Form that would be executed following normal control flow. The rules for this as follows: If not overridden, the .next field of any Form is the next label that appears lexically in the containing Form Array. The .next field of a GOTO Form is set to the result of a lookup of its target label. A failure is a compilation error. The .branch field of a GOSUB Form is set to the result of a lookup of its target label. A failure is a compilation error. The .branch field of an IF_NOT_GOTO psuedo-Form is set to the result of a lookup of its target label. A failure is a compilation error - and an error in the compiler as the target label should have been automatically generated when the IF_NOT_GOTO psuedo-Form was created. Some Forms are designated as 'non-executable'. These forms provide information to the compiler rather than having a run-time effect. These date DATA, LABEL and SUB. These Forms are not set as .next or .branch members during Form Array manipulation. The execution chain should therefore leap over them, and with the result that subsequent translation stages will not contain a representation of them. The From Array is analysed for DATA Forms by the DataMaster constructor. The DataMaster analyses the Form Array in lexical sequence order and produces a singly linked list of data items in the order that they would be READ. As the DataMaster proceeds through the Form Array, any LABELs it encounters are added to a list of pending labels. When the DataMaster encounters a DATA item, all the pending labels are 'applied' to that DATA item so that the item can be made the current DATA item under the cursor as a result of RESTORE. The DATA Form uses its args array to contain all the declared DATA items as expression trees. For each DATA item that the DataMaster constructor encounters, it attempts to evaluate the value of the expression in a contextless (no symbol table) immediate environment using hastyEvaluateTokenTree(). If the evaluation succeeds, the value of the evaluation is stored in the DataMaster as ValueType. If the evaluation throws an exception (because it attempts to evalute the value of a variable, or a non-constant function is called, like RAN()) this is a compilation error. With the Form Array and DataMaster ready, RESTORE Forms can now be fully resolved. RESTORE Forms take a single argument, a textual Yabasic label that precedes the DATA item that should be READ next (or no arguments, which refers to the beginning of the program). formArrayResolveRESTOREs() iterates through all the Forms, setting the argument of each RESTORE Form to the data object stored within the DataMaster for that label if it exists. An attempt to RESTORE to a label that is not present is NOT a compilation error, it's a runtime error. This is manifested as a RESTORE Form with an empty args array. From this point, the program must be considered as a directed graph of Forms. formArrayLocateFirstExecutable() scans the program, looking for the first executable Form. (There is a possible bug here if you start a program with a SUB? It seems that it would jump into the first line of the body of the SUB, instead of leaping over it by treating the SUB Form as a GOTO to the end of the SUB like it would be if you encountered it in the body of the program.) Next, formArrayFindSubroutines() searches the program for SUB Forms, and constructs an object mapping the SymbolTable identifier number of each SUB subroutine name to a Subroutine() instance. From this point the symbol table is introduced. This is handled by the class SymbolTable. The purpose of the symbol table is to allow a mapping from Yabasic identifiers found within the source code to simple integer values. This assists the runtime by allowing it to perform load and store operations using an array with an integer index instead of a string index. Each time the SymbolTable is queried for the 'slot number' of an identifier, the unique identifier allocated to that identifier is returned, or a new one is generated if the identifier has not be introduced to that SymbolTable before. When SymbolTable.addToken is used to introduce a symbol to the table, it also augments the token with a .slot field holding the slot number of the symbol. The Subroutine object represents the constant information that defines each subroutine within a Yabasic program. It takes the Form Array, the Form that declares the SUB, the sequence index identifier, and the symbol table. From the given entry point Form, it uses symbol_table.gatherSymbolsFromEntryPoint() to add all of the symbols in the program graph below this point to the symbol_table. It then gathers up the parameters, making a list of the parameter names and types. For each parameter, the variable of that name within the scope of the subroutine must be considered as LOCAL. To accomplish this, we consider the parameters passed on the stack just as any other function call. We transfer these parameters from the stack to the symbolic store by constructing a prelude sequence that designates each parameter variable as LOCAL and assigning the value to the symbolic store. We do this in opcodes directly. The final opcode's .next field links to the first opcode of the subroutine body. As a result, each Subroutine identifies two entry points: the 'Yabasic entry point' Form directly following the SUB Form, and the effective compiled entry point produced by considering the prelude. After all the subroutines have been identified, the entry point of the program is used to scan for symbol entries using symbol_table.gatherSymbolsFromEntryPoint() just like the subroutines. The symbol table now contains identifiers for all the variables throughout the program. At this point the Form graph is complete. Previous versions of Yabasic would interpret the Form graph directly (still older versions would calculate the destinations of branches each time the program cursor moved, and there was no symbol table). To accelerate the execution of the program, the information-rich Form graph representation needs to be transformed into a smaller, more regular opcode graph representation. An opcode is composed as follows: { opcode: OPCODE. constant indicating the type of opcode args: array of arguments. The length and types of this depend on the type of the opcode next: reference to the next opcode to execute based on normal flow branch: reference to the next opcode to execute based on conditional flow } The following opcodes are defined in the MJS Yabasic low-level implementation: (If next and branch aren't given, this opcode is part of the linear flow.) VALUE, POP, READ, RESTORE, LOAD, LOADARRAY, STORE, STOREARRAY, EVAL, CALLUSER, GOSUB, GOTO, IFNOTGOTO, RETURN, SCOPE and DIM. OPCODE.VALUE { opcode: OPCODE.VALUE args: [ValueType instance] } The single ValueType instance held in args[0] is pushed to the top of the temporary value stack. This is used whenever a direct value needs to be introduced into the stack. OPCODE.POP { opcode: OPCODE.POP args: [] } Removes the top ValueType from the top of the temporary value stack. OPCODE.READ { opcode: OPCODE.READ args: [] } Reads the next DATA item from the DataMaster, advances the data item cursor, and places the value on the top of the temporary value stack. OPCODE.RESTORE { opcode: OPCODE.RESTORE args: [resolved data item reference] } Resets the data item cursor within the DataMaster to the data item held in args[0]. This reference will have been set by formArrayResolveRESTOREs(), resolving to the next appropriate data item after the Yabasic label given in the program. OPCODE.LOAD { opcode: OPCODE.LOAD args: [slot number of variable to load from] } Loads the current value from the symbolic store at the given slot number with respect to the current cursor stack state and pushes it onto the top of the temporary value stack. OPCODE.LOADARRAY { opcode: OPCODE.LOADARRAY args: [slot number of array to load from, number of dimensions in locator] } Loads the current value from the symbolic store at the given slot number as an array access with respect to the current cursor stack state and pushes it onto the top of the temporary value stack. This opcode expects the topmost values on the stack to be the values of the array indices (the 'locator') necessary to access the array item. These need to be in FILO order: the rightmost array index should be the topmost item on the temporary value stack when this opcode is encountered. OPCODE.STORE { opcode: OPCODE.STORE args: [slot number of variable to store to] } Pops the topmost value from the temporary stack and stores the value into the symbolic store at the given slot number with respect to the current cursor stack state. OPCODE.STOREARRAY { opcode: OPCODE.STOREARRAY args: [slot number of array to load from, number of dimensions in locator] } Pops a value from the temporary stack and stores the value into the symbolic store at the given slot number as an array access with respect to the current cursor stack state. This opcode expects the topmost values on the stack at this point to the values of the array indices (the 'locator') necessary to access the array item, followed by the value to store. These need to be in FILO order: the rightmost array index should be the topmost item on the temporary value stack when this opcode is encountered. OPCODE.EVAL { opcode: OPCODE.EVAL args: [reference to JavaScript function evaluate*() to execute] } Calls JavaScript function args[0] with the current state of the runtime, and temporary value stack. The number of arguments popped by the function or pushed as return values is determined by the function itself. Where a function can take different numbers of arguments, there are separate handlers for each case. The called function will expect the parameter values to be present on the top of the stack in FILO order. OPCODE.CALLUSER { opcode: OPCODE.CALLUSER args: [] branch: reference to opcode object to branch to } Calls a user-defined Yabasic subroutine by creating and pushing a new cursor stack frame of Subroutine type starting at the branched opcode. The management of parameters is handled by the prelude of the called subroutine. The called subroutine will expect the parameter values to be present on the top of the stack in FILO order. When the called subroutine returns, its stack frame is destroyed and execution returns to the calling frame, resuming at the .next field of this opcode. OPCODE.GOSUB { opcode: OPCODE.GOSUB args: [] branch: reference to opcode object to branch to } Calls a GOSUB by creating and pushing a new cursor stack frame of Gosub type starting at the branched opcode. When the called GOSUB returns, its stack frame is destroyed and execution returns to the calling frame, resuming at the .next field of this opcode. OPCODE.GOTO { opcode: OPCODE.GOTO args: [] } GOTO has no effect. Its function is to be a no-op opcode object which passes execution to its .next field, which, during compilation, most likely will be set to an opcode derived from a Form not in the original linear flow. OPCODE.IFNOTGOTO { opcode: OPCODE.IFNOTGOTO args: [] branch: reference to opcode object to jump to on ( [top] == 0 ) next: reference to opcode object to jump to otherwise } A conditional GOTO. Pops the topmost value from the temporary value stack. If this value is numeric zero, the opcode cursor is set to .branch, else it is set to .next. If the value is not of numeric type, this is a runtime error. OPCODE.RETURN { opcode: OPCODE.RETURN args: [] } Destroys the topmost entry from the cursor frame stack (generated by GOSUB or CALLUSER), returning execution flow to the calling code. OPCODE.SCOPE { opcode: OPCODE.SCOPE args: [slot number to apply scope string to, scope string] } Applies the scoping label args[1] to slot number args[0] with respect to the current cursor stack state. The valid scoping labels are SCOPE_LOCAL and SCOPE_STATIC. (Default scope is SCOPE_GLOBAL, but you can't apply this.) OPCODE.DIM { opcode: OPCODE.DIM args: [slot number of array to re-DIM, number of dimensions] } Re-DIMs array args[0] by using the top-most args[1] values on the temporary value stack as dimension sizes. Pops the values used. Each Form in the Form graph needs to be converted into an opcode representation, and then these opcodes linked together to form an opcode graph which can then be quickly executed by the runtime. The function formArrayProduceInitialOpcodeSeries() traverses the graph of Forms from a given entry point and translates each Form into the necessary low-level opcode series needed to perform its function. This is executed against the main entry point of the program, and each subroutine, producing an graph of augmented Forms each containing their opcode equivalents. The function formProduceInitialOpcodeSeries() converts a single Form into its opcode equivalent. Each Form has a set algorithm for its conversion to an opcode series. Most opcode series produced by a Form are a simple linear progression, with a single opcode representing a branch to some other Form. FORM_NAME.LOCAL The Form LOCAL can have multiple arguments (LOCAL a, b, c). These symbolic arguments will have been augmented with their slot number during SymbolTable.addToken(). For each of the arguments, an OPCODE.SCOPE is produced that instructs the runtime to apply the appropriate scoping label to that variable slot in the current cursor stack frame. FORM_NAME.STATIC The Form STATIC can have multiple arguments (STATIC a, b, c). These symbolic arguments will have been augmented with their slot number during SymbolTable.addToken(). For each of the arguments, an OPCODE.SCOPE is produced that instructs the runtime to apply the appropriate scoping label to that variable slot in the current cursor stack frame. FORM_NAME.RESTORE After formArrayResolveRESTOREs(), RESTORE Forms will have an args array which contains a reference to a valid data item if the resolve succeeded, or an empty array if it failed. For a successful resolve, an OPCODE.RESTORE opcode is generated that executes a RESTORE to the data item referred to by args[0]. Otherwise, the following sequence is generated, which effects a runtime error where a message printed and the program terminates: OPCODE.VALUE ["Error: RESTORE to undefined label.\n"] OPCODE.EVAL [FORM_NAME.PRINT(1)] OPCODE.EVAL [FORM_NAME.END(0)] FORM_NAME.END_SUB The Form END_SUB is similar to a Form RETURN, but it results in a ValueTypeUnknown type value being pushed to the temporary value stack first. OPCODE.VALUE [ValueTypeUnknown] OPCODE.RETURN FORM_NAME.GOTO The Form GOTO directly maps to an opcode GOTO. At this point we directly place the target Form from the Form GOTO into the .next field of the opcode. This is resolved to an opcode target in a later stage. FORM_NAME.GOSUB The Form GOSUB directly maps to an opcode GOSUB. At this point we directly place the target Form from the Form GOSUB into the .branch field of the opcode. This is resolved to an opcode target in a later stage. In Yabasic, a GOSUB should eventually reach a RETURN to restore the previous cursor stack frame. As the RETURN keyword is shared with the return of a SUB call (the RETURN keyword with no argument returns from a GOSUB), the RETURN keyword in isolation will push an instance of ValueTypeUnknown to the temporary value stack. Therefore the GOSUB opcode will be followed by a POP opcode to remove the ValueTypeUnknown. FORM_NAME.ASSIGNMENT The function of ASSIGNMENT is to evaluate the right side of the equals sign and assign the result in the location designated by the left-hand-side expression to the left of the equals sign. The functions opcodesLeftEvaluateToken() and opcodesRightEvaluateToken() analyse the given token tree and produce opcode strategies that calculate the result of or assign to the expression represented by the token tree. The ASSIGNMENT Form results in an opcode sequence that performs the following actions: 1) Prepares the arguments necessary to assign to the left-hand-side expression. this includes actions such as calculating the values of the indices that appear within array locators. 2) Evalates the value of the right-hand-side expression. 3) Assigns the value of the top of the stack into the appropriate result store as indicated by the type of the base token in the left-hand-side token tree. For example: a(5) = b+c First, the value of the array locator is evaluated: OP_000002: (to OP_000003) OPCODE_VALUE #5 ; <-- #5 Second, the value of the right-hand-side expression is evaluated: OP_000003: (to OP_000004) OPCODE_LOAD [ 1] ; <-- b OP_000004: (to OP_000005) OPCODE_LOAD [ 2] ; <-- c OP_000005: (to OP_000006) OPCODE_EVAL evaluatePLUS_2 Finally, the store is performed: OP_000006: (to OP_000007) OPCODE_STOREARRAY [ 0], 1 FORM_NAME.READ The function of READ is to read the next value from the DataMaster and place the value into the location designated by each of the left-hand-side expressions that appear in its args array. For each argument, this functions similarly to ASSIGNMENT, except the evaluation of the right-hand-side expression is replaced with a READ opcode. 1) Prepares the arguments necessary to assign to the left-hand-side expression. this includes actions such as calculating the values of the indices that appear within array locators. 2) OPCODE.READ 3) Assigns the value of the top of the stack into the appropriate result store as indicated by the type of the base token in the left-hand-side token tree. For an argument list, this will READ into each designated location in turn. FORM_NAME.DIM The function of DIM is to (re-)allocate the given named arrays using the values within their arguments lists. The array size designators are stored in the form's arguments array as token trees. For each argument, the token tree will have a FUNCTION_CALL at its root (the syntax for specifying an array to be DIMmed is the same as a function call invocation). The arguments of the FUNCTION_CALL will be the array size designators as right-hand-side expressions. So, for each argument of the Form DIM: The token tree is examined with opcodesLeftEvaluateToken(). Two of the resulting opcode sequences are used: First, .prepare_arguments is used, which evaluates the values of all the expression token trees present as arguments to the root node of the token tree. (For a function call, this evaluates the values of all the function parameters prior to calling the function. For a DIM, the same sequence is used to determine the values of the locators.) Second, .dim_operation is used, which produces the necessary DIM opcode to initialise the array using the correct number of dimensions using the topmost values on the temporary value stack. FORM_NAME.RETURN The Form RETURN has a corresponding opcode RETURN. However, the opcode only destroys the topmost cursor frame stack entry. Upon destroying a cursor frame stack entry, it is expected that there will always be a return value present on the top of the temporary value stack. The translation from Form to opcode produces this value. For a RETURN Form with no argument, an OPCODE.VALUE is produced which pushes an instance of ValueTypeUnknown() to the top of the temporary value stack. This satisfies the cases where either a value-returning SUB, a non-value-returning SUB or a GOSUB could either use the same RETURN Form to return. For a RETURN Form with an argument, the .produce_result strategy of opcodesRightEvaluateToken() is used to calculate the value of the returned expression. After either the result or the ValueTypeUnknown has been produced, OPCODE.RETURN performs the return. FORM_NAME.IF_NOT_GOTO The Form IF_NOT_GOTO directly maps to an opcode IFNOTGOTO. At this point we directly place the target Form from the Form IF_NOT_GOTO into the .branch field of the IFNOTGOTO opcode. This is resolved to an opcode target in a later stage. The generated opcode sequence is as follows: A strategy for evaluating the argument of the Form as a right-hand-side expression is produced using opcodesRightEvaluteToken(). This is followed by the IFNOTGOTO opcode. FORM_NAME.EVALUATE_AND_DISCARD The Form EVALUATE_AND_DISCARD is similar to IF_NOT_GOTO and ASSIGNMENT in that a right-hand-side expression is evaluated. However, after evaluation the result of the calculation is immediately destroyed. Therefore, after the evaluation sequence is generated by opcodesRightEvaluateToken(), an OPCODE.POP is placed. FORM_NAME.PRINT The Form PRINT is a special Yabasic language feature due to the different ways it can be used. At this point, the PRINT Form will only contain an argument list of things that should be printed to the console. These need to be flattened to repeated calls to the evaluation_handler for PRINT. For each argument to the Form: a strategy for evaluating the value of the token tree is produced using opcodesRightEvaluateToken(), and this is followed by an OPCODE.EVAL calling the PRINT handler for one argument. FORM_NAME.INPUT The Form INPUT is a special Yabasic language feature due to... God knows. The arguments to the form INPUT are stored in the arguments array, this includes the optional string literal prompt test. The Form INPUT is first classified by whether or not the first argument is a STRING_LITERAL token. If so, then this should be printed to the console. Otherwise a "?" should be printed. For the first case, a sequence for evaluating the value of the first token is generated, otherwise the "?" is pushed onto the temporary value stack by OPCODE.VALUE. For each of the remaining arguments to Form INPUT, a sequence similar to READ is generated. However, the call to the evaluation_handler for READ is replaced with a call to the evaluation_handler for either INPUT$ or INPUT. The type of INPUT evaluation_handler required is inferred from the identifier used at the root of the assignee: if the assignment is to 'a', then a numeric value is required and the handler for INPUT is called. If the assignment is to 'b$', then a string value is required and the handler for INPUT$ is called. 1) Prepares the arguments necessary to assign to the left-hand-side expression. this includes actions such as calculating the values of the indices that appear within array locators. 2) OPCODE.EVAL [INPUT or INPUT$] 3) Assigns the value of the top of the stack into the appropriate result store as indicated by the type of the base token in the left-hand-side token tree. For all remaining Forms, no special cases are required. The following generic transformation is used for all other Forms: For each argument to the Form, generate an evaluation sequence using opcodesRightEvaluateToken(). This will result in a sequence that evaluates all arguments to the Form in turn: the temporary value stack will contain each of these results in rightmost-first order. For each Yabasic Form, there is a corresponding evaluation_handler: a JavaScript function which performs the necessary operation within the context of the runtime. References to these are stored in the evaluation_handler collection. When they are added, a reference to the function is stored, together with the number of arguments it expects. Each function can be accessed by a key which has the same name as the original Form. (So a function that handles SETRGB c,r,g,b will be referenced by 'SETRGB' and have four arguments.) An OPCODE.EVAL opcode is generated, with the args array containing the evaluation_handler corresponding to this Form's name and argument count as retrieved from the evaluation_handler collection. After all the opcodes for this Form have been generated, they need to be linked together into a linear sequence. Each opcode is considered in turn. If the opcode's next field is not already set (GOTO opcodes will already have a target), then the .next of the opcode is set to the next opcode in the sequence. The .next of the final opcode is set to a reference to the .next of the Form being transformed. The transformation is now complete: the Form now contains a reference to an array of opcodes in its .opcode_series field which implement its function. The entry point of the program is now the first opcode within the opcode series of the entry point Form. The function opcodeSeriesLinkThroughForms() links together the opcode series within the graph of Forms by replacing any target reference to a Form within .next or .branch fields with the corresponding initial opcode within the target Form. This produces a directed graph of opcodes, with a known entry point, that holds the full semantic content of the entered program. This graph is the final structure of the program. The compilation is complete! The MJS Yabasic runtime: With the ProgramInfo complete, the MJS Yabasic instantiates an ExecutionContext to hold the state of the runtime for the currently running program. Execution of MJS Yabasic programs is represented and controlled by an instance of ExecutionContext. ExecutionContext holds the state of a self-contained Yabasic runtime: variable contents, program counter, etc. The ExecutionContext holds the following in its fields: A reference to the MJSYabasicContext representing the MJS Yabasic instance on the page (the environment). A flag indicating whether the current program has been paused pending user keyboard input due to an execution of INKEY with timeout. (Need to describe how INKEY works, and input in general.) An instance of InputCharsBuffer which provides the parsing capability for INPUT. (Need to describe how INPUT works, the parser and the slicer.) The program start date. This is used to calculate the time elapsed since program start for TIME$. The ProgramInfo of the currently running program. A reference to the SymbolTable from within the ProgramInfo. A SymbolStore holding the values of variables in the base global cursor stack state. An array mapping subroutine handles to SymbolStores, holding the values of variables with STATIC scope associated with each subroutine. The cursor frame stack as an array of CursorFrameContext instances. The temporary value stack as an instance of ValueStack. Other parts of the runtime are considered to the part of the MJS Yabasic interface (the environment) and aren't recreated whenever a program execution begins. This includes an amount of persistent state relating to the Yabasic graphics model: Yabasic has two graphics buffers of size 640x512 pixels. These are labelled 0 and 1. There are two designated roles: the Draw buffer and the Disp (display) buffer. The Draw and Disp buffer roles can be assigned to graphics buffer 0 or 1; they can be the same buffer or different. The contents of the Disp buffer are shown on the screen directly. All draw commands affect the contents of the Draw buffer. There is no separate text mode as in PS2 Yabasic, text output is appeneded to the contents of the bottom console. At the beginning of the program, both buffers are completely black and the both the Disp and Draw buffers are set to 0. There are four colours, 0, 1, 2 and 3. These are 24-bit RGB colours with 8 bits per channel. At the beginning of the program, these are set to (0,0,0), (255,255,255), (255,0,0) and (0,255,0) (black, white, red and green). Most drawing commands are performed using the value of colour 0. Commands containing CLEAR are performed using the value of colour 1. GTRIANGLEs are drawn using colours 1, 2 and 3 for the first, second and third vertices. Yabasic has the concept of 'window origin', set using the WINDOW ORIGIN commands. The window origin dictates how the coordinate space maps onto the display buffer. The window origin is specified as a two character long string, with the first character indicating the origin for x coordinates, and the second character indicating the origin for y coordinates. The first character may be 'l' for left, 'c' for centre or 'r' for right; the second character may be 't' for top, 'c' for centre or 'b' for bottom. Any combination may be used, but the axes cannot be swapped. The default is 'lt', which specifies a coordinate space with 0, 0 in the upper left corner. If 'c' is used, for example in 'cc', then the origin is moved to half way across the display buffer in that direction: 'cc' specifies that objects drawn at 0, 0 be displayed at 320, 256 on the display buffer. If the 'r' or 'b' values are used, then the coordinate space is flipped: 'rb' indicates that 0, 0 should be placed in the lower right corner of the display buffer, with increasing values moving left and up. This will flip all primitives drawn to the Draw buffer. Note that this doesn't flip the rendering of text, but it does manipulate the start position. Similarly, there is a two character string named the 'text_align_poke' which affects how the given coordinate for TEXT commands maps to the display buffer. The characters available are the same as those for window origin. The string control what position along the 'text rectangle' the given coordinate maps to. The default value is 'lb', which indicates that the coordinates will be at the 'left bottom' of the drawn string: the string will be drawn to the right and above the given coordinates. If there are multiple lines, the topmost line will be moved up so that the lowermost line meets the given coordinate. If the value is changed to 'rt', then the coordinates will be at the 'right top' of the drawn string: the string will be drawn to the left and below the given coordinates. When 'cc' is used, the drawn string will only be offset by complete character cells. There is also a drawing 'pen' which holds the location of the last LINE drawing operation. This is used as the starting coordinate for subsequent LINE TO commands, and is set whenever a LINE or LINE TO command is executed. It begins in a 'lifted' state, represented by there being no coordinates stored. When its position is set, the pen is 'down', the coordinate values exist. A command which requires that the pen be 'down' to draw will have no visible effect, but will result in the pen being 'down'. bug that line_2 doesn't draw anything?!?!?! (How the program execution works.) Execution of a program begins at the first opcode in the opcode graph. Normal program flow is to load the program cursor with the opcode object referenced in the .next field. Non-linear control flow uses the .branch field to either change control flow within the current cursor stack frame, or create a new cursor stack frame at the given destination depending on the opcode. MJS Yabasic program execution proceeds by interpreting opcodes from the graph one at a time, the program cursor navigating the graph by following the .next field of each opcode, or the .branch field as directed by GOSUB, CALLUSER and IFNOTGOTO calls. As the browser model of JavaScript requires script execution to terminate to update the screen and handle input, a callback system is set up to trigger incremental execution of the Yabasic program in the form of a 'advance' of the ExecutionContext. Each 'advance' executes a small number of opcodes, before returning to the calling JavaScript environment. An 'advance' can be terminated by one of a number of reasons: * A maximum number of executed opcodes is reached. This is to prevent the browser from becoming unresponsive while program execution is caught inside long-running loops. * The environment detects that Select and Start on the simulated PlayStation 2 pad are being pressed simultaneously, or the Escape key on the keyboard is pressed. This interrupts execution by explicitly executing an END opcode to end execution 'normally'. * An executed opcode causes the program to pause or terminate normally. * The drawing condition. In PS2 Yabasic, if the current Draw buffer is the same as the current Disp buffer, drawing commands immediately affect the graphics display. This is simulated in MJS Yabasic by maintaining a counter of drawn primitives since the last visual update. When this counter is exceeds a predefined limit, no further opcodes are executed during this 'advance' and JavaScript execution flow returns from the callback, allowing the screen to update. This is a necessary but inexact simulation of the behaviour of PS2 Yabasic. If necessary, the limit can be decreased to 1 (to force the screen to update after every draw onto the Disp buffer) by checking the Single Buffer Compatibility Mode checkbox. * The INKEY$ handler is called to retrieve the value of the currently pressed key, no key is pressed and a timeout is set. This requires the runtime to yield completely to the calling environment in order to detect further keypresses. When this occurs, a blank string entry is pushed onto the top of the temporary value stack: this holds the value that would be returned from the INKEY$ call. During the construction of MJSYabasicContext, an instance of InkeyHandler is generated to respond to all keydown events: when INKEY$ is called with a timeout, the field .inkey_halt_on is set within the ExecutionContext, which allows the InkeyHandler to recognise keypresses occurring during an INKEY$-initiated timeout. When this happens, the INKEY$ string corresponding to the pressed key is determined. If the key is a valid INKEY$ key, the handler directly manipulates the top entry of the temporary value stack, resets the .inkey_halt_on flag, clears the timeout callback that would have occurred if no key were pressed, and manually triggers an 'advance'. * An exception is thrown by opcode execution. Runtime errors in opcode execution manifest as JavaScript executions. These aren't handled the simulator internally: they will rise to the 'advance' routine, be caught there, and cause the program to stop. When any condition causes the conclusion of an 'advance' without terminating the program fully, the 'advance' concludes by registering a new timeout to trigger the next frame. The duration of the timeout is calculated by subtracting the time elapsed as a result of the current 'advance' from the required interval. For commands such as WAIT, the interval is explicitly provided in the form of the argument. When the drawing condition is triggered, the synchronisation to a PAL vertical blank is simulated and the interval is set to 20ms (50Hz). For a well-designed program that uses the Disp and Draw buffers correctly, this ensures a smooth frame rate. If the opcode limit is triggered, the next 'advance' is scheduled instantaneously (yet asynchronously to allow for a display and UI update). In PS2 Yabasic the INPUT command uses a character buffer (for no other reason than to make things more complicated apparently). An INPUT command consists of an optional prompt followed by a series of variable left-hand-side expressions. When an INPUT command is received, the object is to place a value of the correct type (respecting the name of the identifier used) into the each of the left-hand-side expressions. The receiving and parsing of INPUT values is performed by an instance of InputCharsBuffer. InputCharsBuffer maintains a stream of characters input by the user in response to an INPUT command. The InputCharsBuffer can be requested to supply either a string or numeric value. In either case, it inspects the buffer, attempting to parse the largest amount of characters from the front of the buffer as an instance of the required type. If the buffer is empty at the time of the call, a prompt is raised to ask the user for text input (the prompt given to INPUT is displayed in the text console as it would be for PS2 Yabasic). For strings, the first quoted string is taken (quotes are stripped) if one is available, else the series of character up until the first space is returned. If there are no characters, the empty string is returned. For numbers, the greatest number of characters that represent a valid number is taken. When a valid series of character has been parsed, they are removed from the buffer, and it is trimmed. Subsequent requests for values will use the remaining characters in the input buffer before requesting more input from the user. This means that if the user enters a character sequence like "1 2 3 4 5", then requests five numeric values with INPUT, no further prompts will be displayed, regardless of how the value requests are spread across different commands. There is no way for the programmer to ensure that the input buffer is blank before requesting new data: the user's previously entered unused characters will be used first. describe how cursorframecontext works The MJS Yabasic runtime implements variable access using class SymbolStore. SymbolStore stores the values of all the variables (specifically in an array mapping variable slot integers to instances of ValueType) and array contents within a given scope. ExecutionContext contains a reference to a SymbolStore for variables within the global scope. ExecutionContext also contains an array holding a SymbolStore for each Subroutine handle within the program: these hold the values of the variables affected by the STATIC scope label within the subroutine. scoping labels live in teh symbol store and not the cursor frame context? Execution The ExecutionContext maintains execution program counter state in a stack of CursorFrameContext instances. The CursorFrameContext class represents the program execution information available from the perspective of the opcode cursor within its level of execution. New instances of CursorFrameContext are generated when execution branches such that it may return to this point: that is, as a result of an OPCODE.GOSUB or OPCODE.CALLUSER. Each CursorFrameContext holds a reference to the next opcode to be executed in its .program_counter field, and references to SymbolStore instances for the STATIC, LOCAL and (implicit) GLOBAL scopes. describe how the stack works and the purpose of it, shadowing earlier things and using the topmost designated symbolstores and stuff like that When program execution begins, a CursorFrameContext is constructed with GOSUB context. Any attempt to access STATIC or LOCAL variables in this outermost CursorFrameContext (or, more correctly, to designate variables as either LOCAL or STATIC) is a runtime error. A CursorFrameContext can be constructed either within the context of a GOSUB command, which manipulates the program flow without affecting the interpretation of any scoping labels; or a CALLUSER command, which manipulates the program flow and opens a new context stack frame within the context of a subroutine call. The two cases are differentiated by the last argument of the CursorFrameContext constructor, which takes a handle to Subroutine if this is CALLUSER-context invocation or null if this is a GOSUB-context invocation. The GOSUB-context constructor creates a new CursorFrameContext starting execution at the designated opcode. It assumes the STATIC and LOCAL SymbolStore references from its parent (if possible). This means that if a called SUB invokes a GOSUB to change the program cursor position, it will continue to maintain the STATIC and LOCAL assignments it had previously. The CALLUSER-context constructor creates a new CursorFrameContext starting execution at the designated opcode. It assumes the STATIC SymbolStore reference appropriate to the designated Subroutine, and creates a new SymbolStore for LOCAL assignments. This means that all STATIC labelled variable accesses will now refer to variables associated with the Subroutine, and all LOCAL labelled variable accesses will refer to variables that exist only for the duration of the execution of the subroutine. When an OPCODE.RETURN is executed, the topmost entry in the cursor frame stack is discarded. This has the effect of returning program execution to the next opcode that follows the GOSUB or CALLUSER opcode that caused execution to branch, destroying all the variables that were designated LOCAL if this was the CALLUSER entry that caused them to be generated: a GOSUB within a CALLUSER manifests as two consecutive entries in the cursor frame stack, both referring to the same SymbolStore for their LOCAL variables. The SymbolStore is only discarded when the covering GOSUBs are destroyed followed by the CALLUSER, returning execution from the users Yabasic SUB. Scalar variable retrieval is performed by the OPCODE.LOAD opcode. This opcode contains the variable slot number to load in args[0], and returns with the value of this variable on the top of the temporary value stack. The opcode interpreter is a wrapper around CursorFrameContext.retrieveVariableScalar(). The method retrieveVariableScalar works by first determining the correct SymbolStore to interrogate for the variable using CursorFrameContent.determineSymbolStoreForScalar(). If a user SUB has been called at any point in the past, .subroutine_local_symbol_store will be set within the current CursorFrameContext. This symbol store contains a mapping for all variable slots to scoping labels: i.e. a variable's scoping label is determined by the scoping label applied to it within the context of the most recently called user SUB. If a variable does not have a scoping label within the context of the most recently called user SUB, the variable is global, and therefore uses the global scope symbol store. This means that a variable declared local within a SUB loses its label temporarily if another variable of the same name is referred to within the sub-SUB: the use of the variable 'x' within a SUB using 'x' as a temporary variable for a loop is distinct from the variable 'x' declared as LOCAL (explicitly or implicitly from the SUB parameter list) within any of the calling SUBs preceding it in the call stack. Having determined the correct instance of SymbolStore to manipulate to retrieve the value of the variable, .retrieveVariableScalar() inspects it to determine if the variable exists within the SymbolStore's .variable_store field; this is simply an array mapping variable slots to instances of ValueType. .variable_store is not automatically initialised to contain instances of ValueType: in fact, the absence of a value ('undefined' JS retrieval) reflects the lack of an assignment to this variable in the given context. If the storage object exists, the .type and .value of the stored ValueType can be copied to the prepared ValueType on the top of the value stack (these are JS primitive fields (well, the .value might be a string but there's nothing I can do about that), so should be a LOT faster than assigning the ValueType itself). Otherwise, the 'value of the missing variable' needs to be implicitly computed at run-time. If the variable does not share a name with one of the magic variable-like keywords, then a default value and type can be assigned. For strings, this is .type = ValueType.VALUE_TYPE_NUMERIC, .value = "". For numerics, this is .type = ValueType.VALUE_TYPE_NUMERIC, .value = 0. There are several magic variables in the Yabasic language that act as a cross between variables and keywords: they can be evaluated and they will automatically contain a value (which may possibly change over time autonomously). These include DATE$, TIME$, PI and EULER. When a value is assigned to one of these variables, it loses its magic and begins to act as an ordinary variable. Until then, when the value of one of these variables is inspected, the magic value of the variable is returned. Scalar variable assignment is performed by the OPCODE.STORE opcode. This opcode contains the variable slot number to store in args[0], and gathers the value to store by popping the top of the temporary value stack. The opcode interpreter is a wrapper around CursorFrameContext.storeVariableScalar(). This method is the dual of CursorFrameContext.retrieveVariableScalar() and works in much the same way. Once the correct SymbolStore for the variable access has been determined, the ValueType field values popped from the top of the temporary value stack are stored into the appropriate storage object in the SymbolStore (one may be created for this purpose if one does not already exist). The act of assigning to a magic variable creates the storage location for its variable slot, preventing it from acting magic when retrieved in this context in the future. Array variable access works by a similar mechanism to scalar variable access, with the extra step of determining the array locator. In MJS Yabasic (unlike proper Yabasic), arrays are defined to always be in GLOBAL scope: determineSymbolStoreForArray() always returns a reference to the global scope SymbolStore. Before an array can be used, its dimensions must first be established with an OPCODE.DIM invocation. The function of OPCODE.DIM is to initialise (or re-initialise) the array at a given variable slot with the dimensions supplied as arguments. OPCODE.DIM stores the variable slot to be used in args[0] and the number of dimensions of the array in args[1]. When an array is DIMmed with OPCODE.DIM, SymbolStore.DIMArray() is called against the relevant SymbolStore (this is always the global scope in MJS Yabasic). This method takes as arguments the variable slot of the array to be DIMmed, the number of dimensions, a reference to the stack and a reference to the SymbolStore to use. DIMArray creates a new entry object in the array_store of the SymbolStore passed to it. The entry object is a basic JavaScript object containing the dimensions of the DIMmed array as an array of integer values in .dimensions and the values contained in the array as a one-dimensional integer-keyed array in .values. Subsequent calls against the same variable slot will replace the contents of the entry object, obliterating the previous Yabasic array and replacing it with a new, freshly-DIMmed one. For array variable access: After determining the SymbolStore, the correct storage object to retrieve the values from or assign the values into must be determined by SymbolStore.locateVariableArray(). This method takes the variable slot from the array name, the number of dimensions, and references to the temporary value stack and SymbolTable. The tuple of values that references a slot within a multi-dimensional array is referred to as the 'locator'. The contents of arrays in MJS Yabasic are stored one-dimensionally: the locateVariableArray() method must collapse the multi-dimensional locator into a one-dimensional index number to be used as an index to entry_object.values[]. The locator is converted into a one-dimensional index number by considering the topmost 'dimensions' values from the temporary value stack, multiplying by the place-value of the appropriate axis and summing them. For example, an array that is dimmed by DIM a(2,3) will have two dimensions, one of size 3 and one of size 4. This is because Yabasic arrays extend from zero to the given number inclusive. We consider the twelve elements of this array to begin at (0,0), then (0,1), (0,2), (1,0), (1,1), (1,2), (2,0)... all the way to (2,2), (2,3). The rightmost number is the least significant digit, so each increment increases the one-dimensional index number by one. The leftmost number is the most significant digit of a number whose dimensionality is the product of the sizes of each of the preceding axes. Each increment of the leftmost locator value in an array of DIM a(2,3) is three (the size of the product of the dimensions to the right of this place). Given that the array locator values will be present on the temporary value stack in rightmost-first order (since they are evaluated left-to-right), the least significant contributor to the index is calculated first, with the product accumulating multiplicands as it moves left, popping values from the stack. The final one-dimensional index number is the sum of the contributions from all axes. With the index number established, the .values array for this entry can be inspected. If a ValueType storage location exists at this position, a reference to it is returned, else one is created in this position and this reference returned. Assignment then works as per scalar variables. The EVAL opcode is the sole interface between the simulated Yabasic environment and the JavaScript environment. All drawing operations occur through this opcode. The EVAL opcode has the format: {args: [Reference to JavaScript function]} The handler is called as a function with arguments of this (the ExecutionContext running the opcode), and the temporary value stack. Each handler is set to consume a set amount of arguments from the temporary value stack and return possibly push a result. During compilation, the correct form of the handler is selected from the available combinations. This prevents Yabasic command calls with the incorrect number of arguments at compile-time. Temporary stack value underrun should be impossible during normal operation as the compilation process for a command such as SETRGB c,r,g,b will first evaluate all four arguments in left-to-right order, leaving the values on the temporary stack ready to be read in a right-to-left order. An underrun indicates a fault in the compiler. Each JavaScript handler performs type checking on the ValueType instances retrieved from the stack. Attempting to pass a string type to a wholly numeric command such as SETRGB. MJS Yabasic does not perform static type checking, and in fact allows the storage of string types in unsuffixed variables and numeric types in dollar-suffixed variables. Once the ValueTypes have been retrieved, the action of the handler is performed by a JavaScript function. (Interface through the .environment. ?) The graphics screen of MJS Yabasic is produced using a CANVAS element, generated within the MJSYabasicContext constructor and attached to the main element of the MJS Yabasic instance. Colours are handled internally as 32-bit values that can be written to a 32-bit graphics buffer. The endianness of the host system is determined by inspecting the byte order of a four byte ArrayBuffer, using an instance of Uint32Array as a view to insert a four byte integer value and an instance of Uint8ClampedArray to inspect the byte order. The two graphics buffers are instances of ImageData produced by calling createImageData on the 2D graphics context of the CANVAS element. Each buffer has a Uint8ClampedArray and Uint32Array view created to manipulate its content. Colour values are written into the ImageData object through the Uint32Array, which allows 32-bit complete colour values to written directly with an array access. To show the contents of the ImageData on screen, putImageData is called against the 2D graphics context of the CANVAS element to cause the ImageData of the required buffer to be written to the active display. (Drawing primitives?)