Editing MUMPS (section)

== Design ==
{{Main|MUMPS syntax}}
{{how-to|section|date=February 2022}}

=== Overview ===

MUMPS is a language intended for and designed to build database applications. Secondary language features were included to help programmers make applications using minimal computing resources. The original implementations were [[interpreter (computing)|interpreted]], though modern implementations may be fully or partially [[compiler|compiled]]. Individual "programs" run in memory [[memory management (operating systems)#Partitioned allocation|"partitions"]]. Early MUMPS memory partitions were limited to 2048 bytes so aggressive abbreviation greatly aided multi-programming on severely resource limited hardware, because more than one MUMPS job could fit into the very small memories extant in hardware at the time. The ability to provide multi-user systems was another language design feature. The word "'''M'''ulti-'''P'''rogramming" in the acronym points to this. Even the earliest machines running MUMPS supported multiple jobs running at the same time. With the change from mini-computers to micro-computers a few years later, even a "single user PC" with a single 8-bit CPU and 16K or 64K of memory could support multiple users, who could connect to it from (non-[[GUI|graphical]]) [[computer terminal|video display terminals]].

Since memory was tight originally, the language design for MUMPS valued very terse code. Thus, every MUMPS command or function name could be abbreviated from one to three letters in length, e.g. {{mono|Quit}} (exit program) as {{mono|Q}}, {{mono|$P}} = {{mono|$Piece}} function, {{mono|R}} = {{mono|Read}} command, {{mono|$TR}} = {{mono|$Translate}} function. Spaces and end-of-line markers are significant in MUMPS because line scope promoted the same terse language design. Thus, a single line of program code could express, with few characters, an idea for which other programming languages could require 5 to 10 times as many characters. Abbreviation was a common feature of languages designed in this period (e.g., [[FOCAL-69]], early BASICs such as [[Tiny BASIC]], etc.). An unfortunate side effect of this, coupled with the early need to write minimalist code, was that MUMPS programmers routinely did not comment code and used extensive abbreviations. This meant that even an expert MUMPS programmer could not just skim through a page of code to see its function but would have to analyze it line by line.

Database interaction is transparently built into the language. The MUMPS language provides a [[hierarchical database model|hierarchical database]] made up of [[persistence (computer science)|persistent]] [[sparse array]]s, which is implicitly "opened" for every MUMPS application. All variable names prefixed with the caret character ({{code|^}}) use permanent (instead of RAM) storage, will maintain their values after the application exits, and will be visible to (and modifiable by) other running applications. Variables using this shared and permanent storage are called ''Globals'' in MUMPS, because the scoping of these variables is "globally available" to all jobs on the system. The more recent and more common use of the name "global variables" in other languages is a more limited scoping of names, coming from the fact that [[scope (computer science)|unscoped variables]] are "globally" available to any programs running in the same process, but not shared among multiple processes. The MUMPS Storage mode (i.e. globals stored as persistent sparse arrays), gives the MUMPS database the characteristics of a [[document-oriented database]].<ref>{{cite web |url=http://gradvs1.mgateway.com/download/extreme1.pdf |title=Extreme Database programming with MUMPS Globals |publisher=Gradvs1.mjgateway.com |access-date=2013-08-13}}</ref>

All variable names which are not prefixed with caret character ({{code|^}}) are temporary and private. Like global variables, they also have a hierarchical storage model, but are only "locally available" to a single job, thus they are called "locals". Both "globals" and "locals" can have child nodes (called ''subscripts'' in MUMPS terminology).  Subscripts are not limited to numerals—any [[ASCII]] character or group of characters can be a subscript identifier. While this is not uncommon for modern languages such as Perl or JavaScript, it was a highly unusual feature in the late 1970s. This capability was not universally implemented in MUMPS systems before the 1984 ANSI standard, as only canonically numeric subscripts were required by the standard to be allowed.<ref>{{cite web |url=http://71.174.62.16/Demo/AnnoStd?Frame=Main&Page=a202005&Edition=1977 |title=The Annotated M[UMPS&#93; Standards |publisher=71.174.62.16 |date=2011-11-29 |access-date=2013-08-12}}</ref> Thus, the variable named 'Car' can have subscripts "Door", "Steering Wheel", and "Engine", each of which can contain a value and have subscripts of their own.  The variable {{code|^Car("Door")}} could have a nested variable subscript of "Color" for example. Thus, you could say

<syntaxhighlight lang="text">
SET ^Car("Door","Color")="BLUE"
</syntaxhighlight>

to modify a nested child node of {{code|^Car}}. In MUMPS terms, "Color" is the 2nd subscript of the variable {{code|^Car}} (both the names of the child-nodes and the child-nodes themselves are likewise called subscripts). Hierarchical variables are similar to objects with properties in many [[object-oriented programming|object-oriented]] languages. Additionally, the MUMPS language design requires that all subscripts of variables are automatically kept in sorted order. Numeric subscripts (including floating-point numbers) are stored from lowest to highest. All non-numeric subscripts are stored in alphabetical order following the numbers. In MUMPS terminology, this is ''canonical order''. By using only non-negative integer subscripts, the MUMPS programmer can emulate the [[array data type|arrays]] data type from other languages. Although MUMPS does not natively offer a full set of [[database management system|DBMS]] features such as mandatory schemas, several DBMS systems have been built on top of it that provide application developers with flat-file, relational, and [[network database]] features.

Additionally, there are built-in operators which treat a delimited string (e.g., [[comma-separated values]]) as an array. Early MUMPS programmers would often store a structure of related information as a delimited string, parsing it after it was read in; this saved disk access time and offered considerable speed advantages on some hardware.

MUMPS has no data types. Numbers can be treated as strings of digits, or strings can be treated as numbers by numeric operators (''coerced'', in MUMPS terminology). Coercion can have some odd side effects, however. For example, when a string is coerced, the parser turns as much of the string (starting from the left) into a number as it can, then discards the rest.  Thus the statement <code>IF 20<"30 DUCKS"</code> is evaluated as <code>TRUE</code> in MUMPS.

Other features of the language are intended to help MUMPS applications interact with each other in a multi-user environment. Database locks, process identifiers, and [[atomicity (database systems)|atomicity]] of database update transactions are all required of standard MUMPS implementations.

In contrast to languages in the C or [[Niklaus Wirth#Programming languages|Wirth]] traditions, some space characters between MUMPS statements are significant. A single space separates a command from its argument, and a space, or newline, separates each argument from the next MUMPS token. Commands which take no arguments (e.g., <code>ELSE</code>) require two following spaces. The concept is that one space separates the command from the (nonexistent) argument, the next separates the "argument" from the next command. Newlines are also significant; an <code>IF</code>, <code>ELSE</code> or <code>FOR</code> command processes (or skips) everything else till the end-of-line. To make those statements control multiple lines, you must use the <code>DO</code> command to create a code block.

=== Hello, World! example ===

A simple [["Hello, World!" program]] in MUMPS might be:

<syntaxhighlight lang="text">
  write "Hello, World!",!
  
</syntaxhighlight>

and would be run with the command <code>do ^hello</code>  after it has been saved to disk. For direct execution of the code a kind of  "label" (any alphanumeric string) on the first position of the program line is needed to tell the mumps interpreter where to start execution. Since MUMPS allows commands to be strung together on the same line, and since commands can be abbreviated to a single letter, this routine could be made more compact:

<syntaxhighlight lang="text">
w "Hello, World!",! 
</syntaxhighlight>

The '<code>,!</code>' after the text generates a newline. This code would return to the prompt.

=== Features ===

ANSI X11.1-1995 gives a complete, formal description of the language; an annotated version of this standard is available online.<ref>{{cite web|url=http://71.174.62.16/Demo/AnnoStd|title=The Annotated M[UMPS] Standards|website=71.174.62.16|access-date=26 February 2018}}</ref>

Language features include:

; Data types : There is one universal [[data type]], which is implicitly [[type conversion|coerced]] to string, integer, or floating-point data types as context requires.

; Booleans {{nobold|(called ''truthvalues'' in MUMPS)}}: In IF commands and other syntax that has expressions evaluated as conditions, any string value is evaluated as a numeric value and, if that is a nonzero value, then it is interpreted as True. <code>a<b</code> yields 1 if a is less than b, 0 otherwise.

; Declarations : None. All variables are dynamically created at the first time a value is assigned.

; Lines : are important syntactic entities, unlike their status in languages patterned on C or Pascal. Multiple statements per line are allowed and are common. The scope of any {{mono|IF}}, {{mono|ELSE}}, and {{mono|FOR}} command is "the remainder of current line."

; Case sensitivity : Commands and intrinsic functions are case-insensitive. In contrast, variable names and labels are case-sensitive.  There is no special meaning for upper vs. lower-case and few widely followed conventions. The percent sign (%) is legal as first character of variables and labels.

; Postconditionals : execution of almost any command can be controlled by following it with a colon and a truthvalue expression. <code>SET:N<10 A="FOO"</code>  sets A to "FOO" if N is less than 10; <code>DO:N>100 PRINTERR,</code> performs PRINTERR if N is greater than 100. This construct provides a conditional whose scope is less than a full line.

; Abbreviation : You can abbreviate nearly all commands and native functions to one, two, or three characters.

; Reserved words : None.  Since MUMPS interprets source code by context, there is no need for reserved words. You may use the names of language commands as variables, so the following is perfectly legal MUMPS code:

:<syntaxhighlight lang="text">
GREPTHIS()
       NEW SET,NEW,THEN,IF,KILL,QUIT SET IF="KILL",SET="11",KILL="11",QUIT="RETURN",THEN="KILL"
       IF IF=THEN DO THEN
       QUIT:$QUIT QUIT QUIT ; (quit)
THEN  IF IF,SET&KILL SET SET=SET+KILL QUIT
</syntaxhighlight>

:MUMPS can be made more obfuscated by using the contracted operator syntax, as shown in this terse example derived from the example above:

:<syntaxhighlight lang="text">
GREPTHIS()
       N S,N,T,I,K,Q S I="K",S="11",K="11",Q="R",T="K"
       I I=T D T
       Q:$Q Q Q
T  I I,S&K S S=S+K Q
</syntaxhighlight>

; Arrays : are created dynamically, stored as [[B-tree]]s, are [[sparse matrix|sparse]] (i.e. use almost no space for missing nodes), can use any number of subscripts, and subscripts can be strings or numeric (including floating point). Arrays are always automatically stored in sorted order, so there is never any occasion to sort, pack, reorder, or otherwise reorganize the database. Built-in functions such as {{mono|$DATA}}, {{mono|$ORDER}}, {{mono|$NEXT}}(deprecated), and {{mono|$QUERY}} functions provide efficient examination and traversal of the fundamental array structure, on disk or in memory.

:<syntaxhighlight lang="text">
for i=10000:1:12345 set sqtable(i)=i*i
set address("Smith","Daniel")="dpbsmith@world.std.com"
</syntaxhighlight>

; Local arrays : variable names not beginning with caret (i.e. "^") are stored in memory by process, are private to the creating process, and expire when the creating process terminates. The available storage depends on implementation. For those implementations using partitions, it is limited to the partition size (a small partition might be 32K). For other implementations, it may be several megabytes.

; Global arrays : <code>^abc, ^def</code>. These are stored on disk, are available to all processes, and are persistent when the creating process terminates. Very large globals (for example, hundreds of gigabytes) are practical and efficient in most implementations. This is MUMPS' main "database" mechanism. It is used instead of calling on the operating system to create, write, and read files.

; Indirection : in many contexts, <code>@VBL</code> can be used, and effectively substitutes the contents of VBL into another MUMPS statement. <code>SET XYZ="ABC" SET @XYZ=123</code> sets the variable ABC to 123. <code>SET SUBROU="REPORT" DO @SUBROU</code> performs the subroutine named REPORT. This substitution allows for [[lazy evaluation]] and late binding as well as effectively the operational equivalent of "pointers" in other languages.

; Piece function : This breaks variables into segmented pieces guided by a user specified separator string (sometimes called a "delimiter"). Those who know [[awk]] will find this familiar. <code>$PIECE(STRINGVAR,"^",3)</code> means the "third caret-separated piece of {{mono|STRINGVAR}}." The piece function can also appear as an assignment (SET command) target.

:<code>$PIECE("world.std.com",".",2)</code> yields {{samp|std}}.

:After
:<syntaxhighlight lang="text">
SET X="dpbsmith@world.std.com"
</syntaxhighlight>

:<code>SET $P(X,"@",1)="office"</code> causes X to become "office@world.std.com" (note that {{mono|$P}} is equivalent to {{mono|$PIECE}} and could be written as such).

; Order function : This function treats its input as a structure, and finds the next index that exists which has the same structure except for the last subscript. It returns the sorted value that is ordered after the one given as input. (This treats the array reference as a content-addressable data rather than an address of a value.)

:<syntaxhighlight lang="text">
Set stuff(6)="xyz",stuff(10)=26,stuff(15)=""
</syntaxhighlight>

:<code>$Order(stuff(""))</code> yields {{samp|6}}, <code>$Order(stuff(6))</code> yields {{samp|10}}, <code>$Order(stuff(8))</code> yields {{samp|10}}, <code>$Order(stuff(10))</code> yields {{samp|15}}, <code>$Order(stuff(15))</code> yields {{samp|""}}.

:<syntaxhighlight lang="text">
Set i="" For  Set i=$O(stuff(i)) Quit:i=""  Write !,i,10,stuff(i)
</syntaxhighlight>

:Here, the argument-less {{mono|For}} repeats until stopped by a terminating {{mono|Quit}}.  This line prints a table of {{mono|i}} and {{code|stuff(i)}} where {{mono|i}} is successively 6, 10, and 15.

:For iterating the database, the Order function returns the next key to use.

:<syntaxhighlight lang="text">
GTM>S n=""
GTM>S n=$order(^nodex(n))
GTM>zwr n
n=" building"
GTM>S n=$order(^nodex(n))
GTM>zwr n
n=" name:gd"
GTM>S n=$order(^nodex(n))
GTM>zwr n
n="%kml:guid"
</syntaxhighlight>

MUMPS supports multiple simultaneous users and processes even when the underlying operating system does not (e.g., [[MS-DOS]]).  Additionally, there is the ability to specify an environment for a variable, such as by specifying a machine name in a variable (as in <code>SET ^|"DENVER"|A(1000)="Foo"</code>), which can allow you to access data on remote machines.

=== Criticism ===
{{criticism section|date=February 2022}}

Some aspects of MUMPS syntax differ strongly from that of more modern languages, which can cause confusion, although those aspects vary between different versions of the language. On some versions, whitespace is not allowed within expressions, as it ends a statement: <code>2 + 3</code> is an error, and must be written <code>2+3</code>. All operators have the same precedence and are [[operator associativity|left-associative]] (<code>2+3*10</code> evaluates to 50). The operators for "less than or equal to" and "greater than or equal to" are <code>'></code> and <code>'<</code> (that is, the Boolean negation operator <code>'</code> plus a strict comparison operator in the opposite direction), although some versions allow the use of the more standard <code><=</code> and <code>>=</code> respectively. Periods (<code>.</code>) are used to indent the lines in a DO block, not whitespace. The ELSE command does not need a corresponding IF, as it operates by inspecting the value in the built-in system variable <code>$test</code>.

MUMPS [[scope (computer science)|scoping]] rules are more permissive than other modern languages. Declared local variables are scoped using the stack. A routine can normally see all declared locals of the routines below it on the call stack, and routines cannot prevent routines they call from modifying their declared locals, unless the caller manually creates a new stack level (<code>do</code>) and aliases each of the variables they wish to protect (<code>. new x,y</code>) before calling any child routines. By contrast, undeclared variables (variables created by using them, rather than declaration) are in scope for all routines running in the same process, and remain in scope until the program exits.

Because MUMPS database references differ from internal variable references only in the caret prefix, it is dangerously easy to unintentionally edit the database, or even to delete a database "table".{{r|Richmond_1984_Thesis}}