The syntax of a programming language is like the grammatical rules of the programming language. These rules are often specified by some form of meta syntax. One commonly used meta syntax is called BNF (Backus-Naur form).
The syntax of C++ is intermediate compared to other programming languages. A plain BNF description of C++ can be difficult to follow. However, a BNF description that makes use of hyperlinks is considerably easier to follow. Fortunately, Alessio Marchetti has already created a hyperlinked BNF description of the syntax of C++.
Each syntactic rule (also called a “production”) consists of two parts:
Let us consider the following example. An iteration-statement is describe as follows:
token | expansion |
---|---|
iteration-statement | while ( condition ) statement do statement while ( expression ); for ( for-init-statmentconditionopt ; expressionopt ) statement for ( for-range-declaration : for-range-initializer ) statement |
In this syntactic rule, the token iteration-statement
has four alternative expansions as each line of the “expansion” column is one alternative. In each expansion alternative, anything that is boldface must be entered verbatim, while anything not in boldface is a token with its own expansion. The subscript “opt” designates the token to be optional. Optional means the same as zero or one occurrence.
Essentially, a token (italicized and not boldfaced) is a placeholder. A placeholder can be holding a space (a position in a sequence) for nothing or many components. A boldfaced item is also known as a “terminal”, a terminal specifies the text expected verbatim.
BNF can be used to express the repetition of a pattern. Let us consider the syntactic rule of a statement-seq
(statement sequence):
token | expansion |
---|---|
statement-seq | statement statement-seq statement |
This shows that there are two alternatives to expand a statement-seq
token:
statement
: the first alternative is a simpler token, a single statement.statement-seq statement
: the second alternative specifies a statement sequence, followed by a single statement.Let us consider an example that does not relate to a complex programming language. For brevity, we will use the following notation:
token1
::= token2
blah
The above example is a rule to expand token1
to token2
followed by the word “blah” verbatim. You can consider the symbol ::= to mean “can expand to”.
With this notation, now we define the following rules:
friend
::= Alifriend
::= Johnfriend
::= Changfriends
::= friend
friends
::= friends
and friend
Note that R1, R2, and R3 are alternatives to expand friend
, while R4 and R5 are alternatives to expand friends
. A more concise way to represent the same set of rules is as follows:
friend
::= Ali | John | Changfriends
::= friend
| friends
and friend
In the concise representation, the vertical bar symbol |
is used to separate the alternatives to expand the token on the left-hand side of the “::=” symbol.
Let us consider how the sentence “John and Ali and Chang” is considered syntactically correct as the token friends
.
friend
.friends
token.friend
token.friend
token.friends
token recognized in step 2, a verbatim “and”, and a friend
token recognized in step 4. As R5 fires, now we have a new friends
token recognized for the partial text of “John and Ali”.friend
token.friend
token.friends
token (corresponding to “John and Ali”), a verbatim word “and”, and also a friend
token corresponding to “Change”. We now have another friends
token recognized to represent the entire text of “John and Ali and Chang”.Graphically, we can represent it as follows:
flowchart LR
john[John]
and1[and]
ali[Ali]
and2[and]
chang[Chang]
friend1(friend)
friends1(friends)
friend2(friend)
friends2(friends)
friend3(friend)
friends3((friends))
john-->and1
and1-->ali
ali-->and2
and2-->chang
john-.-> friend1
friend1-.-> friends1
ali-.->friend2
friends1-.->friends2
and1-.->friends2
friend2-.->friends2
chang-.->friend3
friends2-.->friends3
and2-.->friends3
friend3-.->friends3
In this diagram, solid arrows indicate the flow in the text to be processed, “John and Ali and Chang”. Dotted arrows indicate the recognition of a token (think of these arrows as “reversed expansion”).
The following is generated by ChatGPT.
*statement-seq* ::= *statement* | *statement-seq statement*The second alternative allows *statement-seq* to repeat by including itself in the expansion.
*digit* ::= "0" | "1" | "2" | ... | "9"
, how can you express a non-negative integer?*integer* ::= *digit* | *integer digit*This allows one or more digits to form a number like "123".
*friends*
?friends = friend friends = friends + " and " + friendThis is similar to defining a sequence recursively, where the base case is a single friend, and the recursive case adds "and friend" to the existing sequence.