ANTLR
Reference Manual
Credits
Project Lead
Terence Parr
Support from
jGuru.com
Your View of the Java Universe
Help with initial coding
John Lilly, Empathy Software
C++ code generator by
Peter Wells and Ric Klaren
C# code generation by
Micheal Jordan, Kunle Odutola and Anthony Oguntimehin.
Infrastructure support from Perforce:
The world's best source code control system
Substantial intellectual effort donated by
Loring Craymer
Monty Zukowski
Jim Coker
Scott Stanchfield
John Mitchell
Chapman Flack (UNICODE, streams)
Source changes for Eclipse and NetBeans by
Marco van Meegen and Brian Smith
ANTLR Version 2.7.2 January 19, 2003
How ANTLR in Installed in Computer Science
Like a lot of Java based tools, ANTLR tells you to change your classpath globally to include the ANTLR support stuff. But global variables are terrible, so in Computer Science, I have set up an antlr command which can be used in three ways, and which avoids classpath problems:
How to use antlr The official version export CLASSPATH=.:/usr/local/src/antlr-2.7.2/antlr.jar antlr t.g java antlr/Tool t.g (to compile the grammar) antlr *.java javac *.java (to compile the classes) antlr Main java Main (to run a program)
For this to work, the grammar file must end in .g. The official version still works, if you really want to do it that way. Here is the examples directory from the source:
examples
Ian Holyer
What's ANTLR
ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C++, or C# actions [You can use PCCTS 1.xx to generate C-based parsers].
Computer language translation has become a common task. While compilers and tools for traditional computer languages (such as C or Java) are still being built, their number is dwarfed by the thousands of mini-languages for which recognizers and translators are being developed. Programmers construct translators for database formats, graphical data files (e.g., PostScript, AutoCAD), text processing files (e.g., HTML, SGML). ANTLR is designed to handle all of your translation tasks.
Terence Parr has been working on ANTLR since 1989 and, together with his colleagues, has made a number of fundamental contributions to parsing theory and language tool construction, leading to the resurgence of LL(k)-based recognition tools.
Here is a chronological history and credit list for ANTLR/PCCTS.
Check out Getting started for a list of tutorials and get your questions answered at the ANTLR FAQ at jguru.com
See also http://www.ANTLR.org and glossary.
If you are looking for the previous main version (PCCTS 1.33) of ANTLR rather than the new Java-based version, see Getting started with PCCTS.
Download ANTLR 2.7.2.
- Meta-Language Vocabulary
- Header Section
- Parser Class Definitions
- Lexical Analyzer Class Definitions
- Tree-parser Class Definitions
- Options Section
- Tokens Section
- Grammar Inheritance
- Rule Definitions
- Atomic Production elements
- Simple Production elements
- Production Element Operators
- Token Classes
- Predicates
- Element Labels
- EBNF Rule Elements
- Interpretation Of Semantic Actions
- Semantic Predicates
- Syntactic Predicates
- ANTLR Meta-Language Grammar
- Lexical Rules
- Predicated-LL(k) Lexing
- Keywords and literals
- Common prefixes
- Token definition files
- Character classes
- Token Attributes
- Lexical lookahead and the end-of-token symbol
- Scanning Binary Files
- Scanning Unicode Characters
- Manipulating Token Text and Objects
- Filtering Input Streams
- ANTLR Masquerading as SED
- Nongreedy Subrules
- Greedy Subrules
- Nongreedy Lexer Subrules
- Limitations of Nongreedy Subrules
- Lexical States
- The End Of File Condition
- Case sensitivity
- Ignoring whitespace in the lexer
- Tracking Line Information
- Tracking Column Information
- But...We've Always Used Automata For Lexical Analysis!
- What's a tree parser?
- What kinds of trees can be parsed?
- Tree grammar rules
- Transformations
- Examining/Debugging ASTs
- Introduction
- Pass-Through Token Stream
- Token Stream Filtering
- Token Stream Splitting
- Token Stream Multiplexing (aka "Lexer states")
- The Future
- Introduction
- Grammar Inheritance and Vocabularies
- Recognizer Generation Order
- Tricky Vocabulary Stuff
- ANTLR Exception Hierarchy
- Modifying Default Error Messages With Paraphrases
- Parser Exception Handling
- Specifying Parser Exception-Handlers
- Default Exception Handling in the Lexer
- Programmer's Interface
- Multiple Lexers/Parsers With Shared Input State
- Parser Implementation
- Parser Class
- Parser Methods
- EBNF Subrules
- Production Prediction
- Production Element Recognition
- Standard Classes
- Lexer Implementation
- Token Objects
- Token Lookahead Buffer
- Building the ANTLR C# Runtime
- Specifying Code Generation
- C#-Specific ANTLR Options
- A Template C# ANTLR Grammar File
- Notation
- Controlling AST construction
- Grammar annotations for building ASTs
- Leaf nodes
- Root nodes
- Turning off standard tree construction
- Tree node construction
- AST Action Translation
- Invoking parsers that build trees
- AST Factories
- Heterogeneous ASTs
- AST (XML) Serialization
- AST enumerations
- A few examples
- Labeled subrules
- Reference nodes
- Required AST functionality and form
- File, Grammar, and Rule Options
- Options supported in ANTLR
- language: Setting the generated language
- k: Setting the lookahead depth
- importVocab: Initial Grammar Vocabulary
- exportVocab: Naming Output Vocabulary
- testLiterals: Generate literal-testing code
- defaultErrorHandler: Controlling default exception-handling
- codeGenMakeSwitchThreshold: controlling code generation
- codeGenBitsetTestThreshold: controlling code generation
- buildAST: Automatic AST construction
- ASTLabelType: Setting label type
- charVocabulary: Setting the lexer character vocabulary
- warnWhenFollowAmbig
- Command Line Options

