Create a Programming Language
Many people have difficulties or frustrations with the programming languages they use every day. Some want things to be handled more abstractly, while others dislike implementing features they wish were 'standard'. Whether you are an IT professional or just a hobbyist, many times you may find yourself wanting to create a new programming language.
Steps
- Become familiar with terminology. Compiler writers often use unfamiliar terminology. Read up on compilers before proceeding. Be sure to know everything that you need to know.
- Decide what problem your language is solving. Is it addressing a domain-specific problem, or is it a general purpose language?
- Think about the semantics of your language and the concepts of it.
- Are you going to allow direct pointer access or not?
- What are the data types of your language?
- Is it a static or dynamic language?
- What is your memory model? Are you going to use a garbage collector or manual memory management? (If you use a garbage collector, prepare to write one or adapt an existing one to your language.)
- How are going to handle concurrency? Are you going to use a simple threading/locking model or something more complex like Linda or the actor model? (Since nowadays computers have multiple cores.)
- Are there primitive functions embedded in the language or will everything come from a library?
- What is the paradigm or paradigms of you language? Functional? Object-oriented? Prototype (like JavaScript)? Aspect-oriented? Template oriented? Or something entirely new?
- How is your language going to interface with existing libraries and languages (mainly C)? This point is important if you're building a domain-specific language.
- Finally, some of the answers to this questions are going to be answered by the second step and will help you answer the next step.
- Think of some specific tasks that someone would want to be able to perform with your language. For example, 'they may want to direct a robot to follow a line' or 'they may want to create relatively portable desktop programs in it' or 'they may want to create web applications with it'.
- Experiment with syntax ideas (the text of the language) for the above examples.
- Be careful to keep your language in the context-free language category or something inside it. Your parser generator and you will appreciate it later on.
- Write out a formal grammar for the syntax.
- Decide whether the language will be interpreted or compiled. Meaning that in the interpreted world your user will typically edit your program in an editor, and run it directly on the interpreter; while in the compile world, your user will edit your program, compile it, save the resulting executable somewhere and run it.
- Write the front end scanner and parser or find a tool that helps you with this.
- Also, think about how your compiler/interpreter will warn your user about erroneous programs and syntax errors.
- Use the parser information to write the object code or an intermediate representation. Have the parser create an AST, then create your object code from the AST using three address code or its big brother SSA, then create a symbol table to define your functions, global variables, etc.
- Also, depending on your language, you may also want to create virtual pointer tables or information tables for you classes (in order to support reflection or RTTI).
- Write the executor or code generator that will bind everything together.
- Write many test programs to test the language.
- You want to create programs that stress the burdens of your formal grammar in order to see that your compiler accepts everything that is inside your definition and rejects everything that is outside of it.
- Consider how the user will debug their own programs.
- If your language uses a standard library, you will want to write it. Along with a garbage collector or other runtime features if you need it.
- Specifically, if you write a compiler, you will need the code that the operating system will execute in order to begin running the user code (for example, allocating all global variables).
- Publish your language, along with the specification for it and some examples of what you can do in it.
- Don't forget to document how you can integrate with existing libraries, languages and how to use the runtime features and/or standard library.
Tips
- Start by designing your language and don't write any code, until you are satisfied and have answered all (or most) of the questions or problems related to your design, since it's easier to change the design earlier than later.
- Know your target platform (operating system and libraries) for your compiler/interpreter after all, you are going to use it and manipulate it.
Warnings
- Think if you really need a new language, and what your language has of new that other languages don't have (It may be a combination of features or a single feature).
- Writing languages is difficult if you don't know what you're doing. It takes a lot of practice, too.
- Prepare to spend some time in language design, since you won't have a chance to change your language once you've written the compiler and past the design point.
- Don't try to base your features into a union of several languages, like saying that you language will be a union of language X, language Y and language Z. History has shown us that languages created in such a way will never find success, or everyone would be programming PL/1 instead of something based on C.
Things You'll Need
- Patience.
- Knowledge about language features and language design (you may want to read Programming Language Design Concepts from David A. Watt).
- Knowledge about compiler theory (since you will be writing a compiler/interpreter for your language and your implementation will be the reference implementation).
- Uses for your language (remember that some of the most used languages like C or Lisp were created in order to do something specific like creating Unix or doing symbolic computation).