remedyvm

A toy RISC virtual machine inspired by Bell Lab's `dis' and Tsoding's `bm'
git clone git://git.ethandl.dev/remedyvm
Log | Files | Refs

readme.md (6287B)


      1 # RemedyVM, the register-memory dynamic virtual machine
      2 
      3 **NOTE:** This is a personal project for funzies pls don't bully.
      4 
      5 ## Overview
      6 
      7 RemedyVM will be a (memory↔stack↔heap)-based VM target, with an AOT
      8 compiler, a JIT compiler, and a simple interpreter.
      9 
     10 The memory/stack/heap approach allows for register allocation
     11 optimisations at run/compile-time, whilst still offering compilers
     12 targeting RemedyVM the option to put memory into a stack or the heap.
     13 
     14 A memory machine (MM) is a machine with "infinite" registers, each
     15 corresponding with a temporary value. Much like a register machine,
     16 temporaries can easily occupy real registers in JIT compilation, but
     17 much like a stack machine, the registers are in a growing list.
     18 
     19 There are currently no plans for garbage collection, but if it is to be
     20 implemented in the future, it will likely be done similarly to Bell
     21 Labs' `dis` VM, taking a hybrid reference count/mark and sweep approach.
     22 According to Winterbottom and Pike, the memory machine architecture
     23 makes this easier to accomplish.
     24 
     25 The bytecode should be compilable straight to machine code ahead of time
     26 (AOT), removing *any* runtime in the process. The JIT compiler will need
     27 a psuedo-runtime, all functions are *thunks* until first run, then they
     28 are "evaluated" (JIT compiled) to a function that runs with no overhead.
     29 
     30 Some functions/labels may force themselves to be JIT compiled or
     31 interpreted so as to adopt a structure that could only come about from
     32 being lazily evaluated on the fly. Potentially a full runtime could be
     33 set-up to allow for AOT compiled lazy-thunks.
     34 
     35 The runtime JIT process may need to have multiple steps, akin to the V8
     36 JavaScript runtime. Register allocation may be heavy on resources as it
     37 is a graph-colouring problem, so an initial round may not allocate
     38 machine registers optimally, or may even run interpreted.
     39 
     40 Whilst the standard implementation will only really be a psuedo-runtime,
     41 the initial implementation will simply be an interpreted VM.
     42 
     43 To be clear, this project is defining a VM target, a VM interpreter, and
     44 a VM compiler with JIT/AOT.
     45 
     46 Given the stack-register hybrid approach, temporaries will be stored in
     47 one of 2 places to be determined at runtime: 1. A dynamic array of
     48 registers that is determined at VM startup/compile-time to match the
     49 host architecture. This is the "memory" in memory machine. 2. The
     50 regular stack. 3. The heap.
     51 
     52 Given the simplicity, beauty, and popularity of RISC machines, the
     53 instruction set of RemedyVM will be very simple and should be similar to
     54 ARM, MIPS, and other RISC instruction sets.
     55 
     56 RemedyVM should be as platform agnostic as possible, though it's hard to
     57 get a "one implementation fits all", the underlying platform assumptions
     58 should try to be as minimal as possible.
     59 
     60 Given that RemedyVM is inherently not a very stable target, compilers do
     61 not have much control over what goes into registers, and what goes onto
     62 the stack. This is the hardest problem that this project aims to solve.
     63 This issue is discussed in more depth in the [unstable target
     64 section](#unstable-target-solutions), but the solution picked by
     65 RemedyVM is to have a *memory machine* RISC architecture.
     66 
     67 ## Unstable compiler target solutions
     68 
     69 Given the virtual machine decides the number of registers to allocate at
     70 run/comptime, compilers have an unstable target!
     71 
     72 How can a compiler be sure that an architecture will be able to put 20
     73 variables into registers? It can't!
     74 
     75 Some solutions: 1. Treat all temporaries as being in an ordered dynamic
     76 list, the runtime register allocation takes the $K$ lowest temporaries
     77 and puts them into registers. (Where $K$ is the number of registers for
     78 a target) - This is the method used in the `dis` VM by Bell Labs, they
     79 use a very similar Memory Machine architecture. 2. Leave it up to the
     80 compiler, any excess "registers" get treated as temporaries. - This
     81 would be bad for targets that have too few or many registers. 3.
     82 Stabilise the VM target and just pick a number of registers. - The JVM
     83 takes this to the extreme and has only machine registers, PC registers,
     84 and the stack. 4. Restrict compilers to just use a stack, the JIT will
     85 decide on registers. - This would make RemedyVM a simple stack-based VM
     86 
     87 It seems from these that the best solution is solution 1, to go with an
     88 infinite-register hybrid VM, or rather a Memory machine. Unlike `dis`,
     89 RemedyVM will be a RISC-like architecture for ease of instruction
     90 selection. RemedyVM is thus a *memory virtual machine* with stack & heap
     91 components for records that might not fit well into the architecture.
     92 
     93 ## Goals
     94 
     95 - [ ] Lay-out the basic instruction set
     96   - [ ] Move
     97   - [ ] Stack
     98   - [ ] Heap
     99   - [ ] Arithmetic
    100   - [ ] Logical
    101   - [ ] Jumps
    102   - [ ] Conditional instructions
    103     - [ ] All instructions should be conditional
    104 - [ ] Implement an interpreter
    105 - [ ] Compilation
    106   - [ ] Assembly generation
    107   - [ ] Linking
    108     - [ ] Static linking
    109     - [ ] Dynamic linking (?)
    110     - *Note: Linking will probably just be done with the regular gcc
    111       tooling, at least initially.*
    112   - [ ] Optimise instruction selection for RISC platforms
    113   - [ ] Optimise instruction selection for x86 platforms
    114   - [ ] Implement AOT compiler
    115     - [ ] Link to libc etc
    116   - [ ] Implement JIT compiler
    117     - [ ] Runtime implementation (?)
    118     - [ ] Lazily compiled thunk implementation
    119 - [ ] Implement a basic compiler targeting RemedyVM
    120   - [ ] Basic lisp-like LL(1) language
    121   - [ ] Simpler Lisp-Haskell hybrid
    122   - [ ] Mojo
    123 - [ ] Tests
    124   - [ ] Unit VM tests
    125   - [ ] Fibonacci
    126   - [ ] Factorial
    127   - [ ] Mandelbrot
    128   - [ ] Game of life
    129   - [ ] Memory tests
    130   - [ ] Arithmetic tests
    131 - [ ] Cross-Platform
    132   - [ ] Linux
    133   - [ ] Plan9(front)
    134   - [ ] OpenBSD
    135   - [ ] MacOS
    136   - [ ] Windows
    137 - [ ] Documentation (ugh)
    138   - [ ] Design doc
    139   - [ ] Target spec
    140 - [ ] Implementation languages
    141   - [ ] C
    142     - The most cross-platform, will support every target
    143   - [ ] Haskell
    144     - Will support most targets
    145   - [ ] Rust (?)
    146   - [ ] Go (?)
    147   - [ ] Zig (?)
    148   - [ ] Java (?)
    149     - Icky, but also have already written *some* of a compiler in Java.
    150 
    151 ### Note
    152 
    153 - This is a personal project, and is not going to be a serious virtual
    154   machine.
    155 - It is just experience, my first virtual machine!
    156 - The more advanced features may not ever be implemented.