Mr Speaker

Basic-er BASIC

C64

Today I wrote an "eBASIC" code snippet that contained a macro. The macro expansion included another macro that when expanded generated 6502 assembler code - which itself happened to contain a KickAssembler macro: that in turn generated its own 6502 assembler. The result of the entire macro formed part of a BASIC program that POKEd the assembled bytes into memory and called the machine code via SYS. The final BASIC program was then compiled back to machine language as a .prg file and run in a JavaScript Commodore 64 emulator. The output of the running emulator was used as an input texture to a WebGL shader, and rendered as the screen of a 3D model of a C64 computer monitor, updating 60 times a second.

This is the story of the funnest thing I've worked on, ever.

It started as a simple text-munging script to "save some time" while writing Commodore 64 BASIC. It ballooned into eBASIC: a cornucopia of scope-creepy feature additions. File includes, macros, inline 6502 assembly compilation... I estimate I'll have to write at least fifty BASIC programs to break-even, in regards to the amount of time spent on my eBASIC-to-BASIC tool.

eBASIC -> BASIC tranformation

Lately I've been tinkering with creating a game (I mentioned it in my WebGL gamma-correction post) - the centerpiece of which is an emulated Commodore 64 computer. For the game content I need to write a lot of C64 software, and the easiest way to get that done (I reasoned) was with BASIC.

The programs would largely comprise of screens of text display (with some embellishments) - so I didn't need anything fancy. Initially. But as the programs grew, things got hairy. The first thing I found was I always needed more line numbers. And also, writing line numbers is really annoying.

If you've never written C64 BASIC, you might not know that you need to manually assign a number to every line of code. The lines are ordered - so if you have code:
1 PRINT "Mr SPeaker RuleZ"
2 GOTO 1
... you can't put any instructions in between the two lines! You need to re-number line 2 to, say, line number 3. Traditionally you minimize this issue by having larger gaps as you go:
10 PRINT "mR SpEAker ruLez"
20 GOTO 10
Now you can add more lines between 10 and 20. But you still eventually run out as you add more and more stuff!

The original goal: No Line Numbers

I decided it was time to automate. I'd make a quick little script that would take a BASIC program without line numbers, and add them automatically. For GOTOs and GOSUBs (statements that require a target line number) I'd use labels indicated with a tilde. So the source:

~top
  print "automation saves time! I promise!"
  goto ~top

Would be transformed in to:

10 print "automation saves time! I promise!"
20 goto 10

A simple regex finds all the label targets, and a second pass replaces calls with the assigned line number. It worked a treat, and only took me 30 minutes to write, polish, and integrate in to my build process. A job well done! Time to get back to the real work of making the game...

...Though... you know what would be good? - I started to think to myself - would be if I could use hex and binary numbers instead of decimal. I have to bust out the calculator every time I go to POKE a VIC chip address. I know $d018, I don't know 53272.

outrun

Tiny enhancements: Hex, Binary, Comments

C64 BASIC only supports decimal numbers. I want to do "POKE 0xd012, 0xff", or "POKE 1,0b11111011". And in fact, even better if it supported the underscore separator: "POKE 1,0b_1111_1101" to easily be able to tell which bits are set.

const regexReplace = (line, regex, f) =>
  [...line.matchAll(new RegExp(regex, "g"))].reduce(f, line);

const replaceHex = (line) =>
  regexReplace(line, "0x[0-9a-fA-F_]+", (line, match) =>
    line.replace(match[0], parseInt(match[0].replace(/_/g, ""), 16))
  );

const replaceBinary = (line) =>
  regexReplace(line, "0b([0-1_]+)", (line, match) =>
    line.replace(match[0], parseInt(match[1].replace(/_/g, ""), 2))
  );

Again, a couple of quick-n-dirty regexs and we were away! I composed the replaceHex and replaceBinary functions and mapped them over the program lines. Another 20 minutes, another big time saver!

One more easy one while I was here: Normally comments in BASIC use the REM keyword (e.g., 10 REM This is a comment!), but I don't need them in my compiled program, taking up valuable tokens. So I just parse out lines starting with hash sign (e.g., # comment begone!). If I want the comment to actually appear, use as standard REM statement.

An hour-or-so in total and I'd immensely improved my BASIC coding experience! Such a good feeling making a productivity tool. I sure hope the feeling isn't addictive...

...

Scope is creeping: Multi-lines, macros

Ok, just one more thing. I now had the ability to make my code more readable with blank lines and comments - but that's not what BASIC programs look like. You don't have a lot of space on the C64, so a standard technique is to jam as many commands as you can on to one line. Usually a lot of commands. I want a "continue" token that will let the source code be easy to read, but the BASIC be... BASIC.

for n=0 to 62 :
  read q :
  poke 0x340+n,q :
next

I ran a reduce over the source, so that any line ending up with a colon would be concatenated with the next. The output is the much more BASIC-looking:

10 for n=0 to 62 :read q :poke 832+n,q :next

I put all my new tools to work cleaning up some existing BASIC game code. Refactoring a few programs turned up quite a bit of duplication. For example, there was fancy-looking rainbow text colors in a bunch of places - which meant manually inserting the color codes in between each text characters. This made the text hard to read in the source code, and was tedious to update.

I knew what the answer was. And I knew I was at a crossroads. Obviously my language (I was now calling it "my language") needed macros. Obviously I shouldn't be wasting time adding macros to my language. And obviously I should get back to making the flippin' game!

...

paper boy

So, macros in eBASIC are JavaScript functions that can be called from tags in the eBASIC source code. The return value of the functions are injected in place of the calling tag. If the return result is an array, then multiple lines will be injected. Here, for example, is a call to a macro inside a string that will be printed:

print "{{ rainbow("hello") }}"

The rainbow function expands to the following BASIC, where each character in the word hello is split and preceded by a different C64 color code:

10 print "{wht}h{red}e{cyn}l{pur}l{grn}o"

And the macro implementation is standard JavaScript. You can define the macros anywhere in the source file - I usually stick them at the bottom of the page. Here is the code for "rainbow-izing" a string:

{! rainbow = (str) => {
  // COLS is a magic list of C64 cols.
  const cols = COLS.slice(1);
  return mapped = str.split("")
    .map((c,i) => `{${cols[i%cols.length]}}${c}`)
    .join("");
} !}

Macro definitions are normal JS function, wrapped in a curly-brace-with-exclaimation-mark tag. Macro calls are standard JS function calls, wrapped in double-curly-braces. eBASIC finds all of the blocks of macro definitions - parses their name, arguments, and code bode - then makes them into callable functions:

env[m.name] = new Function(...m.args, "with(this){ " + body + "}");

In a second pass of the source, any macro calls are expanded within a minimal "context" that has access to other macros, as well as some useful constants (well, one constant so far: COLS - which is a list of C64 color tokens).

const ctx = { COLS: ["blk", "wht", "red", ... ] };
const res = env[func].apply(ctx, args);

"Necessary" creep: File includes

Being able to return an array from the macro means you can generate all sorts of source code. It allowed me to really clean up and de-dupe my programs - but it also spawned another problem. Now I wanted needed to use my custom macros across multiple source files. Sigh. No point fighting it, just implement file includes:

{{ "./snippets/custom_chars.ebas" }}

File includes use the same syntax as macro calls, but contain just a single string - the name and path of the file to include. The contents will be split into lines, and injected wherever the tag is.

Because this step happens right at the beginning, the type of the content is not important: it will just be dumped directly, and then processed by subsequent steps - so it can be plain BASIC, more eBASIC, and even macros or asm (covered shortly). The include files are recursively processed: an include file can include include files of its own.

async function includify(lines) {
  return lines.reduce(async (acp, line) => {
    const ac = await acp;
    ...
    const txt = await readFile(name); // load file contents
    const inc = await includify(txt.split("\n"));
    return [...ac, ...inc]; // inject contents into source lines
  }, []);
}
wonder boy

Scope is bounding: Inline assembly compilation

Inline assembly. This idea had been percolating in my brain since the very beginning. I finally allowed myself to start this feature on the condition that I make it the absolutely last feature I would add before getting back to the game.

The goal was to have blocks of assembly that would be compiled with KickAssembler (my assembler of choice). The compiled bytes are converted to DATA statements that are POKEd into memory then called via SYS.

{{!"set screen colors via asm"
  *=$c000
  lda #0    // 0 = color black
  sta $d020 // set border color
  sta $d021 // set background color
  rts       // return from subroutine
!}}

This snippet of 6502 assembler sets the base location to $c000. The instructions set the background and border colors. After conversion they should become the following BASIC code:

10 rem set screen colors via asm
20 data 169,0,141,32,208,141,33,208,96
30 for za = 0 to 8
40 read zb:poke 49152+za,zb:next za
50 sys 49152

If you are unfamiliar with 6502 opcodes, the line 20 data statements represent the assembled code above: LDA (LoaD the Accululator) is opcode 0xA9 (169 decimal). The final command in the routine is RTS which is opcode 0x60, or 96 in decimal. The 9 bytes of the program are POKEd into memory from address $c000 (49152 decimal), and then executed with SYS 49152.

My script finds blocks of assembler wrapped in the proper (increasingly confusing) tags. Using a child process, it runs shell commands to compile. Unfortunately, KickAssembler doesn't work directly with stdin/stdout (as far as I can tell), so I write the block to a temp file, then compile using the temp file as source, and another as destination. The bytes are read back from disk, ready for POKE-ing.

Because I couldn't figure out how to convert a stdin stream to bytes from cat in Node, I ended up piping the file through xxd and stripping the bytes out of the text conversion. Super dodgy.

There's a couple of extra features I include: you can include a title (that appears as a REM statement, and shows up in compilation/error output), and also a way to inject a block at the next byte, rather than needing to specify a memory address like *=$c000 manually for every block.

An unintentional feature: because it uses KickAssembler, all of its macros and functionality are available in the inline blocks. Which is a bit mind-bending. And as the assembly compilation happens early in the transformation - you can "include" the inline block from a different file - or wrap the include tag in an inline asm block tag so you can include full regular .asm files.

kungfu master

So far it seems to work ludicrously well: even for complex assembly programs using interrupts. It's so convenient, and just... fun. Some 8-bit systems of the day (such as the BBC micro) had this as a default feature. I really wish it had been in C64 BASIC as it would have made such a great learning tool as a kid.

const includified = await includify(input);
const asmified    = await inlineASMify(includified);
const macrofied   = macrofy(asmified);
const decommented = decomment(macrofied);
const sameLinered = sameLinerize(decommented);
const numberized  = numberize(sameLinered);
const decimalized = decimalize(numberized);
const output      = stringify(decimalized);

Regrets?

None.

Sure, the scope creeped and creeped. I had started devising an AST for the language before I realized I had creeped too far.

And sure, I ended up pouring a bunch of time into something that even I probably won't need. But really, I have no explicit intention of actually ever releasing this game anyway. It's my game-dev equivalent of (*ahem* a sub-section of) jazz: more entertaining to produce than to consume. This game is turning into the funnest thing I've ever worked on, and creating eBASIC-to-BASIC just made it more fun.

Captcha! Please type 'radical' here: *
How did you find this thingo? *