Monday, October 17, 2016

State of the RC2016/10


RC2016/10 is a little more than halfway over... so where am I?

Pretty much in the weeds.

I'm almost at the point where I wanted to be at the beginning of the challenge. I'm still working on finishing up the SD loader routines.  Well, I've finished the SD/Micro side of things (in the SSDD1 module - Serial SD Drive) both for the real physical drive as well as for the emulator, for loading only.  The emulation also supports writes now, and both support file/directory manipuations:  Directory listings, make directory, remove directory/file. (For the FAT filesystem anyway.)

I've got a bunch of things on my plate right now, between work, finishing up a contract or two, the animatronic bird project has reappeared a bit, plus time for the family, lack of sleep, and general lack of motivation for anything.

I haven't done any of the sector IO stuff yet, as I want to get regular files working.

In any event, here's a quick bullet summary of the current state of the project:

Done:

  • Hardware for SD interface
  • Hardware for ROM/RAM switcher
  • SSDD1 process design (see image above)
  • SD drive (SSDD1) firmware (preliminary)
  • SSDD1 emulation for file and directory support
  • SSDD1 firmware for directory support and file reading
  • CP/M research to figue out what needs to be done, prior work.

ToDo:

  • SSDD1 firmware for file writing
  • SSDD1 firmware and emulation for simulated sector IO
  • Z80 SSDD1 decoder (Hex string+checksum to proper formatter)
  • Z80 IHX decoder (stream from SSDD1 to RAM writer

Gameplan tasks (Roughly in order of probable completion):
  • Z80 IHX decoder
  • Load a ROM from the SD card into RAM, switch off the boot ROM, restart, run from RAM
  • Backport emulation into the firmware to get them equal
From here, I can go down one of two paths.  I can completely flesh out the rest of the SSDD1, and get the sector IO code implemented, which is probably the best course of action, just for completeness.  Or I could start working on porting/implementing the CP/M bios ROM, which might give me the "win" kick that I need to get the Sector interface implemented... although once I have the regular file IO stuff done (which it is) the sector IO stuff is the same plus a bit more wrapper implementation... after all it's just a bunch of 128 byte files in subdirectories... so...
  • Sector IO SSDD1 emulation
  • Backport Sector IO to SSDD1 firmware
  • CP/M Bios
  • CP/M Bootloader into LLoader ROM
  • Burn new LLoader ROM to 27C512 EPROM
  • Boot a RC2014 to CP/M!

Sunday, October 2, 2016

RC2014/LL and RC2016/10



The RetroChallenge is upon us again!

This time around, I plan on working on my Z80 homebuilt computer, the RC2014.  (Website for RC2014, Order a RC2014) For a few months now, I've had my RC-2014 computer built, and modified to be an RC2014/LL computer. What this means is, is that I have some modified modules using no additional external hardware.

The above picture shows my RC2014/LL system with its extra RAM module, and the C0 Serial expansion board to the left, with the SD card interface board (SSDD1) on it.

The basics of this design:

Unmodified RC2014 modules:
  • Z80 CPU module
  • Clock module (* see below)
  • Serial console interface
  • RAM module (for RAM in the range $8000 through $FFFF
Modified RC2014 modules:
  • Second RAM module
  • ROM module
  • Digital IO module
Additional hardware:
  • Second ACIA Serial port at $C0
  • SSDD1 (Serial SD Drive)
(*) While the clock module is unmodified, it technically is modified. I have added a 10 uF (50V) cap between the reset line and ground to be a quick-and-dirty power-on reset circuit. It works perfectly.  Every time I power on the computer, it "presses the reset button" for me. ;)

The plans to mod these parts are available here.  This is currently fully functional and tested. The modifications use unused gates on the boards, so that it requires no extra additional hardware or boards to implement. The basic theory to the /LL modifications are as follows:



Bit 0 (0x01) of the Digital IO module is tied to one of the extra bus lines on the backplane.  Let's call this "Extra-A".  When you do an "out" to that port of "0x01", it will trigger the Extra-A line.




The ROM module "out of the box" is configured such that if there is a memory access, which is a read from address $0000 through $7FFF, it will enable the ROM, and it will put its data on the bus.  My modification adds in one extra condition to this.  It adds in that only if the Extra-A line is LOW, that the ROM will be enabled.  This means that when Extra-A is 0, the ROM works. When Extra-A is 1, it's as though the ROM doesn't exist.



The RAM module "out of the box" sits at the high half of memory space ($8000-$FFFF), and is always enabled on memory READ or WRITE to those addresses.  The modification to this module is threefold.  First, it sits the RAM at the low half of memory space ($0000-$7FFF). Secondly, it is set up that WRITEs to this memory space will always work, regardless of Extra-A.  Thirdly, it is set up that READs to this memory space will ONLY work when Extra-A is HIGH.

The end result of this is that when Extra-A is low, reads in the low half of memory will come from the ROM.  When it is high, reads will come from RAM.  Writes ALWAYS go to RAM.

This is quirky...

I admit this.  It means that you can (and I will) write a bootloader/monitor ROM that is enabled on power-on, will read from a mass-storage device and write into anywhere in RAM... It can load a 64 k byte memory image into RAM, and then switch off the ROM and it will all work.  The quirkyness is that you cannot verify the loaded-in memory in the low range of ram, since reads will come out of the ROM.   Obviously, you also need to do this routine completely out of registers, as your stack variables will get overwritten if you're not careful.

Anyway....

If you want it to behave like a stock RC2014, remove the jumper from the IO board to Extra-A, and instead add a jumper from ground to Extra-A.  I have added a switch to mine to force this in the case where the IO board comes up in the wrong state.

Additionally, I have added a second serial port, which basically follows the circuit for the first port, but it sits at $C0, and does not have an interrupt line wired to it.  The RX/TX on that one comes out to a FTDI-like pinout header, which is where my SSDD1 module plugs in.  The plans for this serial port are available here, and can be seen as the brown perf-board on the left of the topmost image of this post.

The SSDD1 module...


The Serial SD Drive module is the mass storage module that I've created for my Z80 to interface with. I know that I can push the FAT filesystem support onto the Z80, but that would require substantial effort.  I instead decided to go with the model where I have a smart serial-based device that you tell it "i want this file" and it sends it out.  Like a local BBS. ;)

The use of a serial-driven drive is not unprecedented.  It's somewhat modeled after the Commodore 64/Vic 20's "IEC" serial interface for floppy drives.  It also mirrors it in that the drive has some smarts in it to deal with the drive architecture.

I also went this route for the ultimate form of this computer, which is to run CP/M.  CP/M expects drives with a Drive+128 byte Sector layout.  While other Z80-CP/M computers implement this by having direct interface to the sectors on the spinny disk/cf/sd card, I will do it by having files on the SD card, for the most amount of flexibility.  There will be a "drives" folder on the card containing folders named "A", "B", "C", and so on.  These letters are the drives.  In each of those will be directories for the tracks, named "0000", "0001", etc and in each of those, files named "0000" "0001" "0002" etc. These files are the simulated sectors on the disks.  It will be easy to build virtual drives for other use out of this.  It also means that I won't need some special interface on a modern computer to talk with this.  I won't have to do 'dd' style transfers to get at the data... It's all just sitting on a FAT filesystem.

The implementation of this module is based on an Arduino with a microSD breakout module.  The serial interface communicates directly with it, and it sends the content via serial back to the Z80 host.

The above picture shows the SSDD1 module off of the RC2014/LL expansion board, and instead wired to a breakout board where I have a second FTDI serial-USB interface so that I can debug the hardware more easily.

And that brings me to the current Retro Challenge...

My goals for this month is to do a few things here, to finish up this computer system...
  • Finish up the firmware for the SSDD1
    • Sector load/save support
    • File load/save of intel hex files
    • directory create, list, remove
  • Write the SSDD1 emulation for the emulator
  • Finish up the loader and burn it to a ROM
  • Write a CP/M bios that uses the SSDD1 interface
  • Build CP/M disk/sector files 
  • Play Zork
Stretch goals:
  • Extend the NASCOM BASIC to support the SSDD1 for loading and saving.


Friday, June 24, 2016

The RC2014 Computer: 1. Emulation


As you may know, I'm bigtime into Z80 computerey stuff.  For the past 20 years or so, I've been hacking Pac-Man ROMs, been maintaining the Ms Pac Disassembly, and have made my own Z80-Pac based programes over the years.

Fairly recently, I got a Kaypro II from a friend at interlock, and it worked perfectly, and looked brand new.  I felt like I couldn't hold on to it... "it belongs in a museum!" ...so I donated it to ICHEG/Strong Museum of Play.  But it helped whet my appetite for a CP/M computer.

Another project I've been wanting to do was to start with a Commodore 64 (I know it's not Z80, it's 6502ish), a floppy drive, some blank disks and a hardware manual and code up, from scratch, an OS.  Start out by making a text editor, assembler, GEOS-like GUI, etc.

These projects recently had an opportunity to overlap, and they all seem to converge on the RC2014 modular Z80 computer.

The RC2014 computer is a backplane-based modular computer created by Spencer Owen, based on the Z80SBC by Grant Searle.  There are modules for the CPU, RAM, ROM, Serial Terminal interface, and so on.  It is available as a kit from tindie.com.  Once assembled, you hook up a serial terminal to it, power it on, and you get a 1980s-esque BASIC prompt onto which you can write your 32kbytes of program.  This is based on Grant's simplified Z80 computer, so there is no off-line storage.

My general plan for the RC2014 is:
  1. Emulation:
    1. Create an emulator for the system to aid with rapid development
      1. also bring my "bleu-romtools" from Google Code to github
    2. Add a serial-based storage solution to the RC2014 emulation to confirm proof-of-concept
    3. Add ROM swap out to the emulation
    4. Add 32k of ram to give a full flat 64k of ram to the emulation
  2. Hardware:
    1. Get a RC2014 kit
    2. Build the RC2014 kit
    3. Make a test ROM to run on real hardware to verify my toolchain is working
    4. Create a new serial card that sits at port 0xC0 (second serial)
    5. Create the SD Drive firmware for the serial arduino
    6. Hack the ROM and Digital IO boards to allow for disabling the ROM
    7. Add 32k hacked RAM to the system
  3. Name: RC2014/LL
    1. At this point, the architecture is different enough and well defined enough that I think a new name for this configuration is in order. I call this configuration "RC2014/LL".
  4. Port CP/M
    1. Create the BIOS
    2. Create the sector-based emulation layer in the SD drive
    3. Boot CP/M
    4. Play Zork


I started out by making an emulator using the Z80 Pack emulation system.  Once I got this running, I realized the limitations of this emulator and looked around and found another emulator that suited my needs better. (I wanted a way to "swap" memory around, which Z80 Pack would not do in a way that wasn't a major hack.)

I created a layer that adds disableable memory regions, and added emulation of the 6850 ACIA serial chip, and threw the 32k RAM BASIC ROM at it, and it started right up, running BASIC!

I added a second 32k of RAM (easy to do when you're emulating it!), and started creating the SD interface, also using the 6850 ACIA for communications.  I then added a port, emulating the IO card, on which bit 0 (0x01), when set, will disable the ROM... So any reads to the low area of memory will read from the ROM.  Whether this is set or not, all writes to that area of memory will actually happen to the RAM... they will just be hidden from reads until that bit is set.

I now have the basics of the RC2014/LL system emulated in software!  I created a boot/diagnostic ROM which can be used for all RC2014 systems which can probe memory to determine type (ROM, RAM, unpopulated), peek and poke memory, In and out IO, and other utility functions for the SD card interface.

Currently, I'm writing the SD card API which I will port to the Arduino Leonardo and SD breakout board which I have ordered, once those come this weekend, I'll shove them into my own serial board and burn a test ROM and see how it goes....

Sunday, February 21, 2016

Hacking my own Arduino Mega


At Interlock, I was handed the old controller board for a gutted 3D printer that was being rebuilt. "Do whatever you want with this." A close inspection of the board showed that it had a main microcontroller of the ATmega 1280, which is the chip used in older Arduino Megas.  The interface to USB however was an ATmega 8u2, which is the chip used in newer Arduino Megas, and you may also know it from older Arduino Unos... modern Uno R3s use a 32u4.

This board had custom firmware on it so that it didn't look like an Arduino, or any sort of serial connection to the host computer it's plugged into... so as-is, it was useless for general use as an Arduino; taking advantage of the GUI and clicky-clicky programmer interface.

So my thought was, it might be nice to have my own 'Mega for testing and such.  Could this board be set up in a way that might make this process and outcome easy?  Turns out it mostly was.


The original board got its power from a power terminls on the board, 24V.  It needed to power the stepper motors, and such so it needed to be beefy.  This was dropped down to 5 and 3.3 on the board itself.

There is a USB B jack for connecting this to a host computer, which did not have its 5V connected, so my thought was, what if i hooked up this 5V to the USB jack.  would that be enough to power the chips?


I added this jumper, which connects the +5 on the USB jack to the 5v bus on the board, and plugged it in, and sure enough, it beeped and came to life without its host power supply.

Next up would be reprogramming the micros to have the arduino bootloader and code on them.


I hooked up my fairly cheesy Arduino D-15 (hacked stepper motor controller) ISP to the 6 pin header, which thankfully was already populated and labelled on the board!  I plugged it into the port labelled "1280 ISP", selected the Arduino Mega, with 1280 micro from the Arduino 1.6.6 menus, selected Arduino ISP for the programmer, then selected "load bootlader".  In about a minute, it seemed to have completed successfully.... if something didn't jive, it would spew out sync or device errors to the screen.  Seemed good so far!

Next, was hooking it up to the jack labelled 8u2 ISP.  This was a little trickier because I wasn't installing the bootloader (which the Arduino IDE makes REALLY easy to do), but rather the secondary micro's firmware, which basically was just a USB-Serial interface driver.

Long story short, I grabbed the 8u2 code from github, "MEGA-dfu_and_usbserial_combined.hex", and used the following command line (using a mixture of the code on that page, with the parameters that my system used via the arduino IDE on my Mac:

    ./avrdude -p at90usb82 -F -cstk500v1 -P/dev/cu.usbserial-A800czia -b19200 -U flash:w:8u2.hex  -U lfuse:w:0xFF:m -U hfuse:w:0xD9:m -U efuse:w:0xF4:m -U lock:w:0x0F:m -C/Users/me/Library/Arduino15/packages/arduino/tools/avrdude/6.0.1-arduino5/etc/avrdude.conf

In short, it sets the CPU to at90usb82, uses the stk500v1 communications protocol over the /dev/cu.usbserial driver, at 19200 baud.... it programs the file 8u2.hex, sets fuses and sets other avrdude configuration stuff.

After lots of text scrolling by from running that, I was able to drop a program I was working on, onto it via the Arduino IDE directly, without any problems at all! I set the port to the serial Mega, set the board to "Arduino Mega", cpu set at "Mega 1280", clicked 'upload' and bam, fully functional serial communications from the serial montior down through to the '1280 on the board.


Whoo! Free Arduino Mega for me!

Edit: Here's the pinouts of stuff I beeped out.

 * 4 - Piezo +
 * 6 - heat
 * 7 - fan
 *
 * 24 - A Dir
 * 25 - A Step
 * 26 - A Enable
 * 27 - A Pot
 *
 * 28 - B Dir
 * 29 - B Step
 *
 * 36 - debug 2
 * 37 - debug 3
 * 38 - (nc)
 * 39 - B Enable
 * 40 - debug 4
 * 41 - PG0
 * 42 - TP33 / Z-MAX
 * 43 - TP32 / Z-MIN
 * 44 - Extra +/R85
 * 45 - bp heat
 * 46 - TP31 / Y-MAX
 * 47 - TP30 / Y-MIN
 * 48 - TP29 / X-MAX
 * 49 - TP28 / X-Min
 *
 * A0 - X Dir
 * A1 - X Step
 * A2 - X Enable
 * A3 - X Pot
 *
 * A4 - Y Dir
 * A5 - Y Step
 * A6 - Y Enable
 * A7 - Y Pot
 *
 * A8  - Z Dir
 * A9  - Z Step
 * A10 - Z Enable
 * A11 - Z Pot
 *
 * A12 - PK4 / JP7
 * A13 - PK5 / JP7
 * A14 - PK6 / JP6
 *
 * A15 - TP27 / HBP Therm

The molex switch connectors seem to have the pinout: (signal) (ground) (ground) (+5v)

Monday, February 1, 2016

A (mostly) Finished 6502 LlamaCalc(ulator) (RC2016/1 Post-Mortem)


February 1st sees the end of RetroChallenge RC2016/1.  My entry for this month was to create a calculator for the Commodore/MOS KIM-1, by way of 6502 and the KIM-Uno emulation project.  I wanted to have a working somewhat-calculator running on the system, but more importantly, I wanted to learn 6502 assembler.

So let's see what my goals were for this RetroChallenge, as I set them out at the beginning of the project:
Starting today, I'm going to attempt to better learn 6502 asm in my copious amounts of free time for the  RC2016/01 Retrocmputing Competition.  To prepare for this, over the past year I've gotten into working with Oscar Vermeulen's awesome KIM Uno kit, as well as pushing out my own updated firmware for it in the form of my Kim Uno Remix project on github. 
...
For the challenge, I want to use this system to make a simple integer programmer's calculator which I can run on the KIM Uno itself.  Press keys to shift in the nibbles, then switch it into a mode where i can affect the data.  Convert hex to decimal, do bitshifts, add, multiply, etc.
In short, even though I didn't accomplish everything I outlined here, I feel like I was completely successful in the project.  The calculator application is incomplete according to the above feature set, but that wasn't really the goal of this whole thing. I wanted to learn 6502 Asm, which I did. (I didn't finish the BCD to HEX conversions, nor did I implement multiply/divide math functions.)

What were my problems?

I think that one thing that held me back was getting my head around doing multibyte math with only a carry bit. For some reason I got it into my head that this wasn't enough, which of course it is.

Another thing that kept me from getting everything done was that I spent a lot of time to understand the BCD/Hex algorithms.  The code that I used was ultimately very similar to sample code online, but I decided that I really wanted to understand how it worked, so I didn't put it in until that was true.

And of course, just the general lack of time because of various other things including: my daytime job, playing with my kid, two contracts to work on at home, being sick, etc.

What did I achieve?

Over the course of the month, I learned a lot about how to work with such a limited set of registers.  I came from Z80 world where you have a bunch of 16 bit registers.  6502 has one 8 bit accumulator (A), two 8 bit indexing registers (X,Y) which each can only be used for certain operations.

Most everything, seemingly, is done by interacting with memory locations, specifically those in the "zero page". The 6502 has this idea where the 16 byte address's top byte is the "page" of memory.  the memory in the zero page would be bytes from $0000 through $00FF.  This is generally used for OS and general use variables, etc since there are small opcodes specifically for working with them.

I'm getting into too much detail. I'll instead outline all of my accomplishments for the month...

  • Learned 6502 ASM
  • Improved the "KIM Uno Remix" Desktop application (QT for portability)
    • Added a memory snooper
    • Better graphics palette
    • More speed support
  • Learned indexing (using X and Y registers)
  • Wrote the LlamaCalc input routine 3 times, learning 6502 opcodes better each time
  • Came up with a decent user interface for LlamaCalc that's somewhat learned-intuitive
  • LlamaCalc features implemented:
    • Display/UI states for LlamaCalc (Splash, Result, Menu, Error)
    • centralized interface for doing math functions, error handling, etc
    • 8 level stack of numbers to be used (changeable at build time)
    • Push/Pop stack functions
    • Hexadecimal to BCD conversion
    • bit shift left by one bit
    • bit shift right by one bit
    • 24 bit addition
    • 24 bit subtraction
  • Designed an RLE compression scheme for graphics
  • Added RLE decompressor to the source code projects
  • Played with optimizing screen display
  • Oh yeah, created a full repository for 6502 code, with libraries etc. on github
  • Every time I learned something new, I created another project in the Projects6502 repo

So yeah. I feel like i was successful...


I will soon have a walkthrough of using LlamaCalc using a KIM-Uno device.

Here's the source code for everything:

Monday, January 25, 2016

6502 - 24 Bit Math and a little BCD (RC2016/1)

I decided that another experiment/lesson to do on my way to making my calculator app was to learn how to do multibyte math and possibly experiment with BCD/Decimal vs Hexadecimal. (Or as Mark Watney calls them "Hexidecimals".

I kinda like breaking down this project into multiple "lessons" as it were.  It makes me feel like I'm following along lesson plans in a book.  Perhaps I should go the other way around and actually make the book I would be following if I were following a book to make this thing.

The code for this can be found in my github repository.

I broke down the application into a few main steps:

  1. display the last result
  2. add together the two previous results
  3. store that sum into a result variable
  4. repeat
When broken down further, we see that we also have to have some method for "kickstarting" it, as it were, since the first two numbers in the sequence do not follow the standard fibonacci sequence. (quick reminder: each value in the fibonacci sequence is the previous two values added together. very simple.  So, for the first two values, there is no "previous two" so they are just hardcoded as "0, 1"

Computation of the sequence can be described as :
  1. hardcoded "0"
  2. hardcoded "1"
  3. use algorithm to sum previous two values
  4. same as 3
  5. etc
For doing the math, I wanted to have variables that mimiced the 3 bytes we are able to display on the KIM, so I use 24 bits (3 bytes) to store them.  I broke down the math functions to be generic in that they can perform using two variables ("i" and "j") and store their result in a third variable "RESULT".  From there, additional functions were created to move the values around between them.  For example, we need to "roll" the values through if we want to make this repeatable. So the computation sequence can be described as:
  1. RESULT, I and J all set to '0'
  2. refresh display RESULT "0"
  3. RESULT gets "1"
  4. refresh display RESULT "1"
  5. shift the values through:
    1. J gets I's value
    2. I gets RESULT's value
  6. add:  RESULT gets the value from adding I and J
  7. repeat at step "4"
And this is basically the procedure as seen in the source.  

The multibyte addition was actually a lot simpler than I thought it was going to be. My first thought was "how could this possibly work if i were to add like 100 to 100... you end up with "2" for the carry instead of "1"."  Obviously, you can see the error here, but for some reason this got stuck in my head and suddenly, all of the multibyte (16+ bit) math seemed near impossible to deal with.  I think it was the multiplication that seemed hard, but when you break it down as multistep additions instead of multiplications, it all makes sense.  I blame this on the cold and fuzzy head I have right now.  I'm just not thinking right... also extra time at work... sure... and um... an ARP storm.  all contributing factors to not thinking clearly. ;)

The basic procedure for doing multibyte math is to observe the carry bit.  The carry bit is set when math on two 8 bit values exceeds the 8 bit container.  If you think of it in decimal, when you add 1 to 9, you get "0" with a carry of "1" which ends up in the next digit space, resulting in a "10".  So if you were to add 99 and 04, you end up with 03 with a carry of 1, resulting in "103".  Math on the 6502 is no different, other than we're (probably) using hex where the value can go from 0-9,a-f rather than just 0-9 for each digit.  The math for addition is basically:
  1. for each byte (starting from the least significant on the right)
    1. add one byte to the other, with the carry bit from the previous byte
    2. store that result in the RESULT
Or, more precisely
  1. clear "Carry" (Carry = 0)
  2. register A gets I0 (A = J0)
  3. add j0 to A.  (A = A + J0 + Carry)
  4. store the result in RESULT0
  5. A = I1
  6. A = A + J1 + Carry
  7. RESULT1 = A
  8. A = I2
  9. A = A + J2 + Carry
  10. RESULT2 = A 
I think you can see that this can be carried out indefinitely for multiple bytes.

The "display the result" was pretty straightforward as well.  The "RESULT" bytes were stored into INH, POINTL and POINTH, and then the SCANDS function is called. This refreshes those three values out to the KIM's LED display.  Then a call to GETKEY stores the current key press value into the accumulator register.  If nothing is pressed, this fills A with $15, or KEY_NONE as I have it defined.  Then it just sits in a tight loop refreshing the display and waiting for any key to be pressed.
  1. refresh display
  2. check for key press
  3. no key press? repeat at 1
  4. return
So the end result is a program that advances to the next sequence number each time you press a key.

When it fills all the digits, when we get a "carry" on the third digit while doing the math, i display "EEEEEE" as a cheesy error display and wait for a press.  When something is pressed then, it resets and starts all over agian.


As for BCD, I basically have run the code both in BCD (decimal) mode and hex mode, just to see how it works out.  Turns out i was worried for nothing,  It all 'just worked' fine in both modes.

So yeah.  My throat is sore, and I'd love to just go to sleep right now.

Thursday, January 21, 2016

6502 - RLE Image Renderer (RC2016/1)


I finished up my RLE (Run-Length Encoded) image renderer last night.  It would have been much simpler but there were a few things that I wanted to deal with to have proper full support for sprite placement and large image rendering.

The basic concept of RLE is that instead of storing just a series of pixel colors, we also store the number of times each pixel is repeated.  As described in the previous post, we know that this hardware uses the lower nibble of each byte to store the color number.  We will use the upper nibble to indicate repetitions as well as other commands, which we'll get to later...

Using '0' for the number of repetitions makes no sense, so it will never be used when the image is encoded. (repeat "red" pixels 0 times? nope.) So we'll use '0' in the top nibble to indicate commands.  A few commands that we will need are:

$00 - End of image (stop rendering, return)
$0F - End of line (no more pixels on this line, start over vertically down one pixel from the start of this line)

Which leaves $01 through $0E, which we will use as a "skip".  Advance the screen position, but do not draw any pixels to the screen. We can use this to allow images to have "transparency".

One thing to deal with was that after 255 bytes (at most), the referencing will go into another bank.  If everything fits in one bank, that's fine, but the screen itself is 4 banks, so this was something that needed to be addressed.  (HA! Addressed! I'm hilarious!)  If this isn't dealt with, and we only are incrementing the lower byte of a two byte address, we'll just keep reading (or writing) forever inside of one bank. $41FE, $41FF, $4100, etc  rather than $41FE, $41FF, $4200, $4201 ...

So basically instead of just incrementing the screen pointer by one, indirectly using
    inc IMGPTR    ; will wrap around inside a bank. bad.
I instead had to add a '1' to it, then add the carry bit onto the high byte of the value.  I need to take a step back here.  The 6502 only really has grasp of 8 bit (one byte) values.  It can use 16 bit values for addresses, stored as two bytes, but all math functions happen on the one-byte scale.
    clc          ; clear the carry bit  (Carry = 0)
    lda IMGPTR   ; A = *IMGPTR
    adc #$01     ; A = A + 1 + C
    sta IMGPTR   ; *IMGPTR = A
      ; at this point, the carry bit is either set or not,
      ; so we will add 0 into the next byte with carry
    lda IMGPTR+1 ; A = *IMGPTR+1
    adc $#00     ; A = A + 0 + C
    sta IMGPTR+1 ; *IMGPTR+1 = A

Why use RLE? A couple reasons.  First of all, it will save ROM space.  The RLE encoded (compressed) images should take a bit less space inside the rom.  An alternate we could do is to store color data in both nibbles of the byte, then just shift them out to the screen.  We would lose the ability for transparency, but you're guaranteed 50% space savings with the system we have here.

The full source code for this project is over at github.

The image shown at the top of this post shows three sprites stored in the ROM.  They were hand-encoded from graph paper sketches of various sources.  The rainbow was just coded by scratch to test out everything.

The red ghost is obviously borrowed from Namco's "Pac-Man" arcade game.  The mouse is borrowed from Nintendo's "Goonies" arcade game. Both are used for educational/demonstrative purposes here.