VGA card development, tiled graphics mode part 1

With the additions of the hardware accelerated drawing functions to improve the performance in EmuTOS the VGA card seemed to be in a good position to start adding new functionality.

One of my long-term goals for the VGA card project has been to support “game” modes, specifically running at 320x240 resolution. While bitmap modes are possible, updating a full bitmap directly, even at 320x240 is beyond the capabilities of the 68000 CPU for anything interactive. Inspired by classic consoles and arcade systems from the 80s and 90s, the decision was to go with a tile-based graphics mode.

Tile-based graphics are a rendering technique where the screen is composed of many small, reusable image blocks called tiles. The main benefit here is that to draw a full screen, instead of having to update every pixel, you just need to update a tile map and the VDP will handle the screen drawing, it results in a lot less work on the CPU freeing it up for other tasks.

Initial attempt implementing tile graphics mode. #

For the tilemap mode, I thought this was a good starting point:

Tile and Map Specifications

Tile Size: 8x8 pixels
Map Dimensions: 64x64 tiles (512x512 pixels)
Colour Depth: 8-bpp (256 colours)

To test the tile mode, I wanted to use a known, recognised image so looking at Arcade games which heavily used tile maps, I used a section of the Ghouls n Ghosts map from vgmaps.com.

Using Pro Motion NG, I converted part of the map into an 8x8 tile image and corresponding map data. This provided a solid reference for initial development.

I created some quick tools to convert the tile image data into a format suitable for the VRAM, a dual port ram was added in the FPGA to store the map data.

Fetching and displaying the tiles on a pixel by pixel basis is not going to work, so the idea was to use a similar method to the framebuffer mode, where it starts populating a FIFO buffer with data on the start of the horizontal blank on the end of a line.

For each pixel, it would look up the tile index based on the expected display coordinate, fetch the data from VRAM and add to the display FIFO.

To display each tile on screen involves several steps.

Lookup tile index from the tile map
Fetch the tile data from VRAM
Get the pixel from the tile data
Write the pixel data to the FIFO

graph LR; A[Index lookup]-->B[VRAM fetch]; B-->C[Tile to pixel]; C-->D[Pixel to FIFO];

The VGA timing is still for 640x480 but the tile map is targeting 320x240 resolution so each pixel needs to be doubled on both the X and Y axis.

At this stage, doubling the pixels on the X axis is simply displaying the same pixel from the FIFO twice for each pixel on the horizontal axis, to double the pixels vertically, it’s currently reading and adding data to the FIFO twice, once for each row.

Initial output and timing issues #

After a few false starts and bug fixing, I managed to get something that was starting to look like a tiled image.

I eventually arrive at a nice stable image, not perfect but starting to look more like the output that I’m expecting.

The main issue in this image is that the first tile row is actually starting around a third of the way into the image and the last row is repeated at the top of the screen. Other than that, the image looks like a good starting point and giving a sense of optimism that the code is moving in the right direction.

Fixing the image #

….and starting the curse of off by 1 clock cycles #

Initial thoughts on fixing the image was that the FIFO wasn’t starting at the correct point, with some work on the timings, I managed to get something that looked a bit more reasonable but still with issues. The image still looked as if it was starting drawing the row too late. Convinced this was an issue with the FIFO, I changed the FIFO to use a line buffer for the row, this has two benefits to it. It allows deterministic positions in that reading pixel 0, should always be the correct position, and having a full line buffer means that it no longer needs to fetch the data twice for duplicating the vertical row.

Unfortunately, this didn’t resolve the problem that start of the image is still showing the wrong tile data at the start of the image.

This is where the for lack of a better term “interesting” debugging process started. The tile at index 0 was wrong, but there was a few moving parts to the puzzle here, was the CPU side writing the correct data? To rule that out I initialised the tile map ram with known data. Them in simulation I initialise the SRAM with known data. With this, running in simulation looks good but on real hardware the problem still occurs. After a lot of tests writing test data to the ram and tile maps and many other debugging steps, as a test, I force the map index for the first tile in the row to a known value and it shows the up as expected.

  if tile_row = 0 then
      tile_index <= 0;
  else
      tile_index <= unsigned(tm_rd_data(10 downto 0));
  end if;

This shows the first tile row is picking up the forced tile, but there are also pixels in front of the tile which shouldn’t be there, but that’s a different issue for the moment. What I was expecting to be shown as the first tile in the line is actually starting as the second tile, something was not working here.

Simulation vs real hardware #

I was making the assumption that the Dual port RAM in the FPGA BRAM latency works on a single clock cycle, and when looking at simulation it looks as if it was doing exactly what was expected. When fetching the tile map index, was setting the address for the tile location and reading it back in the next clock cycle.

    when S_FETCH_TM =>
        tm_rd_addr <= std_logic_vector(resize(tm_addr, tm_rd_addr'length));
        state      <= S_WAIT_TM;

    when S_WAIT_TM =>
        tile_index <= unsigned(tm_rd_data(10 downto 0));

Adding an extra state into fetching the tilemap index fixed the problem.

    when S_FETCH_TM =>
        tm_rd_addr <= std_logic_vector(resize(tm_addr, tm_rd_addr'length));
        state      <= S_WAIT_TM;

    when S_WAIT_TM =>
        state      <= S_WAIT_TM2;
        
    when S_WAIT_TM2 =>
        tile_index <= unsigned(tm_rd_data(10 downto 0));

Now the tiles are showing in the correct position. This was the first (of many) issue where I had to add an additional wait state to fix an off by 1 clock issue. There was still the issue of a few pixels at the start of the line but those turned out to be a similar issue where I had to insert an additional wait state, this time in the SRAM controller. This is a situation where real hardware does something different to the simulated version.

Scrolling the map #

With the tiles now showing correctly, the next step was to implement scrolling.

The map data on the X axis is filled to the full 64 tiles on the map but on the Y axis it was only 30 tiles so I duplicated the top of the map just to have data to test.

Scrolling in the Y direction was a simple process. it’s just adding the Y scroll value to the current line value before fetching the tile. Per pixel scrolling on the X axis is a little more complex as if the starting position is in the middle of the tile, it has to fetch an additional tile for the row. The X and Y values after the scroll values are added are clamped to 512 pixels so it wraps around the tile map data.

After getting the tile display working reliably, this was a logical extension of what the code was already doing and quick to implement.

Conclusion #

Implementing a tile-based graphics mode for the VGA card was a challenging but eventually rewarding process. The experience highlighted the value of simulation, but also the pitfalls of hardware timing and where simulation can fail you.

This give me a good starting point and a solid base to work with, but things went in an unexpected direction to be continued in part 2.