# BameGoy ### Nintendo GameBoy Hardware Emulator ### GameBoy Specs **SoC**: Nintendo DMG-CPU (Sharp LR35902) **CPU**: 4.194304 MHz Sharp SM83 **Screen**: 45.5mm x 41.5mm Memory: 64 KB - On SoC: 256 B "bootstrap" ROM, 127 B High RAM - Internal: 8 KB RAM, 8 KB Video RAM - External: (in cartridge) up to 1 MB ROM, up to 128 KB RAM **Resolution**: 160 (w) × 144 (h) pixels (10:9 aspect ratio) - 4 possible colors of gray **Power**: 70–80 mAh, 4 x AA batteries I/O: joypad, audio, graphics, LCD ### GB-Z80 Specs **GB-Z80:** Using modified open-source Z80, also operating at 4.19 MHz **Timing**: Instructions in multiples of 4 cycles **Interrupts**: V-Blank, LCD controller, timer, serial, and joypad **I/O**: performed through memory load/store instructions **Modifications:** for this specific usage, there were modifications, shown in the table on the right. | Opcode | Z80 | GB CPU | |-------------|------------------------------|----------------| | 08 | EX AF,AF | LD (nn),SP | | 10 | DJNZ PC+dd | STOP | | 22 | LD (nn),HL | LDI (HL),A | | 2A | LD HL,(nn) | LDI A,(HL) | | 32 | LD (nn),A | LDD (HL),A | | 3A | LD A,(nn) | LDD A,(HL) | | D3 | OUT (n),A | - | | D9 | EXX | RETI | | DB | IN A,(n) | | | DD | <ix> prefix</ix> | - | | E0 | RET PO | LD (FF00+n),A | | E2 | JP PO,nn | LD (FF00+C),A | | E3 | EX (SP),HL | - | | E4 | CALL P0,nn | - | | E8 | RET PE | ADD SP,dd | | EA JP PE,nn | | LD (nn),A | | EB | EB EX DE,HL - | | | EC | CALL PE,nn | - | | ED | <pre><prefix></prefix></pre> | - | | F0 | RET P | LD A, (FF00+n) | | F2 | JP P,nn | LD A,(FF00+C) | | F4 | CALL P,nn | - | | F8 | RET M | LD HL,SP+dd | | FA | JP M,nn | LD A,(nn) | | FC | CALL M,nn | | | FD | <iy> prefix</iy> | - | | CB 3X | SLL r/(HL) | SWAP r/(HL) | ### Memory #### 64 KiB Memory allocated as follows: # System Block Diagram ### PPU ### PPU Modes During a scanline, PPU enters specific modes with distinct functions: Mode 2: OAM Scan - PPU searches OAM memory for sprites and stores in buffer - 80 T-cycles (PPU checks new OAM every 2 T-cycles) Mode 3: Drawing PPU transfers pixels to LCD Mode 0: H-Blank - "Padding" for remainder of scanline til reaches total of 456 T-cycles - PPU "paused" Mode 1: V-Blank - "Padding" similar to H-Blank - Takes place at the end of every frame for longer duration # PPU Modes & Timing # Object Attribute Memory (OAM) GameBoy can display up to 40 moveable objects/sprites - Maximum of **10** objects per scanline - Each object consist of 4 bytes - PPU compares (LCDC bit 2) to determine sprite height Object attributes in memory at \$FE00-\$FE9F #### Writing to OAM: - Data is written to a buffer in WRAM - WRAM copied to OAM - Direct OAM writing only works during HBlank or VBlank OAM Sprite Memory SFE00 - SFE9F Byte 0: Y-Position Actual Y position on screen = Byte 0 - 16 Byte 1 - X-Position: Actual X position on screen = Byte 1 - 8 Byte 2 - Tile Number: Unsigned 8 bit integer used for Sprite fetching. #### Byte 3 - Sprite Flags: | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 | |-----------------------|--------|--------|-------------------|---------------------------------------|----------------------|-------------|-------------| | OBJ-to-BG<br>Priority | Y-Flip | X-Flip | Palette<br>Number | Mode 0<br>STAT<br>Interrupt<br>Enable | Coincide<br>nce Flag | PPU<br>Mode | PPU<br>Mode | Bit 7: OBJ-to-BG Priority 0 = Sprite is always rendered above background 1 = Background colors 1-3 overlay sprite, sprite is still rendered above color 0 Bit 6: Y-Flip If set to 1 the sprite is flipped vertically, otherwise rendered as normal Bit 5: X-Flip If set to 1 the sprite is flipped horizontally, otherwise rendered as normal Bit 4: Palette Number If set to 0, the OBP0 register is used as the palette, otherwise OBP1 Bit 3-0: CGB-Only flags ### Object Attribute Memory (OAM) #### **OAM DMA Transfer:** Writing to \$FF46 – DMA initiates DMA transfer from WRAM to OAM #### Selection priority: - Only selects first (up-to) 10 objects to be drawn; can apply to off-screen objects - Setting Y=0 would therefore "hide" sprite #### **Drawing priority:** - Priority given to smaller X coordinate - If X identical, sprite located first in OAM has higher priority ### PPU Block Diagram ### Background Fetching Fetching pixels takes 2 T-Cycles to complete, with the process: - 1. Fetch Tile No. - a. For background pixel offsets tile data by 32 \* (((LY + SCY) & OxFF) / 8) - 2. Fetch Tile Data (Low) - a. Fetches first byte of tile data with offset of 2 \* ((LY + SCY) mod8) - 3. Fetch Tile Data (High) - a. Same as Low, except next byte is read and stored - 4. Push to FIFO - a. Only executes if FIFO is fully empty - Step 4 usually restarts twice before pushing Steps 1-3 take 6 T-cycles, PPU takes 8 T-cycles to shift out all 8 pixels ### Pixel FIFO Individual pixels are pushed to the LCD one by one - Each pixel holds the information of color, palette, sprite priority, and background priority - The **pixel fetcher** is responsible for loading the FIFO registers Background and Pixel FIFO are merged when pushed to LCD ### Background Scrolling SCX register allows for scrolling background on a per-pixel basis - While shifting pixels out of background FIFO, start of scanline **SCX mod 8** pixels are discarded - Simultaneously, per-tile horizontal scrolling is handled with fetching process - Results in PPU Mode 3 extending by SCX mod 8 cycles ### Sprite Fetching The Sprite Fetcher works very similarly to the background fetcher: - 1. Fetch Tile No. - 2. Fetch Tile Data (Low) - 3. Fetch Tile Data (High) - 4. Push to FIFO - Tiles taken from \$8000-\$87FF and unsigned numbering If (X-Position of any sprite in buffer) ≤ (current Pixel-X-Position + 8), sprite fetch is initiated: - resets the Background Fetcher to step 1 and temporarily pauses it - pixel shifter to the LCD is also suspended ### Sprite Timing #### Once sprite fetch is completed - PPU starts pushing pixels to LCD - Background fetcher is restarted - Delay occurs if <6 pixels remaining in Background FIFO (delay=6 REMAINING\_PIXEL\_COUNT) #### Sample timing diagram: # Sprite Timing #### Example of tiles creating background map: ### Tile Data Tile data stored in **VRAM** in **\$8000-\$97FF** Tiles can be displayed as part of **Background/Window** and/or moveable **Sprites** #### \$8000 Method: - \$8000: base pointer - TILE\_NUMBER: unsigned 8-bit integer - Add \$8000 to (TILE\_NUMBER \* 16) #### \$8800 Method: - \$9000: base pointer - SIGNED\_TILE\_NUMBER: signed 8-bit integer - Add \$9000 to (SIGNED\_TILE\_NUMBER \* 16) ``` abcdefgh ijklmnop ia jb kc ld me nf og ph 00111100 01111110 00 10 11 11 11 11 10 00 $3C $7E 01000010 01000010 00 11 00 00 00 00 11 00 $42 $42 01000010 01000010 00 11 00 00 00 00 11 00 $42 $42 00 11 00 00 00 00 11 00 $42 $42 01000010 01000010 01011110 00 11 01 11 11 11 11 00 $7E $5E 00001010 00 01 01 01 11 01 11 00 01111110 $7E $0A 01010110 00 11 01 11 01 11 10 00 01111100 $7C $56 $38 $7C 00111000 01111100 00 10 11 11 11 10 00 00 abcdefgh ijklmnop ia ib kc ld me nf og ph Sample tile data ``` ### Tile Map Game Boy contains two 32×32 tile maps in VRAM at \$9800-\$9BFF and \$9C00-\$9FFF Each tile map has 1 byte indexes of tiles to be displayed - Tiles obtained through address from VRAM selected via LCDC register #### Background-to-Object Priority - BG/OBJ priority declared in **3** places: BG Map bit 7, LCDC bit 0, OAM bit 7 - Sample priority table shown below: | LCDC bit 0 | OAM attr bit 7 | BG attr bit 7 | Priority | |------------|----------------|---------------|-----------------------------| | 0 | 0 | 0 | OBJ | | 0 | 0 | 1 | OBJ | | Ø | 1 | 0 | OBJ | | Ø | 1 | 1 | OBJ | | 1 | 0 | 0 | OBJ | | 1 | 0 | 1 | BG color 1–3, otherwise OBJ | | 1 | 1 | 0 | BG color 1–3, otherwise OBJ | | 1 | 1 | 1 | BG color 1–3, otherwise OBJ | # Interrupt Handling | VBlank | INT \$40 | Requested every time Game Boy enters VBlank | | |--------|--------------------------------------------------------------------------------------------------------------|---------------------------------------------|--| | STAT | INT \$48 | Can be triggered by various sources | | | Timer | INT \$50 | Requested every time the timer overflows | | | Serial | INT \$58 Requested upon completion of a serial data transfer (8 serial clock cycles after starting transfer) | | | | Joypad | INT \$60 | Requested when a button is pressed | | ### Color Rendering The GameBoy is capable of 4 colors - Each color takes 2 bits - Each tile in tile data set is held in 16 bytes. Each color is achieved with the equation: - (Each bit of first byte) + (bit in same position on second byte) => calculates color number ### Scrolling GameBoy scrolling is 160x144 pixels, but background map is 256x256 pixels $\rightarrow$ **scrolling** - Background is defined at the top-left of screen - By moving this point between frames, the background can scroll - Top-left is defined by registers **Scroll X**, **Scroll Y** ### Timing Proper timing ensure proper synchronization with the CPU, frame rate and refresh rate, memory access, and rendering pipelines (For testing) - Calibrated clock cycles using gtkwave - Example final timing diagram of PPU ### Timing #### Example timing diagram of memory mechanism - "dots [8:0]" represents instructions processed by the PPU, for the sake of testing - System stalls during testing but not deployment - "dot" increments, then moves onto the next instruction. # PPU Testing Using interactive test benches, gradually built up capabilities of PPU - Example images of initial PPU testing - Found issues with timing and synchronization # PPU Testing Tile map drawn from one tile Tile map drawn from multiple tiles # PPU Testing **Sprites implemented** Sprites implemented and handles LCDC flags, window, multiple BG maps, and scrolling GameBoy Video: 160x144 pixels VGA: 1280 x 1024 pixels @ 60Hz VGA Game Window: **960 x 864 pixels** (centered on screen) - GameBoy pixel data written into framebuffer @ 4MHz - VGA reads framebuffer with pixel frequency @ 108 MHz - One pixel every cycle - Referenced Lab 3 as a VGA scaling resource - Background set to Pantone 292! | General timing | | | | |---------------------|---------------------|--|--| | Screen refresh rate | 60 Hz | | | | Vertical refresh | 63.981042654028 kHz | | | | Pixel freq. | 108.0 MHz | | | | | - | | | # Joypad #### **NES Controller** - Based on states of output pins (P14-15) and input pins (P10-13), CPU can identify a button press - Configured to our CPU using libusb (referenced lab2) ### Cartridge Obtained Tetris cartridge from online open source Cartridge header (\$0100-\$014F) provides the information for running a game, namely: - 0147 Cartridge type - 0148 ROM Size - 0149 RAM Size Cartridge ROM loaded to On-Chip RAM on DE1-SoC # PPU Compilation #### Analysis and Synthesis Resource Utilization of PPU | Analysis | Analysis & Synthesis Resource Utilization by Entity < | | | | | | |---------------------------|--------------------------------------------------------|---------------------|---------------------------|--|--|--| | < <filte< th=""></filte<> | | | | | | | | | Compilation Hierarchy Node | Combinational ALUTs | Dedicated Logic Registers | | | | | 2 | ▼ [PPU3:GB_PPU] | 734 (729) | 488 (456) | | | | | 1 | PPU_SHIFT_REG:bg_fifo | 3 (3) | 16 (16) | | | | | 2 | [PPU_SHIFT_REG:sp_fifo] | 2 (2) | 16 (16) | | | | | Flow Summary | | |---------------------------------|---------------------------------------------| | < <filter>&gt;</filter> | | | Flow Status | Successful - Sat Aug 3 15:18:00 2024 | | Quartus Prime Version | 21.1.0 Build 842 10/21/2021 SJ Lite Edition | | Revision Name | soc_system | | Top-level Entity Name | soc_system_top | | Family | Cyclone V | | Device | 5CSEMA5F31C6 | | Timing Models | Final | | Logic utilization (in ALMs) | 3,576 / 32,070 ( 11 % ) | | Total registers | 2493 | | Total pins | 356 / 457 ( 78 % ) | | Total virtual pins | 0 | | Total block memory bits | 705,792 / 4,065,280 ( 17 % ) | | Total DSP Blocks | 0/87(0%) | | Total HSSI RX PCSs | 0 | | Total HSSI PMA RX Deserializers | 0 | | Total HSSI TX PCSs | 0 | | Total HSSI PMA TX Serializers | 0 | | Total PLLs | 1/6(17%) | | Total DLLs | 1 / 4 ( 25 % ) | | Ana | lysis & Synthesis Resource Usage Summary | | |-----|--------------------------------------------------------|-----------------------------------------------------------------------------------| | | <filter>&gt;</filter> | | | | Resource | Usage | | 1 | Estimate of Logic utilization (ALMs needed) | 3694 | | 2 | | | | 3 | <ul> <li>Combinational ALUT usage for logic</li> </ul> | 5620 | | 1 | 7 input functions | 115 | | 2 | 6 input functions | 1518 | | 3 | 5 input functions | 1256 | | 4 | 4 input functions | 1102 | | 5 | <=3 input functions | 1629 | | 4 | | | | 5 | Dedicated logic registers | 2211 | | 6 | | | | 7 | I/O pins | 356 | | 8 | I/O registers | 186 | | 9 | Total MLAB memory bits | 0 | | 10 | Total block memory bits | 705792 | | 11 | | | | 12 | Total DSP Blocks | 0 | | 13 | | | | 14 | ▼ Total PLLs | 4 | | 1 | PLLs | 4 | | 15 | | | | 16 | Total DLLs | 1 | | 17 | Maximum fan-out node | soc_system:soc_system0 soc_system_Main_PLll altera_pll:altera_pll_i outclk_wire[1 | | 18 | Maximum fan-out | 903 | | 19 | Total fan-out | 37130 | ### Game Play Testing with PPU Game Boy boot screen Dr. Mario start screen ### Open Source Emulator We utilized an open source emulator to base our systems on - Used functional CPU in conjunction with our PPU, hardware/software interface, and peripherals (joypad) - Independently tested our PPU with testbenches, performing successfully in simulation - Ran into issues with integration into larger emulator - Confirmed PPU issues by successfully running open source game play ### Qsys File #### Tested open source CPU to ensure functionality - Generated Qsys file shown below - Managed interconnects and removed SDRAM (used utilized on-chip RAM instead) ### Open Source Game Play Confirmed the function of our peripherals and hardware/software interface through integration with open source emulator, shown below Tetris game play Dr. Mario game play # Demo!