# FPGgram

Diana Valverde Tonye Brown

## Overview

- Aim
- Design
  - Neural Network Structure
  - Hardware
  - Software
- Results
- Lesson Learned

#### Aim

Our project focuses on using convolutional neural network for image processing. Specifically we would like to recreate an image in an artistic style. The output image is created through convolutional neural network that recognizes the content of the image and applies the style of a separate image.

Our aim is to accelerate this Very Deep Convolutional Neural Network by implementing layers of the network in hardware and allow a software program to interface between these layers.

## VGG network

Our project implemented the convolution and average pool functions as well as additional units to handle the back-propagation





# Top level

- 64 bit ISA
- Memory control instruction
  - High 32 bits: 1 bit (mem or ALU), 3 bits (buffer to read/write), 1 bit (reset), 26 bits DDR3 address
  - Low 32 bits: 8 bits (stride), 8 bits (rows), 16 bits (block)
- ALU instruction
  - High 32 bits: 1 bit (mem or ALU), 3 bits (buffer to read), 4 bits (output sub ID), 2 bits (input/output sub block), 2 bits (read row + whether it's a row or column - also use input/output sub block), 1 bit reverse mask), 19 empty bits
  - Low 32 bits: 16 bits (input block ID), 16 bits (output block ID)
- 128 bit data may used shared memory to make this transfer

## Mem control Unit

- Read buffer 1
  - 256x256 buffer
  - Composed of 64 4x4RAMs
    - This is to access blocks
      - of data in one clock
      - cycle
  - Bit for padded read
- Main buffer used
- Reads image from DDR3 based on stride, row, block inputs

#### Mem control unit contd.

- Read buffer 2
  - 256x128 buffer
  - Composed of 4x4 RAMs
  - Bit for padded read
- Secondary buffer only used for Gram matrix calculations and
- Reads from DDR3

#### Mem control unit contd.

- Mask buffer
  - 4x4 buffer
- Reads mask from memory

#### Write back accumulator

- Write buffer
  - 256x256
  - Composed of 16 dual-port rams
    - To write/accumulate in one cycle
- Writes results back to DDR3 once accumulation is done

#### 64 RAMs

- Otherwise takes 1-2 hours to compile
  - Single-cycle 4x4 block access too complicated for altera to optimize/infer RAM
- Single-cycle access made possible with striding

| 0  | 1  | 2  | 3  | 12 | 13 | 14 | 15 | 8  | 9  | 10 | 11 |  |
|----|----|----|----|----|----|----|----|----|----|----|----|--|
| 4  | 5  | 6  | 7  | 0  | 1  | 2  | 3  | 12 | 13 | 14 | 15 |  |
| 8  | 9  | 10 | 11 | 4  | 5  | 6  | 7  | 0  | 1  | 2  | 3  |  |
| 12 | 13 | 14 | 15 | 8  | 9  | 10 | 11 | 4  | 5  | 6  | 7  |  |
|    |    |    |    |    |    |    |    |    |    |    |    |  |

#### ALU

- Takes in 1 64-bit input as the instruction set along with 6x6 matrix of 32 bits.
- If first bit is high instruction set goes to ALU
- 3 bits encode which buffer to read/write from.
- 4 bits for output sub id from block
- 2 bits for input/output sub-block
- 2 bits enable read and encodes whether data is in rows or columns
- 1 bit rev mask
- 16 bits encode the block ID in memory for input
- 16 bits encode the block ID in memory for output



# **Multipliers**

- 27-fixed point multiplications for ALU units
  - 1 bit signed
  - 14 bits integer
  - 13 bits fraction
- 112 multipliers on board
- 144 multipliers needed for 3x3 convolutions
- Solution: 112 hard multipliers, 22 soft multipliers

## **Additional Peripherals**

To fully implement this project, a vga framebuffer was also implemented in the device. Pixel data is sent from CPU and stored on the framebuffer and displayed through the VGA capabilities on the FPGA



|                                                   | -                                                                         | Files 🖂 🖂 🖾              |                         | st.sv 💥 🕹 Compilation Report - SoCKit_DDR3_RTL_Test 💥 🎄                 | myRAMinitializer.v 🗶 🗣 write_back_acc               | umulator.sv 🗶       | • Resource Optimiza       | uon Advisor 🔏     |      | IP Catalog                                     |
|---------------------------------------------------|---------------------------------------------------------------------------|--------------------------|-------------------------|-------------------------------------------------------------------------|-----------------------------------------------------|---------------------|---------------------------|-------------------|------|------------------------------------------------|
| upgr                                              | ade recommended.                                                          | IP Upgrade Tool X        | Flow Summary            | Compilation Hierarchy Node                                              | LC Combinationals                                   | LC Registers        | Block Memory Bits         | DSP Blocks        | Pir  | 🖻 Library                                      |
| Files                                             |                                                                           | ^                        | Flow Settings           | 1   IsoCKit_DDR3_RTL_Test                                               | 87624 (154)                                         | 5982 (135)          | 1540512                   | 111               | 242  | Basic Function:                                |
| abe FI                                            | PGA DDR3/Avalon bus RW Test.sv                                            |                          | Flow Non-Default GI     |                                                                         | 31996 (703)                                         | 869 (869)           | 0                         | 111               | 0    | Arithmetic                                     |
| ibo re                                            | verse mask.sv                                                             |                          | Flow Elapsed Time       | 1  backprop_pool:bp                                                     | 100 (100)                                           | 0 (0)               | 0                         | 0                 | 0    | 🕀 Bridges and                                  |
| Reverse Inask.sv                                  |                                                                           |                          |                         | 2       E [convolution:convo]                                           | 31044 (0)                                           | 0 (0)               | 0                         | 95                | 0    |                                                |
| E Flow Log                                        |                                                                           |                          |                         | 3 [dot_product:dp]                                                      | 149 (149)                                           | 0 (0)               | 0                         | 16                | 0    |                                                |
| Conv.sv 🗆 🗁 Analysis & Synthes                    |                                                                           |                          |                         | 2  ⊕  fpga_ddr3:fpga_ddr3_inst                                          | 4181 (0)                                            | 3951 (0)            | 213408                    | 0                 | 0    |                                                |
| backprop relu.sv                                  |                                                                           |                          |                         | 3  [mem_control:mv]                                                     | 51198 (2093)                                        | 950 (0)             | 1327104                   | 0                 | 0    |                                                |
| backprop_reld.sv ⊕ Esttings                       |                                                                           |                          |                         | 1  mask_buffer:mb_i                                                     | 58 (58)                                             | 288 (288)           | 0                         | 0                 | 0    | On Chip Me                                     |
| avg_pool.sv   Parallel Compila                    |                                                                           |                          |                         | 2                                                                       | 17636 (83)                                          | 75 (66)             | 442368                    | 0                 | 0    | # FIF(                                         |
| Source Files Re:                                  |                                                                           |                          |                         | 3                                                                       | 16161 (82)                                          | 66 (66)             | 442368                    | 0                 | 0    | I RAM                                          |
| FPGA DDR3/fpga ddr3/fpga ddr3.sip  Resource Usage |                                                                           |                          |                         |                                                                         | 15250 (105)                                         | 521 (69)            | 442368                    | 0                 | 0    | # RAM                                          |
|                                                   | -GA_DDR3/lpga_ddi3/lpga_ddi3.sip                                          |                          | 🔜 Resource Utiliza      | 4 ⊕  sld_hub:auto_hub                                                   | 95 (1)                                              | 77 (0)              | 0                         | 0                 | 0    | # RAI                                          |
| s                                                 | Compilation                                                               |                          | RAM Summary             |                                                                         |                                                     |                     |                           |                   |      | r RO                                           |
| 1                                                 | Task                                                                      | <b>(</b> )               | DSP Block Usac          |                                                                         |                                                     |                     |                           |                   |      | I ROI                                          |
|                                                   | TimeQuest Timing Analysis                                                 | 00:00:00                 | IP Cores Summa          |                                                                         |                                                     |                     |                           |                   |      | 🖷 Shil                                         |
| -                                                 | Edit Settings                                                             | 00.00.00                 | F State Machines        |                                                                         |                                                     |                     |                           |                   |      | Simulation                                     |
|                                                   | View Report                                                               |                          | Optimization Re:        |                                                                         |                                                     |                     |                           |                   |      | Debug a                                        |
| -                                                 |                                                                           |                          | E Source Assignm        |                                                                         |                                                     |                     |                           |                   |      | <ul> <li>Ecough</li> <li>F) Simulat</li> </ul> |
| _                                                 | <ul> <li>TimeQuest Timing Analyzer</li> <li>EDA Netlist Writer</li> </ul> |                          | Parameter Settir        |                                                                         |                                                     |                     |                           |                   |      | Verifica                                       |
|                                                   |                                                                           |                          | E LPM Parameter         |                                                                         |                                                     |                     |                           |                   |      | IT DSP                                         |
| -                                                 | Edit Settings                                                             |                          | E Connectivity Che      |                                                                         |                                                     |                     |                           |                   |      | Interface Proto                                |
|                                                   | View Report                                                               |                          | Post-Synthesis I        | <                                                                       |                                                     |                     |                           |                   | 2    | <ul> <li>Memory Interfa</li> </ul>             |
|                                                   | Edit Settings                                                             |                          | Elapsed Time Pe         | Note: For table entries with two numbers listed, the numbers in parent  | heses indicate the number of resources of the       | iven type used by   | the specific entity alone | . The numbers lis | sted | _ wentory meete                                |
|                                                   | Program Device (Open Programmer)                                          |                          | Messages                | outside of parentheses indicate the total resources of the given type u | sed by the specific entity and all of its sub-entit | ies in the hierarch | у.                        |                   |      |                                                |
| _                                                 |                                                                           | -                        | ( messages v            |                                                                         |                                                     |                     |                           |                   |      | + Add                                          |
| -                                                 |                                                                           |                          |                         |                                                                         |                                                     |                     |                           |                   |      | Construction and the second second             |
| Ту                                                |                                                                           | there is also determined |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   | 128002 Starting physical syn<br>128003 Physical synthesis al              |                          |                         | d alack improvement of 0 no                                             |                                                     |                     |                           |                   |      |                                                |
|                                                   | 128002 Starting physical syn                                              |                          |                         | stack improvement of 6 ps                                               |                                                     |                     |                           |                   |      |                                                |
|                                                   | 128002 Starting physical syn<br>128003 Physical synthesis al              |                          |                         | delack improvement of Q pe                                              |                                                     |                     |                           |                   |      |                                                |
|                                                   | 128001 Physical synthesis op                                              |                          |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   | 176233 Starting register pac                                              |                          | compreter crupsed crue  | 15 00.02.04                                                             |                                                     |                     |                           |                   |      |                                                |
| -                                                 | 176235 Finished register pac                                              |                          |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
| - 1                                               | 176219 No registers were pac                                              |                          |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
| Ŧ 🦊                                               | 15705 Ignored locations or                                                | region assignments to    | the following nodes     |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   | 11798 Fitter preparation op                                               |                          |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   | 170190 Eitter placement pren                                              | aration operations be    | ginning                 |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   |                                                                           |                          |                         |                                                                         |                                                     |                     |                           |                   |      |                                                |
|                                                   |                                                                           | blocks of type combined  | national node. However, | the device contains only 83820 blocks.                                  |                                                     |                     |                           |                   |      |                                                |

System Processing (1142)