## Assembly Languages II

#### Prof. Stephen A. Edwards

with contributions from Prof. Brian Evans, Niranjan Damera-Venkata and Magesh Valliappan, UT Austin

Copyright © 2001 Stephen A. Edwards All rights reserved

### Last Time

- General model of assembly language
  - Undifferentiated sequence of instructions Arithmetic instructions (ADD, SUB)
  - Control-flow (JMP, CALL, RET)
- Four main types: CISC, RISC, DSP, and VLIW
- CISC
  - Few, special-purpose registers Complex addressing modes

  - · Powerful instruction (e.g., string move)
- RISC
  - Many general-purpose registers
  - Few addressing modes Arithmetic operations don't touch memory
  - · Simple instructions

Copyright © 2001 Stephen A. Edwards All rights reserved

#### **Digital Signal Processor Apps.**

- Low-cost embedded systems Modems, cellular telephones, disk drives, printers
- High-throughput applications
  - · Halftoning, base stations, 3-D sonar, tomography
- PC based multimedia
  - · Compression/decompression of audio, graphics, video
- Embedded processor requirements
  - · Inexpensive with small area and volume
  - · Deterministic interrupt service routine latency
  - Low power: ~50 mW (TMS320C54x uses 0.36 μA/MIPS)

Copyright © 2001 Stephen A. Edwards All rights reserved

# **Conventional DSP Architecture**

- Harvard architecture
  - · Separate data memory/bus and program memory/bus
  - · Three reads and one or two writes per instruction cycle
- Deterministic interrupt service routine latency
- Multiply-accumulate in single instruction cycle
- Special addressing modes supported in hardware
  - · Modulo addressing for circular buffers for FIR filters
  - Bit-reversed addressing for fast Fourier transforms
- Instructions to keep the pipeline (3-4 stages) full
  - · Zero-overhead looping (one pipeline flush to set up)
  - · Delayed branches

Copyright © 2001 Stephen A. Edwards All rights reserved

# **Conventional DSPs**

|                  | Fixed-Point                     | Floating-Point                        |  |
|------------------|---------------------------------|---------------------------------------|--|
| Cost/Unit        | \$5 - \$79                      | \$5 - \$381                           |  |
| Architecture     | Accumulator                     | load-store or<br>memory-register      |  |
| Registers        | 2-4 data, 8 address             | 8-16 data, 8-16 address               |  |
| Data Words       | 16 or 24 bit                    | 32 bit                                |  |
| Chip Memory      | 2-64K data and program          | 8-64K data and program                |  |
| Address<br>Space | 16-128K data,<br>16-64K program | 16M – 4Gdata,<br>16M – 4G program     |  |
| Compilers        | Bad C                           | Better C, C++                         |  |
| Examples         | TI TMS320C5x;<br>Motorola 56000 | TI TMS320C3x;<br>Analog Devices SHARC |  |

Copyright © 2001 Stephen A. Edwards All rights reserved

# **Conventional DSPs**

- Market share: 95% fixed-point, 5% floating-point
- Each processor comes in dozens of configurations · Data and program memory size
  - · Peripherals: A/D, D/A, serial, parallel ports, timers
- Drawbacks
  - · No byte addressing (needed for image and video)
  - · Limited on-chip memory
  - · Limited addressable memory on most fixed-point DSPs
  - · Non-standard C extensions to support fixed-point data

#### **DSP Example**

- Finite Impulse Response filter (FIR)
- Can be used for lowpass, highpass, bandpass, etc.
- Basic DSP operation
- For each sample, computes



- a<sub>0</sub> ... a<sub>k</sub> are filter coefficients
- x<sub>n</sub> and y<sub>n</sub> are the nth input and output sample
   Copyright © 2001 Stephen A. Edwards All rights reserved

# 56001 Programmer's Model



#### 56001 Datapath



# 56001 Memory Spaces

- Three memory regions, each 64K:
  - 24-bit Program memory
  - · 24-bit X data memory
  - · 24-bit Y data memory
- Idea: enable simultaneous access of program, sample, and coefficient memory
- Three on-chip memory spaces can be used this way
- One off-chip memory pathway connected to all three memory spaces
- Only one off-chip access per cycle maximum

Copyright © 2001 Stephen A. Edwards All rights reserved

#### 56001 Address Generation

- Addresses come from pointer register r0 ... r7
- Offset registers n0 ... n7 can be added to pointer
- Modifier registers cause the address to wrap around
- Zero modifier causes reverse-carry arithmetic

| Address | Notation | Next value of r0 |
|---------|----------|------------------|
| r0      | (r0)     | r0               |
| r0 + n0 | (r0+n0)  | r0               |
| r0      | (r0)+    | (r0 + 1) mod m0  |
| r0 – 1  | –(r0)    | r0 – 1 mod m0    |
| r0      | (r0)-    | (r0 – 1) mod m0  |
| r0      | (r0)+n0  | (r0 + n0) mod m0 |
| r0      | (r0)-n0  | (r0 – n0) mod m0 |

Copyright © 2001 Stephen A. Edwards All rights reserved

# FIR Filter in 56001



#### FIR Filter in 56001

| movep         | y:input, >      | "Load a sample from an I/O" (c:(r0)                                                           |  |
|---------------|-----------------|-----------------------------------------------------------------------------------------------|--|
| "Clear accumu | llator A"       | "Load a sample from X memory<br>into x0, advance the pointer"                                 |  |
| cir           | a               | x:(r0)+, x0 <u>y:(r4)+, y0</u>                                                                |  |
| rep<br>mac    | #n-1<br>x0,y0,a | "Load a coefficient from Y memory<br>into y0, advance the pointer"<br>x:(r0)+, x0 y:(r4)+, y0 |  |
| macr          | x0,y0,a         | (r0)-                                                                                         |  |
| movep         | a, y:outp       | ut                                                                                            |  |

Copyright © 2001 Stephen A. Edwards All rights reserved

### FIR Filter in 56001

| movep              | y:input, x:            | (r0)                                                         |
|--------------------|------------------------|--------------------------------------------------------------|
| cir<br>"Denest the | a                      | x:(r0)+, x0 y:(r4)+, y0                                      |
| "Repeat the        | e next instructio      | n n-1 times                                                  |
| rep<br>mac         | r-1<br>#n-1<br>x0,y0,a | etch next sample and coefficient"<br>x:(r0)+, x0 y:(r4)+, y0 |
| macr               | x0,y0,a                | "a = a + x0 * y0"                                            |
| movep              | a, y:outpu             | t                                                            |

Copyright © 2001 Stephen A. Edwards All rights reserved

# FIR Filter in 56001

| movep                                                                                                                       | y:input, x:(r    | y:input, x:(r0)                           |                               |  |
|-----------------------------------------------------------------------------------------------------------------------------|------------------|-------------------------------------------|-------------------------------|--|
| clr                                                                                                                         | а                | x:(r0)+, x0                               | y:(r4)+, y0                   |  |
| rep<br>mac                                                                                                                  | #n-1<br>x0,y0,a  | x:(r0)+, x0                               | y:(r4)+, y0                   |  |
| macr                                                                                                                        | x0,y0,a          | (r0)- ← "G<br>ne                          | Bet ready for the ext sample" |  |
| movep                                                                                                                       | a, y:output<br>≮ | "a = a + x0 * y0 and<br>round the result" |                               |  |
| "Write the filtered result to an I/O<br>device in Y data memory"<br>Copyright © 2001 Stephen A. Edwards All rights reserved |                  |                                           |                               |  |

# TI TMS320C6000 VLIW DSP

- Eight instruction units dispatched by one very long instruction word
- Designed for DSP applications
- Orthogonal instruction set
- Big, uniform register file (16 32-bit registers)
- Better compiler target than 56001
- Deeply pipelined (up to 15 levels)
- Complicated, but more regular, datapath

Copyright © 2001 Stephen A. Edwards All rights reserved

# Pipelining on the C6

- One instruction issued per clock cycle
- Very deep pipeline
- · 4 fetch cycles
- 2 decode cycles
- · 1-10 execute cycles
- Branch in pipeline disables interrupts
- Conditional instructions avoid branch-induced stalls
- No hardware to protect against hazards
   Assembler or compiler's responsibility

Copyright © 2001 Stephen A. Edwards All rights reserved

# 'C6 Datapath



yright © 2001 Stephen A. Edwards All rights reserved

# 'C6 Datapath

- Two identical halves
- Each has
- 16 32-bit registers
- Logical/Arithmetic (.L)
- Shifter/Branching (.S)
- Multiplier (.M)
- · Data/Memory (.D)
- One cross path



Copyright © 2001 Stephen A. Edwards All rights reserved

# Peripherals

- Often the whole point of the system
  - Memory-mapped I/O
  - Magical memory locations that make something happen or change on their own
- Typical meanings:
  - · Configuration (write)
  - · Status (read)
  - · Address/Data (access more peripheral state)

# Example: 56001 Port C

Copyright © 2001 Stephen A. Edwards All rights reserved

FIR in 'C6 Assembly

FIRLOOP:

|| [B0] B

Ш Ш LDH .D1

LDH .D2

.S2

predicated instruction: "Execute only if B0 is non-zero"

|| [B0] SUB .L2

"Run all of these

instructions in parallel"

"Load a halfword (16 bits)"

MPY .M1X A2, B2, A3 ; Sample \* Coefficient

ADD .L1 A4, A3, A4 ; Accumulate result

X: "Use the cross path"

; Fetch next sample

; Branch if non-zero

; Fetch next coefficient

; Decrement loop count

'Do this on unit D1"

\*A1++, A2

\*B1++, B2

B0, 1, B0

FIRLOOP

Nine pins each usable in one of two ways • Simple parallel I/O Serial interface Parallel Serial PC0 RxD Serial Communication Interface (SCI) PC1 TxD PC2 SCLK PC3 SC0 PC4 SC1 PC5 SC2 Synchronous Serial Interface (SSI) PC6 SCK PC7 SRD PC8 STD

Copyright © 2001 Stephen A. Edwards All rights reserved

Copyright © 2001 Stephen A. Edwards All rights reserved

# Port C Registers for Parallel Port

Port C Control Register

· Selects mode (parallel or serial) of each pin

#### X: \$FFE1



Port C Data Direction Register · I/O direction when used in parallel mode X: \$FFE3

0 = Input

1 = Output

Copyright © 2001 Stephen A. Edwards All rights reserved

# **Port C Registers for Parallel Port**

#### Port C Data Register

· Returns input data or sets output state of parallel port X: \$FFE5

> Read: pin state Write: set output pin state

## Port C SCI

- Three-pin interface
- 422 Kbit/s NRZ asynchronous interface (RS-232-like)
- 3.375 Mbit/s synchronous serial mode
- Multidrop mode for multiprocessor systems
- Two Wakeup modes
- Idle line
- Address bit
- Wired-OR mode
- On-chip or external baud rate generator
- Four interrupt priority levels

Copyright © 2001 Stephen A. Edwards All rights reserved

# Port C SCI Registers



Copyright © 2001 Stephen A. Edwards All rights reserved

# **Port C SCI Registers**

#### X: \$FFF1

SCI Status Register (readonly) Human Service Ser

#### **Port C SCI Registers**



Copyright © 2001 Stephen A. Edwards All rights reserved

# Port C SSI

- Intended for synchronous, constant-rate protocols
   Easy interface to serial ADCs and DACs
- Many more operating modes than SCI

Copyright © 2001 Stephen A. Edwards All rights reserved

- Six Pins (Rx, Tx, Clk, Rx Clk, Frame Sync, Tx Clk)
- 8, 12, 16, or 24-bit words

# **Port C SSI Registers**

- \$FFEC SSI Control Register A
   prescaler, frame rate, word length
- \$FFED SSI Control Register B
  - Interrupt enables, various mode settings
- \$FFEE SSI Status/Time Slot Register
   Sync, empty, overrun
- \$FFEF SSI Receive/Transmit Data Register

Copyright © 2001 Stephen A. Edwards All rights reserved