CSEE4823 Handout #39d Prof. Steven Nowick 12/3/11 Mini-Project: FAQ (frequently-asked questions) The following summarizes some questions/answers and clarifications about the mini-project. Listed from most recent to oldest: ====================================================================== 12/11/11 ====================================================================== Q. multiple sources for the same variable: When doing the datapath for the ASM, if we find a particular block has multiple loads from multiple sources, can we use a MUX with a status signal to select which source? And if so, should it just be the state of the controller? Or is it just better to have 3 of the same datablocks each with a different load source? A. I sketched a solution to this in class. There are no status signals -- remember that status signals are outputs of the datapath that go back to the control. Also, MUXes do *not* appear in pseudo-code or gen.-ASM. But if you detect 3 different sources going to the same destination with different load statements, e.g. y:=x, y:=z, y:=120, then the variable y gets allocated (during resource allocation) a 3-1 MUX, which allows each of these (in separate cycles of course) to write to the same location. The variable y typically has control signal inputs, like most datapath covered in class and handout #34. In addition, the MUX gets *additional control signals* as outputs from the control ASM, to configure it for each distinct operation. ====================================================================== 12/10/11 ====================================================================== Q. How different can my pseudo-code (step #1) be from my generalized ASM (step #2)? A. The answer to this was covered in some class lectures. The pseudo-code should be very close in level to the generalized-ASM. See handout #34 as an example. The idea is that the pseudo-code already is broken into simple RTL operations, that map directly to the generalized ASM. Of course, the pseudo-code is behavioral, therefore inherently higher-level, and doesn't have a notion of 'state' or clock cycle (unlike the generalized ASM) Nonetheless, as indicated in class, the operations in pseudo-code and generalized ASM should be at a similar level, and the former ones correspond simply and directly to the latter ones. Q. WHAT TO HAND IN: for steps #1 (pseudo-code) and #2 (generalized ASM), should we add comments, notations, clarifications, etc. to make our submitted projects more readable and understandable? A. YES! These are basic requirements, as we went over explicitly for HW#5. In particular, much like Handout #34 (but in more detail, because you are handling more complex designs): (i) listing basic variables/storage units (before step #1): *Before* presenting your pseudo-code, you should neatly list each 'variable' you will use. (For ex., the variables in handout #34 included Input, Output, Data and Ocount.) For each variable, indicate total # of bits, any breakdown into individual fields, subfield names, etc. I strongly suggest you DRAW A FIGURE for each storage variable, indicating the above. You can show any breakdown into fields in the figure. The above is part of general documentation, whether for TA's, management or design review. This documentation makes it easier for others to follow your pseudo-code and generalized ASM. (ii) commenting and annotating your pseudo-code (step #1) It is very useful to include comments within your pseudo-code to document the major steps, what each part does, etc. In addition to inserted text comments, you can also include brackets around groups of instructions, as in handout #34, to highlight the larger parts of your code. (iii) commenting and annotating your generalized ASM (step #2) It is useful, as in HW#5, to draw dotted lines around groups of ASM blocks that form steps, and annotate them. For example, in the FP adder problem, you can group ASM blocks that handle special symbols, indicate where initialization occurs, where output occurs, where an inner loop body occurs and what it does. All such neat *hierarchical* documentation adds to the readability of your solution, helps to present your solution more clearly, and is part of professional practice. ====================================================================== 12/8/11 ====================================================================== NOTE: The 12/3 FAQ entry on "parity operation" has been updated below. It is correct that you are allowed to use such a parity block as a library component, with N inputs and 1 output. However, the earlier FAQ entry incorrectly said it is similar to an LU -- not right, an LU (using an XOR operation) does multiple 2-bit parity operations only (not N-way parity) and produces N results (not 1). ====================================================================== 12/6/11 ====================================================================== ---------------------------------- GENERAL GUIDELINES: BOTH PROBLEMS ---------------------------------- =================================================== 1) RTL operations: is '-' a simple RTL operation? =================================================== Q. Can we simply write "F=A-B"? According to what you has mentioned in today's class, we cannot write any "complex operation" in our pseudo code. One way to look subtraction is that although "F=A-B" seems to be quite simple, we actually deal with the subtraction as "F=A+B'+1", which is obviously a "complex" operation, so we should the code like "B=B'; B=B+1; F=A+B"? Is this the right way to go? A. It depends on what your target operation is. You are treating the 'operator' as an addition, i.e. an adder. In this argument, to use + for subtraction, there are 2 distinct RTL steps: bitwise invert (i.e. LU) followed by + (with a cin specified as 1). However, this is not necessary! We have a primitive unit in the library: *ALU*. This unit is already configurable to perform one of several different logic or arithmetic operations, within 1 clock cycle. So, given that we have an ALU available, you are allowed -- and it the best approach -- to write any basic logic or arithmetic operation the ALU can perform (like '-') as a basic RTL operation, and it will be mapped to the ALU to perform it in 1 cycle. That is, you can write A-B and map to an ALU, and it fits the requirements. ====================================================================== 12/3/11 ====================================================================== ---------------------------------- GENERAL GUIDELINES: BOTH PROBLEMS ---------------------------------- =================================================== 1) RTL operations: pseudo-code and generalized ASM =================================================== Your RTL operations should be simple. The idea is that you are breaking down a relatively complicated specification into a series of very simple steps. In the pseudo-code, these steps are 'behavioral' (i.e. no clock cycles indicated). In the generalized ASM, these steps include clock ticks: each state box in your generalized ASM corresponds to an FSM state, and all operations in the state occur in the same clock cycle. As a result, in the 'resource allocation step', each micro-operation will be mapped to a very simple unit in the Gajski-based library. DO NOT LOOK FOR COMPLEX OPERATIONS AND UNITS! As I have indicated, unless you get special permission from me, you must only use components in the given library. Also, if you start working on the problem by trying to design the hardware and architecture, you are off track! Follow handout #34: your focus should be entirely on the pseudo-code and generalized ASM first. basic RTL operations: --------------------- For example, typical RTL operations are very simple, as you have seen in Handout #34 (1's counter): x := y + z x := y - z x := y >> 3 h := j j := j + 1 z := (x > y) [sequential] or simply (x > y) [combinational] ... etc. Most of your RTL operations will look like the above. parity operation: ----------------- In addition, I will allow simple operations like 'odd parity' or 'even parity', which is actually an N-way XOR function. other special operations: ------------------------- In addition, for 1 simple step, 'Hamming-correct', I have allowed you to assume a combined library block to do this operation. It is extremely simple: assuming you already have computed a syndrome, the block includes a decoder followed by a set of XOR2 gates. This is a very common combined operation, so I am allowing you to assume this combined library block as a combined unit. allowed multi-step operations ('chaining'): ------------------------------------------- See details in requirements of Handout #39. I indicate that in a few simple cases, you can have 2 combinational operations, one after the other, in the same clock cycle. You should *not* assume a single 'merged' datapath block. Instead this involves 2 separate blocks, and you must identify *both* operations explicitly in your pseudo-code and generalized ASM. For example, I have allowed a combinational shift before (or after) an addition. To specify this in pseudo-code and the generalized ASM, you simply list a composite operation: e.g. x := (a << 1) + b -- shifts a before adding to b y := (a - b) >> 3 -- shifts result after subtracting a - b In this case, since you are doing combinational shifting instead of sequential shifting, you would allocate a barrel shifter in the resource allocation for the shift part of the operation. Both combinational components would be explicitly allocated in the 'resource allocation step', and they would be tied together appropriately for the given composite (i.e. chained) operation. However, very few operations can be chained under my guidelines. See explicit discussion on p. 3 of Handout #39. Normally, an RTL operation is simple a single step! ------------------------------------------------------------------------------------- ----------------------------------------------- PROBLEM #1: HYBRID FLOATING-POINT INTEGER UNIT ----------------------------------------------- ==================== 1) SMALL CORRECTION: ==================== Q. How long should data be available on input and output buses? A. Data must be available long enough to be read. In a Moore-style generalized ASM, an extra cycle is required for the start of reading, using our design approach. REVISED REQUIREMENTS: DETAILS (i) data inputs: Handout #39 (p. 5) states that Start is asserted high for 1 clock cycle, and the 2 data operands (Input-I, Input-FP) arrive in the same cycle. The handout also indicates (p. 3, "Input Bus Validity") that the two data inputs will be valid for 1 extra clock cycle, i.e. 2 cycles total. SUMMARY: NO CHANGE use the above assumptions, as stated in Handout #39. (ii) data output: Handout #39 (p. 5) states that "Done" is asserted high for 1 clock cycle, and that the FP addition result should be output in the same cycle. SUMMARY: CHANGE Following the approach in (i), "Done" is still asserted high for 1 clock cycle, the FP addition result is output in the same cycle. However, *keep* the Data output valid for 1 extra clock cycle, i.e. 2 cycles total. ==================================================== 2) SPECIAL FP OPERATIONS: SIGNED 0, SIGNED INFINITY ==================================================== Q. How do I do operations with the special reserved FP symbols, +/-0, +/- inf? A. First, for your integer operand, if it is 0, assume it can be treated in FP as +0. So you will never be adding two -0 operands. Of course, the integer operand is never infinity. Given these constraints, the relevant operations with signed 0 and infinity are: For any N: +0 + N = N -0 + N = N +inf + N = +inf -inf + N = -inf Note: For motivation for signed 0, infinity, see references below. ===================== 3) BANKER'S ROUNDING: ===================== Q. How do I do banker's rounding? A. First, when you need to round, realize that you may have many extra bits which must be eliminated. For example, if you have a 23-bit mantissa and a 32-bit result, the last 9 bits must be eliminated. Should you consider only the 24th bit when rounding, or all further bits, i.e. 24-32? The answer is the *latter*: all the remaining bits. The specification of banker's rounding should be clear from Handout #39, p. 5. The only important note is the one above: that you must consider all remaining bits. They may push the rounding in one direction or the other (beyond considering just the 24th bit). But if you consider all remaining bits and there is still no clear direction to round, banker's rounding gives you a 'tie-breaker' to tell you which way to round. ===================================================== 4) RESOURCES ON FP ADDITION, SPECIAL FP SYMBOLS, etc. ===================================================== Q. What other resources are available as background on FP operations, special symbols, etc. A. There are many resources online. Consider: - Wiki pages: http://en.wikipedia.org/wiki/Floating_point http://en.wikipedia.org/wiki/Signed_zero - other pages: http://www.mrob.com/pub/math/floatformats.html (IEEE 754 parts) I also gave you useful pointers to a section in the Patterson/Hennessy Computer Organization book. ------------------------------------------------------------------------------------- -------------------------------------------------- PROBLEM #2: FAULT-TOLERANT CHANNEL INTERFACE UNIT -------------------------------------------------- ==================== 1) SMALL CORRECTION: ==================== Q. How long should data be available on input and output buses? A. Data must be available long enough to be read. In a Moore-style generalized ASM, an extra cycle is required for the start of reading, using our design approach. The following is fairly detailed, but involves only small changes on how long data is available on input and output buses. In a few cases, data is assumed to valid for 1 extra cycle, to allow it to be identified and stored. (a) REVISED REQUIREMENTS: "TRANSMITTER MODE" (i) data inputs: Handout #39 (p. 8) states that Start is asserted high for 1 clock cycle, in the same cycle as the 32-bit "Input" word arrives. In p. 3 of the handout ("Input Validity"), it indicates that the data must be valid for 1 extra cycle, i.e. 2 cycles total. SUMMARY: NO CHANGE "Start" is still asserted high for 1 cycle, in the same cycle as the 32 bit "Input" arrives. The "Input" remains valid for 2 cycles. (ii) data outputs: Handout #39 (p. 8) states that "Done1-out" is asserted high for 1 clock cycle, and that the first output row will be sent on the output bus "Row-out" in the same cycle. Then the remaining rows are output on consecutive clock cycles. SUMMARY: CHANGE To allow the external receiver to identify and store the 1st row correctly, make the 1st row available on the "Row-out" bus for *2 cycles*. All remaining rows are then transmitted, for 1 cycle each, on consecutive cycles. So, "Done1-out" is asserted high for exactly 1 cycle along with the 1st row placed on output bus "Row-out". But the row remains on this bus for 1 extra cycle, before transmitting the remaining rows (1 per cycle, consecutively). (b) REVISED REQUIREMENTS: "RECEIVER MODE" (i) data inputs: Handout #39 (p. 8) states that "Done1-in" is asserted high for 1 cycle simultaneous with the input of the first row on "Row-in". However, this is not enough time to store the first row, so you should assume that the first row is available for 2 cycles. SUMMARY: CHANGE To allow the router node to identify and store the 1st row correctly, we want the 1st row available as input on the "Row-in" bus for *2 cycles*. All remaining rows are then received, for 1 cycle each, on consecutive cycles. (ii) data output: Handout #39 (p. 9) states that the final correct word is placed on the "Output" bus for 1 cycle, and the control output "Done" is asserted high for this same 1 cycle. However, this is not enough time to store the result in the local processor, so you should make the result available for 2 cycles. SUMMARY: CHANGE "Done" is still asserted high for 1 clock cycle, and the correct word is still output on the "Output" bus in the same cycle. However, *keep* the data output valid for 1 extra clock cycle, i.e. 2 cycles total. -------------------------------------------------------------------------------------