Coding Techniques for Bus Functional Models In Verilog, VHDL, and C++
By Ben Rhodes and Dan Notestein, SynaptiCAD
Bus functional models are simplified simulation models that accurately reflect the I/O level behavior
of a device without modeling its internal computational abilities. For example, a bus functional model
of a microprocessor would be able to generate PCI read and write transactions to a PCI device model
to initialize and test the PCI device's functionality, but the microprocessor BFM would not be capable
of reading CPU instructions from a memory and properly executing the instructions (this would require
a complete behavioral level model of the processor). Bus functional models are commonly used in test
benches to stimulate design models and verify their functionality. For the purposes of this paper, the
designs models being tested are either RTL or gate-level models of the system.
Using transactors to model transaction signaling protocols
Bus functional models typically serve as an abstraction layer between the transaction level of system
functionality which describes what data is being exchanged between two devices and the signaling level
which dictates how this data is exchanged during the transaction. At the transactional level, a transaction
can be viewed as a simple function call with parameters for the data being exchanged. At the signaling
level, this is converted into signal transitions on appropriate clock cycles along with handshaking
logic to ensure the data exchange is properly synchronized. The part of the BFM that performs the signaling
when a transaction function is called is known a transactor. Transactors are the only parts of a BFM
that interact directly with the signals of a design model; the remaining code in a BFM manipulates only
transaction level data. The figure below demonstrates how transactors serve as an interface between
the transaction level code in the testbench and the signals of the design model being tested.
Figure 1: Transactors are driven by the Transaction manager and stimulate the MUT
For best simulation performance, transactors should generally be modeled in the language of the design
under test since there is typically a performance penalty for simulation activity that occurs across
simulation language barriers, whereas it is often convenient to model the transaction level part of
the BFM in a language that directly supports data structures and dynamic memory allocation. There is
usually little if any penalty in writing the transaction level code in a higher level language since
data is being worked with in larger chunks that doesn't need to interact as much with the simulation
kernel.
Master and slave transactors
Transactors can be divided into two broad categories: master transactors that initiate a transaction
and slave transactors that respond to a master transactor. Master transactors are generally modeled
as procedures that are called whenever a transaction should be started, whereas slave transactors tend
to be modeled as a group of related parallel processes that run for the entire simulation run, responding
whenever they recognize a transaction is addressed to them.
Although it is convenient from the point of view of the code that initiates master transactions to model
master transactors as a procedure, the underlying implementation of a master transaction may also require
the use of multiple parallel processes, which neither VHDL nor Verilog allow in functions. This problem
can be overcome by modeling the master transactor as a state machine that responds to handshaking signals
triggered by an "ApplyTransaction" procedure, making the master transactor look like a procedure call
to the transaction-level code of the BFM. By default, this creates a transactor that does not block
the calling process, but blocking transactions can be achieved by calling a version of the "ApplyTransaction"
procedure call that waits for a completion signal from the transactor.
It is frequently necessary to model a transaction as a set of cooperating processes, but this leads
to two problems: (1) the processes must be synchronized so that they start and stop together and (2)
it is easy to introduce races between when signals are sampled and driven. In Verilog, synchronization
of the processes can be achieved using a fork-join to coordinate the processes. In VHDL, a pseudo fork-join
can be used to simulate this effect. This technique uses a resolved handshaking signal that is monitored
and driven by all the processes to be forked (see Writing Testbenches, Janick Bergeron, pp 135-137 for
a detailed explanation of this technique).
It is often desirable to be able to restart these processes during the middle of a transaction, effectively
reseting the transaction. In Verilog, this can be done using disable statements, in VHDL it is more
awkward, as it requires an abort status signal to be checked every time a wait statement is encountered
in the transaction processes. By adding an additional state to the handshaking signal that handles the
pseudo fork-join, we can reuse this signal as the abort status signal. This technique allows any of
the processes in the pseudo fork-join to abort the transaction.
Avoiding race conditions in transactor sampling code
Race conditions can arise in a transactor when you need to sample the value of a signal and drive other
signals that could affect the value of the sampled signal. Generally this can be avoid by sampling the
value prior to driving the other signals, but when multiple processes are involved the order in which
these statements occur is no longer known. This can be avoided in simple cases by the use of non-blocking
statements in Verilog (in VHDL, this is the default case as long as you're not using shared variables).
However, if one of the processes enables the execution of another process through zero delta time handshaking
signals, these extra delta times can still lead to race conditions. This kind of code often occurs when
a condition in the first process enables the execution of the second process, for example, when a signal's
stability needs to be checked after a particular clock edge. This kind of state sampling code can often
be in-lined in the enabling process, but this is not possible in cases where the stability checking
code includes wait statements that would block the execution of the enabling process. To solve this
problem, the following method can be used:
- Place the sampling code in a separate process that waits on a triggering event from the
initiating process.
- If the sampling process needs to sample at the same clock edge as the triggering clock,
then the initiating process needs to store off the initial value of the signal to be sampled.
- To start the sampling process, use event triggers "->" in Verilog or toggle a std_logic
signal in VHDL. Using this technique, you can trigger multiple sampling processes from the initiating
process without introducing delta cycles in the initiating process.
Data structures and data packing for serializing of packet data
Data structures are useful for modeling complex data at a high level of abstraction. This can be very
helpful when passing data between modules and tasks since multiple pieces of data can be passed as a
single logical unit. Classes are even more useful since tasks and functions can be associated with each
data structure for encapsulating algorithms specific to the type of data structure, such as packing
and randomization.
Classes form the base of C++, but aren't available in VHDL and Verilog. However, you can create pseudo-classes
in these HDL languages. In Verilog, you would create a module with regs, tasks and functions to represent
a class. Two tasks need to be defined to convert the class to/from an array of bits in order to pass
instance information across module and task boundaries (this is very similar to the concept of using
$realtobits and $bitstoreal to pass real numbers across module boundaries). In VHDL, you can create
a record to represent the data structure, usually placed in a package. For each class method, the first
parameter should be an inout of the data structure record type to allow the method to operate on the
internals of a particular data structure instance. A Verilog example is shown below.
module packet_type;
reg [23:0] tb_packed_bits;
reg [7:0] FIELD0;
reg [7:0] FIELD1;
function [23:0] tobits;
input dummy;
begin
tb_packed_bits = { FIELD1, FIELD0 };
tobits = tb_packed_bits;
end
endfunction
task frombits;
input [23:0] tb_packed_bits_in;
begin
tb_packed_bits = tb_packed_bits_in;
{ FIELD1, FIELD0 } = tb_packed_bits;
end
endtask
endmodule
Data packing is necessary when you need to translate data structures into information that can be understood
by a bus protocol being used. It is very convenient to pass high level data structures around when working
with a test bench, but usually at some point these data structures need to be transmitted across an
actual bus in the hardware models. A nice way to do this is to create a class method that can be used
to convert the data structure into either an array of bits or bytes (depending on the bus protocol).
In Verilog, this could even be the same method that was written to pass the class across module and
task boundaries, as described above. Below is an example of how to do this in VHDL:
type CLASS0 is record
FIELD0 : bit_vector(7 downto 0);
FIELD1 : bit_vector(7 downto 0);
end record;
function pack(this : CLASS0) return std_logic_vector is
variable packed_data : std_logic_vector(15 downto 0);
begin
packed_data(7 downto 0) := To_StdLogicVector(this.FIELD0);
packed_data(15 downto 8) := To_StdLogicVector(this.FIELD1);
return packed_data;
end function;
function unpack(packed_data : std_logic_vector(15 downto 0))
return CLASS0 is
variable dataStructure : CLASS0;
begin
dataStructure.FIELD0 := To_bitvector(packed_data(7 downto 0));
dataStructure.FIELD1 := To_bitvector(packed_data(15 downto 8));
return dataStructure;
end function;
VHDL and Verilog do have some limitations when using these pseudo-class techniques. In Verilog, to pass
a class instance into a module, it must first be converted into a bit array. Then, inside the task it
must be converted back into a module instance. This means an additional module instance must be created
that is available from the scope of the task that can be used to convert the bit array that was passed
in to a data structure. Also, Verilog and VHDL pseudo-class solutions lack more advanced features available
in C++ classes such as data hiding, inheritance, and polymorphism.
Developing transaction generators and managers to stimulate a design
Once transactors have been created for a BFM, a transaction generator must be created that can generate
the different types of transaction calls and the inputs for the transaction calls. The transactions
are typically a mix of directed tests used to setup and test specific functionality combined with long
runs of randomly generated transactions to catch any problem cases not caught by the directed tests.
Constrained random testing is used when a system has too many potential input sequences to test all
possible input sequences (a typical situation for virtually all system level designs) because they save
time compared to manually writing the huge number of directed tests that would otherwise be required.
The term constrained random is used to refer to randomly generated transactions that are constrained
by the generator to meet some requirements on the randomly generated values. Typically the constraints
are that the parameters to the transaction are logically consistent with one another and with respect
to the transaction protocol and the implementation of the design under test. For example, the address
values to a read transaction might be constrained so that most of them are within the address space
of the device under test. By constraining the parameters in this fashion, fewer transaction test vectors
need to be generated to test the system, reducing the overall run time of the test bench.
Using hierarchical references to transactors
When generating master transactor calls to test your design, it is frequently useful to be able call
transactors that are located in different BFM instantiations. For example, a higher level BFM may contain
several ATM port BFMs with SendPacket transactors that need to be initiated from the higher level BFM.
This requires that the transactors be hierarchically addressable from the higher level BFM. Hierarchical
referencing of transactors is supported natively in Verilog and easily done in C++, but it is not natively
supported in VHDL. Below is a technique that can be used to emulate hierarchical referencing in VHDL.
Although this technique is discussed for the purpose of supporting hierarchical function calls to transactors,
it can also be applied whenever a testbench requires hierarchical access to components of the design.
The basic idea behind hierarchically accessible transactors is to create a global array of control signals,
one for each transactor instance. As each transactor initializes itself, it registers itself with a
hash table that maps from the transactor instance hierarchical name to the appropriate index into the
control signal array. Additional arrays are also needed to store the parameters for each type of transactor.
Generics can be used to pass down through the hierarchy the instance name strings to each transactor
instance. The figures below show the flow of control for the transactor and the Apply function that
initiates a transaction on the transactor:
Using a transaction manager queue to mix transaction streams
For simple test sequences, you can execute a series of transactors from a single process, one after
the other. If you want multiple transactors to execute at the same time, then you can use non-blocking
calls to the transactors. But, if you want to have multiple sequences of transactions running in parallel,
then you must develop a more involved transaction sequencer.
One solution is to create a process for each sequence of transactions that you want to run in parallel.
But, this is limiting in situations where you need to have control over all the types of transactions
to run in one process. For example, in order to fully exercise an ATM switch, you need to send ATM cells
to each input port simultaneously. Also, randomly determining the port number and ATM cell data to transmit
can enhance the test bench. So, it would be nice to be able to generate X number of cells to send and
transmit them to the switch through random port numbers. And while doing this, not allowing one particular
transmitter to block another. So, a second solution is to create a transaction manager that reads transactions
from a queue and executes them one after the other. You could have one transaction manager instance
per port and place transactor calls randomly into their queues. In Verilog, this is difficult to do
and beyond the scope of this paper so we are just going to cover how to implement this solution in VHDL
and C++.
In VHDL, you can implement a transaction manager by using the "hierarchical referencing" technique above
and by creating the following: 1) an additional record type, TApplyCall that stores a Transactor Node
and the transactor's parameters, 2) a queue of TApplyCall's, 3) functions that can be used to place
TApplyCall's on the queue, and 4) a process that will read TApplyCall's from the queue and use them
to execute a transactor.
The transactor parameters can be represented using a "line" in VHDL so that TApplyCall can be used for
all types of transactors. Then, you would add a data member to the Transactor Node that represents the
type of the transactor that the transactor manager can switch on to determine what method to call to
run the transactor. That method would be responsible for extracting the appropriate parameters from
the parameters "line" and executing the transactor using the control signal index as described in the
" hierarchical referencing" section.
In C++, a class can be written to represent the transaction manager. This class would read transactors
from a queue and call a virtual method, Execute, to run the transactor. So, there would be a base class
that all transactor classes derive from and each transactor class would have it's own data member to
represent the parameters to use for a particular transaction. Each transactor class would be responsible
for actually performing a particular bus transaction when the Execute method is called (i.e. by using
TestBuilder, SCV, or PLI). For each transactor that you want to place in the queue, you would create
a new instance of the transactor class, set up its parameters data member and push it onto the queue.
Using a golden reference model to verify design output in the face of randomized input
A golden reference model is an unclocked, behavioral model of the system design that can be used to
verify the output of a low level model (either RTL or gate level). The golden reference model must model
both the design under test and the functionality of the surrounding BFMs. The same transactions are
applied to both the lower level model under test and the golden reference model and the outputs of the
two models are compared to ensure that the lower level model is functioning properly. By using a golden
model, a verification engineer can avoid having to manually determine the expected results of his directed
tests. Further, the use of a golden reference model is virtually required when performing constrained
random tests as it would take too long to manually determine expected results for a large number of
randomly generated transactions. The figure below shows a typical structure for a testbench that uses
a golden reference model to verify the output from the design model.
When written in C++, golden reference models usually consist of several classes, one for each type of
device in the system. Each class contains functions for each type of transaction that the device participates
in. These functions take their inputs and compute the appropriate outputs in zero simulation time since
the functions are all untimed behavioral code. The code for the golden reference model is also much
simpler than the code for the RTL-level model as it doesn't need to account for low level protocol details
such as when data becomes available during a transaction or handshaking requirements of a transaction.
The outputs from the golden reference model can be generated before, during, or after the testing of
the design under test. There is one advantage to running the golden reference model and the simulation
model in parallel: the randomization of the transactions and transaction data can be modified at runtime
according to coverage requirements of the test bench. However, this approach does require that the output
values for both models be available at the same time during the test bench so that the values can be
compared. This can be achieved by calling the appropriate golden reference model function at the end
of the execution of a transactor when the results from the lower level model become available. Since
the golden reference model is an untimed model, its outputs are available immediately after the function
call is made and the results of the two models can be compared.
Conclusion
Transaction-based BFMs enable very robust, reusable testbenches to be created, but some problems occur
when writing these type of testbenches due to limitations in VHDL and Verilog. In this paper, we have
examined several coding techniques for overcoming these problems as well as some ways to overcome them
using a combination of C++ and Verilog or VHDL. SynaptiCAD makes a graphical bus-functional model generator
called TestBencher Pro that will generate the code described in this paper.
Daniel Notestein, co-founder of SynaptiCAD, is the chief architect for SynaptiCAD's WaveFormer Pro and
VeriLogger Pro products. Notestein obtained his bachelor's degree in electrical engineering and minors
in computer science and math from Virginia Tech and his MSEE from the University of Texas.
Ben Rhodes is the project leader for SynaptiCAD's TestBencher Pro product. His areas of special expertise
include VHDL, Verilog, SystemC, OpenVera, and e test bench coding. Rhodes obtained his BSEE from Virginia
Tech.
Back to Technical Papers page
|