Yossi Veller, Mentor Graphics
The shift toward electronic system level (ESL) design and verification is beginning as the productivity of RTL modeling and verification techniques lag behind the remarkable growth of design complexity. ESL methodologies focus on the architecture of the design, raising the level of abstraction for design, modeling, and validation to the transaction level.
A transaction-level modeling (TLM) platform provides an essential framework within which many of the essential design and verification tasks can be performed. Moreover, there is a growing recognition of the advantages of extending this flow by directly synthesizing high-level abstraction code to hardware implementation; i.e., by using high-level synthesis (HLS).
The overriding obstacle to adoption of such an ESL flow is that the same basic model cannot be used for all of the critical ESL design and verification tasks; i.e., virtual prototyping, hardware verification and validation, and performance analysis. This is because the code for the TLM models used to express identical system component behavior varies widely for each of these tasks, incurring an enormous and prohibitive amount of modeling effort.
A software virtual prototype has to run as fast as possible. A rule of thumb says that unless the software can run at 50 MIPs at least, application engineers will abandon it and wait for a physical prototype that will deliver the needed performance. Firmware and operating system engineers may be more patient, but if bringing up the OS takes an hour, or hours, their productivity is at peril. Hence these kinds of models include only the minimal behavior that will allow the software to run unhindered. Timing is abstracted away as much as possible; hence, untimed or SystemC TLM2.0 LT (loosely timed) models of computation are chosen. The timers have to be modeled because an OS relies on them, but mostly their timing relies on the host wall clock and not the target architecture timing. Thus if the application has some timing dependence, these facets can’t be checked on the virtual platform and the engineers must wait for the board in order to test them.
Software-based performance and power analysis has to be based on the target architecture timing. It allows the application engineer to tune his code to be most effective on the actual platform. The effect of choosing the processor can be evaluated only this way. Caches and cache coherency mechanisms have a lot of influence on performance and can’t be omitted, as happens in a software virtual prototype. The same can be said about the effect of the shared resources, like buses and memories. Some kind of statistical timing modeling has to be performed in order to not slow down the simulation too much (according to the 50 MIPs rule). A fast power model of the architecture that reacts to the software’s actions during execution is essential in order to evaluate various power management strategies.
Hardware-based performance and power analysis requires models with a high degree of timing fidelity. These models allow system architects to optimize the system, and validate that it can adhere to the requirements using minimal resources. The resolution of the performance estimation has to be finely detailed because there is a need to evaluate scenarios with a high amount of confidence. For example, system architects examine sequences of transactions to check whether the cache and snoop models perform as expected. Hence SystemC TLM2.0 AT (approximately timed) model of computation is chosen for the timing fidelity it delivers. SystemC TLM2.0 AT models enable modeling of pipelined transactions and computations; however they don’t adhere to the 50 MIPs rule: they run closer to 1 MIP. Furthermore, SystemC TLM2.0 AT code uses a non-blocking interface while SystemC TLM2.0 LT uses a blocking interface; thus their structure is very different.
Hardware verification at the transaction level can use the same models used for hardware-based performance analysis, but with a stress on more accurate timing. These models allow system engineers to verify that the system does what it is supposed to do without going into fine cycle-by-cycle details. The models achieve this accuracy not by executing each cycle but by cycle counting.
Firmware debugging at the transaction level can use the AT models used for hardware verification, unless signoff is needed. Signoff can be achieved only by using exactly cycle accurate models; i.e., an RTL representation. However, with multi-core designs, which have a highly nondeterministic environment, and in the presence of caches and memory coherence mechanisms, accurate models represent only one scenario. Hence the validation has to be on statistical mean values, which can be modeled by TLM AT models. Moreover, accurate models are both late and slow and can cover only few scenarios.
High-level synthesis (HLS) requires models wherein the interfaces between the blocks are cycle-accurate and at the pin level, so that the generated hardware can be optimized. The code in a synthesizable model should specify the bit widths of the variables in order to let the tool optimize the amount of logic. Also the models should be unambiguous; e.g., in TLM there is no problem accessing shared resources concurrently, but for HLS the code has to specify the arbitration between the accesses. A big problem with HLS is that synthesis tools insert clocks into an implementation of an untimed code. Thus the simulation results of the source code will not match exactly with that of the resulting HDL.
The modeling effort required to create this wide variation in model styles and objectives is prohibitive. Hence companies either have shunned ESL completely or choose to do only one of these activities; most often software virtual prototyping.
Fortunately, a methodology built around standard, scalable transaction-level models presents a solution in the form of a single scalable model that handles all ESL abstraction levels and design
tasks. In the scalable modeling style, a single TLM2.0 model is divided into different, separated parts: functionality, interfaces, and a timing and power overlay. The functional code is written in a simplified style that is fast to execute. It contains no timing or other implementation artifacts. Communication between the functional threads is done through channels; for example, TLM2.0 sockets, FIFOs, etc.
Figure 1: A scalable transaction-level model entirely separates functionality from the timing and power architecture as well as the communication layer, allowing them to be connected and disconnected on the fly.
The most significant innovation has to do with how the architectural aspects of timing and power overlay the functional description. The scalable modeling style is based on an aspect-oriented language to specify how internal events and events on channels are spaced in time and how much power is consumed. This scalable modeling language is referred to as the policies.
Here is a simple example of a policy:
It expresses that “if a write on port p2 comes after a read from p1 it will be after 3 cycles.”
Enhancing the policies language, associated C++ callbacks can compute any kind of timing; including the effects of preemption. The same can be said about power calculations. This is analogous to the relationship between the PSL language and HDL languages; where part of the PSL description is represented in VHDL or Verilog. The policies’ callbacks use an API that is defined by the terms of functional events. Values from the functional model can be used for the timing and power calculations. Essentially, if enough work is invested in the timing and power overlay, the accuracy of the resulting representation of the design is only limited by the accuracy of the TLM communication channels.
The channels used can attribute policies to their actions. The same channels are designed in a way that can make designs using them deterministic. They can also have pin-level, cycle accurate representations of their actions. Thus, the same functional models can be easily refined to synthesis ready models. The separation of interface and the other aspects of the model allow changing the bus protocol without changing the functional code and, in principle, the timing layer. This change is reflected both in the TLM simulation and the synthesis results.
Thus, the same scalable platform can run at hundreds of MIPS for software development, 50 MIPS for software performance analysis, and 1 MIPS for hardware performance analysis and low-level software and hardware verification. These different performance levels are accomplished through a run time switch that allows the platform to run fast until an interesting point is reached and only then invokes a higher precision mode of operation.
During the design process, timing and power accuracy evolve with the model refinement process from an abstract un-timed view into a detailed implementation view of the target micro-architecture, all of which are represented within a single model. Using this layered approach allows using the same model in a switchable mode, alternating between fast un-timed software execution mode (in LT mode) or detailed simulation mode for hardware verification and performance/power analysis (in AT mode).
Scalable transaction-level models supply the infrastructure for the entire ESL design flow. It is critical to have simplified modeling practices and tools, which filter out language and complex semantics issues and allow designers to concentrate on pure functionality and architectural issues. The scalable transaction-level model approach allows users to quickly explore various complex micro-architecture alternatives in the system context with minimal coding effort while keeping the code representing the functionality intact.