Professional Documents
Culture Documents
Application Note
Table of Contents
Basics of Power Analysis ......................................................................................................3 Dynamic Power: ............................................................................................................................... 3 Switching Power........................................................................................................................... 3 Internal Power............................................................................................................................... 3 Static Power .................................................................................................................................. 4 What all types of analysis are possible? .......................................................................................... 4 Vector-based average power calculation .................................................................................... 4 i. ii. Gate Level VCD .................................................................................................................... 4 RTL VCD based ................................................................................................................ 5
Propagation-based average power calculation ........................................................................... 6 Vector profiling or Cycle accurate analysis: ................................................................................... 8 Identifying VCD window with maximum activity .................................................................... 8 Identifying VCD window with maximum power....................................................................... 8 Clock gating Metrics ........................................................................................................................ 9 Appendix A Power Calculation and Input Data .............................................................11 Appendix B Scripts and Flows ........................................................................................18 Appendix C Flow Recommendation & Debugging .......................................................23
Dynamic Power:
Switching Power It is the power consumed in the charging and discharging of interconnects capacitances. In most cases, this type of power consumption dominates because of large drivers having to drive large capacitive loads. P = 0.5*CLV2 F*A Where, CL Output capacitive loading coming from spef and .lib V Supply voltage F Frequency coming from VCD, SDC, TWF or user defined A Average Switching activity coming from VCD or computed. Internal Power It is the power consumed in charging and discharging of interconnect and device capacitances internal to cell. Internal power can be divided into two parts: . Pin Power . Arc Power Internal power is calculated by using the internal power tables provided in the .lib, which capture the characterized internal power over a range of input slew rates and external loading. The tables reflect the combination of both the internal switching and internal feedthrough power. Tables are generated as a result of spice simulation during library characterization. If k-factor power scaling parameters (for process, temperature, and voltage) are specified in the .lib file, the power engine will take them into consideration when calculating internal power (Note: timing related scaling factors are not handled by the power engine).
Static Power Leakage power is also classified as static power because this power is consumed by devices when they are not switching. It includes state-dependent leakage, which is leakage that depends on the state of the gate, that is, whether a transistor is on or off. This value comes from the .lib file if it exists. If k-factor power scaling parameters (for process, temperature, and voltage) are specified in the .lib file, the power engine will take them into consideration when calculating leakage power (Note: timing related scaling factors are not handled by the power engine).
This application note covers first two input types in power calculation. i. Gate Level VCD Since Gate and RTL VCD is obtained from design simulation, the coverage is normally known to user. There are situations when Gate level VCD is not available especially in early stages of design. In such cases RTL level VCD can still give reasonable power number if EDA tools support it. Conformal LEC, the equivalence checking tool from Cadence can generate RTL to Gate map file which can be fed directly to EPS power analysis engine along with RTL VCD to support RTL VCD based power analysis. The power engine calculates the number of transitions from 0<->1, 0/1<->X and 0/1<>Z.
The 0<->1 transition is counted as 1 2. 0/1 <-> X transitions are counted as 0.5 by default 3. 0/1<->Z transitions are counted as 0.25 by default.
1.
Note: 1. You can use -x_transition_factor and -z_transition_factor options of set_power_analysis_mode command for changing the default value of 0/1<->X or 0/1 <->Z transitions. The power engine also calculates the duty cycle of each net for state dependent internal or leakage power calculation as described previously. 2. The power engine also takes clock definition from VCD. Tools give higher priority to clock definition from VCD if there is discrepancy between clock frequency in SDC or TWF and VCD. ii. RTL VCD based The following command performs instance name mapping between RTL netlist and GATE level netlist, wherein the GATE level netlist is Golden and the RTL netlist is Revised; it specifies the mapping file MapFile.alt map_activity_file -rtl2gate MapFile.alt -golden gate An example of the mapping file output from Conformal is described as follows:
Mapped points: SYSTEM class 1-th mapped points: (G) + 1 PI /x_ick (R) + 595 PI /x_ick 2-th mapped points: (G) + 2 PI /x_jreset_cp_p (R) + 594 PI /x_jreset_cp_p 3-th mapped points: (G) + 3 PI /x_mreset_cp_p (R) + 593 PI /x_mreset_cp_p ........ 60-th mapped points: (G) + 4582 DFF /cpexec0/cpddecls0/gi_opls/q_reg_reg[35] (R) + 3856 DFF /cpexec0/cpddecls0/gi_opls/q_reg_reg_35_ 61-th mapped points: (G) + 4583 DFF /cpexec0/cpddecls0/gi_opls/q_reg_reg[34] (R) + 3855 DFF /cpexec0/cpddecls0/gi_opls/q_reg_reg_34_ 62-th mapped points: (G) + 4633 DFF /cpexec0/cpddecex0/gi_opex/q_reg_reg[21] (R) + 3805 DFF /cpexec0/cpddecex0/gi_opex/q_reg_reg_21_
Where, G refers to the Golden netlist name and R refers to the revised netlist name. By default, the tool assumes that the Golden netlist name refers to the RTL net name and revised netlist refers to the GATE level net name. Propagation-based average power calculation The power engine calculates the switching probability, as well as static state probability, of each net in the design. The propagation based approach is vector-independent and provides coverage for all nets in a design. However, the accuracy depends on good starting values, that is, information about the switching probabilities at the primary inputs in a design. Simple examples are clock and reset or enable inputs. Obtaining an accurate prediction without information about the switching probabilities of these special inputs is difficult, and in most cases an inaccurate prediction causes an over estimation of the power consumption. Activity Propagation in the power engine Activity propagation inside the power engine can be divided into following categories. a. Activity propagation through combinational cells Activity propagation through combinational cells is easier. The power engine gets function of combinational cell from .lib and uses the function to propagate activity through combinational cells. The power engine also propagates duty cycle through combinational cells. The only tricky part is when there are combinational loops inside design. In this case the power engine seeds activity at input to break combinational loop. The seeded activity is based upon internal heuristic of power engine and takes into account activity of other neighboring pins. b. Activity propagation through sequential Cells Activity Propagation through sequential cell is based upon activity of input pin, set or reset pin, and scan enable pin. However most of sequential cells are in sequential loops like state machines which make activity propagation through sequential cell based on heuristics by seeding activity at input of sequential cell. Therefore it is recommended to provide activity at outputs of sequential cells. With the power engine you can use either set_default_switching_activity command to specify the average activity on sequential cells or use RTL, VCD, or TCF for seeding activity at sequential cells. As seen in the example below, the activity at the output of the sequential cell in the loop cannot be resolved using propagation. In this case, iterating the loop to determine the activity at the output Q of the sequential cell will result in diminishing activity towards
COPYRIGHT 2011, CADENCE DESIGN SYSTEMS, INC.
ALL RIGHTS RESERVED. PAGE 6
0. It proves that using heuristic method to compute activity in sequential loop is an intractable problem for propagation based power calculation. Therefore, in order to get good average power numbers for the design under test, you must always specify the average activity at the output of the sequential cell using the above mentioned command. In addition, you can override this default activity using VCD or TCF.
c. Activity propagation through macros The major component of power in macros is internal power. Internal power of macros is highly sensitive to activity on read and writes signals. Small change in activity of the read or write signal can cause large change in internal power numbers. Therefore, it is recommended that users specify activity at the read and write signals of macros. d. Activity propagation through clock network and clock gates For accurate propagation through clock network it is important that users specify the TWF file which has clock frequency of generated clocks as well. The activity propagation of clock through clock gating cells depends upon activity of clock enable signal. Since clock enable is a signal net, it will generally have low activity unless specified, which will cause lot of optimism in power calculation. Therefore, it is recommended that users should specify activity at enable of clock gating cells for proper propagation through clock network. You can use the set_default_switching_activity command in EPS for specifying average activity at enable signal of all clock gating cells.
For more option on Vector profiling refer to EPS command reference for command report_vector_profile. You can also use -write_profiling_db to generate and load profiling db to see graphical representation,
You can generate Metrics for clock_gating_efficiency and register_gating_efficiency using following command options:
COPYRIGHT 2011, CADENCE DESIGN SYSTEMS, INC.
ALL RIGHTS RESERVED. PAGE 9
1.
report power -clock_gating_efficiency generates a report that includes the Clock Gating Efficiency (CGE) for all clock gating instances as well as different hierarchies in the design. Average CGE for different hierarchies is also reported.
Clock Gating Efficiency is calculated as: CGE = (toggles at clock gate output/toggles at clock gate input) A sample report is given below: Instance Toggles_at_CG_input Toggles_at_CG_output CGE ICG1 20 16 0.8 ICG2 16 8 0.5 ......... Average CGE = (avg of all CGEs above)
2.
report_power -register_gating_efficiency generates a report that includes the Register Gating Efficiency (RGE) and Data Aware Gating Efficiency (DAGE) for all sequential cell instances as well as different hierarchies in the design. Average RGE and DAGE for different hierarchies is also reported.
Register Gating Efficiency is calculated as: RGE = 1 - (toggles at register clock pin/root clock toggles) A sample report is given below: Instance Root_Clock_Toggles Toggles_at_Clock_Pin RGE Reg1 20 8 0.6
. .
Inputs can have several sets of power table pairs, each associated with a when clause that specifies the logical condition of inputs that the tables apply to. A call to the table lookup function utilizes a procedure that returns a weighted sum of the energies based on the when clause functions and the signal activity. The weighting of energies is similar to the propagation of static probabilities (duty cycles):
The procedure for the weight calculation is similar. However one complication of the energy weighting is that the coverage of the when clauses might not be complete. For example, one set of state-dependent tables includes only two when clauses: When: A & !B; When: !A & B; This set of clauses does not account for cases where A and B are either both high or both low. In normal operation, neither of these conditions may appear, so the incomplete coverage may not matter. However, the transition density data is static and lacks signal correlation; the conditions not included in the when clauses should be accounted for, or else the internal power will be underestimated. Scaling can resolve this. But it turns out that the most common situation for incomplete clauses is in memories where assuming the energy to be 0 is the more correct thing to do. As a result, PowerMeter assumes energy of 0 for missing clauses. Implementation for state-based internal power on the inputs is straightforward. The implementation for the internal power contribution from a single input port is:
For multiple input ports, each port has a set of energy tables, instead of just one pair, and each pair includes a when clause, from which a probability can be calculated:
As an example, consider a port with the following data, and with P(A) = 0.25 and P(B) = 0.50:
The calculation of energy for a single transition on this input would be:
Output internal energy is a weighted sum of values extracted from internal power tables that are associated with timing arcs. These tables are indexed by input transition time and output load capacitance. The values for each arc are weighted by the transition density of the corresponding inputs. The resulting energy value is multiplied by the transition density of the output pin to calculate the power. When state dependent arc-based (output) internal power tables are present, if two or more tables apply to the same arc, they are weighted by the when clauses in the same manner as described in the previous section for state dependent internal power (input pins). As an example, we will calculate the output internal power for an AND gate with inputs A and B and output Y. The goal is to calculate the internal power contributed by
the output Y. This example does not include any state-dependency. The relevant portion of the .lib file is included at the end of this document.
The first step is to determine the output load capacitance on Y. The output load is the sum of the parasitic net capacitance on the net connected to the Y pin of the instance, plus the pin capacitances of all of the input pins that the net drives. In the example shown, this value is 0.0085pF + 0.03pF = 0.0385pF. Convert this value as required to the units specified in the .lib file for capacitive loads: capacitive_load_unit (1,pf); The next step is to associate the timing arcs with the tables in the .lib file. An AND gate has eight arcs, each corresponding to a change in output logic level in response to a change in an input logic level. Normally, these eight arcs are represented by four tables in the .lib file. This is the Correspondence
The energy for a transition has two components, rise (E Y,rise) and fall (EY,fall). These two values are averaged, because output Y rises as often as it falls, for the total energy for one transition. Note that in the following equations, the rise and fall contributions are averaged as well for the same reason.
In the above equations, D(A) is the transition density on input A, and E(Y- A-) is the energy value looked up from the corresponding table as described above. For this example, the energy is:
3. Leakage Power:
As the leakage component of power dissipation increases, accurate estimation of it becomes more and more critical. Originally, the leakage component of a cells power dissipation was modeled in Liberty libraries as a single number. However, nowadays it is more common to associate different leakage power values with different input combination (state-dependence).
The method to compute leakage power is as follows. Extract the state-dependent leakage data from a cells Liberty .lib description and compute a weighted sum of the leakage values based on the instances input probabilities. The state-dependence is expressed as a set of logical functions describing various input conditions. This set of functions may or may not be complete (e.g. covering all possible input combinations). In addition, a generic leakage power value may also be provided. PowerMeter covers all of the possible combinations of available data. State-dependent leakage data appears in a Liberty library in the following form: cell_leakage_power : 14.335 ; leakage_power() { when : "!A1 !A2" ; value : 9.120 ; } leakage_power() { when : "!A1 A2" ; value : 16.467 ; } leakage_power() { when : "A1 !A2" ; value : 12.364 ; } leakage_power() { when : "A1 A2" ; value : 19.390 ; } Note that the set of when clauses are complete all possible combinations of the inputs A1 and A2 are accounted for. Also note that there is a generic leakage power value that is not associated with any condition. We assume the following combinations of input data for our approach: 1. Complete clause set with or without generic leakage value 2. Incomplete clause set with generic leakage value 3. Incomplete clause set without generic leakage value 4. Over-complete clause set with generic leakage value (error condition) 5. Over-complete clause set without generic leakage value (error condition)
If the clause set is complete, we expect the sum of the probabilities of all of the clauses to add up to 1.0. Verification of a complete clause set requires logical analysis of the statements. As extensive library verification is not within the functional requirements of PowerMeter, we base our assessment of the clause sets completeness by the sum of the probabilities. If this sum is equal to 1.0, the set is complete. If the sum is less than 1.0, we will assume the set is incomplete. If the sum is greater than 1.0 (which would indicate over-coverage), we assume that there is an error in the library data. In order to decide which of the above five conditions we have, we need two pieces of information: whether or not we have a generic leakage value and the sum of the clause probabilities. We find the probability sum as follows:
Then, use the following equations and procedures to detect and calculate the five input conditions: 1. Complete clause set with or without generic leakage value : This condition is assumed if probTotal is equal to 1.0. For this case we calculate the weighted sum of all available clauses:
2. Incomplete clause set with generic leakage value: For this case, we use the generic leakage value to fill in the missing clauses. We do this by weighting the generic leakage value with 1.0 minus probTotal. leakagePower = leakSum + ( cell when _ generic _ value() * (1.0 probTotal)) 3. Incomplete clause set without generic leakage value: For this case we simply scale up the leakSum value to accommodate the missing clauses. We accomplish this by dividing it by probTotal.
4. Over-complete clause set with generic leakage value (error condition): This case occurs when probTotal is greater than one, and there is a generic leakage value. For this case, we simply revert to the generic value (a warning is also printed to alert the user that there is a possible problem with the library).
5. Over-complete clause set without generic leakage value (error condition): Without a generic leakage value, we must use the incorrect leakage data as best as we can. We do this by calculating leakSum and scaling it down. It uses the same equation as condition 3.
set_power_analysis_mode -method static \ -create_binary_db false \ -write_static_currents true ## Reading VCD file read_activity_file format vcd start 10ns stop 20ns dmac_mac.vcd set_power_output_dir outputs report_power -cap -instances {*} -outfile inst-pwr.rpt -vcd_scope top/dma_dut
Application Note on Average Power Analysis using EPS set_power_output_dir outputs report_power -cap -instances {*} -outfile inst-pwr.rpt
## The following command performs instance name mapping between RTL netlist ## and GATE level netlist, wherein the GATE level netlist is Golden and the ## RTL netlist is Revised; it specifies the mapping file MapFile.alt #map_activity_file -rtl2gate MapFile.alt -golden gate set_power_output_dir outputs report_vector_profile -activity
## The following command performs instance name mapping between RTL netlist ## and GATE level netlist, wherein the GATE level netlist is Golden and the ## RTL netlist is Revised; it specifies the mapping file MapFile.alt #map_activity_file -rtl2gate MapFile.alt -golden gate set_power_output_dir outputs report_vector_profile -power
-vcd_scope top/dma_dut
## The following command performs instance name mapping between RTL netlist ## and GATE level netlist, wherein the GATE level netlist is Golden and the ## RTL netlist is Revised; it specifies the mapping file MapFile.alt #map_activity_file -rtl2gate MapFile.alt -golden gate set_power_output_dir outputs ## ## ## ## ## Following command Writes out profiling databases which can later be viewed as histograms using the SimVision interface.You can specify this parameter during both, activity and power profiling. The profiling databases support instance level and power/ground net histograms. It creates an output file name with the *.trn extension.
## report_vector_profile -power -write_profiling_db true report_vector_profile -activity -write_profiling_db true view_dynamic_waveform -type profile -waveform_files profiling_db
Application Note on Average Power Analysis using EPS ## Reading VCD file read_activity_file format vcd start 10ns stop 20ns dmac_mac.vcd
-vcd_scope top/dma_dut
## The following command performs instance name mapping between RTL netlist ## and GATE level netlist, wherein the GATE level netlist is Golden and the ## RTL netlist is Revised; it specifies the mapping file MapFile.alt #map_activity_file -rtl2gate MapFile.alt -golden gate set_power_output_dir outputs report_power -clock_gating_efficiency
## The following command performs instance name mapping between RTL netlist ## and GATE level netlist, wherein the GATE level netlist is Golden and the ## RTL netlist is Revised; it specifies the mapping file MapFile.alt #map_activity_file -rtl2gate MapFile.alt -golden gate set_power_output_dir outputs report_power -register_gating_efficiency
read_lib timing_and_power.lib read_verilog design.v set_top_module top_module_name_in_verilog set_wire_load_model -name <wireload_name_from_dotlib> ## EXAMPLE: ## set_wire_load_model -library COM3_LV_WIRE -name COM3_LV_WLM_0.0 read_sdc design.sdc # Max Corner Power analysis commands set_power_analysis_mode -method static \ COPYRIGHT 2011, CADENCE DESIGN SYSTEMS, INC.
ALL RIGHTS RESERVED. PAGE 21
Application Note on Average Power Analysis using EPS -corner max \ -create_binary_db false \ -write_static_currents true ## Reading VCD file read_activity_file format vcd start 10ns stop 20ns -vcd_scope top/dma_dut dmac_mac.vcd report_power -net -outfile eps_reportpowerwlm.txt ## Note: 1) WLM support commands can be found in ETS command reference ## 2) WLM support for Power calc is not available through EDI
Here are some of examples which depict how well the above methodology works:
The table clearly shows that if you specify activity just on inputs then the result vary from -25% from +45% whereas if you specify average sequential activity of design then results are pessimistic by 25-30%, which is expected because combinational activity propagation is meant to be pessimistic. Here are another two examples which show how well results correlate after specifying activity at Macros and clock gating cells.
Testcase 4:
Expected power = ~375mW Power after specifying default input activity = 225mW Power after specifying default input activity + macro activity = 254mW Power after specifying default input activity + macro activity + clock gating activity =
370mW Testcase 5:
Expected power = 10W Power after specifying default input activity = 5.5W Power after specifying clock gating activity + Macro activity = 10.8W
Debug Techniques:
1.
Parsing SPEF file ../input/design.spef1 Filename (capacitance) Names in file that matched to design Annotation coverage for this file : ../input/design.spef1 : 549/549 : 549/549 = 100%
. .
Filename (activity) Names in file that matched to design Annotation coverage for this file 2. : design.vcd : 2/2 : 2/549 = 0.364299%
power
report_instance_power
calculation
(report_instance_power): to debug the components of done by EPS. Follows the report generated by
Instance: inst1 Cell: Std_NOR Liberty file: std_cell.lib Internal power: 3.67164e-05mW Switching power: 2.70928e-05mW Leakage power: 1.81823e-06mW; Total power: 6.56274e-05mW;
Net Pin Direction Voltage(V) Duty Density Cap(pf) Rise slew(ns) Fall slew(ns) Power(mW)
n1 n2
Y B
Output Input
1.08 1.08
0.5 0.5
2.25e+07 2.25e+07
0.00206468 0.0294135
0.0834 0.219
0.0736 0.1865
2.70928e-05 0
n3
Input
1.08
0.5
2.25e+07
0.0794393
0.2982
0.193
Leakage power When Duty Power ((!(A)) & (!(B))) 0.25 0.25*4.80144e-09 ((!(A)) & (!(B))) 0.25 0.25*0 ((A) & (B)) 0.25 0.25*1.74581e-10 ((A) & (B)) 0.25 0.25*1.43885e-12 ((!(A)) & (B)) 0.25 0.25*9.54676e-10 ((!(A)) & (B)) 0.25 0.25*1.91847e-12 ((A) & (!(B))) 0.25 0.25*1.33789e-09 ((A) & (!(B))) 0.25 0.25*9.59233e-13 Internal power From To when activity/ns energy power(mW) A ^ -> - : none : 0.0225 -1.34308e-19 -3.02193e-09 A ^ -> - : none : 0.0225 8.98457e-17 2.02153e-06 B ^ -> - : none : 0.0225 -1.64624e-19 -3.70404e-09 B ^ -> - : none : 0.0225 6.98163e-17 1.57087e-06 A v -> Y ^ : none : 0.0225 7.15743e-20 1.61042e-09 A v -> Y ^ : none : 0.0225 1.32744e-15 2.98674e-05 B v -> Y ^ : none : 0.0225 4.86779e-20 1.09525e-09 B v -> Y ^ : none : 0.0225 1.61739e-15 3.63913e-05