VLSI Physical Design: August 2015

Monday 31 August 2015

Calculation Related to power Planning

Power Calculations

1. Number Of The Core Power Pad Required For Each Side Of Chip=(Total Core Power)/{(Number Of Side)*(Core Voltage)*Maximum Allowable Current For A I/O Pad)}

2. Core Current(mA)=(CORE Power)/(Core Voltage )

3. Core P/G Ring Width=(Total Core Current)/{(N0.Of.Sides)*(Maximum Current Density Of The Metal Layer Used For Pg Ring)}

4. Total Current =Total Power Consumption Of Chip(P)/Voltage(V)

5. No.Of Power Pads(Npads) = Itotal/Ip

6. No.Of Power Pins=Itotal/Ip

Where,
Itotal =TOTAL Current
Ip Obtained From Io Library Specification.

7. Total Power=Static Power+Dynamic Power
=Leakage Power+[Internal Power+Ext Switching Power]
=Leakage Power+[{Shortckt+Int Power}]+Ext Switching Power]
=Leakage Power+[{(Vdd*Isc)+(C*V*V*F)+(1/2*C*V*V*F)]

IR Drop

Voltage transfer in metal a drop occurs due to resistance of metal this is known as IR drop.

IR drops are two types

1. Static IR drop

Independent of the cell switching the drop is calculated with the help of wire resistance. Methods to Improve static IR drop

1. Increase the width of wire

2. Provide more number of wire

2 . Dynamic Power Drop

Dynamic IR drop:ir drop is calculated with the help of the switching of the cells. We can improve dynamic IR drop by below methods:

1. Placing dcap cells in between them

2. Increase the no of straps.

Calculation related to IR drop

1. Average Current Through Each Strap=Istrapavg=(Itotal)/(2*Nstraps)mA

2. Appropriate Ir Drop At The Center Of The Strap=Vdrop or IRdrop

=IstrapAvgRs(W/2)*(1/Wstrap)

3. Number Of Straps Between Two Power Pads

Nstrappinspace=Dpadspacing/Lspace.

MIN Ring Width = wring = Ip/Rj Microm

Sunday 30 August 2015

Core Utilization

Utilization:Utilization defines the area occupied by standard cell, macros and blockages. In general 70 to 80% of utilization is fixed because more number of inverters and buffers will be added during the process of CTS (Clock Tree Synthesis) in order to maintain minimum skew.

Core utilization = (standard cell area+ macro cells area)/ total core area

A core utilization of 0.8 means that 80% of the area is available for placement of cells, whereas 20% is left free for routing.

IO Placement / Pin placement

If you are doing a digital-top design, you need to place IO pads and IO buffers of the chip.Take a reactangular or square chip that has pads in four sides.To start with, you may get the sides and relative positions of the PADs from the designers. You will also get a maximum and minimum die size according to the package you have selected. To place IOs, people mainly use a perl script to place them.

Some Basic Rules For Placing Macros

Macro placement

Once you have the size & shape of the floorplan ready and initialized the floorplan, thereby creating standard cell rows, you are now ready to hand place your macros. Do not use any auto placement, I have not seen anything that works. Flylines in your tool will show you the connection between the macros and standard cells or IOs.

Use flylines and make sure you place blocks that connects to each other closer
For a full-chip, if hard macros connect to IOs, place them near the respective IOs
Consider the power straps while placing macros. You can club macros/memories
Creating Power Rings and Straps
Avoided the placement of macros in front of ports.
Arranged the macros to get contiguous core area.
Macro spacing given by space={[(no. of pins) * pitch] + space} / (no of metal layers in horizontal or vertical direction)

Design Netlist

Physical design is based on a netlist which is the end result of the Synthesis process. Synthesis converts the RTL design usually coded in VHDL or Verilog HDL to gate-level descriptions which the next set of tools can read/understand. This netlist contains information on the cells used, their interconnections, area used, and other details. Typical synthesis tools are:

Cadence RTL Compiler/Build Gates/Physically Knowledgeable Synthesis (PKS)

Synopsys Design Compiler

During the synthesis process, constraints are applied to ensure that the design meets the required functionality and speed (specifications). Only after the netlist is verified for functionality and timing it is sent for the physical design flow.

Aspect Ratio

Aspect Raio of Core/Block/Design

The Aspect Ratio of Core/Block/Design is given as:

The aspect ratios of different core shapes are given in below :

The Role of Aspect Ratio on the Design:

The aspect ratio effects the routing resources available in the design
The aspect ratio effects the congestion
The floorplanning need to be done depend on the aspect ratio
The placement of the standard cells also effect due to aspect ratio
The timing and there by the frequency of the chip also effects due to aspect ratio
The clock tree build on the chip also effect due to aspect ratio
The placement of the IO pads on the IO area also effects due to aspect ratio
The packaging also effects due to the aspect ratio
The placement of the chip on the board also effects
Ultimately every thing depends on the aspect ration of core/block/designThe all the points are drawn attention in future articles

Blockages

BLOCKAGES

Placement blockages prevent the placement engine from placing cells at specific locations. Routing blockages block routing resources on one or more layers and it can be created at any point in a design flow. In general placement blockages are created at floor planning stage and routing blockages are created before using any routers. It acts like guidelines for placement of standard cells. Blockages will not be guiding the tool to place the standard cells at some particular area, but it won’t allow the tool to place the standard cell in the blocked areas (in both placement and routing blockages). This is how the blockages acts like guidelines for standard cell placement. During the CTS process (Clock Tree Synthesis) in order to balance the skew, more number of buffers and inverters are added and blockages are used to reserve space for buffers and inverters.

Placement blockages

Use placement blockages to:

Define std-cells and macro area
Reserve channels for buffer insertion
Prevent cells from being placed at or near macros
Prevent congestion near macros

Soft (Non buffer blockage)

Only buffers can be placed and standard cells cannot be placed.

Hard (Std-cell blockage)

Blocks all std-cells and buffers to be placed. Std-cell blockages are mostly used to:

Avoid routing congestion at macro corners
Restrict std-cells to certain regions in the design
Control power rails generation at macro cells

Partial blockages

By default a placement blockage has a blockage factor of 100%. No cells can be placed in that area, but flexibility of blockages can be chosen by partial blockages. To reduce placement density without blocking 100% of the area, changing the blockage factor of an existing blockage to lower value will be a better option.

Keepout Margin (Halo)

fig-1: Halo

It’s the region around the boundary of fixed macros in design in which no other macros or std-cells can be placed. It allows placement of buffers and inverters in its area. Pictorial representation of halo is mentioned in the figure-1.
Halos of adjacent macros can overlap; there the size of halo determines the default top level channel size between macros. Prevent cells from being placed at or near the macros.
If the macros are moved from one place to another, hallows will also be moved.

Monday 24 August 2015

Scripts used in IC Compiler

Purpose and contents of the different scripts

1.init_design_icc.tcl

The purpose of this file is to handoff a floorplanned CEL to the next step, which is the place_opt step. Depending on the input format (MW, Verilog, DDC), it will read the appropriate files and also include the floorplan information provided via either a DEF input file, or already existing in the initial floorplanned CEL.

If the input format is MW CEL, then no SDC constraints are loaded because they are assumed to be in the CEL already. The same is the case when loading DDC. It is only in the case of loading a verilog netlist, that the read_sdc command is executed.

We strongly recommend the usage of group paths to differentiate the Input-to-flop, Flop-to-Output and input-to-outputs feed through paths. That will improve the visibility during optimization. Just like with the SDC constraints, we will not create the group paths in case we enter with a MW cel or a DDC, but only in the case of entering with a pure ASIC flow, i.e. Verilog + sdc constraints.

If certain floorplan constraints need to be added (such as placement or routing blockages), it is recommended to do this in this file, after the read-DEF section.

This file is also setting up the different MV and MCMM portions.

The output CEL created by this script is called: init_design_icc.

2. place_opt_icc.tcl

The purpose of this script is to execute the placement and the placement based optimization. The default command that is executed is:

place_opt -area_recovery -effort low

That will provide the fastest result with still good QoR. After reading the initial CEL, created in the previous step, a file called icc_scripts/common_optimization_settings_icc.tcl It is sourced. That file contains several settings that are recommended to be used during each of the optimization steps that follow. Because we also have the capability to execute a place_opt -cts, Which will also create the clock tree, it is required to also include the icc_scripts / common_cts_settings_icc.tcl file. That file specifies any clock tree related settings.

In addition to the default place_opt command mentioned above, the script contains several other flavors of the

place_opt flow. These steps are put in comments and detailed explanation is provided for each of them. It is sufficient to comment out the undesired place_opt command, and uncomment the desired one.

Eg if the user wants to run scan chain reordering in place_opt, he has to put comments before the default place_opt command, and uncomment the following lines:

## What commands do you need when you want to optimize SCAN?

# Read_def $ ICC_IN_SCAN_DEF_FILE

# Redirect -file $ REPORTS_DIR / scan_chain_pre_ordering.rpt

{Report_scan_chain}

# Place_opt -area_recovery -optimize_dft -num_cpus $ ICC_NUM_CPUS

The output CEL created in this step is called place_opt_icc.

3. clock_opt_icc.tcl

The purpose of this script is to execute the following three steps:

•Clock tree synthesis and clock tree optimization (CTO)

•Optimization of the post-cts design, including hold fixing based on virtual routes

•Routing of the clock tree

The file icc_scripts / common_cts_settings_icc.tcl needs to be edited when you want to define any clock tree specific requirements.

Examples are:

•Clock tree exceptions.

•Non Default Routs (NDR's) to define e.g. double spacing for Xtalk avoidance on clock nets.

•Definition of clock tree master cells, for clock tree synthesis, or delay insertion during clock tree optimization (CTO).

•Inter clock delay balancing options specified via set_inter_clock_delay_options.

By default, ICC-RM does not execute any of these clock tree settings, because these are obviously very design dependent.

In the clock_opt_icc.tcl script itself, there are 3 variants of the default clock_opt flow provided:

•How to execute inter clock delay balancing?

•How to update the IO-latency after CTS?

•What commands to execute once your design becomes too congested after clock tree synthesis?

The cell that is saved at the end is called clock_opt_icc

4. route_opt_icc.tcl

The purpose of this script is to execute the routing step and proceed with the post route optimization in order to close the design for timing, DRC, and other design constraints.

The command that is executed is the mainstream route_opt command, ie:

•route_opt -effort low -xtalk_reduction

The tool will run by default in Xtalk Delta Delay (XDD) mode.

To enable static noise (aka glitches), as well as some advanced timing analysis capabilities (Arnoldi, timing windows, CRPR) you will have to edit the file:

./icc_scripts/common_route_si_settings.tcl and uncomment the appropriate lines.

The route_opt_icc.tcl scripts also contains (in comments), the required commands to run leakage power optimization, as well as some of the frequent used variants of route_opt:

•Incremental route_opt optimization

•Limiting the potential disturbance to the design using -size_only

•Hold fixing only optimization

The cell that is saved at the end is called route_opt_icc

5.chip_finish_icc.tcl

The purpose of this script is to provide the commands to execute the following chip finishing steps:

• Antenna fixing against the plasma effect.

•Critical area reduction by executing timing driven detail route wire spreading ( Global route wire spreading is on by default as part of the Xtalk avoidance).

•Redundant via insertion.

•Standard cell filling.

•Timing driven Metal filling.

None of these steps are enabled by default, but can be controlled easily by editing the chipfinishing variables in the

icc_setup.tcl file.

The cell that is saved at the end is called chip_finish_icc

6. signoff_opt_icc.tcl

This TCL script is opening the chipfinished cell, and executes run_signoff and signoff_opt.

The run_signoff command is running Synopsys's signoff extraction tools: Star-rcxt and Primetime. The ICC database in annotated with these signoff numbers.

The signoff_opt command is optimizing the design by making use of these signoff delays. After every optimization loop, the design will be incrementally extracted by Star-rcxt and incrementally timed by PTSI. With this methodology, the output is a design that is signoff ready.

This step is run by default after chipfinishing. If that step include metal filler, the signoff_opt_icc.tcl will also execute the required trim_eco_filler commands to clean up some of the modified filler polygons.

7. outputs_icc.tcl

The purpose of this script is to create several output files that will allow you to proceed with the next steps of the flow.

Following files are generated:

•Verilog netlist with and without PG connections

•SBPF binary parasitic file (ASCII SPEF command is commented out)

•GDSII streamout file

Onchip Variation (OCV)

Static timing analysis of a design is performed to estimate its working frequency after the design has been fabricated. Nominal delays of the logic gates as per characterization are calculated and some pessimism is applied above that to see if there will be any setup and/or hold violation at the target frequency. However, all the transistors manufactured are not alike. Also, not all the transistors receive the same voltage and are at same temperature. The characterized delay is just the delay of which there is maximum probability. The delay variation of a typical sample of transistors on silicon follows the curve as shown in figure 1. As is shown, most of the transistors have nominal characteristics. Typically, timing signoff is carried out with some margin. By doing this, the designer is trying to ensure that more number of transistors are covered. There is direct relationship between the margin and yield. Greater the margin taken, larger is the yield. However, after a certain point, there is not much increase in yield by increasing margins. In that case, it adds more cost to the designer than it saves by increase in yield. Therefore, margins should be applied so as to give maximum profits.

Number of transistors v/s delay for a typical silicon transistors sample

We have discussed above how variations in characteristics of transistors are taken care of in STA. These variations in transistors’ characteristics on as fabricated on silicon are known as OCV (On-Chip Variations). The reason for OCV, as discussed above also, is that all transistors on-chip are not alike in geometry, in their surroundings, and position with respect to power supply. The variations are mainly caused by three factors:

Process variations: The process of fabrication includes diffusion, drawing out of metal wires, gate drawing etc. The diffusion density is not uniform throughout wafer. Also, the width of metal wire is not constant. Let us say, the width is 1um +- 20 nm. So, the metal delays are bound to be within a range rather than a single value. Similarly, diffusion regions for all transistors will not have exactly same diffusion concentrations. So, all transistors are expected to have somewhat different characteristics.
Voltage variation: Power is distributed to all transistors on the chip with the help of a power grid. The power grid has its own resistance and capacitance. So, there is voltage drop along the power grid. Those transistors situated close to power source (or those having lesser resistive paths from power source) receive larger voltage as compared to other transistors. That is why, there is variation seen across transistors for delay.
Temperature variation: Similarly, all the transistors on the same chip cannot have same temperature. So, there are variations in characteristics due to variation in temperatures across the chip.

How to take care of OCV: To tackle OCV, the STA for the design is closed with some margins. There are various margining methodologies available. One of these is applying a flat margin over whole design. However, this is over pessimistic since some cells may be more prone to variations than others. Another approach is applying cell based margins based on silicon data as what cells are more prone to variations. There also exist methodologies based on different theories e.g. location based margins and statistically calculated margins. As advances are happening in STA, more accurate and faster discoveries are coming into existence.

Signal Integrity

Signal integrity

Signal Integrity is the ability of an electrical signal to carry information reliably and to resist the effects of high-frequency electromagnetic interference from nearby signals. Effects: CrossTalk, EM, Antennae Effects.

Crosstalk:

Switching of the signal in one net can interference neighboring net due to cross coupling capacitance. This affect is known as cross talk. Crosstalk can lead to crosstalk-induced delay changes or static noise.

Techniques to solve Crosstalk

Double spacing => more spacing=>less capacitance=>less cross talk

Multiple vias => less resistance=>less RC delay

Shielding => constant cross coupling capacitance =>known value of crosstalk

Buffer insertion => boost the victim strength.

Net ordering => in same metal layer change the net path.

Layer assignment=> Change the metal layer of two nets if possible. (One s/g in mtl3 and one signal in 4).

• Signal Electro Migration:

Electromigration is the permanent physical movement of metal in thin wire connections resulting from the displacement of metal ions by flowing electrons. ectromigration can lead to shorts and opens in wire connections, causing functional failure of the IC device.

High current densities cause wearing of metal due to EM.

Techniques to solve EM:

1) Increase the width of the wire

2) Buffer insertion

3) Upsize the driver

4) Switch the net to higher metal layer

Antennae effects

The antenna effect [plasma induced gate oxide damage] is an effect that can potentially cause yield and reliability problems during the manufacture of MOS integrated circuits. The IC fabs normally supply antenna rules that must be obeyed to avoid this problem and violation of such rules is called an antenna violation. The real problem here is the collection of charge.

A net in an IC will have atleast one driver (which must contain a source or drain diffusion or in newer technology implantation is used), and at least one receiver (which will consist of a gate electrode over a thin gate dielectric). Since the gate dielectric is very thin, the layer will breakdown if the net somehow acquires a voltage somewhat higher than the normal operating voltage of the chip. Once the chip is fabricated, this cannot happen, since every net has at least some source/drain implant connected to it. The source/drain implant forms a diode, which breaks down at a lower voltage than the oxide (either forward diode conduction, or reverse breakdown), and does so non-destructively. This protects the gate oxide. But during the construction phase, if the voltage is build up to the breakdown level when not protected by this diode, the gate oxide will breakdown.

Antenna rules are normally expressed as an allowable ratio of metal area to gate area. There is one such ratio for each interconnect layer. Each oxide will have different rule.

Antenna violations must be fixed by the router. Connecting gate oxide to the highest metal layer, adding vias to near the gate oxide to connect to highest layers used and adding diode to the net near the gate are some fixes that can be applied. Adding diode rises the capacitance and makes circuit slower and consumes more power.

Techniques to solve Antennae violation

1. Jumper insertion

2. Diode insertion near logic gate input pin

3. Buffer Insertion

Saturday 22 August 2015

Advance Onchip Variation

What is Advanced OCV -

AOCV uses intelligent techniques for context specific derating instead of a single global derate value, thus reducing the excessive design margins and leading to fewer timing violations. This represents a more realistic and practical method of margining, alleviating the concerns of overdesign, reduced design performance, and longer timing closure cycles.

Advanced OCV determines derate values as a function of logic depth and/or cell, and net location. These two variables provide further granularity to the margining methodology by determining how much a specific path in a design is impacted by the process variation.

There are two kinds of variations.

1) Random Variation

2) Systematic Variation

Random Variation-

Random variation is proportional to the logic depth of each path being analyzed.

The random component of variation occurs from lot-to-lot, wafer-to-wafer, on-die and die-to-die. Examples random variation are variations in gate-oxide thickness, implant doses, and metal or dielectric thickness.

Systematic Variation-

Systematic variation is proportional to the cell location of the path being analyzed.

The systematic component of variation is predicted from the location on the wafer or the nature of the surrounding patterns. These variations relate to proximity effects, density effects, and the relative distance of devices. Examples of systematic variation are variations in gate length or width and interconnect width.

Take the example of random variation, given the buffer chain shown in Figure 1, with nominal cell delay of 20, nominal path delay @ stage N = N * 20. In a traditional OCV approach, timing derates are applied to scale the path delay by a fixed percentage, set_timing_derate –late 1.2;set_timing_derate –early 0.8

Figure 1: Depth-Based Statistical Analysis

Statistical analysis shows that the random variation is less for deeper timing paths and not all cells are simultaneously fast or slow. Using statistical HSPICE models, Monte-Carlo analysis can be performed to measure the accurate delay variation at each stage. Advanced OCV derate factors can then be computed as a function of cell depth to apply accurate, less pessimistic margins to the path.

Figure 2a shows an example of how PrimeTime Advanced OCV would determine the path depth for both launch and capture. These values index the derate table, as shown in Figure 7, to select the appropriate derate values.

Fig 2a-Depth Based Advanced OCV

Effects of systematic variation shows that paths comprised of cells in close proximity exhibit less variation relative to one another. Using silicon data from test-chips, Advanced OCV derate factors based on relative cell-location are then applied to further improve accuracy and reduce pessimism on the path. Advanced OCV computes the length of the diagonal of the bounding box, as shown in Figure 2b, to select the appropriate derate value from the table.

Fig2b -Distance Based advanced OCV

PrimeTime Advanced OCV Flow -

PrimeTime internally computes depth and distance metrics for every cell arc and net arc in the design. It picks the conservative values of depth and distance thus bounding the worst-case path through a cell.

Fig-3

Home