Model Calibration
From Traffic Analysis and Microsimulation
Status: Draft
Introduction
Microsimulation is a tool for analyzing the performance of roadways that have complicated geometric configurations and congested traffic. The core of a microsimulation model is a set of mathematical algorithms that evaluate the motion of each vehicle on a second-by-second basis as it interacts with the roadway network, the traffic control system, and other vehicles in the network. To account for the wide range of traffic conditions in the real world, all commercial microsimulation software packages allow various parameters to be adjusted by the modeler. For example, most drivers in a small city such as Eagle River or Fond du Lac are probably rather "laid back" compared to their counterparts in Madison or Milwaukee, who in turn are not as aggressive as drivers in downtown Chicago or Manhattan. Therefore, driver aggressiveness and other model parameters must be adjusted to account for local conditions.
Model calibration refers to the process of assuring that a model reproduces real-world traffic conditions reasonably well. Microsimulation models that have not been properly calibrated can produce unrealistic or misleading results. Therefore, before applying the model it is essential for the Project Manager to assure that it has been properly calibrated.
By definition, a model is by definition a simplified representation of reality, and no model can reproduce reality perfectly. This guideline seeks to strike a balance between "perfection" and what is practical to achieve within the time-frame and budget of a normal highway project. The overall goal of this process is to assure that the model is a good representation of the actual traffic conditions. To achieve this, the model should not only meet the mathematical targets outlined in this guideline (Realism Tests 1 & 2), but should also be realistic in terms of overall traffic patterns such as lane use, queuing, and route choice (Realism Test 3).
Microsimulation models contain many adjustable parameters, and the relevant adjustments vary for each software package. If a model fails to achieve calibration targets, it is essential to verify that the right parameters are modified to correct the situation. For example, adjusting driver aggressiveness or link cost factors will not successfully compensate for a flawed origin-destination matrix. The user manuals and technical support service for each software product generally provide some guidance on calibration parameters, but these people and documents usually assume that the model is free of major flaws. If serious model calibration problems are encountered, advice from an independent expert who is experienced with the relevant software can help assure that the calibration process does not consume excessive amounts of time and other resources.
Preparation and delivery of the Model Calibration Report is a major milestone in the microsimulation model development process. A Microsoft Excel template that can be used to prepare the calibration report can be found at this link. The template is in a zip file; to use it you will need a decompression tool such as 7-Zip.
Calibration vs. Validation
The terms "calibration" and "validation" are used somewhat interchangeably by traffic modelers. Some people use one term to refer to the process that software developers go through to test their products, and the other to describe the steps necessary to assure that the model for a specific location works properly. Others use the two terms to refer to different phases of the process of making sure that the model correctly represents real-world traffic conditions. All of these activities are iterative and inter-related, so for simplicity the word "calibration" is used throughout this document to refer to the whole process.
Why Model Calibration is Necessary
The purpose of traffic modeling is to predict the future by exploring the effect future traffic volumes and proposed roadway improvements will have on the operation of the transportation system. The process of developing a microsimulation model starts with a BASE-YEAR MODEL (representing the existing traffic conditions) and then "forks" into various scenarios representing future-year DO-MINIMUM and BUILD alternatives.
Although the existing conditions model may be unimportant to decision-makers, it is vital to the modeling process. The only way to determine that a model is working properly is to compare the base year model with the real-world traffic. If the base year model cannot reproduce the existing traffic conditions with a reasonable degree of accuracy, then it will be of no value in predicting the future. Building a design-year model on an uncalibrated base would be as unwise as building a highway on uncompacted soil, or constructing a bridge on unstable footings.
Many things can go wrong in the modeling process, even if everyone involved has the best intentions. For example, a typographical error in the origin-destination matrix can have a large impact on traffic flows. Some modeling errors are obvious: for example a model of existing conditions might show gridlock in a location where real-world traffic is free-flowing. At other times, errors are subtle and hard to detect. When a subtle but significant error goes undetected, misleading conclusions can be drawn from the model. As a result, money could be spent in the wrong place, potentially leading to criticism that the agency implemented a solution that did not work. This would be the opposite of the desired outcome of modeling, which is to help assure that proposed designs work as expected.
Running an uncalibrated or miscalibrated model with future volumes (or proposed roadway geometry) can cause many problems. Typically the model will either give misleading results, or it will lock up and give no results at all. Even if the uncalibrated model can be run, its results are highly dubious and should be regarded with utmost suspicion. Therefore, the existing conditions model must be successfully calibrated before work begins on any design year scenarios.
Benefits of Careful Calibration
Developing a traffic simulation model is both labor-intensive and data-intensive. By consistently following the recommendations in this guideline, Project Managers can achieve the following benefits:
- The quality and integrity of the model can be assured, protecting the Department's investment in the traffic model.
- Mistakes and modeling glitches can be detected and corrected prior to showing the model to decision makers or the public.
- Model developers can assure that their results are reliable and suitable for further analysis, such as benefit/cost studies.
- Management can be confident that recommendations based on microsimulation studies are trustworthy.
- Projects from different locations can be compared more easily, since they are calibrated to a consistent standard. This is important when the model is being used to make decisions about how to allocate resources.
Global, Categorical, and Link-Specific Calibration Factors
Microsimulation models are comprised of links (roadway segments) and nodes (points where there is a widening, narrowing, change in speed limit or curvature, intersection, etc.). Global calibration parameters affect the entire network. Categorical factors affect a category of the network's links (such as all off-ramps). Link-specific factors only influence vehicles while they are driving on an individual link in the network.
When a model is being calibrated, global and categorical parameters should be adjusted first. Link-specific factors should only be used for final fine-tuning of the model, and they should be used sparingly. Link-specific parameters (such as cost factors) should be avoided on any links that will be eliminated or substantially modified in the DO-MINIMUM or BUILD scenarios.
Carrying Calibration Parameters Forward Into Model Forks
Once the main calibration parameters have been set for the BASE YEAR model, they must be carried forward without change in each subsequent fork (scenario). For example, if it is necessary to use a MEAN TARGET HEADWAY of 0.85 to reproduce the level of congestion in the existing real-world network during the AM peak hour, then the same MEAN TARGET HEADWAY value should be used for the AM peak hour model in the future years. Forcing more traffic through the network by reducing the future-year MEAN TARGET HEADWAY to 0.80 could be tempting—but it implies that in the future drivers will become more aggressive, alert, and adept: an assumption that is probably not realistic.
The Model Scoping, Forking, and Scenarios guideline provides more information about model forking, including advice about the sequence of calibration efforts to reduce the number of forks in the modeling process.
Use of Balanced Target Flows for the Existing Network
Usually the available traffic data for a microsimulation study area is unbalanced. For example, starting at an Automatic Traffic Recorder (ATR) station on a freeway mainline and proceeding in the direction of travel, adding the raw on-ramp volumes and subtracting the raw off-ramp volumes, the result will almost never match the volume measured at the next downstream ATR. This happens for three main reasons:
- Most microsimulation projects require combining data collected on different days.
- There are inherent imperfections in the data collection process. For example, a vehicle could be counted twice (or not counted at all) if it is changing lanes as it drives over a detection loop.
- Data collected manually (such as intersection turning counts) is subject to human error.
Microsimulation models cannot account for these imperfections, so the data set should be "balanced" to create a mathematically consistent volume set. In general, balanced volumes should be used as the traffic volume targets for the base-year model. The use of balanced volumes usually removes statistical outliers from the target volume set, making it easier to achieve calibration targets. More information about the balancing process will be contained in the Traffic Volume Balancing guideline.
On arterials and other corridors that are not access controlled, side zones may need to included in the model to account for traffic generated by developments located between intersections. A "soft" balancing process should be used to verify that the variation in traffic volumes between intersections can reasonably accounted for by the amount of development that exists in the corridor. "Soft balancing" means that the volume entering each intersection is similar to the sum of the volumes leaving the intersections that feed it. This is distinguished from "hard balancing" where the entering volume must exactly equal the sum of the relevant upstream volumes (which would be the case for an access-controlled facility).
Random Seeds
Real-world traffic varies considerably from day to day, and even from minute to minute. Microsimulation models attempt to mimic this effect by using stochastic (randomized) variables to account for variations in driver behavior and departure time. Each software package provides a mechanism for the modeler to select one or more SEEDS that determine a sequence of random numbers that control these parameters during the model run. Assuming that all other factors remain the same, two model runs with the same SEED value will produce identical results.
To account for the range of traffic conditions encountered in the real world, during the calibration process each model should be run with seven different seed values:
- Calibration statistics should be prepared for each of the seven SEEDS.
- The highest and lowest results may be discarded, leaving five model runs that are included in the Model Calibration Report.
- The same five SEEDS should be used for the DO-MINIMUM and BUILD models.
Since the microlimulation packages use proprietary algorithms to generate their random seeds, the use of prime numbers as seed values is recommended.
Unreleased Vehicles and Stuck Vehicles
Unreleased or BLOCKED vehicles can occur when there are capacity problems in the model. For example, the screen shot at the right shows an intersection where a signal timing problem is causing a bottleneck for westbound vehicles. As westbound traffic attempting to get through the intersection backs up, Zone 51 becomes overloaded and no more vehicles can be released. In this software package, the problem is identified by the magenta shading superimposed over Zone 51, and by the appearance of a blocked vehicle counter, which in this case shows that 18 vehicles have been unable to enter the network (the number of BLOCKED vehicles will change as the model runs).
Unreleased vehicles are a serious model calibration problem, since they create a mismatch between the travel demand and the actual number of vehicles that are successful in getting through the network. Typically, this results in a downstream traffic volume undercount and gives the false impression that downstream operations are better than they actually are. (In these situations the model generally fails Realism Tests 3.2, 3.3 and/or 3.4, which in turn invalidates the results of Realism Tests 1 & 2).
Blocked vehicle problems must be resolved before computing any of the mathematical targets (Realism Tests 1 & 2). In most cases, this means fixing the model feature that is causing the blockage (in the previous example that might mean revising the signal timings). If a serious blockage problem exists in the real world, the link where the blockage exists should be extended so that the entire queue can be accommodated in the model: the actual queue length should be measured in the field and matched very closely in the model. In the unlikely event that it is not possible to extend the link where the congestion occurs, the real-world queue at the site of the blockage should be carefully observed in the field and should very closely match the sum of the modeled queue length plus the number of unreleased vehicles.
Some software packages include "blockage removal" features which "destroy" blocked vehicles by removing them from the network automatically if they are stuck for a pre-determined number of seconds. These tools can be helpful during pre-calibration model building, but because they result in a traffic volume undercount their use is generally not acceptable in a "calibrated" model. Instead, the problem that is causing the blockages needs to be fixed. For instance in the case of the two green trucks, the ramp calibration parameters may need to be adjusted so that they can merge correctly into the mainline traffic stream.
The GEH Formula
The traffic volumes in different portions of a highway corridor typically vary over a wide range. For example, the mainline of a freeway might carry 5000 vehicles per hour, while one of the freeway’s on-ramps carries only 50 vehicles per hour. In that situation, it would not be possible to select a single percentage that can be used as a model accepance criterion for both volumes. For example, setting a volume tolerance of 5% would permit a modeled mainline flow of 5000 ± 250 vehicles, which would be very lenient compared to the ramp tolerance of 50 ± 3 vehicles. Some traffic modelers use a matrix of tolerance percentages for various volume ranges, but this can be cumbersome and is prone to mathematical discontinuities.
To overcome these problems, Geoffrey E. Havers developed a continuous volume tolerance formula while while working as a transport planner in London, England in the 1970s. Colleagues dubbed it the GEH formula. Although its mathematical form is similar to a chi-squared test, is not a true statistical test. Rather, it is an empirical formula that has proven useful for a variety of traffic analysis purposes.
For hourly traffic flows, the GEH formula is:
- Where:
- m is the traffic volume from the traffic model (vehicles per hour)
- c is the real-world traffic count (vehicles per hour)
In an Excel spreadsheet, if the the modeled hourly flow is in cell M1 and the real-world hourly count is in cell C1, this would be written as
=SQRT(2*(M1-C1)^2/(M1+C1))
The GEH formula was created for hourly flows and is not completely self-scaling. Traffic volumes from shorter or longer durations must be converted to hourly flow rates before computation of the GEH formula. For example, 15 minute volumes must be multiplied by 4 to compute the hourly flow rate.
For daily traffic volumes, a simplistic approximation is that the peak hourly flow is about 10% of the Annual Average Daily Traffic (AADT). Using this approximation, the GEH formula for daily flows can be computed as follows:
]]Image:GD Formula.gif|center[[
- Where:
- M is the traffic volume from the traffic model (AADT)
- C is the real-world traffic count (AADT)
In an Excel spreadsheet, if the the modeled Annual Average Daily Traffic (AADT) is in cell M1 and the real-world AADT is in cell C1, this would be written as
=SQRT((0.2*M1^2-0.4*C1*M1+0.2*C1^2)/(M1+C1))
Applying the GEH Formula to Individual Traffic Flows in the Model
For individual traffic flows, the following rules of thumb can be applied using the GEH formula:
GEH less than 5 | Acceptable fit, probably OK. |
GEH between 5 and 10 | Caution: possible model error or bad data. |
GEH greater than 10 | Warning: high probability of modeling error or bad data. |
In general the GEH needs to be less than 5 for any location that is particularly important to the study goals. Examples of such sites include:
- Links representing crossings of major rivers or other large bridges, viaducts, causeways, tunnels, etc.
- "Bottleneck links" representing choke-points on congested networks.
- Links being considered for major reconfiguration.
- Turn flows that will be be used to make final design decisions at high-volume intersections.
Acceptance Criteria for the Model as a Whole
As noted in the introduction, the goal of model calibration is to assure that the model as a whole is a good representation of the actual traffic conditions. This means that the model must not only meet the mathematical targets related to traffic volumes and speeds, but must also be reasonable in terms of overall traffic patterns such as lane choice, queueing, and routeing. Microsimulation models must meet all of the criteria identified in Realism Tests 1, 2, and 3 below. Broadly speaking this means the model must conform in the following ways:
- The modeled traffic volumes match the observed (real-world) volumes within acceptance thresholds.
- The modeled travel times and speeds are similar to the actual ones.
- The modeled travel patterns are realistic throughout the network.
Test | Criteria | Acceptance Targets |
1.1. | G_{H} < 5.0 | At least 85% of freeway and arterial mainline links. |
1.2. | G_{H} < 5.0 | At least 85% of entrance and exit ramps. |
1.3. | G_{H} < 5.0 | At least 75% of intersection turn volumes. |
1.4. | Individual flows within ±400 vehicles per hour for flows exceeding 2700 vehicles per hour. | At least 85% of applicable mainline links. |
1.5. | G_{H} < 4.0 for total flows on screenlines. | All (or nearly all) screenlines. |
1.6. | Total screenline flows (normally 5+ links) within ±5%. | All (or nearly all) screenlines. |
If the model is prepared on an AADT basis, use G_{D} in place of G_{H} and multiply the criteria in Realim Test 1.4 by 10.
Test | Criteria | Acceptance Targets |
2.1. | Modeled travel time within ±1 minute for routes with observed travel times less than 7 minutes. | At least 85% of routes. |
2.2. | Modeled travel time within ±15% for routes with observed travel times greater than 7 minutes. | At least 85% of routes. |
2.3. | Travel speeds within ±10 mph. | At least 85% of mainline links. |
Test | Criteria | Acceptance Targets |
3.1. | Routing check for 10 most important Origin-Destination pairs. | 9 of 10 follow routes that a typical driver would use. |
3.2. | Visually realistic queuing patterns at intersections and congested links. | Entire network. |
3.3. | Real-world bottlenecks replicated. | Entire network. |
3.4. | Bottlenecks in locations where real-world bottlenecks do not exist. | None in entire network. |
3.5 | Vehicles doubling back or making unrealistic U-turns. | None in entire network. |
3.6. | Vehicles exiting and then re-entering a freeway, side-street, or driveway. | Rare or non-existent in entire network. |
3.7. | Freeway lane choice. | Consistent with field conditions. |
3.8. | Freeway merging. | Consistent with field conditions. |
3.9. | Vehicle types and truck percentages. | Correct on freeway mainlines and for network as a whole. Arterial intersections correct if required to meet study goals. |
Screenlines
In physical geography, a "ridgeline" runs along the top of a set of hills, dividing the rainfall into two or more watersheds. Similarly, a screenline (also called a cordon line) is a line or curve drawn across a map of the study area, separating two or more "traffic sheds" or sets of traffic flows. At least one screenline should be established in any model that includes more than one parallel route.
To illustrate the screenline concept, the image below shows the Green Bay area, which is bisected by the Fox River. As of 2010, the Green Bay area has five highway bridges that cross this wide waterway. It is fair to say that Green Bay has two traffic sheds: east and west of the river. If a model was being developed to evaluate the feasibilty of installing a sixth bridge, it would be important for that model to have a very accurate representation of the total traffic volume that crosses the Fox. Since the Fox River is such an important traffic shed boundary, in such a model the river should be treated as a screenline and the total traffic flows crossing the screenline should be checked for conformance with the criteria in Realism Tests 1.5 and 1.6. In this instance, two checks would be made: the sum of the eastbound traffic volumes for all 5 bridges, and the sum of the westbound flows on the 5 bridges.
Other situations where screenlines might be used include:
- Total traffic entering/exiting a study area.
- Total traffic entering/exiting a downtown.
- Total traffic entering/exiting a special traffic generator, such as a stadium or airport.
- Total traffic crossing a major jurisdictional boundary, such as the state line.
An example of the use of screenlines in a regional travel demand forecasting situation can be found on the Southern California Association of Governments website.
Calibration of Models with Peturbation and/or Dynamic Assignment
PERTURBATION and DYNAMIC ASSIGNMENT are features present in some microsimulation software packages (such as Paramics). These parameters affect vehicle routing in the network.
When PERTURBATION and DYNAMIC ASSIGNMENT are turned off, microsimulation models use an "All Or Nothing" (AON) method to determine vehicle routing. This method is derived from the concept of Wardrop equilibrium, with all vehicles using the least-cost (fastest) path to get from origin to destination. Some software packages offer a refinement of this concept by allowing a PERTURBATION adjustment. The idea is that some drivers actually follow routes that are slightly longer than the least-cost path (in the real world this could occur for any number of reasons: for example, a driver sticks to a particular route by force of habit, doesn't like driving on a particular road, or chooses a route because she needs to deposit some letters in the mailbox along the way). The PERTURBATION adjustment factor permits the modeler to account for these real-world phenomena, but excessive perturbation can cause erratic vehicle behavior such as vehicles making U-turns or loops.
When DYNAMIC ASSIGNMENT is enabled in a model, some vehicles periodically re-evaluate their route choice based on actual traffic conditions. DYNAMIC ASSIGNMENT should be considered only for complex, highly congested urban networks where drivers have good information about traffic conditions such as frequent radio traffic reports or variable message signs, and where the effects of vehicle re-routing during the trip are important for the study goals.
Generally, DYNAMIC ASSIGNMENT can be used with or without PERTURBATION. The steps for determining which combination to use are as follows:
- ALL OR NOTHING assignment should be used during model building and initial calibration.
- If appropriate, various levels of PERTURBATION should then be introduced to see if they improve calibration and reasonableness.
- Once an appropriate level of PERTURBATION has been selected, various levels of DYNAMIC ASSIGNMENT can then be tested to see if they improve calibration and reasonableness.
Perturbation and dynamic assignment should not be used unless they improve the model calibration or are necessary for other key requirements of the study.
Calibration in the Context of the Model Development Process
When preparing the model calibration statistics, model runs from the primary modeling software (e.g. Paramics Modeller) should always be used. It is not acceptable to prepare the model calibration report using statistics from the O-D matrix estimation software (e.g. Paramics Estimator). These are different steps of the model development process, as indicated in the flowchart below.
As discussed in more detail in the addendum on Route Assignment in Paramics Estimator, the matrix estimation tool in the Quadstone Paramics software suite can work around some network coding problems when ALL-OR-NOTHING traffic assignment is selected. This is a useful feature, but the traffic volumes will not necesarilly be the same when the matrix is brought back into Paramics Modeller, where problems such as intersection coding errors can affect throughput. (If DYNAMIC ASSIGNMENT is used, incorrect network coding will strongly influence the O/D matrix estimation process, possibly resulting in matrix errors or failure to converge to a stable matrix).
After a stable matrix has been obtained using the matrix estimation tool, the O/D matrix must be brought back to the primary modeling software (e.g. Paramics Modeller), where glitches can be identified and fixed. Therefore, for the purpose of preparing the model calibration reports the network must be run using the primary modeling software.
Calibration of Future-Year Models
After calibration has been successfully completed for the EXISTING CONDITIONS model, testing of the future-year DO MINIMUM scenario may begin. At this stage the forecasted future-year volumes will become important. There are two possible situations:
- Microsimulation Model as Forecasting Tool. In some cases the microsimulation model itself is used to forecast the future volumes. This is typically done by adjusting the base-year Origin-Destination matrix using suitable growth factors. The simplest method is to multiply the entire O/D matrix by a growth factor (for example 0.75% per year). Another method is to increase the volume on selected O/D pairs; this method is often used when new development is anticipated, using the techniques outlined in the ITE Trip Generation Handbook. A combination of the two methods can also be used.
When the microsimulation model is the forecasting tool, the resulting link volumes must be carefully compared against the existing volumes to assure that the forecast is reasonable. The first check is to review the growth rates on each link in the model to assure that they are within the normal range (about 0.5% to 1.0% per year in most parts of Wisconsin). Next the spatial pattern of volume increases should be reviewed; these increases should occur in appropriate locations based on the type and scale of the anticipated changes in land use and development. Another important check is to assure that there are not any unexpected reductions in volumes (this can happen in microsimulation models if vehicles get stuck somewhere in the network and fail to reach their destination). - Matching External Forecasts. If the future-year forecasts have been prepared externally (using a regional travel demand forecasting model or a trend analysis method), it will be necessary to assure that the volumes in the future-year microsimulation model are similar to the forecast targets. This can be done using the GEH methodology and criteria outlined previously. If significant differences (high GEHs) are found, it will probably be necessary to talk with the people who created the external forecasts. Considerable effort and dialogue may be necessary to clarify whether the problem lies in the microsimulation model, the external forecasts, or a combination of both.
In all cases, travel times, speeds, queuing and overall patterns must also be checked for each future-year model scenario. These rewiews are similar to Realism Tests 2 and 3, but are reasonableness checks rather than numerical targets.