Evaluation of approaches to internal validation of multinomial Logit models: The case of personal travel mode choice
The prediction validity of discrete choice models is key for policy making in the transportation sector. For internal validation, i.e., when the population used to estimate and validate the model is the same, different approaches exist. Each approach is characterized in terms of sampling strategy an...
        Saved in:
      
    
          | Published in | Communication in statistics. Case studies and data analysis Vol. 11; no. 3; pp. 316 - 342 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Taylor & Francis
    
        03.07.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2373-7484 2373-7484  | 
| DOI | 10.1080/23737484.2025.2522358 | 
Cover
| Summary: | The prediction validity of discrete choice models is key for policy making in the transportation sector. For internal validation, i.e., when the population used to estimate and validate the model is the same, different approaches exist. Each approach is characterized in terms of sampling strategy and accuracy metric. The former includes in-sample, also referred to as apparent, split-sample, cross-validation, and bootstrapping. The latter include McFadden rho-squared, percentage of right classification, McFadden proportion of right predictions, Brier Score, polytomous discrimination index, and hypervolume under ROC manifold. It is widely recognized that in-sample strategies are overly optimistic because the model is optimized for performance in the sample in which it is estimated. Evaluation of performance of approaches to internal validation has been carried out in the clinical epidemiology area with logistic regression models. This paper evaluates approaches to internal validation using synthetic and real datasets related to personal travel mode choices modeled using multinomial Logit. The performance of each approach is evaluated against the apparent performance in the full population. With both synthetic and real data, cross-validation produces the lowest bias with most metrics. The metric with lowest bias is data-specific. Lowest variability is produced by bootstrapping. | 
|---|---|
| ISSN: | 2373-7484 2373-7484  | 
| DOI: | 10.1080/23737484.2025.2522358 |