Evaluation of approaches to internal validation of multinomial Logit models: The case of personal travel mode choice

The prediction validity of discrete choice models is key for policy making in the transportation sector. For internal validation, i.e., when the population used to estimate and validate the model is the same, different approaches exist. Each approach is characterized in terms of sampling strategy an...

Full description

Saved in:

Bibliographic Details
Published in	Communication in statistics. Case studies and data analysis Vol. 11; no. 3; pp. 316 - 342
Main Authors	Parmar, Janak, Delle Site, Paolo
Format	Journal Article
Language	English
Published	Taylor & Francis 03.07.2025
Subjects	Accuracy metric internal validation multinomial Logit optimism bias prediction validity sampling strategy travel mode choice
Online Access	Get full text
ISSN	2373-7484 2373-7484
DOI	10.1080/23737484.2025.2522358

Cover

More Information
Summary:	The prediction validity of discrete choice models is key for policy making in the transportation sector. For internal validation, i.e., when the population used to estimate and validate the model is the same, different approaches exist. Each approach is characterized in terms of sampling strategy and accuracy metric. The former includes in-sample, also referred to as apparent, split-sample, cross-validation, and bootstrapping. The latter include McFadden rho-squared, percentage of right classification, McFadden proportion of right predictions, Brier Score, polytomous discrimination index, and hypervolume under ROC manifold. It is widely recognized that in-sample strategies are overly optimistic because the model is optimized for performance in the sample in which it is estimated. Evaluation of performance of approaches to internal validation has been carried out in the clinical epidemiology area with logistic regression models. This paper evaluates approaches to internal validation using synthetic and real datasets related to personal travel mode choices modeled using multinomial Logit. The performance of each approach is evaluated against the apparent performance in the full population. With both synthetic and real data, cross-validation produces the lowest bias with most metrics. The metric with lowest bias is data-specific. Lowest variability is produced by bootstrapping.
ISSN:	2373-7484 2373-7484
DOI:	10.1080/23737484.2025.2522358