Large‐scale assessments such as the Programme for International Student Assessment (PISA) have field trials where new survey features are tested for utility in the main survey. Because of resource constraints, there is a trade‐off between how much of the sample can be used to test new survey features and how much can be used for the initial item response theory (IRT) scaling. Utilizing real assessment data of the PISA 2015 Science assessment, this article demonstrates that using fixed item parameter calibration (FIPC) in the field trial yields stable item parameter estimates in the initial IRT scaling for samples as small as n = 250 per country. Moreover, the results indicate that for the recovery of the county‐specific latent trait distributions, the estimates of the trend items (i.e., the information introduced into the calibration) are crucial. Thus, concerning the country‐level sample size of n = 1,950 currently used in the PISA field trial, FIPC is useful for increasing the number of survey features that can be examined during the field trial without the need to increase the total sample size. This enables international large‐scale assessments such as PISA to keep up with state‐of‐the‐art developments regarding assessment frameworks, psychometric models, and delivery platform capabilities.