In spite of a body of research into subscale score reporting at the individual level, there exists a paucity of research into subscale score estimation in international large-scale assessment (ILSA). This dissertation aimed at evaluating the typically available methods for subscale score estimation in order to identify a model that was suitable for item parameter estimation, population score estimation, and reporting valuable subscale scores. This dissertation further examined the models in order to identify the better fitting model. Through investigating the accuracy and bias in estimating the model parameters given different test conditions, the key motivation of this dissertation was to provide practitioners with general guidelines when it comes to estimating subscale scores under different test specifications. This dissertation was based on simulation studies and an empirical study. The findings presented in this dissertation advance the existing knowledge about subscale score estimation by extending the conversation to an ILSA context. This thesis argues that different subscale score estimation methods may be more optimal under different test conditions and sample composition.