We used the multiple object tracking (MOT) paradigm to investigate sustained attention to several objects simultaneously, while directly manipulating task difficulty. By means of pupillometry, we investigated the role of effort in the MOT task and aimed to distinguish between different proposed models for MOT: Pylyshyn’s early-vision model, a purely attentional account of MOT, Yantis’ perceptual grouping model, and a purely serial account of MOT. A phasic increase in pupil size was observed when tracking several objects, while a decrease in pupil size was seen when subjects passively viewed the display. Moreover, the phasic pupil dilation in tracking conditions was proportional to task difficulty. Previously, pupil responses have been demonstrated to have an intimate relation to activation of the Locus Coeruleus, which in turn is thought to have a modulating effect on attention through its norepinephrinergic projections. Importantly, phasic activity of the Locus Coeruleus has been associated with task “exploitation”. The results appear to be in line with Pylyshyn’s early-vision model and a purely attentional account of MOT, whereas other models may have more difficulties explaining the current results. Since the assumed application of effort in MOT differs in these models, suggestions are offered to further distinguish between these models and to clarify which mechanisms make us able to pay attention to several objects simultaneously.