When solving dynamic visuo-spatial tasks, the brain copes with perceptual and cognitive processing challenges. In the multiple-object tracking (MOT) task, the number of objects to be tracked (i.e. load) imposes attentional demands, but so does spatial interference from irrelevant objects (i.e. crowding). Presently, it is not clear whether load and crowding activate separate cognitive and physiological mechanisms. Such knowledge would be important to understand the neurophysiology of visual attention. Furthermore, it would help resolve conflicting views between theories of visual cognition, particularly concerning sources of capacity limitations. To address this problem, we varied the degree of processing challenge in the MOT task in two ways: First, the number of objects to track, and second, the spatial proximity between targets and distractors. We first measured task-induced pupil dilations and saccades during MOT. In a separate cohort we measured fMRI brain activity during MOT. The behavioral results in both cohorts revealed that increased load and crowding led to reduced accuracy in an additive manner. Load was associated with pupil dilations, whereas crowding was not. Activity in dorsal attentional areas and frequency of saccades were proportionally larger both with higher levels of load and crowding. Higher crowding recruited additionally ventral attentional areas that may reflect orienting mechanisms. The activity in the brainstem nuclei ventral tegmental area/substantia nigra and locus coeruleus showed clearly dissociated patterns. Our results constitute convergent evidence from independent samples that processing challenges due to load and object spacing may rely on different mechanisms.