Accurately simulating gross primary productivity (GPP) in terrestrial ecosystem models is critical because errors in simulated GPP propagate through the model to introduce additional errors in simulated biomass and other fluxes. We evaluated simulated, daily average GPP from 26 models against estimated GPP at 39 eddy covariance flux tower sites across the United States and Canada. None of the models in this study match estimated GPP within observed uncertainty. On average, models overestimate GPP in winter, spring, and fall, and underestimate GPP in summer. Models overpredicted GPP under dry conditions and for temperatures below 0°C. Improvements in simulated soil moisture and ecosystem response to drought or humidity stress will improve simulated GPP under dry conditions. Adding a low-temperature response to shut down GPP for temperatures below 0°C will reduce the positive bias in winter, spring, and fall and improve simulated phenology. The negative bias in summer and poor overall performance resulted from mismatches between simulated and observed light use efficiency (LUE). Improving simulated GPP requires better leaf-to-canopy scaling and better values of model parameters that control the maximum potential GPP, such as εmax (LUE), Vcmax (unstressed Rubisco catalytic capacity) or Jmax (the maximum electron transport rate).