Accurate solar forecasting lately relies on advances in the field of artificial intelligence and availability databases with large amounts information meteorological variables. In this paper, we present methodology applied to introduce a large-scale, public, irradiance dataset, CyL-GHI, containing refined data from 37 stations found within Spanish region Castile León (Spanish: Castilla y León, ...