
When selecting solar resource data for PV projects, many in the solar industry still rely on outdated or questionable practices. One especially concerning practice is the use of multiple datasets—or worse, mixing them—to artificially enhance financial attractiveness for investors, banks and stakeholders.
PV developers are often offered several datasets in order to pick the one that best justifies their business case. Others go a step further, combining values from diverse sources into artificial constructions. These approaches may yield comforting numbers in the project financing stage, but they lack scientific rigor and integrity, and open the door to frustrating surprises in the future.
Unlock unlimited access for 12 whole months of distinctive global analysis
Photovoltaics International is now included.
- Regular insight and analysis of the industry’s biggest developments
- In-depth interviews with the industry’s leading figures
- Unlimited digital access to the PV Tech Power journal catalogue
- Unlimited digital access to the Photovoltaics International journal catalogue
- Access to more than 1,000 technical papers
- Discounts on Solar Media’s portfolio of events, in-person and virtual
As a scientist, I want to address this recurring issue in our industry. The core argument I want to make is simple: one high-quality, physics-based dataset that is validated, consistent and traceable will always outperform even the most elaborate mosaic of empirical assumptions and patchwork datasets. Let me explain why.
Physics is universal
Solar radiation is governed by physical laws that apply equally in Texas, Indonesia, Patagonia and South Africa. The satellite observations and the global weather models provide input data for solar radiation models in the same way, globally.
When the modelling that considers the state of the atmosphere, aerosols, cloud cover and terrain features respects these physical principles, there’s no need to ’optimise’ such datasets. A globally consistent, physics-based solar dataset ensures comparable results across different regions.
To improve the accuracy of solar model outputs at a specific site, local ground measurements can be applied in a process known as site adaptation. This scientifically rigorous method fine-tunes the original time series data to better reflect the site’s unique geographical conditions, without changing the core structure of the global model.
Mixing datasets breeds inconsistency
Datasets built on different assumptions or methodologies often do not align. Attempting to mix GHI (Global Horizontal Irradiation) values from one source with DNI (Direct Normal Irradiation) from another and DIF (Diffuse Horizontal Irradiance) from a third breaks the fundamental physical relationship. It’s akin to assembling a car using unrelated parts; each component might work independently, but fail to perform together.
In a well-calibrated solar model, these three irradiance components form a tightly coupled system. If one changes, the others must adjust accordingly. Violating this balance compromises simulation accuracy and integrity of the outputs.
Objectivity, transparency and repeatability cannot be compromised
Using a single, validated dataset ensures that all stakeholders—developers, financiers, technical advisors and operators—are working from the same foundation. The model’s assumptions can be openly inspected and validated against ground measurements, and the accuracy statistically evaluated, via deviation metrics, calculated from the time series.
This transparency supports confidence in the long-term performance expectations and financial plans.
Contrast this with ‘patchwork datasets’ subjectively built by combining monthly averages and then backfitting them into synthetic hourly profiles. Such methods may have made sense 20 years ago, when data was sparse and less reliable, but they are obsolete in today’s data-rich environment.
Avoiding ‘black magic’
Subjective tweaking of data—whether by adjusting coefficients, mixing sources or retrofitting synthetic Typical Meteorological Year (TMY) datasets—results in black-box manipulation. These shortcuts might produce appealing results in Excel, but they lack scientific rigor, transparency and reproducibility.
Worse, they foster a false sense of confidence that leads to costly underperformance and disputes. Data that has been manipulated cannot be independently validated against ground measurements, shifting decision-making from evidence-based reasoning into the realm of belief.
A physics-based dataset, by contrast, avoids subjective manipulation. It relies on transparent, verifiable models that are continuously refined and calibrated using the latest satellite observations, global weather data and quality-controlled ground measurements. Such a dataset behaves consistently under diverse conditions and incorporates safeguards to detect and flag unusual events—such as extreme weather, aerosols from wildfires or volcanic ash from large eruptions—ensuring anomalies do not go unnoticed.
The importance of model harmony
Physics-based modeling is not just about good inputs; it’s about system-wide coherence. For example, adjusting the aerosol parameters in a clear-sky model affects not just solar radiation values, but downstream events such as heating of a PV module and inverter loading.
It’s like adjusting one gear in a finely tuned machine; the rest of the gear must be recalibrated too for it to run smoothly. These interdependencies require that all modeling layers—from satellite calibration to radiative transfers and cloud dynamics to electrical modelling—speak the same language of physics.
This is why using a modular but harmonised modeling platform is so crucial. It enables small improvements—like better calibration constants or updated aerosol data—to propagate through the system in a controlled, physics-consistent manner, respecting the Earth’s geographical diversity.
Stable accuracy in time and throughout all sites
The argument for ‘picking the best dataset’ for a region implies that no single model can perform consistently everywhere. This is fundamentally untrue, if the model is built right. A high-quality physics-based system will show stable accuracy in regions as diverse as Alberta, Rajasthan and Bavaria. Minor regional deviations may occur, but they will be within quantifiable uncertainty margins. There are no wild swings that would justify swapping datasets.
Validation statistics calculated for individual locations are often mistakenly interpreted as model uncertainty. In reality, the performance and uncertainty of a solar model for a specific region can only be accurately assessed through validation at multiple representative sites. Reliable uncertainty estimates can only be provided by experts who have comprehensive knowledge and control over both the solar models and the ground measurements used in validation.
In addition to geographic consistency, it is equally important that solar resource datasets remain stable and consistent over long periods of time. This stability allows for meaningful analysis of year-to-year variability and long-term trends, spanning more than 30 years in some regions.
The solar models integrate data from multiple satellite missions, atmospheric and meteorological models, including a high-resolution digital terrain model. All data streams are safeguarded by rigorous quality monitoring and harmonization procedures, ensuring uninterrupted real-time data supply.
Moreover, real-world validation is straightforward. Developers can compare any number of years of satellite-based model irradiance with data from on-site pyranometers, compute RMSE and bias metrics and objectively choose the superior model. There’s no need for guesswork or data manipulation, just science.
Ultimately, an important advantage of selecting a single, consistent and validated data source for long-term financial evaluation is that the same data stream can be used for future performance monitoring and short-term forecasting.
What you should demand from your data provider:
- Physics-based modeling: Models built on fundamental physical principles, not heuristics or legacy assumptions.
- Transparent validation: Comprehensive benchmarking against high-quality on-site measurements.
- High resolution: One to 15-minute time series data better capture short-term variability and enable realistic PV system modeling.
- Long-term coverage: Extensive archives of historical data, spanning the maximum possible timeframe, to support reliable P50/P90 values and trend analysis.
- Traceability: Every data point, model assumption and adjustment is clearly explained, independently verifiable and reproducible.
What you should avoid:
- Mixing datasets from different providers.
- Synthetic generation of hourly TMY from monthly averages.
- Arbitrary regional preferences unsupported by physical validation.
- Manual tweaking of data without scientific justification.
The future of solar energy lies not in approximations, averages or subjective adjustments, but in high fidelity to real-world physics. A single, well-calibrated dataset built on sound physical principles is a strategic advantage for any PV developer. As PV projects scale in size, complexity and financial scrutiny, the industry must retire from using the patchwork approach to resource modeling and prioritise physics.