One high-quality dataset is enough: Rethinking common data practices in PV projects

By Marcel Suri, CEO, Solargis
Facebook
Twitter
LinkedIn
Reddit
Email
A map from Solargis.
The comparison of model outputs with ground-measured data from reference stations ensures the accuracy of solar models and reduces uncertainty across all climates. Image: Solargis.

When selecting solar resource data for PV projects, many in the solar industry still rely on outdated or questionable practices. One especially concerning practice is the use of multiple datasets—or worse, mixing them—to artificially enhance financial attractiveness for investors, banks and stakeholders.

PV developers are often offered several datasets in order to pick the one that best justifies their business case. Others go a step further, combining values from diverse sources into artificial constructions. These approaches may yield comforting numbers in the project financing stage, but they lack scientific rigor and integrity, and open the door to frustrating surprises in the future.

This article requires Premium SubscriptionBasic (FREE) Subscription

Unlock unlimited access for 12 whole months of distinctive global analysis

Photovoltaics International is now included.

Not ready to commit yet?
  • Regular insight and analysis of the industry’s biggest developments
  • In-depth interviews with the industry’s leading figures
  • Unlimited digital access to the PV Tech Power journal catalogue
  • Unlimited digital access to the Photovoltaics International journal catalogue
  • Access to more than 1,000 technical papers
  • Discounts on Solar Media’s portfolio of events, in-person and virtual

Or continue reading this article for free

As a scientist, I want to address this recurring issue in our industry. The core argument I want to make is simple: one high-quality, physics-based dataset that is validated, consistent and traceable will always outperform even the most elaborate mosaic of empirical assumptions and patchwork datasets. Let me explain why.

Physics is universal

Solar radiation is governed by physical laws that apply equally in Texas, Indonesia, Patagonia and South Africa. The satellite observations and the global weather models provide input data for solar radiation models in the same way, globally.

When the modelling that considers the state of the atmosphere, aerosols, cloud cover and terrain features respects these physical principles, there’s no need to ’optimise’ such datasets. A globally consistent, physics-based solar dataset ensures comparable results across different regions.

To improve the accuracy of solar model outputs at a specific site, local ground measurements can be applied in a process known as site adaptation. This scientifically rigorous method fine-tunes the original time series data to better reflect the site’s unique geographical conditions, without changing the core structure of the global model.

Mixing datasets breeds inconsistency

Datasets built on different assumptions or methodologies often do not align. Attempting to mix GHI (Global Horizontal Irradiation) values from one source with DNI (Direct Normal Irradiation) from another and DIF (Diffuse Horizontal Irradiance) from a third breaks the fundamental physical relationship. It’s akin to assembling a car using unrelated parts; each component might work independently, but fail to perform together.

In a well-calibrated solar model, these three irradiance components form a tightly coupled system. If one changes, the others must adjust accordingly. Violating this balance compromises simulation accuracy and integrity of the outputs.

Objectivity, transparency and repeatability cannot be compromised

Using a single, validated dataset ensures that all stakeholders—developers, financiers, technical advisors and operators—are working from the same foundation. The model’s assumptions can be openly inspected and validated against ground measurements, and the accuracy statistically evaluated, via deviation metrics, calculated from the time series.

This transparency supports confidence in the long-term performance expectations and financial plans.

Contrast this with ‘patchwork datasets’ subjectively built by combining monthly averages and then backfitting them into synthetic hourly profiles. Such methods may have made sense 20 years ago, when data was sparse and less reliable, but they are obsolete in today’s data-rich environment.

Avoiding ‘black magic’

Subjective tweaking of data—whether by adjusting coefficients, mixing sources or retrofitting synthetic Typical Meteorological Year (TMY) datasets—results in black-box manipulation. These shortcuts might produce appealing results in Excel, but they lack scientific rigor, transparency and reproducibility.

Worse, they foster a false sense of confidence that leads to costly underperformance and disputes. Data that has been manipulated cannot be independently validated against ground measurements, shifting decision-making from evidence-based reasoning into the realm of belief.

A physics-based dataset, by contrast, avoids subjective manipulation. It relies on transparent, verifiable models that are continuously refined and calibrated using the latest satellite observations, global weather data and quality-controlled ground measurements. Such a dataset behaves consistently under diverse conditions and incorporates safeguards to detect and flag unusual events—such as extreme weather, aerosols from wildfires or volcanic ash from large eruptions—ensuring anomalies do not go unnoticed.

The importance of model harmony

Physics-based modeling is not just about good inputs; it’s about system-wide coherence. For example, adjusting the aerosol parameters in a clear-sky model affects not just solar radiation values, but downstream events such as heating of a PV module and inverter loading.

It’s like adjusting one gear in a finely tuned machine; the rest of the gear must be recalibrated too for it to run smoothly. These interdependencies require that all modeling layers—from satellite calibration to radiative transfers and cloud dynamics to electrical modelling—speak the same language of physics.

This is why using a modular but harmonised modeling platform is so crucial. It enables small improvements—like better calibration constants or updated aerosol data—to propagate through the system in a controlled, physics-consistent manner, respecting the Earth’s geographical diversity.

Stable accuracy in time and throughout all sites

The argument for ‘picking the best dataset’ for a region implies that no single model can perform consistently everywhere. This is fundamentally untrue, if the model is built right. A high-quality physics-based system will show stable accuracy in regions as diverse as Alberta, Rajasthan and Bavaria. Minor regional deviations may occur, but they will be within quantifiable uncertainty margins. There are no wild swings that would justify swapping datasets.

​​Validation statistics calculated for individual locations are often mistakenly interpreted as model uncertainty. In reality, the performance and uncertainty of a solar model for a specific region can only be accurately assessed through validation at multiple representative sites. Reliable uncertainty estimates can only be provided by experts who have comprehensive knowledge and control over both the solar models and the ground measurements used in validation.

In addition to geographic consistency, it is equally important that solar resource datasets remain stable and consistent over long periods of time. This stability allows for meaningful analysis of year-to-year variability and long-term trends, spanning more than 30 years in some regions.

The solar models integrate data from multiple satellite missions, atmospheric and meteorological models, including a high-resolution digital terrain model. All data streams are safeguarded by rigorous quality monitoring and harmonization procedures, ensuring uninterrupted real-time data supply.

Moreover, real-world validation is straightforward. Developers can compare any number of years of satellite-based model irradiance with data from on-site pyranometers, compute RMSE and bias metrics and objectively choose the superior model. There’s no need for guesswork or data manipulation, just science.

Ultimately, an important advantage of selecting a single, consistent and validated data source for long-term financial evaluation is that the same data stream can be used for future performance monitoring and short-term forecasting.

What you should demand from your data provider:

  • Physics-based modeling: Models built on fundamental physical principles, not heuristics or legacy assumptions.
  • Transparent validation: Comprehensive benchmarking against high-quality on-site measurements.
  • High resolution: One to 15-minute time series data better capture short-term variability and enable realistic PV system modeling.
  • Long-term coverage: Extensive archives of historical data, spanning the maximum possible timeframe, to support reliable P50/P90 values and trend analysis.
  • Traceability: Every data point, model assumption and adjustment is clearly explained, independently verifiable and reproducible.

What you should avoid:

  • Mixing datasets from different providers.
  • Synthetic generation of hourly TMY from monthly averages.
  • Arbitrary regional preferences unsupported by physical validation.
  • Manual tweaking of data without scientific justification.

The future of solar energy lies not in approximations, averages or subjective adjustments, but in high fidelity to real-world physics. A single, well-calibrated dataset built on sound physical principles is a strategic advantage for any PV developer. As PV projects scale in size, complexity and financial scrutiny, the industry must retire from using the patchwork approach to resource modeling and prioritise physics.

2 December 2025
Málaga, Spain
Understanding PV module supply to the European market in 2026. PV ModuleTech Europe 2025 is a two-day conference that tackles these challenges directly, with an agenda that addresses all aspects of module supplier selection; product availability, technology offerings, traceability of supply-chain, factory auditing, module testing and reliability, and company bankability.
10 March 2026
Frankfurt, Germany
The conference will gather the key stakeholders from PV manufacturing, equipment/materials, policy-making and strategy, capital equipment investment and all interested downstream channels and third-party entities. The goal is simple: to map out PV manufacturing out to 2030 and beyond.

Read Next

August 27, 2025
Independent power producer RP Global is building a 50MWp solar project in Harbke, Germany.
August 27, 2025
Norwegian energy company Statkraft has sold its Netherlands solar portfolio of 120MWp to Dutch renewable energy supplier Greenchoice.
August 26, 2025
Investment in utility-scale solar fell by 19% in the first half of 2025, as global investment in all renewable energy projects grew by 10%.
August 22, 2025
Polish independent power producer (IPP) R.Power plans to sell a 440MW ready-to-build PV portfolio in its home country.
August 21, 2025
Canadian Solar shipped 7.9GW of modules in the second quarter of this year, a 14% quarter-on-quarter increase.
August 21, 2025
Qair has secured a US$5.7 million senior debt facility to finance the development of a 5.8MW floating solar (FPV) project in Seychelles.

Subscribe to Newsletter

Upcoming Events

Solar Media Events
September 16, 2025
Athens, Greece
Solar Media Events
September 30, 2025
Seattle, USA
Solar Media Events
October 1, 2025
London, UK
Solar Media Events
October 2, 2025
London,UK
Solar Media Events
October 7, 2025
Manila, Philippines