One high-quality dataset is enough: Rethinking common data practices in PV projects

By Marcel Suri, CEO, Solargis
Facebook
Twitter
LinkedIn
Reddit
Email
A map from Solargis.
The comparison of model outputs with ground-measured data from reference stations ensures the accuracy of solar models and reduces uncertainty across all climates. Image: Solargis.

When selecting solar resource data for PV projects, many in the solar industry still rely on outdated or questionable practices. One especially concerning practice is the use of multiple datasets—or worse, mixing them—to artificially enhance financial attractiveness for investors, banks and stakeholders.

PV developers are often offered several datasets in order to pick the one that best justifies their business case. Others go a step further, combining values from diverse sources into artificial constructions. These approaches may yield comforting numbers in the project financing stage, but they lack scientific rigor and integrity, and open the door to frustrating surprises in the future.

This article requires Premium SubscriptionBasic (FREE) Subscription

Try Premium for just $1

  • Full premium access for the first month at only $1
  • Converts to an annual rate after 30 days unless cancelled
  • Cancel anytime during the trial period

Premium Benefits

  • Expert industry analysis and interviews
  • Digital access to PV Tech Power journal
  • Exclusive event discounts

Or get the full Premium subscription right away

Or continue reading this article for free

As a scientist, I want to address this recurring issue in our industry. The core argument I want to make is simple: one high-quality, physics-based dataset that is validated, consistent and traceable will always outperform even the most elaborate mosaic of empirical assumptions and patchwork datasets. Let me explain why.

Physics is universal

Solar radiation is governed by physical laws that apply equally in Texas, Indonesia, Patagonia and South Africa. The satellite observations and the global weather models provide input data for solar radiation models in the same way, globally.

When the modelling that considers the state of the atmosphere, aerosols, cloud cover and terrain features respects these physical principles, there’s no need to ’optimise’ such datasets. A globally consistent, physics-based solar dataset ensures comparable results across different regions.

To improve the accuracy of solar model outputs at a specific site, local ground measurements can be applied in a process known as site adaptation. This scientifically rigorous method fine-tunes the original time series data to better reflect the site’s unique geographical conditions, without changing the core structure of the global model.

Mixing datasets breeds inconsistency

Datasets built on different assumptions or methodologies often do not align. Attempting to mix GHI (Global Horizontal Irradiation) values from one source with DNI (Direct Normal Irradiation) from another and DIF (Diffuse Horizontal Irradiance) from a third breaks the fundamental physical relationship. It’s akin to assembling a car using unrelated parts; each component might work independently, but fail to perform together.

In a well-calibrated solar model, these three irradiance components form a tightly coupled system. If one changes, the others must adjust accordingly. Violating this balance compromises simulation accuracy and integrity of the outputs.

Objectivity, transparency and repeatability cannot be compromised

Using a single, validated dataset ensures that all stakeholders—developers, financiers, technical advisors and operators—are working from the same foundation. The model’s assumptions can be openly inspected and validated against ground measurements, and the accuracy statistically evaluated, via deviation metrics, calculated from the time series.

This transparency supports confidence in the long-term performance expectations and financial plans.

Contrast this with ‘patchwork datasets’ subjectively built by combining monthly averages and then backfitting them into synthetic hourly profiles. Such methods may have made sense 20 years ago, when data was sparse and less reliable, but they are obsolete in today’s data-rich environment.

Avoiding ‘black magic’

Subjective tweaking of data—whether by adjusting coefficients, mixing sources or retrofitting synthetic Typical Meteorological Year (TMY) datasets—results in black-box manipulation. These shortcuts might produce appealing results in Excel, but they lack scientific rigor, transparency and reproducibility.

Worse, they foster a false sense of confidence that leads to costly underperformance and disputes. Data that has been manipulated cannot be independently validated against ground measurements, shifting decision-making from evidence-based reasoning into the realm of belief.

A physics-based dataset, by contrast, avoids subjective manipulation. It relies on transparent, verifiable models that are continuously refined and calibrated using the latest satellite observations, global weather data and quality-controlled ground measurements. Such a dataset behaves consistently under diverse conditions and incorporates safeguards to detect and flag unusual events—such as extreme weather, aerosols from wildfires or volcanic ash from large eruptions—ensuring anomalies do not go unnoticed.

The importance of model harmony

Physics-based modeling is not just about good inputs; it’s about system-wide coherence. For example, adjusting the aerosol parameters in a clear-sky model affects not just solar radiation values, but downstream events such as heating of a PV module and inverter loading.

It’s like adjusting one gear in a finely tuned machine; the rest of the gear must be recalibrated too for it to run smoothly. These interdependencies require that all modeling layers—from satellite calibration to radiative transfers and cloud dynamics to electrical modelling—speak the same language of physics.

This is why using a modular but harmonised modeling platform is so crucial. It enables small improvements—like better calibration constants or updated aerosol data—to propagate through the system in a controlled, physics-consistent manner, respecting the Earth’s geographical diversity.

Stable accuracy in time and throughout all sites

The argument for ‘picking the best dataset’ for a region implies that no single model can perform consistently everywhere. This is fundamentally untrue, if the model is built right. A high-quality physics-based system will show stable accuracy in regions as diverse as Alberta, Rajasthan and Bavaria. Minor regional deviations may occur, but they will be within quantifiable uncertainty margins. There are no wild swings that would justify swapping datasets.

​​Validation statistics calculated for individual locations are often mistakenly interpreted as model uncertainty. In reality, the performance and uncertainty of a solar model for a specific region can only be accurately assessed through validation at multiple representative sites. Reliable uncertainty estimates can only be provided by experts who have comprehensive knowledge and control over both the solar models and the ground measurements used in validation.

In addition to geographic consistency, it is equally important that solar resource datasets remain stable and consistent over long periods of time. This stability allows for meaningful analysis of year-to-year variability and long-term trends, spanning more than 30 years in some regions.

The solar models integrate data from multiple satellite missions, atmospheric and meteorological models, including a high-resolution digital terrain model. All data streams are safeguarded by rigorous quality monitoring and harmonization procedures, ensuring uninterrupted real-time data supply.

Moreover, real-world validation is straightforward. Developers can compare any number of years of satellite-based model irradiance with data from on-site pyranometers, compute RMSE and bias metrics and objectively choose the superior model. There’s no need for guesswork or data manipulation, just science.

Ultimately, an important advantage of selecting a single, consistent and validated data source for long-term financial evaluation is that the same data stream can be used for future performance monitoring and short-term forecasting.

What you should demand from your data provider:

  • Physics-based modeling: Models built on fundamental physical principles, not heuristics or legacy assumptions.
  • Transparent validation: Comprehensive benchmarking against high-quality on-site measurements.
  • High resolution: One to 15-minute time series data better capture short-term variability and enable realistic PV system modeling.
  • Long-term coverage: Extensive archives of historical data, spanning the maximum possible timeframe, to support reliable P50/P90 values and trend analysis.
  • Traceability: Every data point, model assumption and adjustment is clearly explained, independently verifiable and reproducible.

What you should avoid:

  • Mixing datasets from different providers.
  • Synthetic generation of hourly TMY from monthly averages.
  • Arbitrary regional preferences unsupported by physical validation.
  • Manual tweaking of data without scientific justification.

The future of solar energy lies not in approximations, averages or subjective adjustments, but in high fidelity to real-world physics. A single, well-calibrated dataset built on sound physical principles is a strategic advantage for any PV developer. As PV projects scale in size, complexity and financial scrutiny, the industry must retire from using the patchwork approach to resource modeling and prioritise physics.

2 December 2025
Málaga, Spain
Understanding PV module supply to the European market in 2026. PV ModuleTech Europe 2025 is a two-day conference that tackles these challenges directly, with an agenda that addresses all aspects of module supplier selection; product availability, technology offerings, traceability of supply-chain, factory auditing, module testing and reliability, and company bankability.
10 March 2026
Frankfurt, Germany
The conference will gather the key stakeholders from PV manufacturing, equipment/materials, policy-making and strategy, capital equipment investment and all interested downstream channels and third-party entities. The goal is simple: to map out PV manufacturing out to 2030 and beyond.

Read Next

October 7, 2025
Solar PV will account for almost 80% of the 4.6TW of new renewable power expected to be added by 2030, according to the International Energy Agency (IEA).
October 6, 2025
German solar inverter manufacturer SMA Solar will cut 350 jobs in 2026 as it adapts to the “weak” residential PV market.
October 6, 2025
An expert panel has identified a series of grid failures that led to April's unprecedented power outage in Spain and Portugal, ruling out renewables as the leading cause.
October 3, 2025
EDF Renewables and Enlight Renewable Energy have advanced solar-plus-storage projects in New Mexico and Arizona.
October 2, 2025
Spanish waste management company Trabede and energy firm Greening Group will build a solar module recycling plant in Granada, Andalusia, Spain.
October 2, 2025
The European solar sector will lose around 5% of its jobs in 2025, the first contraction in employment for the sector in nearly a decade.

Subscribe to Newsletter

Upcoming Events

Solar Media Events
October 7, 2025
Manila, Philippines
Solar Media Events
October 7, 2025
San Francisco Bay Area, USA
Solar Media Events
October 21, 2025
New York, USA
Solar Media Events
November 25, 2025
Warsaw, Poland