Introducing TZ-SAM: Solar Asset Mapper

A global solar asset dataset, powered by planetary-scale machine learning

Renewables

Data

Summary

TZ-SAM is an open-access, global, asset-level dataset of commercial- and utility-scale solar energy facilities

A collaboration between TransitionZero and Global Energy Monitor, the dataset contains ~26,000 square kilometres of solar farms across 200 countries, with a total estimated capacity of 1,055 GW, providing three times the coverage of smaller facilities than incumbents

Through earth observation data and machine learning, we can identify small- and medium-sized solar assets at scale, while also estimating generation capacity and the date of construction

The rapid and decentralised growth of solar makes forecasting difficult. TZ-SAM will support more effective forecasting and planning by filling gaps in traditional methods of solar asset reporting

We plan to regularly update TZ-SAM, and will integrate it with Model Builder - our world-first, no-code user interface for modelling both grid capacity expansion and unit dispatch

Tracking solar growth is difficult

Solar power is the fastest-growing power generation technology in history. In 2023 alone, the world added around 380 GW of solar capacity. The International Energy Agency’s (IEA) net-zero scenario projects a substantial increase in global solar capacity, from 1,200 GW in 2023 to an estimated 4,800 GW by 2030.

Accurate and current facility-level data are crucial for managing intermittency, planning the grid, and identifying trade-offs with biodiversity, conservation, and land protection priorities due to the land-use and land-cover changes required for ongoing solar deployment. Currently available datasets of solar generating capacity do not fully meet these needs. Widely used statistics from the International Renewable Energy Agency (IRENA) are country-level, often dated and incomplete, and based on several different methodological approaches. This is problematic not only due to the rapid growth rates, but also the smaller scale of solar facilities.

Moreover, traditional analysts struggle to keep up with the rapid pace and scale of this expansion. For instance, BloombergNEF, a highly regarded forecaster, released its updated solar installation forecast for 2023, projecting 413 GW of new capacity. This figure significantly revises a January 2022 forecast by the same analysts, which projected only 236 GW, falling 42% short of the current expectations.

Earth observation and machine learning present a potentially accurate and scalable solution to bridge the gap between traditional approaches to data collection, such as those used by IRENA, to effectively address the operating and planning challenges facing the continued deployment of solar energy.

Our solution: TZ-SAM

We have developed algorithms using earth observation and machine learning to accurately identify the capacity, land area, and age of every large solar facility worldwide. Our Solar Asset Mapper, or TZ-SAM, is a global dataset of commercial- and utility-scale solar facilities, derived from satellite data and crafted through a combination of machine learning and human annotation.

Our methodology for identifying and analysing global solar farms builds on Kruitwagen, L., Story, K.T., Friedrich, J. et al. and is structured into three main phases. First, we construct a training set using known solar farms paired with satellite imagery, and then train a deep segmentation model to accurately identify the location and footprint of solar farms from these images. Once trained, this model is deployed across the Earth's land surface to create candidate solar farm polygons, which are manually refined to remove false positives.

The second phase, construction date estimation, involves applying our trained models to historical satellite imagery to determine the earliest visible date of each plant. The final phase focuses on capacity estimation. We build a training set of solar farm polygons with known capacities and develop a model that estimates capacity based on the farm's shape and geographic location. This model is then applied to our validated solar farm detections to estimate their capacities. View our full methodology documentation on Zenodo.

Our Q1 2025 dataset contains the location and shape of
102,862 assets, along with estimated capacities. We estimate the construction date for over 80% of these assets. The dataset contains around 26,353 square kilometres of solar farms across 200 countries, with a total estimated capacity of 1,055 GW. Analysis of the results found false positives to be 1%. See our methodology documentation for more information on the caveats and limitations.

One of the distinguishing features of TZ-SAM is its coverage of a large number of small- and medium-sized assets. While it is feasible for traditional data providers to manually track the relatively small number of extremely large plants that contribute most to global capacity, this task becomes very difficult when dealing with the much larger number of smaller plants.

Top-down vs. bottom-up analysis

Globally comprehensive bottom-up datasets like TZ-SAM enable comparison and quality analysis against top-down country-level datasets such as IRENA's. Asset-level methods often show a bias toward larger facilities due to technical or practical reasons. Approximately 10% of the capacity², consisting of small commercial, industrial, and residential facilities, is typically underrepresented in bottom-up datasets. Top-down inventories, on the other hand, are derived from trade and original equipment manufacturer data for PV panels, and from aggregated permitting or regulatory data. These inventories capture existing facilities and often report greater overall capacity than asset-level datasets. Additionally, top-down methods might more accurately reflect improvements in panel efficiency over time, an aspect that may be overlooked in asset-level inventories.

Supporting more effective energy forecasting and planning

The rapid and decentralised growth of solar energy creates significant forecasting and planning challenges. TZ-SAM effectively addresses these issues through the following applications:

Solar analysts will be able to use TZ-SAM to refine their top-down forecasts of capacity.
Technical managers, planning engineers and trading analysts will be able produce more accurate model results for operational and capital expenditure decisions.

TZ-SAM is on our product roadmap for integration into Scenario Builder our software platform that offers users an end-to-end service for capacity expansion and unit dispatch modelling.

TransitionZero plans to release further updates to the TZ-SAM data set in 2024. Use the download link below to access the data and to receive updates on the next release.

TZ-SAM data package includes

Solar farm locations and data dictionary in an Excel file format (3.9 MB)
Solar farm locations in a CSV file format (6.4 MB)
Solar farm polygons in a GeoPackage file format (31.5 MB)
Raw polygons of solar assets that make up a solar farm in a GeoPackage file format (440.6 MB)
A table for mapping raw polygons to their corresponding solar farms in a CSV file format (14.5 MB)
A summary PDF of documentation including data and files dictionary (1.8 MB)

Access TZ-SAM