Xarray-SQL: Query Xarray Datasets with SQL

A lot of scientific data can be read with Xarray, a Python dataframe library for working with n-dimensional arrays up to the petabyte scale. In Python, Xarray has become a sort of standard intermediate representation for n-dimensional array formats such as geotiffs, COGs, NetCDFs, Grib, and Zarr. This new library, xarray-sql, builds upon this foundation and provides a way for data practitioners to use the Structured Query Language (SQL) to work with gridded raster data in Xarray. It even allows scientists and analysts to join tabular data (like CSV, Excel, and Parquet) with raster data.

Data and Resources

This dataset has no data

Additional Info

Field Value
Last Updated April 6, 2026, 18:51 (UTC)
Created April 6, 2026, 18:51 (UTC)
appAccessMethod Python package published on PyPI
appAudience Data Scientists and Programmers
appPrereqs Needs basic familiarity with Python, SQL, and Xarray.
appSummary <p>Xarray-SQL lets one treat Xarray Datasets as if they were SQL tables. This allows data practitioners to join gridded rasters and traditional tabular datasets together.</p><p><br></p><pre class="ql-syntax" spellcheck="false">pip install xarray-sql </pre><p><br></p><pre class="ql-syntax" spellcheck="false">import xarray as xr import xarray_sql as xql ds = xr.tutorial.open_dataset('air_temperature') # The same as a dask-sql Context; i.e. an Apache DataFusion Context. ctx = xql.XarrayContext() ctx.from_dataset('air', ds, chunks=dict(time=24)) # the dataset needs to be chunked! # data is only materialized when we make a query. result = ctx.sql(''' SELECT "lat", "lon", AVG("air") as air_avg FROM "air" GROUP BY "lat", "lon" ''') # DataFrame() # +------+-------+--------------------+ # | lat | lon | air_avg | # +------+-------+--------------------+ # | 75.0 | 205.0 | 259.88662671232834 | # | 75.0 | 207.5 | 259.48268150684896 | # | 75.0 | 230.0 | 258.9192123287667 | # | 75.0 | 275.0 | 257.07574315068456 | # | 75.0 | 322.5 | 250.11792123287654 | # | 75.0 | 325.0 | 250.81590068493134 | # | 72.5 | 205.0 | 262.74933904109537 | # | 72.5 | 207.5 | 262.5384315068488 | # | 72.5 | 230.0 | 260.82879452054743 | # | 72.5 | 275.0 | 257.3063321917804 | # +------+-------+--------------------+ # Data truncated. </pre><p><br></p>
appUrls ["https://colab.research.google.com/drive/1JAzzsmOvf5LsOz2EOFzKL6M7MtQp9s20"]
appVideos ["https://www.youtube.com/watch?v=AtB_6c-GcJE"]
creationMethod <p>Open source software that is community maintained.</p>
creatorEmail al@merose.com
creatorName Alexander Merose
creatorWebsite https://alex.merose.com
dataAuthType public
dataType Software
docsURL https://alxmrs.github.io/xarray-sql/
issueDate 2023-09-30
lang en
lastUpdateDate 2026-03-29
license other
ndp_creator_md5 6bac2b7876195a6842b7db2d3585796b
otherLicense Apache 2.0
pocEmail al@merose.com
pocName Alexander Merose
pocWebsite https://alex.merose.com
publisherEmail
publisherName PyPI
publisherWebsite https://pypi.org/project/xarray_sql/
purpose <p>Instead of manually planning out how one might make sense of multiple raster datasets along with tabular datasets, diving deep into the physical structures of disparate data representations, SQL provides a high level logical view of all data. If you can write a declaration of how data can be related to one another in SQL, then you almost certainly will be able to perform the query. Unfortunately, this property is not possible in the typical way we work with data in the Python ecosystem, where one has to "manually" manipulate dataframes according to their physical layout ("hand written" query plans). In Xarray-SQL, we let users of data wield gridded rasters but think of them as tables, which we argue is a more accessible way to work with data.</p>
status submitted
theme []
updateFreq Monthly
uploadType application