DESI Research and Beyond

Galaxy evolution, quasar physics, and ML-driven spectral analysis on the largest spectroscopic survey to date

Research Focus

Our research uses large public spectroscopic survey data, primarily DESI Data Release 1, the largest spectroscopic survey to date. We investigate fundamental questions about galaxy evolution and quasar physics. We build Analysis-Ready Datasets (ARDs) that transform raw survey data into enriched, science-ready products, then apply those ARDs to targeted research questions.

Active Research Areas

Environmental Quenching in Cosmic Voids

Cosmic voids are vast underdense regions, the "bubbles" between filaments of the cosmic web. Galaxies in voids experience minimal environmental interactions, which makes them ideal laboratories for studying intrinsic evolution. We compare void galaxies to wall galaxies to disentangle "nature" (mass-driven) from "nurture" (environment-driven) quenching mechanisms.

AGN Feedback and Outflow Energetics

Quasar-driven outflows may regulate galaxy growth through AGN feedback. We use semi-automated spectral fitting and Cloudy photoionization modeling to measure outflow properties at scale, producing distances, mass outflow rates, and kinetic luminosities. The goal is the first comprehensive catalog of quasar outflow energetics.

ML-Driven Anomaly Detection

With 1.6 million quasar spectra, systematic outlier detection reveals rare objects that manual inspection would miss. We use autoencoder architectures to identify statistical anomalies that may represent unusual accretion physics, rare evolutionary phases, or potentially new source types.

Value-Added Catalogs

Nine DESI DR1 Value-Added Catalogs integrated in our ARD:

Category VAC Purpose
Galaxy FastSpecFit Stellar continuum + emission lines
Galaxy PROVABGS Bayesian SED fitting with posteriors
Galaxy DESIVAST Cosmic void classifications (4 algorithms)
Galaxy Gfinder Halo-based group catalog
QSO AGN/QSO Spectral and IR classification
QSO CIV Absorber Intervening CIV systems
QSO MgII Absorber Intervening MgII systems
QSO QMassIron Black hole masses
QSO Stellar Mass/EmLine CIGALE masses and emission line properties

Methodology

We follow a three-layer enrichment model:

  1. 1
    Foundation Layer Unified catalog with cross-match linkage, environmental classifications, quality flags
  2. 2
    Physics Layer Derived physical quantities: Lick indices, pPXF kinematics, SED posteriors, outflow energetics
  3. 3
    AI / Embeddings Layer Neural spectral embeddings, similarity metrics, anomaly scores

PostgreSQL serves as the materialization engine where VAC joins and derived computations occur. Final ARD products are exported to Parquet for distribution and analysis. The pipeline currently manages approximately 32 GB of catalog data in PostgreSQL and 108 GB of spectral tiles in Parquet.

Data at scale

0

Galaxy rows

0

Void classifications

0+

Quasar spectra

0

DESI DR1 VACs

Upcoming Work

As DESI research matures, two newer initiatives are taking shape:

DESI-LSST Transient Anomalies

A Fink community alert broker science module for DESI-contextualized anomaly detection on the Rubin/LSST alert stream.

COSMOS-Web Anomalies

Systematic anomaly detection on COSMOS-Web DR1 imaging, exploiting tension between independent measurements to surface candidates for follow-up.