Skip to main content

Data Repository Submission Quick Guides

Dryad — Submission Guide

Generalist repository for publishing research datasets (with DOIs and professional curation).

When to use Dryad

  • No domain-specific repository (e.g., GEO/SRA/dbGaP) fits your data.
  • You’re publishing raw and/or processed data underlying a manuscript (or meeting funder/journal requirements).
  • You want FAIR sharing (findable, accessible, interoperable, reusable) with a persistent identifier and curation.

Step-by-Step Submission

1) Prepare your data

  • Organize logically: folders, descriptive file names, consistent headers.
  • Prefer open formats: CSV, TSV, TXT, XML, JSON (proprietary files accepted but less reusable).
  • README & documentation: describe purpose, file structure, variable/abbreviation definitions; include protocols or supplementary docs.
  • Restrictions: do not include PHI, confidential data, or sensitive species locations (unless permitted/redacted).
  • Language: provide all content in English.

2) Timing your submission

  • Submit before, during, or after manuscript submission; many journals integrate directly with Dryad.
  • Embargoes: generally limited; data are released at article publication (special cases: dissertations, press embargoes).
  • Private for Peer Review: generate a secure private link for editors/reviewers; release upon acceptance.

3) Login & authentication

  • ORCID iD required for submitters (can register during submission).
  • NC State SSO: NC State affiliates can link ORCID to their NC State credentials for simplified login. Learn more about ORCID at NC State

4) Describe your dataset (metadata)

  • Title: clear, descriptive dataset title.
  • Abstract: purpose, methods, scope; include key context for reuse.
  • Contributors: authors and contributors (add ORCID iDs where available).
  • Keywords: improve discoverability.
  • Funding & acknowledgments: include grants (e.g., P30ES025128).
  • Related works: link to associated publications, preprints, or software repositories.

5) Upload your files

  • Sources: upload from computer or via URL (Google Drive links are not supported).
  • Classify files: Data (raw/processed), Software (scripts/code/workflows), Supplemental Information (figures/tables/appendices).
  • Tabular checks: CSV/TSV/XLS/XLSX (<50 MB) are automatically validated; fix flagged issues before finalizing.
  • Large files: >10 GB per file are allowed but may increase handling/curation time.

6) Review & submit

  • Verify metadata, contributors, affiliations, and grant acknowledgments.
  • Data Publishing Charge (DPC): normally required by Dryad, but waived for NC State affiliates (see benefits below).
  • Submit; Dryad performs professional curation and assigns a DOI upon publication.

Checklist

RequirementNotes
ORCID iDRequired for submitters
MetadataTitle, abstract, keywords, contributors, funding
README / DocumentationDescribe file structure, variables, abbreviations, methods
File formatsPrefer open formats (CSV/TSV/TXT/XML/JSON)
RestrictionsNo PHI, confidential, or sensitive data (unless policy-compliant)
Peer-review privacyOptional “Private for Peer Review” link
Data feesWaived for NC State authors

NC State Benefits

  • No publishing fee: The Dryad Data Publishing Charge is covered for NC State affiliates.
  • Single sign-on: Use NC State credentials linked to your ORCID.
  • Library support: NC State Libraries can review metadata, file prep, and submissions.

NC State Library Dryad Guide

CHHE Notes

  • Cite CHHE P30 (P30ES025128) and any other relevant grants in metadata/acknowledgments.
  • If human-participant data are involved, confirm IRB and consider dbGaP (instead of or in addition to Dryad) before submitting.
  • Questions on formats, licensing, or compliance? Contact CHHE Data Support.

Useful Links

Last updated: September 2025

Gene Expression Omnibus (GEO) — Submission Guide

GEO is an NCBI repository for functional genomics data (microarray and sequencing-based assays). This guide summarizes the two submission routes and key requirements for raw and processed data, including single-cell specifics.

When to use GEO

  • Your project generated gene expression, epigenomics, or other functional genomics datasets.
  • Your journal/funder requires a domain repository. GEO works with SRA for raw HTS reads and hosts processed/metadata files.
  • You can keep the record private until publication and share reviewer access links.

Step-by-Step Submission

1) Choose the correct submission route

2) Prepare your files (by data type)

HTS — Raw reads (to SRA)
  • FASTQ files (paired reads as R1/R2; include index/barcode reads if applicable).
  • File naming: unique, consistent sample identifiers; indicate lane/read (e.g., sampleA_S1_L001_R1.fastq.gz).
  • Bulk vs. single-cell:
    • Bulk: submit demultiplexed per-sample FASTQs.
    • Single-cell (e.g., 10x Genomics, Drop-seq, inDrops): NCBI recommends keeping most data in the original multiplexed form when appropriate (retain index/cell barcodes). See HTS Raw data.
  • Optional alignments: BAM/SAM files may be uploaded to SRA if desired, but they are not considered processed data for GEO display.
  • Submit FASTQs via the SRA submission portal and record the assigned BioProject/SRA accessions to reference in your GEO metadata.
  • Large file transfer: use NCBI’s FTP or Aspera-based methods for efficient uploads.
HTS — Processed data (to GEO)
  • Expression/quantification tables (counts, TPM/FPKM). For single-cell, provide gene × cell matrices and related annotation files.
  • Peak calling outputs for ChIP-seq/ATAC-seq (BED, narrowPeak/broadPeak), plus optional coverage tracks (bigWig/bedGraph).
  • Methylation summaries (e.g., beta values, coverage tables).
  • Note: alignment files (BAM/SAM) are not accepted as processed data for GEO records. See HTS Processed data.
Single-cell specifics
  • Raw reads: retain multiplexing and index/cell barcodes when recommended (e.g., 10x/Drop-seq/inDrops); submit to SRA.
  • Processed outputs: include the standard 10x-style trio when applicable:
    • matrix.mtx (sparse counts matrix)
    • barcodes.tsv (cell barcodes)
    • features.tsv (genes/features)
    Provide any additional cell/feature annotations (cluster IDs, metadata) used in downstream analyses.
  • Document barcoding/UMI strategy, chemistry version, and demultiplexing in the README/methods.
Microarray & non-HTS
  • Raw array files (e.g., CEL for Affymetrix, IDAT for Illumina) plus processed signal matrices.
  • Platform annotations, sample-level metadata, and normalized result tables as required by GEO spreadsheets.

3) Complete metadata spreadsheets

  • Download and complete the required Series and Sample spreadsheets:
  • One spreadsheet per data type in your study (e.g., separate sheets for RNA-seq vs. ATAC-seq, if both are present).
  • Include organism, tissue/cell type, treatment/condition, library strategy, platform, experimental design, and file associations (which samples map to which raw/processed files).
  • If raw reads are already in SRA, include the corresponding BioProject/SRA accessions so GEO can link them.

4) Submit

  • HTS: Submit raw FASTQs to SRA; upload processed files and spreadsheets to GEO. Ensure spreadsheets reference the correct SRA accessions.
  • Microarray/non-HTS: Upload spreadsheets, raw array files (CEL/IDAT), processed matrices, and supporting docs to GEO.
  • Expect a GSE series accession (and GSM sample accessions). Initial processing is typically a few business days.

5) Privacy & peer review

  • Submissions can remain private until manuscript acceptance; generate reviewer access links for editors/referees.
  • Upon release, GEO will display processed data/metadata and link to SRA raw reads.

Checklist

RequirementNotes
Raw dataHTS: FASTQ to SRA (paired reads + index if applicable); optional BAM/SAM to SRA. Arrays: CEL/IDAT to GEO.
Processed dataCounts/TPM/FPKM matrices; peaks (BED/narrowPeak/broadPeak); bigWig/bedGraph coverage; methylation summaries; single-cell matrices and annotations.
Metadata spreadsheetsSeries/Sample spreadsheets (HTS or microarray templates); one per data type; include BioProject/SRA if available.
Single-cellMultiplexed FASTQs when recommended; include 10x-style matrix.mtx, barcodes.tsv, features.tsv and relevant annotations.
PrivacyKeep private until publication; reviewer links supported.
StandardsMINSEQE (sequencing) / MIAME (microarray).
TransferUse SRA portal for FASTQs; large files via FTP/Aspera. Upload GEO files through the GEO submission interface.

CHHE Notes

  • Grant citations: Include CHHE P30 (P30ES025128) and any other relevant NIH grants.
  • Human participants: GEO accepts de-identified data; controlled-access human data must go to dbGaP.
  • Reproducibility: Attach analysis scripts/workflows (e.g., Snakemake/Nextflow, R scripts) as supplemental files and reference them in the README.

Useful Links

Last updated: October 2025