Data Repository Submission Quick Guides
Dryad — Submission Guide
Generalist repository for publishing research datasets (with DOIs and professional curation).
When to use Dryad
- No domain-specific repository (e.g., GEO/SRA/dbGaP) fits your data.
- You’re publishing raw and/or processed data underlying a manuscript (or meeting funder/journal requirements).
- You want FAIR sharing (findable, accessible, interoperable, reusable) with a persistent identifier and curation.
Step-by-Step Submission
1) Prepare your data
- Organize logically: folders, descriptive file names, consistent headers.
- Prefer open formats: CSV, TSV, TXT, XML, JSON (proprietary files accepted but less reusable).
- README & documentation: describe purpose, file structure, variable/abbreviation definitions; include protocols or supplementary docs.
- Restrictions: do not include PHI, confidential data, or sensitive species locations (unless permitted/redacted).
- Language: provide all content in English.
2) Timing your submission
- Submit before, during, or after manuscript submission; many journals integrate directly with Dryad.
- Embargoes: generally limited; data are released at article publication (special cases: dissertations, press embargoes).
- Private for Peer Review: generate a secure private link for editors/reviewers; release upon acceptance.
3) Login & authentication
- ORCID iD required for submitters (can register during submission).
- NC State SSO: NC State affiliates can link ORCID to their NC State credentials for simplified login. Learn more about ORCID at NC State
4) Describe your dataset (metadata)
- Title: clear, descriptive dataset title.
- Abstract: purpose, methods, scope; include key context for reuse.
- Contributors: authors and contributors (add ORCID iDs where available).
- Keywords: improve discoverability.
- Funding & acknowledgments: include grants (e.g., P30ES025128).
- Related works: link to associated publications, preprints, or software repositories.
5) Upload your files
- Sources: upload from computer or via URL (Google Drive links are not supported).
- Classify files: Data (raw/processed), Software (scripts/code/workflows), Supplemental Information (figures/tables/appendices).
- Tabular checks: CSV/TSV/XLS/XLSX (<50 MB) are automatically validated; fix flagged issues before finalizing.
- Large files: >10 GB per file are allowed but may increase handling/curation time.
6) Review & submit
- Verify metadata, contributors, affiliations, and grant acknowledgments.
- Data Publishing Charge (DPC): normally required by Dryad, but waived for NC State affiliates (see benefits below).
- Submit; Dryad performs professional curation and assigns a DOI upon publication.
Checklist
Requirement | Notes |
---|---|
ORCID iD | Required for submitters |
Metadata | Title, abstract, keywords, contributors, funding |
README / Documentation | Describe file structure, variables, abbreviations, methods |
File formats | Prefer open formats (CSV/TSV/TXT/XML/JSON) |
Restrictions | No PHI, confidential, or sensitive data (unless policy-compliant) |
Peer-review privacy | Optional “Private for Peer Review” link |
Data fees | Waived for NC State authors |
NC State Benefits
- No publishing fee: The Dryad Data Publishing Charge is covered for NC State affiliates.
- Single sign-on: Use NC State credentials linked to your ORCID.
- Library support: NC State Libraries can review metadata, file prep, and submissions.
CHHE Notes
- Cite CHHE P30 (P30ES025128) and any other relevant grants in metadata/acknowledgments.
- If human-participant data are involved, confirm IRB and consider dbGaP (instead of or in addition to Dryad) before submitting.
- Questions on formats, licensing, or compliance? Contact CHHE Data Support.
Useful Links
Last updated: September 2025
Gene Expression Omnibus (GEO) — Submission Guide
GEO is an NCBI repository for functional genomics data (microarray and sequencing-based assays). This guide summarizes the two submission routes and key requirements for raw and processed data, including single-cell specifics.
When to use GEO
- Your project generated gene expression, epigenomics, or other functional genomics datasets.
- Your journal/funder requires a domain repository. GEO works with SRA for raw HTS reads and hosts processed/metadata files.
- You can keep the record private until publication and share reviewer access links.
Step-by-Step Submission
1) Choose the correct submission route
- High-throughput sequencing (HTS) (RNA-seq, miRNA-seq, ChIP-seq, ATAC-seq, RIP-seq, Hi-C, methyl-seq, bulk and single-cell): raw reads to SRA; processed data + metadata to GEO.
- Microarray & other non-HTS data: array raw files (e.g., CEL, IDAT), processed signal matrices, and metadata spreadsheets to GEO.
- Overview: GEO Submission page
2) Prepare your files (by data type)
HTS — Raw reads (to SRA)
- FASTQ files (paired reads as R1/R2; include index/barcode reads if applicable).
- File naming: unique, consistent sample identifiers; indicate lane/read (e.g.,
sampleA_S1_L001_R1.fastq.gz
). - Bulk vs. single-cell:
- Bulk: submit demultiplexed per-sample FASTQs.
- Single-cell (e.g., 10x Genomics, Drop-seq, inDrops): NCBI recommends keeping most data in the original multiplexed form when appropriate (retain index/cell barcodes). See HTS Raw data.
- Optional alignments: BAM/SAM files may be uploaded to SRA if desired, but they are not considered processed data for GEO display.
- Submit FASTQs via the SRA submission portal and record the assigned BioProject/SRA accessions to reference in your GEO metadata.
- Large file transfer: use NCBI’s FTP or Aspera-based methods for efficient uploads.
HTS — Processed data (to GEO)
- Expression/quantification tables (counts, TPM/FPKM). For single-cell, provide gene × cell matrices and related annotation files.
- Peak calling outputs for ChIP-seq/ATAC-seq (BED, narrowPeak/broadPeak), plus optional coverage tracks (bigWig/bedGraph).
- Methylation summaries (e.g., beta values, coverage tables).
- Note: alignment files (BAM/SAM) are not accepted as processed data for GEO records. See HTS Processed data.
Single-cell specifics
- Raw reads: retain multiplexing and index/cell barcodes when recommended (e.g., 10x/Drop-seq/inDrops); submit to SRA.
- Processed outputs: include the standard 10x-style trio when applicable:
matrix.mtx
(sparse counts matrix)barcodes.tsv
(cell barcodes)features.tsv
(genes/features)
- Document barcoding/UMI strategy, chemistry version, and demultiplexing in the README/methods.
Microarray & non-HTS
- Raw array files (e.g., CEL for Affymetrix, IDAT for Illumina) plus processed signal matrices.
- Platform annotations, sample-level metadata, and normalized result tables as required by GEO spreadsheets.
3) Complete metadata spreadsheets
- Download and complete the required Series and Sample spreadsheets:
- HTS: HTS metadata
- Microarray/non-HTS: Spreadsheet submission
- One spreadsheet per data type in your study (e.g., separate sheets for RNA-seq vs. ATAC-seq, if both are present).
- Include organism, tissue/cell type, treatment/condition, library strategy, platform, experimental design, and file associations (which samples map to which raw/processed files).
- If raw reads are already in SRA, include the corresponding BioProject/SRA accessions so GEO can link them.
4) Submit
- HTS: Submit raw FASTQs to SRA; upload processed files and spreadsheets to GEO. Ensure spreadsheets reference the correct SRA accessions.
- Microarray/non-HTS: Upload spreadsheets, raw array files (CEL/IDAT), processed matrices, and supporting docs to GEO.
- Expect a GSE series accession (and GSM sample accessions). Initial processing is typically a few business days.
5) Privacy & peer review
- Submissions can remain private until manuscript acceptance; generate reviewer access links for editors/referees.
- Upon release, GEO will display processed data/metadata and link to SRA raw reads.
Checklist
Requirement | Notes |
---|---|
Raw data | HTS: FASTQ to SRA (paired reads + index if applicable); optional BAM/SAM to SRA. Arrays: CEL/IDAT to GEO. |
Processed data | Counts/TPM/FPKM matrices; peaks (BED/narrowPeak/broadPeak); bigWig/bedGraph coverage; methylation summaries; single-cell matrices and annotations. |
Metadata spreadsheets | Series/Sample spreadsheets (HTS or microarray templates); one per data type; include BioProject/SRA if available. |
Single-cell | Multiplexed FASTQs when recommended; include 10x-style matrix.mtx , barcodes.tsv , features.tsv and relevant annotations. |
Privacy | Keep private until publication; reviewer links supported. |
Standards | MINSEQE (sequencing) / MIAME (microarray). |
Transfer | Use SRA portal for FASTQs; large files via FTP/Aspera. Upload GEO files through the GEO submission interface. |
CHHE Notes
- Grant citations: Include CHHE P30 (P30ES025128) and any other relevant NIH grants.
- Human participants: GEO accepts de-identified data; controlled-access human data must go to dbGaP.
- Reproducibility: Attach analysis scripts/workflows (e.g., Snakemake/Nextflow, R scripts) as supplemental files and reference them in the README.
Useful Links
- GEO Submission Overview
- HTS Submission Guide (bulk & single-cell)
- HTS Metadata Requirements
- Microarray & Non-HTS Spreadsheet Submission
- SRA Submission Portal (raw FASTQ)
Last updated: October 2025