LogoLogo Secondary

A human atlas of chromatin accessibility during development

To generate a human cell atlas of chromatin accessibility using tissues obtained during development, we devised an improved assay for single cell profiling of chromatin accessibility based on three-level combinatorial indexing (sci-ATAC-seq3). We applied this method to 53 fetal tissue samples representing 15 organs, altogether profiling on the order of one million single cells. We leveraged cell types defined by gene expression in the same organs to annotate these data, and then built a catalog of hundreds-of-thousands of candidate gene regulatory elements exhibiting cell type-specific accessibility. This resource allows, for example, the identification of lineage-specific transcription factors, prediction of cis-regulatory interactions based on co-accessibility or calculation of cell type-specific enrichments of complex trait heritability. Together with the companion human cell atlas of gene expression during development, these data comprise a rich resource for the exploration of human biology.




Tissue Data - Seurat Objects

Contain both gene by cell matrix used for cell type annotation (RNA assay) and peak by cell matrix (peaks assay) used for downstream analyses. Cell metadata includes:
nCount_RNA:number of reads in gene bodies +2kb upstream of TSS
nFeature_RNA:number of gene bodies + 2 kb upstream of TSS that contain at least one read
cell:The cell barcode id
sample_name:The name of the sample name
donor_id:The ID of the donor
tissue:tissue type
batch:experimental batch
total:total number of reads
total_deduplicated:total number of deduplicated reads
total_deduplicated_peaks:total number of deduplicated reads that fall into peaks
total_deduplicated_tss:total number of deduplicated reads that fall into a 2 kb window centered on TSS
frip:fraction of reads in peaks
[x,y,autosomal]_chrom_window_counts:number of 5 kb windows of the X, Y and autosomal chromosomes respectively containing reads
blacklist_fraction:fraction of unique reads falling in ENCODE blacklist regions, Louvain cluster ID (resolution = 0.3)
nFeature_peaks:number of reads in peaks (nCount_peaks)
nCount_peaks:number of peaks that contain at least one read
tissue_umap[1,2]:tissue UMAP coordinates based on the peaks assay
cell_type:assigned cell type
Day_of_pregnancy:estimated gestational age
Frit:fraction of reads in 2 kb window centered on TSS
RNA_snn_res.0.3:Louvain cluster ID at resolution 0.3

Sampled Data

To compare cell types across organs, up to 800 cells were randomly sampled per cell type per tissue (or in cases where less than 800 cells of a given cell type were represented in a given tissue, all cells were taken). Seurat object contains peak by cell matrix for 86,685 cells and 1,001,437 peaks (z-score filtered master list) in the peaks assay.

Cell Metadata

Cell metadata for all cells, including tissue of origin, donor ID, estimated gestational age, sex, experimental batch, total number of reads, total number of deduplicated reads, total number of deduplicated reads that fall into peaks, total number of deduplicated reads that fall into a 2 kb window centered on TSS, fraction of reads in peaks (frip), fraction of reads in 2 kb window centered on TSS (frit), number of peaks that contain at least one read (nFeature_peaks), Louvain cluster ID and UMAP coordinates of per tissue (not combined) UMAP visualizations and cell type annotation.

Masterlist of peaks/regions with motif occurrences

For each region within a merged set of 1.05 M peaks of accessibility the chromosomal location, peak width and motif occurrences for motifs in the JASPAR vertebrate motif database at a p-value threshold of 1e-7 are provided.

Specificity scores

Specificity scores calculated for each region/cell type pair using Jensen-Shannon divergence. A higher score indicates a peak that is more specific to a given cell type.

Motif enrichment across cell type

For each of the 579 motifs from the JASPAR vertebrate database, the enrichment in accessible sites in each of the main 54 cell types was determined using a linear regression model. Fold-change of the mean motif occurrence in sites of a given cell type relative to the rest of the dataset and matching Benjamini Hochberg-adjusted p-values are reported for each motif-cell type pair.

Cicero co-accessibility scores by cell type

Comma-separated table of Cicero co-accessibility scores greater than 0.1, generated for each of 101 cell type/tissue pairs. The first two columns are the coordinates in hg19 of the two tested sites. Each of the remaining columns represents the co-accessibility scores for each of the cell types. NA values indicate that the pair of sites was not tested because of insufficient depth or that the co-accessibility value was less than 0.1.

Cell-type specific accessibility as bigwig files

Fragment endpoints were extended 100 bp in each direction, reads were summed across all cells in a cell type and then normalized to total number of cells per cell type.