Research ArticlesTechniques

Learning-dependent chromatin remodeling highlights noncoding regulatory regions linked to autism

See allHide authors and affiliations

Sci. Signal.  16 Jan 2018:
Vol. 11, Issue 513, eaan6500
DOI: 10.1126/scisignal.aan6500

Deciphering the chromatin in learning and autism

Both development and learning are shaped through epigenetics—modifications to the DNA or chromatin that alter gene expression without changing the underlying DNA sequence. The intellectual disorder ASD (autism spectrum disorder) is idiopathic, but it is associated with changes in gene expression that alter the development of neuronal circuitry in the brain and impair some forms of learning. Using a new technique called DEScan, Koberstein et al. explored learning-induced changes to the epigenetic landscape of the hippocampus (a region critical for memory) in mice. They found that learning was mediated through changes to regulatory regions, particularly the activation of alternative promoters, of many genes associated with ASD. These findings identify an epigenetic source of one gene alteration associated with ASD and, more broadly, demonstrate a new technique for exploring learning disorders in animal models.


Autism spectrum disorder (ASD) is a prevalent neurodevelopmental disorder that is associated with genetic risk factors. Most human disease-associated single-nucleotide polymorphisms (SNPs) are not located in genes but rather are in regulatory regions that control gene expression. The function of regulatory regions is determined through epigenetic mechanisms. Parallels between the cellular basis of development and the formation of long-term memory have long been recognized, particularly the role of epigenetic mechanisms in both processes. We analyzed how learning alters chromatin accessibility in the mouse hippocampus using a new high-throughput sequencing bioinformatics strategy we call DEScan (differential enrichment scan). DEScan, which enabled the analysis of data from epigenomic experiments containing multiple replicates, revealed changes in chromatin accessibility at 2365 regulatory regions—most of which were promoters. Learning-regulated promoters were active during forebrain development in mice and were enriched in epigenetic modifications indicative of bivalent promoters. These promoters were disproportionally intronic, showed a complex relationship with gene expression and alternative splicing during memory consolidation and retrieval, and were enriched in the data set relative to known ASD risk genes. Genotyping in a clinical cohort within one of these promoters (SHANK3 promoter 6) revealed that the SNP rs6010065 was associated with ASD. Our data support the idea that learning recapitulates development at the epigenetic level and demonstrate that behaviorally induced epigenetic changes in mice can highlight regulatory regions relevant to brain disorders in patients.


Autism spectrum disorder (ASD) is a neurodevelopmental disorder of high prevalence that involves genetic risk factors. Genetic analysis has connected a high number of genetic variants to ASD (15). Studies using exome sequencing that focused on identifying de novo mutations have discovered hundreds of potential ASD-associated genes (68). So far, most approaches for understanding genetic risk factors in ASD have focused on identifying mutations in genes (or exons). However, one of the hallmark findings from the Encyclopedia of DNA Elements (ENCODE) project is that most of the single-nucleotide polymorphisms (SNPs) associated with human disease are outside of protein-coding genes (9). Genetic variants in regulatory regions, the regions of the genome that control gene expression, are also more common (10). It is likely that a large proportion of genetic variants associated with ASD will be common variants that lie within regulatory regions and, thus, remain largely unexplored.

Several examples of regulatory genetic variants were found to have functional links to ASD (1114). Reanalysis of available genome-wide association study data shows that loci associated with autism are more likely to influence gene expression in the brain (expression quantitative trait loci) (15). Focusing on regions that can regulate gene expression is a promising avenue to discover new ASD-linked genetic variants. However, it remains challenging to pinpoint regulatory regions that are likely to be important for a specific disease. In any given tissue at any given time, not all regulatory regions are able to be used or “accessible.” DNA exists within the nucleus of a cell as tightly packed chromatin, which can be unpackaged and repackaged as needed. The accessibility of chromatin is regulated through epigenetic modifications. The regulatory regions accessible in the brain, in response to a relevant stimulus, may be functionally relevant to ASD.

Conceptual parallels between the cellular basis of development and the formation of long-term memory have long been recognized. In particular, the epigenetic mechanisms used for deciding fate during development are the same as those that form the basis of long-term memory storage (16). Here, we explored whether epigenomic studies of learning and memory could guide the discovery of regulatory regions of relevance to neurodevelopmental disorders, such as ASD. We used sonication of cross-linked chromatin followed by high-throughput sequencing (HTS) [Sono-seq (17)] to determine which promoters in the mouse genome are accessible in the hippocampus after a learning experience. We developed a new bioinformatics strategy called DEScan (differential enrichment scan) to analyze the data and found that learning produces increases in chromatin accessibility at 2365 regions, most of which are transcriptional start sites (TSS), and furthermore, are alternative (intronic) promoters. Learning-regulated promoters disproportionately overlap with those active during forebrain development in mice and are enriched in epigenetic modifications that are hallmarks of bivalent promoters, typically considered to poise expression of developmental genes (18). We then combined the results of our chromatin accessibility study with learning-induced genome-wide gene expression data [RNA sequencing (RNA-seq)] that we have previously generated (19, 20). Our results suggest that the relationship between changes in chromatin state and changes in gene expression is complex, with a highly significant enrichment of learning-regulated promoters relative to alternative splicing. In addition, learning-regulated promoters were disproportionately associated with known ASD risk genes. Genotyping of common SNPs in a clinical cohort within one of these regions (SHANK3 promoter 6) revealed that SNP rs6010065 was associated with ASD. Our data support the idea that learning recapitulates development at the epigenetic level and demonstrate that behaviorally induced epigenetic changes obtained in mice can highlight functionally relevant regulatory regions for genetic association studies of brain disorders.


Learning increases chromatin accessibility genome-wide

Learning and long-term memory require transcription, protein synthesis, and epigenetic processes that regulate gene expression (2127). However, it is not yet clear which specific regulatory regions are epigenetically controlled by experience-based learning. We used Sono-seq (17) to determine which promoters in the mouse genome are regulated upon contextual fear conditioning. Sono-seq has been shown to be a useful and simple method by which to map local alterations in chromatin structure that correlate well with RNA polymerase II (RNAPol II) binding. Fear conditioning is a form of learning in which an aversive stimulus (such as a shock) is associated with a neutral context. Reexposure to the context triggers retrieval of the memory for the context-shock association, which is quantified as freezing in mice. Fear conditioning is highly reproducible among individual mice, requiring a single exposure to learn. In addition, the timeline of sensitivity for transcriptional and epigenetic inhibition is established with this test, making it an ideal learning task for our genomic study. Fear conditioning requires activity in the hippocampus, a brain region essential for long-term memory formation. Hippocampal tissue was collected at the first established sensitive period for transcriptional and epigenetic inhibition during memory consolidation, which was 30 min after fear conditioning (28, 29). Some animals were not sacrificed to confirm learning of the task, which was typical with average freezing of 55% (±10%) after reexposure to the context 24 hours later. Animals that were handled but not trained were dissected at the same time of day as controls. After tissue dissection, DNA was immediately cross-linked and prepared for Sono-seq (Fig. 1A). The results indicated that learning increases chromatin accessibility genome-wide (Fig. 1B).

Fig. 1 Learning increases chromatin accessibility in the mouse hippocampus.

(A) Outline of experimental design to measure changes in chromatin accessibility after contextual fear conditioning (FC) versus time of day matched controls (HC) using sonication of cross-linked chromatin followed by sequencing (Sono-seq) (B) The effect of learning in chromatin accessibility at transcriptional start sites (TSS) genome-wide. The plot shows normalized read counts relative to genomic position around TSS for all promoters in the mouse genome (mm9). Average per condition is shown (n = 4 samples per group). Average sequencing depth per sample: 100 million 100-bp single-end reads. The number of learning-regulated regions detected at false discovery rate (FDR) < 0.05 requiring four replicates to reproduce the peak using DEScan is shown (see figs. S1 and S2 and table S1).

To define where in the genome these changes were occurring, we developed a new bioinformatics strategy, which we termed DEScan. The main motivation behind DEScan was to leverage biological variation to ensure reproducibility. A common goal in epigenomic sequencing studies is to identify differences between conditions; however, how replicates affect the results has not been thoroughly explored. DEScan is an R programming language–integrated peak and differential caller, specifically designed to accommodate epigenomic signal of variable width and leverage biological reproducibility. DEScan has two modules (fig. S1A): a novel peak-calling module and a differential enrichment module (see Materials and Methods for further details). The first module integrates peak calling on individual samples using a Poisson model distribution and a variable-width window (without the need of a separate background sample) with evaluation of reproducibility of peak location across replicates to produce a count matrix for statistical testing. The results show that most of the peaks found in a sample are not reproducible (fig. S1B), highlighting the need for three or more biological replicates in differential epigenomic studies. The differential enrichment module in DEScan implements normalization and statistical testing in a manner analogous to RNA-seq data analysis described in other popular epigenetic sequencing differential analysis packages. The main difference being that, in DEScan, data normalization is tailored to epigenomic data. Traditional methods of normalization based on total read counts per sample do not properly distinguish the controls from fear-conditioned samples (fig. S2A) and remove the differential signal from the analysis (fig. S2, B and C). Using a method of normalization that can properly correct for unwanted variability in sequencing data, called removal of unwanted variation (RUV) (27), we were able to capture the experimental variable. Our results caution against using normalization factors derived from total read counts, pointing to a fundamental difference between transcriptomic and epigenomic data. Overall, DEScan detected 2365 learning-regulated regions at a false discovery rate (FDR) < 0.05 (table S1). Of those, 25 are decreases and 2340 are increases, confirming that the overall effect of learning is an increase in chromatin accessibility. Learning-regulated regions were closely associated with genes, with 2019 of them either overlapping the start of a known transcript or located inside a known gene. They were disproportionately biased toward alternative (not first exon) promoters and CpG islands (Fig. 2A).

Fig. 2 Learning recapitulates development at the epigenetic level and is linked to changes in gene expression.

(A) Overlap of learning-regulated regions relative to regions of accessible chromatin during development (ATAC-seq), CpG islands, alternative promoters, and histone marks associated with active (H3K4me2/3, H3K9ac, and H3K27ac) or inactive, repressed (H3K27me3) transcription. Permutation testing was performed to evaluate statistical significance. The number of overlaps observed for each set with the learning-regulated regions was compared to overlaps with 10,000 random samplings of the same number of regions from a background set of all Sono-seq peaks present in two or more samples. The P value corresponds to the proportion of permuted samples with more overlaps than the observed value; ***P < 0.001, where no permutation showed greater overlap than the observed value. Violin plots display the full distribution of permuted set overlaps with each genomic region data set, whereas circles indicate the observed value of overlaps with learning-regulated promoters. All ChIP-seq and ATAC-seq data obtained from ENCODE are derived from mouse forebrain at E12.5. CpG island and alternative promoter (altEvents) annotations obtained from the UCSC genome browser. (B) Overlap of learning-regulated regions with active histone marks in the forebrain, midbrain, and hindbrain at E12.5 relative to the overlap in the same tissues at postnatal day 0 (P0). (C) Enrichment of learning-regulated promoters relative to gene expression measured using RNA-seq after FC and retrieval of memory 24 hours after (RT), assessed as the number of overlaps between the learning-regulated regions and each set of differentially expressed or spliced set of genes (FDR < 0.05) compared to overlaps with 10,000 random samplings of genes expressed in the hippocampus. P value determined as described in (A). Violin plots display the full distribution of permuted set overlaps with each genomic region data set, whereas circles indicate the observed value of overlaps.

Learning-regulated regions are active during development, associated with differential expression, and disproportionately linked to known ASD risk genes

To further understand the functional characteristics of learning-regulated regions, we used all available data obtained from brain tissue samples from the ENCODE3 project to investigate whether these regions were enriched for specific epigenetic modifications. We found that learning-regulated regions showed significant enrichment for histone marks associated with promoting transcription [“active”: di- or trimethylated histone H3-Lys4 (H3K4me2/3), acetylated histone H3-Lys9, and -Lys27 (H3K9ac, H3K27ac)] as well as repressing transcription (“repressive”: H3K27me3) (Fig. 2A). These are characteristics of bivalent promoters; thus, herein, we refer to learning-regulated regions as learning-regulated promoters. Enrichment for active marks H3K4me2/3 and H3K9ac is significantly higher in the mouse brain at embryonic day 12.5 (E12.5) than postnatally (Fig. 2B). These results suggest that promoters induced by learning are preferentially active during development. Comparing chromatin-accessible regions in the developing mouse brain at E12.5 [obtained by assay for transposase-accessible chromatin using sequencing (ATAC-seq) through ENCODE] with chromatin-accessible regions in the mouse hippocampus as determined by our study indicated that learning-regulated promoters are also, for the most part, accessible during development (Fig. 2A).

To investigate the relationship between an increase in chromatin accessibility and differential expression after learning, we examined the overlap between our learning-regulated promoters and genome-wide gene expression data after fear conditioning we have previously defined using RNA-seq (19). The genome-wide gene expression data include gene expression measured 30 min after fear conditioning (the same time the Sono-seq samples were taken) and 30 min after a retrieval test 24 hours after. Our results showed a trend toward enrichment of learning-regulated promoters in genes differentially expressed 30 min after fear conditioning (P = 0.12). This enrichment is borderline significant for genes differentially expressed 30 min after retrieval (P = 0.05; Fig. 2C). Because there is a bias toward alternative promoters, we hypothesized that they may be driving alternative exon usage or splicing. To test this, we looked at the relationship between learning-regulated promoters and genes that show differential exon usage independently from global differential expression we previously defined (20). In this case, there was a highly significant enrichment for alternative spliced genes 30 min after fear conditioning (P < 0.001). This enrichment was also significant 30 min after retrieval, although to a lesser magnitude (P = 0.01; Fig. 2C). Overall, our data suggest that there is a relationship between chromatin organization and differential expression. The relationship is complex, in that it involves a highly significant role of learning-regulated promoters in alternative splicing as well as a moderate role in global gene expression during both memory consolidation and retrieval.

To determine whether learning-regulated promoters were associated with specific sets of genes, we conducted classical gene set enrichment analysis using the Database for Annotation Visualization and Integrated Discovery [DAVID (30)]. Only genes that directly overlapped with learning-regulated promoters were considered. We first focused on established pathways [Kyoto Encyclopedia of Genes and Genomes (KEGG)] and found that only “pathways in cancer” enriched at an FDR < 0.01. We then looked at Gene Ontology (GO; Biological Function) and found only eight GO terms enriched at FDR < 0.01: transcription, regulation of transcription, and positive regulation of transcription (all DNA-templated); positive and negative regulation of transcription from RNAPol II promoter; and nervous system development, in utero embryonic development, and positive regulation of neuron differentiation. Overall, learning-regulated promoters are associated with genes involved in transcription, brain development, and cancer. Given our results and the recent awareness that ASD and cancer share risk genes and pathways (31), we asked whether learning-regulated regions were preferentially associated with known risk genes for either disorder. We observed a greater than threefold enrichment of ASD risk genes in learning-regulated regions (Table 1 and table S2). This enrichment held true and was greater when the strength of evidence required to define a “risk gene” was increased. Genes known to cause syndromic forms of ASD were enriched 4.8-fold relative to learning-regulated promoters. Some ASD risk genes, such as Rai1, Rere, Otx1, Dlx, Pcca, and Cacna1, were associated with more than one learning-regulated promoter. All members of the SHANK family of scaffolding proteins (encoded by Shank1, Shank2, and Shank3) were associated with learning-regulated promoters and therefore likely epigenetically regulated by learning.

Table 1 Learning-regulated regions are disproportionately associated with known ASD and cancer candidate genes.

ASD candidate genes were obtained from the Simons Foundation Autism Research Initiative (SFARI) gene database. Cancer candidate genes were obtained from Statistical significance was calculated using χ2 tests comparing observed versus expected frequencies. ASD genes associated with learning-regulated promoters can be found in table S2.

View this table:

Shank3 promoter 6 is a learning-regulated promoter associated with ASD and differential isoform expression of Shank3

To further investigate the functional relevance of learning-regulated promoters, we focused on one such region associated with a well-established ASD risk gene: Shank3 promoter 6. SHANK3 is a synaptic scaffolding protein for which mutations have been found to be associated with a substantially increased risk for ASD (32). In humans, complete deletion of SHANK3 leads to Phelan-McDermid syndrome (33), a rare disease strongly associated with ASD. Our Sono-seq data showed that, although the canonical first exon promoter of Shank3 is accessible in basal conditions, accessibility of promoter 6 increased specifically in response to learning (Fig. 3A). Given that this is an internal promoter and Shank3 has a wide array of isoforms (34), we then asked whether learning affected isoform-specific expression of Shank3. We repeated our contextual fear conditioning experiment, but this time, we isolated RNA and evaluated gene expression using quantitative polymerase chain reaction (qPCR). We found that learning down-regulated expression of the major isoform of Shank3 (Shank3a), which mirrors the activity-dependent reduction of Shank3a that has been reported in vitro (Fig. 3B) (34). The mechanism that links activation of Shank3 promoter 6 and repression of Shank3a is not clear at this point. To investigate whether SHANK3 promoter 6 has any impact on ASD diagnosis, we genotyped a noncoding SNP located within this region (rs6010065) in a clinical cohort. Our results show that rs6010065 is associated with ASD (Fig. 3C). Given the uneven distribution of minor allele frequencies observed at this site based on the 1000 genomes project (, we split our cohort based on genetic background. We found that the association of the heterozygous allele with ASD was only significant in Caucasians (fig. S3A). In non-Caucasians, the same genotype was associated with learning ability (fig. S3B), although the sample is too small to draw definite conclusions.

Fig. 3 Learning increases accessibility to Shank3 promoter 6, highlighting a regulatory region linked to ASD.

(A) Chromatin accessibility at the Shank3 locus. Sono-seq signals (mapped reads) from representative samples are shown as mapped to the UCSC genome browser (mm9) for both fear-conditioned (FC) and control (HC) animals. Dark blue track shows UCSC gene model for Shank3, with filled rectangles indicating the 22 exons. The light blue region highlights a region of significant enrichment compared to controls as detected by DEScan (n = 4 per group, FDR < 0.05). Green boxes represent CpG islands. The promoters of Shank3 are shown for reference, adapted from (46). (B) Differential expression (by qPCR) of major Shank3 isoforms after fear conditioning in an independent cohort of animals, normalized to Hprt expression and quantified using the ΔΔCt method. Data are mean ± SE from n = 7 animals per condition. Only P < 0.1 is shown; by two-sided t tests. Primer pairs used to quantify Shank3 isoform expression were designed to span Shank3a exons 2 to 10, Shank3b exons 3 to 6, Shank3c exons 12 to 14, Shank3d exons 13 to 15, Shank3e exons 17 to 19, and Shank3f exon 22. (C) Association of genetic variation in promoter 6 of SHANK3 (SNP rs6010065) with ASD (blue, n = 422 patients) versus TDCs (red, n = 182 individuals) as assessed by the χ2 test per genotype relative to the sum of all other genotypes. SNP rs6010065 does not overlap SHANK3 exon 21. Further details on genotyping of patients are in fig. S3.


Here, we identified the promoters in the mouse hippocampus altered by learning after a classical conditioning task through the use of HTS technology and the development of a new bioinformatics strategy. We found that learning increases chromatin accessibility genome-wide, supporting previous findings that neuronal activation increases chromatin accessibility (35). Our results, however, provide a genome-wide map of chromatin remodeling in response to behavior. Our data reveal a complex relationship between chromatin organization and differential expression during memory consolidation and retrieval, with a likely role in regulating alternative splicing and moderate role in regulating global levels of gene expression. We also present DEScan, a new bioinformatics strategy that allows us to define the identity of the regions recruited by learning in a reproducible way. It has long been recognized that the epigenetic mechanisms involved in development are co-opted during learning. Our results further support the conceptual parallels between learning and development. We show that the promoters recruited and the epigenetic modifications within them also show parallels between learning and brain development. The overlap between learning and brain development is particularly significant for mouse E12.5, although the reason behind this is unclear. Studies have shown that specification of hippocampal fate in mice begins as early as E10.5 to E12.5 (36), raising the possibility that the high concordance observed between chromatin accessibility in forebrain at E12.5 and hippocampus in adulthood is rooted in similarities between epigenetic regulation due to learning and hippocampal fate determination. Overall, our data suggest that learning may recapitulate brain development at the epigenetic level by activating an overlapping subset of promoters associated with genes involved in transcription, brain development, and cancer. Learning-regulated regions are also disproportionately associated with ASD risk genes, further strengthening the link between brain development and long-term memory formation. This is not surprising, considering that intellectual disability, which is characterized by learning impairments, is the most common ASD comorbidity (37).

As whole-genome sequencing (WGS) becomes widespread, looking beyond the 2% of the genome that codes for genes and understanding the contribution of genetic risk factors in regulatory regions are essential for understanding the genetic contribution to any disorder. Work from the ENCODE consortium has already shown that most of the SNPs associated with human disease are located in regulatory regions (9). However, it remains challenging to pinpoint regulatory regions that are likely to be important for a specific disease. Here, we strengthen the conceptual parallels between learning and brain development and show that those parallels can be used to identify regulatory regions relevant to neurodevelopmental disorders, such as ASD. We go one step further and illustrate how that information can be exploited to identify a noncoding variant linked to ASD (rs6010065 located in SHANK3 promoter 6). Still, an open question is how the observed changes in chromatin accessibility ultimately translate into differences in gene expression that can affect phenotype. The challenges in that regard are twofold. First, changes in gene expression are expected to follow epigenetic changes, but at which point in time is uncertain and that timeline could be different for each regulatory region and gene. Second, because the changes in chromatin accessibility disproportionately affect alternative promoters, they are likely to affect isoform-specific expression (as illustrated in our Shank3 example), which is notoriously harder to quantify reliably genome-wide. With the continued WGS of ASD individuals, future studies will evaluate whether learning-regulated regions can identify novel regulatory genetic variants as well as explore the potential role of those variants in learning ability within ASD.


Mice and behavior

C57BL/6J mice were maintained under standard conditions, with food and water available ad libitum. Adult male mice 3 months of age were kept on a 12-hour light/12-hour dark cycle with lights on at 7 a.m. All behavioral and biochemical experiments were performed during the light cycle, with training starting at about 10 a.m. (Zeitbeger time 3). All procedures were approved by the University of Pennsylvania Institutional Animal Care and Use Committee. Contextual fear conditioning was performed as previously described (38). Briefly, mice were handled for 3 days before training, placed in a novel context within soundproof chambers, and allowed to explore for 148 s. A 2-s 1.5-mA footshock was delivered, and mice were kept in the novel context for an additional 30 s. Mice were removed to their home cage after training. One mouse was trained and dissected per day to allow for dissection to occur at the same circadian time every day.

DNA isolation and sequencing

Thirty minutes after fear conditioning, mice were sacrificed and the hippocampus was rapidly dissected. The hippocampus was finely chopped and placed into 500-μl 2% paraformaldehyde for 10 min to cross-link tissue. Cross-linking was stopped by the addition of 100-μl 1 M glycine, and tissue was washed 3× in ice-cold phosphate-buffered saline containing protease inhibitor cocktail (Sigma-Aldrich) and frozen at −80°C. Chromatin was prepared by dounce-homogenizing the tissue in 1 ml of cell lysis buffer [10 mM tris-HCl (pH 8.1), 10 mM NaCl, 3 mM MgCl2, and 0.5% NP-40), centrifuging at 5500g for 5 min at 4°C, removing the supernatant, and resuspending the pellet in 300 μl of chromatin immunoprecipitation (ChIP) nuclear lysis buffer [50 mM tris (pH 8.1), 5 mM EDTA, and 1% SDS]. Chromatin was sonicated using a Bioruptor sonicator with the following conditions: 30 s on and 1 min off for 15 min, add ice and cool the sonicator, and 30 s on and 1 min off for another 15 min. Chromatin was centrifuged for 5 min at 14000g at 4°C, and the supernatant was removed. Chromatin size was tested by removing an aliquot, reversing cross-links, cleaning DNA using the Qiagen MinElute PCR purification kit (Qiagen), quantifying by NanoDrop (Thermo Fisher Scientific), and running the DNA on a 1% agarose gel. Sono-seq DNA was prepared by removing 1 μg of sonicated chromatin. The next morning, 4-μl 0.5 M EDTA, 8-μl 1 M tris-HCl (pH 7.5), and 1-μl Proteinase K (Roche Diagnostics) were added, and samples were incubated for an additional hour at 65°C. Samples were purified into 30 μl using the Qiagen MinElute PCR purification kit. Library preparation for Sono-seq samples was performed using a combination of the ChIP sequencing (ChIP-seq) DNA sample prep kit and the Multiplexing Sample Preparation Oligonucleotide Kit (Illumina). End repair and adenylation were performed as described by the ChIP protocol, whereas adapter ligation was performed as described by the multiplexing protocol. A gel slice 300 ± 25 base pair (bp) was cut from the gel for size selection, the Qiagen gel extraction kit was used to purify DNA, and 23 PCR cycles were used after the PCR reactions described by the multiplexing kit. The final product was run on a gel and purified into 15 μl by the Qiagen gel extraction kit again. Sequencing libraries were quantified using the KAPA Library Quantification Kit, normalized to 10 nM, and submitted to the Penn Genome Frontiers Institute sequencing core at the University of Pennsylvania for 100-bp single-end sequencing on an Illumina HiSeq2000. The average sequence depth was 148 million reads, four independent biological replicates per condition.

HTS data analysis

Reads were mapped to the mouse genome (mm9) using Bowtie2 with default parameters (39). Mapped reads were filtered for quality using custom scripts. Reads were required to have a Bowtie2 mapping quality alignment score greater than 2 to limit multimapping reads. To avoid artifacts from clonal PCR amplification, identical reads (three or more reads of the same sequence) were removed if the surrounding reads did not also show high replication. This strategy was adopted to avoid removing identical reads that may result from repeated sampling of highly accessible chromatin regions, a possible outcome of the high level of sequencing depth used in this study. Details of sequencing, mapping, and filtering statistics are available in table S1. TSS plots were generated using NGSplot (40) of aggregate read counts over all replicates per condition. To analyze differential enrichment of Sono-seq signal between fear conditioned and home cage controls, we developed a novel bioinformatics strategy using the R statistical language we named “DEScan” (fig. S1). DEScan is optimized to perform best on epigenetic HTS data that are not defined by genomic sequence and produce broad peaks. DEScan performs peak calling in individual replicates based on a Poisson distribution using an adaptive window scan and the surrounding 5 or 10 kb of sequence as a background. Peaks in individual replicates are then aligned, and only genomic regions with a defined number of replicates containing a peak (“carriers”) are kept for further analysis. The union of the coordinates of the replicated peaks is used to set the final peak coordinates. The peak-calling module of DEScan cannot be directly compared to currently available peak-caller software for three reasons. First, DEScan was designed to accommodate epigenomic data that produce both narrow (transcription factor) and broad (chromatin accessibility and histone acetylation) signal. Hence, it implements a scan of adaptive width, not a fixed window scan as all other peak finders. Second, other peak-calling algorithms usually rely on “input” as background sample to determine significance for peak calling. DEScan does not require an input sample and calculates peak significance based on the surrounding genomic sequence (Sono-seq and input are also essentially the same data set). Although this approach could lead to increased false positives, peaks on individual samples are never considered in isolation of other biological replicates in DEScan. The last and most important difference is that DEScan requires biological replicates. It considers the reproducibility of peaks in both location and significance simultaneously. Because no current peak caller explicitly accounts for reproducibility, the results are not directly comparable. Replicated genomic regions are used to derive a count matrix using featureCounts (41). The differential module of DEScan implements a strategy similar to that used for RNA-seq differential expression analysis. In that sense, it is analogous to other available approaches used to estimate differential epigenomic regulation, with one important difference: normalization. Normalization is performed using RUV (27). We have previously shown in the context of RNA-seq that RUV normalization is essential for the accuracy and reproducibility of differential analysis for the types of data sets described in this article (19). After normalization, testing for differential enrichment between fear-conditioned and control animals was performed using edgeR (42). The results of differential Sono-seq analysis reported in this article were obtained using the following DEScan parameters: adaptive window size, 50 to 500 bp; number of carriers, 4; and FDR > 0.01. The choice of a proper k for RUV normalization was determined as previously described (19), resulting in k = 4. The DEScan manual is available in file S1.

Analysis of overlap with genomic regions was performed using mm10 ChIP-seq and ATAC-seq data derived from forebrain tissue available from ENCODE3 and CpG island and alternative promoter annotations obtained from the University of California, Santa Cruz (UCSC) genome browser. Coordinate differences between mm9 and mm10 mouse genome version were accounted for by using UCSC liftOver. Significance of the number of overlapping regions with histone marks, ATAC-seq, CpG islands, and alternative promoters was assessed through permutation testing (learning-regulated regions compared to 10,000 rounds of random resampling from replicated Sono-seq peaks) using the R/Bioconductor package regioneR. Comparisons between developmental stages were performed using ENCODE E12.5 and postnatal day 0 ChIP-seq data derived from forebrain, midbrain, and hindbrain. Significance of overlap with differentially expressed or spliced genes was assessed through permutation testing (learning-regulated regions compared to 10,000 rounds of random resampling from all genes expressed in the hippocampus). Comparisons to genes implicated in disease were performed using data sets obtained from the Simons Foundation Autism Research Initiative Human Gene list (881 total genes, 7 October 2017, updated version) and the Catalogue of Somatic Mutations in Cancer cancer gene census (561 total genes, version 82). Human HUGO Gene Nomenclature Committee symbols were then translated to Ensembl mouse gene IDs using the getLDS function from the R/Bioconductor biomaRt package, resulting in 880 and 573 mouse genes, respectively. Learning-regulated regions were annotated with the Ensembl gene ID of the nearest TSS using the R/Bioconductor package ChIPpeakAnno. Functional annotation was based on Ensembl gene IDs and performed using the database for annotation, visualization, and integrated discovery version 6.7 (DAVID, The following functional categories were used: GO biological process and KEGG pathways. Enrichment cutoff relative to background (mouse genome) = FDR < 0.01.

Quantitative reverse transcription PCR

qPCR was performed as previously described (43). RNA extraction was performed using a Qiagen RNeasy lipid tissue kit with the following modifications: RNA precipitation was performed using 100% ethanol, and washes were performed using Qiagen’s RWT buffer. Concentration and purity were quantified by NanoDrop spectrophotometry (Thermo Fisher Scientific). For quantitative real-time reverse transcription PCR (RT-PCR), reactions were prepared in 384-well optical reaction plates (ABI) with optical adhesive covers (ABI). Two technical replicates were used. Reactions were carried out in ViiA7 real-time PCR system (Invitrogen) according to the manufacturer’s instructions. For mRNA qPCR, generation of complementary DNA was carried out by the RETROscript Kit (Ambion) with 1 μg of RNA as template. Primer sequences for Shank3 isoform genes were obtained from (44). Data were normalized to Hprt before calculation of differences. Fold change was calculated using the ΔΔCt method (45). The data presented are the calculated mean for the biological replicates with n = 8 (that is, the number of mice examined). We used t tests to compare fold change values for each gene in each comparison of interest. Two-tailed P values are reported. Fos induction was used as a positive control on all qPCR runs.


Our data set consisted of 768 individuals, including both male (678) and female (90) participants, among which 554 children (age, 3.3 to 15.2 years) were affected by ASD and 214 children (age, 3.2 to 16.3 years) were typically developing controls (TDCs). Participants were recruited and diagnosed by expert clinicians at the Center for Autism Research at the Children’s Hospital of Philadelphia (CHOP). Children with known genetic conditions associated with ASD were excluded from the study. DNA samples were collected through cheek swabs and/or blood draw. All procedures were approved by the Institutional Review Board at CHOP.

Patient genotyping

DNA extraction was carried out using a detergent-based method using AMPure beads automated on a Biomek FX liquid handler instrument from Beckman Coulter. Genotyping of rs6010065 was performed using TaqMan assay from Life Technologies (catalog no. C__29723448_10). Reactions were carried out in ViiA7 real-time PCR system (Invitrogen) according to the manufacturer’s instructions. On the basis of data from the 1000 genome project (, we expected the minor allele for rs6010065 to be different in Caucasian populations relative to non-Caucasians. Accordingly, Hardy-Weinberg equilibrium was not achieved whether all individuals were analyzed together, but it was achieved whether non-Caucasians and Caucasians were analyzed separately. Self-reported ethnicity was used to stratify patients based on genetic background. Patients that did not report ethnicity were excluded from further studies. Final sample sizes were as follows: Caucasian, n = 302 (ASD) and n = 125 (TDCs); non-Caucasian: n = 67 (ASD) and n = 46 (TDCs). Tests of association were performed as χ2 test on 2 × 2 contingency tables per genotype relative to the sum of all other genotypes. P values reported are one-tailed.


Fig. S1. DEScan is a tool to perform differential analysis of data obtained from epigenetic HTS experiments with multiple biological replicates.

Fig. S2. Normalization affects detection of differences in chromatin accessibility after learning.

Fig. S3. SNP rs6010065 is differentially associated with ASD or intelligence quotient depending on genetic background.

Table S1. Detailed annotation of learning-regulated promoters.

Table S2. Details of which learning-regulated regions are associated with known ASD risk genes.

File S1. DEScan user manual.


Acknowledgments: We thank H. Schoch for helpful discussions. Funding: This work was supported by National Research Service Award training grants T32NS007413 [to L.P.; M. Robinson, principal investigator (PI)] and T32HL007953 (to M.E.W.; A.I. Pack, PI). T.A. acknowledges funding from the Brush Family Professorship, Defense Advanced Research Projects Agency 58077 LSDRP (S. Bhatnagar, PI), and R01 MH087463. H.H. and R.T.S. acknowledge funding from the Lurie Family Foundation. R.T.S. acknowledges funding from the Pennsylvania Department of Health (Student Assistance Program no. 4100047863), U54 HD86984, and RC1 MH08879. Author contributions: Study design by L.P., M.E.W., and T.A. Data collection by L.P., M.E.W., S.G.P., G.P., and C.K. Data analysis and interpretation by L.P., J.N.K., D.R., M.E.W., S.G.P., N.R.Z., B.G., R.T.S., and T.A. Manuscript preparation by L.P. and J.N.K., with editing by M.E.W., S.G.P., H.H., D.R., N.R.Z., R.T.S., and T.A. Competing interests: The authors declare that they have no competing interests. Data and materials availability: HTS data will be publicly available through Gene Expression Omnibus after manuscript acceptance: GSE98317 (reviewer’s access link: DEScan code is freely available, and instructions for download and installation can be found in the user manual (file S1).
View Abstract

Navigate This Article