Whole Exome Sequencing (WES) is a next-generation sequencing (NGS) technique that targets and sequences all protein-coding regions (exons) of the genome, collectively known as the exome. While exons make up only 1–2% of the human genome, they contain about 85% of known disease-related mutations.
WES offers a cost-effective and powerful method for identifying genetic variants associated with inherited diseases, cancer, and complex traits.
Purpose: Identify mutations in coding regions of genes
Target Regions: All known protein-coding exons (~20,000 genes)
Output: Variants (SNPs, indels) in exons and splice sites
Applications: Rare disease diagnosis, cancer genomics, pharmacogenomics, personalized medicine
DNA Extraction
Genomic DNA is isolated from blood, saliva, or tissue samples.
Library Preparation
DNA is fragmented and adapters are added for sequencing.
Exome Capture (Enrichment)
Biotinylated probes hybridize specifically to exon regions. These regions are pulled down using magnetic beads.
Amplification & Sequencing
Captured fragments are PCR-amplified and sequenced using high-throughput platforms like Illumina or MGI.
Data Analysis
Raw reads are mapped to the reference genome, and variant calling identifies single nucleotide variants (SNVs), insertions, deletions, and splice site mutations.
Rare Disease Diagnosis
Detects inherited mutations responsible for Mendelian disorders and undiagnosed conditions.
Cancer Genomics
Identifies somatic mutations in tumor DNA that drive cancer development.
Prenatal & Pediatric Testing
Screens for genetic defects in fetuses or children with developmental disorders.
Neurogenetic Disorders
Investigates causes of epilepsy, autism, intellectual disability, and neurodegeneration.
Pharmacogenomics
Determines genetic variants that affect drug metabolism and response.
Carrier Screening
Identifies individuals who carry mutations that could be passed to offspring.
High Clinical Yield
Focuses on regions most likely to harbor pathogenic variants.
Cost-Effective
Cheaper than Whole Genome Sequencing with sufficient clinical utility.
Efficient Analysis
Smaller data size makes analysis and interpretation faster and easier.
Supports Novel Variant Discovery
Not limited to known mutations like targeted panels.
Customizable Coverage
Capture kits can be tailored to specific gene sets or updated databases.
Feature | Description |
---|---|
Target Size | ~30–50 Mb (1–2% of the genome) |
Variant Types Detected | SNPs, insertions, deletions, splice site mutations |
Read Depth | Typically 100× or higher for accurate variant calling |
Turnaround Time | 2–6 weeks depending on pipeline |
Databases for Annotation | ClinVar, OMIM, HGMD, dbSNP, gnomAD |
Bioinformatics Pipelines | GATK, VarDict, BWA, Annovar, VEP, DeepVariant |
Does Not Cover Non-Coding Regions
Misses regulatory variants in promoters, enhancers, introns, and UTRs.
Incomplete Coverage
Some exons may have low or no coverage due to poor capture efficiency.
Misses Structural Variants
Large rearrangements, CNVs, and repeat expansions are often undetected.
Interpretation Complexity
Variant interpretation requires clinical correlation and expert curation.
False Positives/Negatives
Errors in mapping or calling can occur, especially in GC-rich or repetitive regions.
QC & Trimming: FastQC, Trimmomatic
Alignment: BWA-MEM, Bowtie2
Variant Calling: GATK HaplotypeCaller, FreeBayes, DeepVariant
Annotation: Annovar, VEP, SnpEff
Visualization: IGV, UCSC Genome Browser
Interpretation: ClinVar, OMIM, HGMD, InterVar