A flexible, open-access workflow to facilitate the identification of cell clones that have desirable CRISPR-Cas9-induced gene edits.
This only needs to be done once. Follow these detailed installation steps before running the analsysis.
The active environment you are currently using is shown in parentheses () or brackets [] at the beginning of your command prompt.
(geneditid) $
if not, activate it using conda activate geneditid
before running the next steps.
cd ~/GenEditID/
./scripts/start_webapp.sh
Go to http://localhost:8080
GEPID
folder has been created on your local disk in GenEditID/PROJECTS/
to store data files associated with the project (e.g. fastq sequencing files and all output files from ampli_count and/or protein analysis). Replace GEPID
with the identifier of the project (e.g. GEP00001) in all steps below.Click on the ‘GEPID’ of the project created in the table of projects below and go to the Setup tab
Save all fastq files related to your project into the fastq
folder of the GEPID project folder. In the code, replace GEPID in GenEditID/PROJECTS/GEPID
with the identifier of the project.
Combine paired-end reads by merging or joining reads to generate .fqjoin.gz
files for ampli_count analysis. File formats for paired-end reads should end in *.s_1.r_1.fq.gz
and *.s_1.r_2.fq.gz
.
fastq-join
needs to be installed)
cd ~/GenEditID/PROJECTS/GEPID/fastq
~/GenEditID/scripts/run_joinreads.sh
seqkit
needs to be installed)
cd ~/GenEditID/PROJECTS/GEPID/fastq
~/GenEditID/scripts/run_mergereads.sh
amplicount
analysisThe project submission spreadsheet (step 1) has been loaded and the combined paired-end reads generated (step 2). Amplicon sequences uploaded are used to automatically generate an GenEditID/PROJECTS/GEPID/amplicount_config.csv
file that enables downstream analysis, as well as GenEditID/PROJECTS/GEPID/amplicount_config_tsearch.csv
if targeted search submitted. Note that this requires association with a reference genome that is in the GenEditID/data/reference/
folder (detailed information about the reference genome files). This allows the amplicount
tool to be ran which will generate an amplicount.csv
file. Sequences associated with each amplicon are counted and quality controlled to discard low frequency and low quality reads.
GEPID
with the identifier of the project.
cd ~/GenEditID/
source venv/bin/activate
cd PROJECTS/GEPID
geneditid_run_amplicount
Turn off sleep mode on your computer for analysis to run smoothly.
geneditid_run_amplicount --abundance=10
by default, it is set to 60 reads.
geneditid_run_amplicount --quality=5
by default, it is set to 10.
geneditid_run_amplicount --reverse
View the output of the analysis in the output result file GenEditID/PROJECTS/GEPID/amplicount.csv
and GenEditID/PROJECTS/GEPID/amplicount_tsearch.csv
(if targeted search submitted)
If you wish to change the weighting score given to each consequence, you can do so by editing the GenEditID/python/geneditid/consequences.csv
file (save it as a csv file) before visualising your results.
./scripts/start_webapp.sh
GenEditID/PROJECTS/GEPID/geneditid_plots/coverage.html
GenEditID/PROJECTS/GEPID/geneditid_plots/impacts.html
GenEditID/PROJECTS/GEPID/geneditid_plots/koscores.html
GenEditID/PROJECTS/GEPID/geneditid_plots/targeted_search.html
(if targeted search submitted)Ensure your question has not already been answered, please read closed issues first.