A flexible, open-access workflow to facilitate the identification of cell clones that have desirable CRISPR-Cas9-induced gene edits.
This only needs to be done once. Follow these detailed installation steps before running the analsysis.
The active environment you are currently using is shown in parentheses () or brackets [] at the beginning of your command prompt.
(geneditid) $
if not, activate it using conda activate geneditid before running the next steps.
cd ~/GenEditID/
./scripts/start_webapp.sh
Go to http://localhost:8080
GEPID folder has been created on your local disk in GenEditID/PROJECTS/ to store data files associated with the project (e.g. fastq sequencing files and all output files from ampli_count and/or protein analysis). Replace GEPID with the identifier of the project (e.g. GEP00001) in all steps below.Click on the ‘GEPID’ of the project created in the table of projects below and go to the Setup tab
Save all fastq files related to your project into the fastq folder of the GEPID project folder. In the code, replace GEPID in GenEditID/PROJECTS/GEPID with the identifier of the project.
Combine paired-end reads by merging or joining reads to generate .fqjoin.gz files for ampli_count analysis. File formats for paired-end reads should end in *.s_1.r_1.fq.gz and *.s_1.r_2.fq.gz.
fastq-join needs to be installed)
cd ~/GenEditID/PROJECTS/GEPID/fastq
~/GenEditID/scripts/run_joinreads.sh
seqkit needs to be installed)
cd ~/GenEditID/PROJECTS/GEPID/fastq
~/GenEditID/scripts/run_mergereads.sh
amplicount analysisThe project submission spreadsheet (step 1) has been loaded and the combined paired-end reads generated (step 2). Amplicon sequences uploaded are used to automatically generate an GenEditID/PROJECTS/GEPID/amplicount_config.csv file that enables downstream analysis, as well as GenEditID/PROJECTS/GEPID/amplicount_config_tsearch.csv if targeted search submitted. Note that this requires association with a reference genome that is in the GenEditID/data/reference/ folder (detailed information about the reference genome files). This allows the amplicount tool to be ran which will generate an amplicount.csv file. Sequences associated with each amplicon are counted and quality controlled to discard low frequency and low quality reads.
GEPID with the identifier of the project.
cd ~/GenEditID/
source venv/bin/activate
cd PROJECTS/GEPID
geneditid_run_amplicount
Turn off sleep mode on your computer for analysis to run smoothly.
geneditid_run_amplicount --abundance=10
by default, it is set to 60 reads.
geneditid_run_amplicount --quality=5
by default, it is set to 10.
geneditid_run_amplicount --reverse
View the output of the analysis in the output result file GenEditID/PROJECTS/GEPID/amplicount.csv and GenEditID/PROJECTS/GEPID/amplicount_tsearch.csv (if targeted search submitted)
If you wish to change the weighting score given to each consequence, you can do so by editing the GenEditID/python/geneditid/consequences.csv file (save it as a csv file) before visualising your results.
./scripts/start_webapp.sh
GenEditID/PROJECTS/GEPID/geneditid_plots/coverage.html
GenEditID/PROJECTS/GEPID/geneditid_plots/impacts.html
GenEditID/PROJECTS/GEPID/geneditid_plots/koscores.html
GenEditID/PROJECTS/GEPID/geneditid_plots/targeted_search.html (if targeted search submitted)
Ensure your question has not already been answered, please
read closed issues first.