RNASq Quick Start
Once PyCharm is installed and the basic RNASq python files are installed, you’re ready to do your first analysis. In the file exptparams.py, you and your colleagues will have defined all of the individual experiments that were run (recently and earlier). In that setup, you will specify details about the experiment, including, critically, the sequences of the adapters you ligated onto your RNA.
The sequencing instrument reads from the 5′ adapter, through your RNA sequence, and into the 3′ adapter. The results generated include, for each sequence, all of that information. The first thing you will do is to “trim” away the 5′ and 3′ adapters, leaving you with just the RNA sequences.
So let’s say we have an experiment that we have setup in Expt[‘5N’]. The command to import that data into a Python object variable ‘Dset’ would be:
Dset = Expt['5N'].import_dataset()
The resulting output would look like
>Expt['5N'].import_dataset() # [Enz] = 0.50 uM, [DNA] = 2.00 uM, for 5.0 min at T=37.0 C
5N_S3_L001_R1_001.fastq.gz, Trgt=GGNNNNNGTAGAGGTGAAGATTTA (24mer) isTemplate=False
>> 36026 sequences imported
The output reports back some of the information that was initially supplied in exptparams.py, and tells you that the file contained 36,026 sequences. You would next take that data set and trim away the adapters:
Dset = Dset.trimAdaptors(None,None)
The output would look like:
Trim Adapters >> 24752 of 36026 (68.7%) returned
(29270) 5'-TACTAT-(26447)-TGGAA-3' (32379) 10228(28.4%) failed adapter criteria
804(2.2%) PrmDmr, 0(0.0%) 5'dup, 3309(9.2%) 3'dup
48(0.1%) A, 89(0.2%) C, 43(0.1%) G, 62(0.2%) T
The top response tells us that after trimming, we are left with 24,752 valid RNA sequences. In this case, ‘valid’ means it met all three of the following criteria:
- the sequence contained both 5′ and 3′ adapters
- the sequence was not a primer dimer (5′ adapter ligated directly to a 3′ adapter)
- the sequence was not just a single base
Looking more closely, we can see that 29,270 sequences had the (TACTAT) 5′ adapter sequence, 32,379 had the (TGGAA) 3′ adapter sequence, and 26,447 had both (but some of those were primer dimers or single bases). This information can be useful in debugging incorrectly specified parameters or ligation workups gone wrong.