Set up exptparams - Martin Lab

Before (or while) doing an RNA Seq run, you'll want to add your experiments to the exptparams.py file. This file contains a python dictionary variable with entries for every sequencing run in the Data directory.

In this case, information includes some essential information, like the name of the sequencing data file, but also includes lots of information about the experiment. Please include everything.

In the example below, three variables are defined for convenience. A number of separate experiments were sequenced on the same day and used the reaction conditions and promoter sequence, so we defined 'einfo' once, and then used it through multiple experiment definitions. Similarly, all runs use the same 3' adapter/primer.


einfo = Exptinfo('8/8/16', 'Aruni',0.5, 2.0, 5, 37,'','AATTAATACGACTCACTATA')


adptr3 = "TGGAA"


primer3 = 'TGGAATTCTCGGGTGCCAAGG'


Expts['5N'] =  Seqsetup('5N_S3_L001_R1_001.fastq.gz', 'GGNNNNNGTAGAGGTGAAGATTTA',

       
'TACTAT', adptr3, einfo, 'WT Enz, randomized N +3 to +7', False,

       
{'Tmplt':'TAAATCTTCACCTCTACNNNNNCCTATAGTGAGTCGTATTAATT',

         
'NTmpl':'AATTAATACGACTCACTATAGG',

         
'5Prmr': 'GTTCAGAGTTCTACAGTCCGACGATCTACTAT',

         
'3Prmr':primer3,

         
'AlignSeq': 'AGAGG','Index1':'GCCAAT'},

       
{'Keywords': 'random, end',

         
'Description': 'Upstream sequence allows loop back',

         
'QCode':'5N',

         
'PF':6.9370,

         
'Run Date':'11/14/2017'})

An explanation of the parameters:

5N_S3_L001_R1_001.fastq.gz - the full name of the Illumina data file
'GGNNNNNGTAGAGGTGAAGATTTA' - the expected/encoded RNA sequence (note that in this experiment, template bases encoding +3 to +7 were randomized).
'TACTAT' - the last six bases of the 5' adapter (this could be the last 5 or 7 bases; the former leading to potentially false hits and the latter perhaps missing some good reads)
adptr3 - a variable (see above) containing the first six bases of the 3' adapter
einfo - experimental information (must be specified using the Exptinfo function call.
'WT Enz, randomized N +3 to +7' - a textual description of the experiment
False - set this to True only if this is not a transcription run (most typically it might be the sequencing of a DNA template)

The next definition is "multi-line" and extends from { to }. It is a python dictionary variable that contains sequences relevant to this experiment. Although none are essential (you can simply omit their definition), they are very valuable to include and if present, 'AlignSeq' is used as a default for a number of functions. All sequences should be entered 5' to 3'


{'Tmplt':'TAAATCTTCACCTCTACNNNNNCCTATAGTGAGTCGTATTAATT',


'NTmpl':'AATTAATACGACTCACTATAGG',


'5Prmr': 'GTTCAGAGTTCTACAGTCCGACGATCTACTAT',


'3Prmr':primer3,


'AlignSeq': 'AGAGG','Index1':'GCCAAT'},

Tmplt - the DNA template strand used in transcription
NTmpl - the DNA nontemplate strand used in transcription
5Prmr - the 5' primer sequence (the end should match the 5' adapter sequence above)
3Prmr - the 3' primer sequence (the beginning should match the 3' adapter sequence)
AlignSeq - used as the default for aligning sequencing using an internal sequence. This is useful for 5' and 3' end heterogeneity analyses.
Index1 - someone remind me what this is!!!

For these definitions, please be careful to use these precise definitions (to the left of the colon). And in python capitalization matters. You can also add other sequences here. Give them definition names that make sense to you. You can even reference these later in your programming.

The next specification is similarly a dictionary variable, and is a place to store other information about the experiment. QCode should match precisely the name you used in setting up Expt['QCode'] above. The other things are self-explanatory (except for PF, which I can't now remember what it is!). As above, you can add your own dictionary variables to this. Give them a name that you'll remember.



{'Keywords': 'random, end',


'Description': 'Upstream sequence allows loop back',


'QCode':'5N',


'PF':6.9370,


'Run Date':'11/14/2017'}

====================================

For advanced users only:

In your python programming, you can access the above dictionary variable entries as follows:

x = Dset.SeqsUsed['Tmplt']
x = Dset.dData['Description']

Remember that you can test for the existence of a variable as follows:

if 'Tmplt' in Dset.SeqsUsed: