Periodicity plot


This tool applies the “periodicity plot” technique described in Mrázek, J. (2010) J. Bacteriol. 192, 3763-3772. See this paper for additional information and examples of application.


INPUT

The program reads a nucleotide sequence in the standard GenBank or Fasta format. It expects a single contiguous sequence in the input file. Sequence characters other than A, C, G, T (or U) are treated as N (unknown nucleotide).


DATA ANALYSIS

Step 1:

A spacing histogram N(s) is created. N(s) denotes the number of times a pair of specified sequence motifs (see “Method” below) occurs at the distance s from each other (distance between the first nucleotides of each motif).

Step 2:

The values are converted to odds ratios . are the expected counts estimated as , where signifies the number of times a pair of any nucleotides A, C, G, or T is found at the distance s from each other. Note that under normal circumstances (L being the length of the analyzed sequence). p is the probability of finding the selected pattern at any given position in the sequence. For example, for the “AT” method, for the “A2T2” method, and for the “AT4” method. is the A+T content of the sequence at hand.

Step 3:

The 3-bp periodic signal arising from biased codon usage in genes is subsequently removed with a 3-bp sliding window average, yielding .

Step 4:

In some genomes, the plot has a strong decreasing slope resulting from local variance in the A+T content. This slope is eliminated by subtracting a parabolic regression from the histogram, yielding , where the parameters A, B, and C define the parabola fitted to by the least squares method.

Step5:

A section of the plot between values and is converted to a power spectrum by Fourier transform. The power spectrum measures the strength of a periodic signal corresponding to the period . It is defined as , where is the imaginary unit.

Step 6:

To allow comparisons among sequences of different properties, the power spectrum is subsequently normalized to average 1 over a desired range of periods (5-20 bp), yielding (Figure 1b). We refer to the function as “periodicity plot”.


USER-DEFINED PARAMETERS


Method

Specifies the sequence motifs whose spacing is analyzed. Available options are:

“AT” A or T

“A2T2” AA or TT

“A3T3” AAA or TTT

“A4T4” AAAA or TTTT

“A5T5” AAAAA or TTTTT

“AT2” AA, AT or TT

“AT3” AAA, AAT, ATT or TTT

“AT4” AAAA, AAAT, AATT, ATTT or TTTT

“AT5” AAAAA, AAAAT, AAATT, AATTT, ATTTT or TTTTT

“AT6” AAAAAA, AAAAAT, AAAATT, AAATTT, AATTTT, ATTTTT or TTTTTT


Spacing range

The values and (see Step 5 above).


OUTPUT


*.perplot.ps

Graphical output in Postscript format (can be converted to PDF). Includes the plots , , and (see above).

*.perplot

Same data as in *.perplot.ps but in a tab-delimited table

*.perplot-indices

Plain text file, which shows the MaxQ and PMaxQ indices. MaxQ is the maximum value of over the given range of periods P (5-20 bp). PMaxQ is the period P that yielded MaxQ.

log.txt

The console output from running the program. Check for any potential error and /or warning messages.