**Periodicity
plot**

**This
tool applies the “periodicity plot” technique described in
****Mrázek,
J. (2010) J. Bacteriol. 192, 3763-3772****. See
this paper for additional information and examples of application.**

INPUT

The program reads a nucleotide sequence in the standard GenBank or Fasta format. It expects a single contiguous sequence in the input file. Sequence characters other than A, C, G, T (or U) are treated as N (unknown nucleotide).

DATA ANALYSIS

Step 1:

A spacing histogram N(s) is created. N(s) denotes the number of times a pair of specified sequence motifs (see “Method” below) occurs at the distance s from each other (distance between the first nucleotides of each motif).

Step 2:

The
values
are converted to odds ratios
.
are the expected counts estimated as
,
where
signifies the number of times a pair of any nucleotides A, C, G, or T
is found at the distance *s* from each other. Note that under
normal circumstances
(L being the length of the analyzed sequence). *p* is the
probability of finding the selected pattern at any given position in
the sequence. For example,
for
the “AT” method,
for the “A2T2” method, and
for the “AT4” method.
is the A+T content of the sequence at hand.

Step 3:

The 3-bp periodic signal arising from biased codon usage in genes is subsequently removed with a 3-bp sliding window average, yielding .

Step 4:

In some genomes, the plot has a strong decreasing slope resulting from local variance in the A+T content. This slope is eliminated by subtracting a parabolic regression from the histogram, yielding , where the parameters A, B, and C define the parabola fitted to by the least squares method.

Step5:

A section of the plot between values and is converted to a power spectrum by Fourier transform. The power spectrum measures the strength of a periodic signal corresponding to the period . It is defined as , where is the imaginary unit.

Step 6:

To allow comparisons among sequences of different properties, the power spectrum is subsequently normalized to average 1 over a desired range of periods (5-20 bp), yielding (Figure 1b). We refer to the function as “periodicity plot”.

USER-DEFINED PARAMETERS

Method

Specifies the sequence motifs whose spacing is analyzed. Available options are:

“AT” A or T

“A2T2” AA or TT

“A3T3” AAA or TTT

“A4T4” AAAA or TTTT

“A5T5” AAAAA or TTTTT

“AT2” AA, AT or TT

“AT3” AAA, AAT, ATT or TTT

“AT4” AAAA, AAAT, AATT, ATTT or TTTT

“AT5” AAAAA, AAAAT, AAATT, AATTT, ATTTT or TTTTT

“AT6” AAAAAA, AAAAAT, AAAATT, AAATTT, AATTTT, ATTTTT or TTTTTT

Spacing range

The values and (see Step 5 above).

OUTPUT

*.perplot.ps

Graphical output in Postscript format (can be converted to PDF). Includes the plots , , and (see above).

*.perplot

Same data as in *.perplot.ps but in a tab-delimited table

*.perplot-indices

Plain
text file, which shows the MaxQ and PMaxQ indices. MaxQ is the
maximum value of
over the given range of periods *P* (5-20 bp). PMaxQ is the
period *P* that yielded MaxQ.

log.txt

The console output from running the program. Check for any potential error and /or warning messages.