Periodicity scan


This tool applies the “periodicity scan” technique described in Mrázek, J. (2010) J. Bacteriol. 192, 3763-3772. See this paper for additional information and examples of applications.


INPUT

The program reads a nucleotide sequence in the standard GenBank or Fasta format. It expects a single contiguous sequence in the input file. All letters other than A, C, G, T (or U) in the DNA sequence are treated as N (unknown nucleotide).


DATA ANALYSIS

Periodicity scan applies the periodicity plot technique in a sliding window mode. A window of a given size is moved along the analyzed sequence, and periodicity plot (the function) generated for the section of the analyzed sequence in the current window. In the final display, the horizontal axis signifies the position of the center of the window in the analyzed sequence, the vertical axis displays the period P, and the value is displayed by the level of gray (the darker the spot the stronger periodic signal near the given position in the analyzed sequence and the given period P). See the periodicity plot description for more details.


USER-DEFINED PARAMETERS


Method (same as in periodicity plot)

Specifies the sequence motifs whose spacing is analyzed. Available options are:

“AT” A or T

“A2T2” AA or TT

“A3T3” AAA or TTT

“A4T4” AAAA or TTTT

“A5T5” AAAAA or TTTTT

“AT2” AA, AT or TT

“AT3” AAA, AAT, ATT or TTT

“AT4” AAAA, AAAT, AATT, ATTT or TTTT

“AT5” AAAAA, AAAAT, AAATT, AATTT, ATTTT or TTTTT

“AT6” AAAAAA, AAAAAT, AAAATT, AAATTT, AATTTT, ATTTTT or TTTTTT


Spacing range (same as in periodicity plot)

The values and (see periodicity plot description).


Window size and step

The size of the sliding window and the step at which the window is moved. Note that the online version of the program limits the number of windows analyzed in a single run to 5000. That is, the analyzed sequence cannot be longer than 5000×step.


Black/white

These parameters determine how the values are converted into the grey shades for the purpose of plotting. Values of are displayed in black, values of are displayed in white and the values between black and white in gradually changing shades of grey.


OUTPUT


*.perscan.ps

Graphical output in Postscript format (can be converted to PDF). The large plot at the top is the main periodicity scan output. It displays the Q* values as a function of the period P and the window position in the analyzed sequence. The two additional plots at the bottom display basic statistics on the distribution of the periodic signal in the analyzed sequence. The plot on the left shows the percentage of windows that have at the period shown by the abscissa (at the resolution of 0.1 bp) regardless of the value. In the plot on the right, the cyan lines signify the percentage of windows that have for the given period P, regardless of whether it is the maximum or not. The magenta lines show the percentage of windows with , etc.

*.pescan

Tabulated data used to generate the main periodicity plot.

*.pescan.stats

Tabulated data used to generate the two additional plots plus some additional values not displayed in the plots. The “Ave FT” column shows the mean value across all windows. This value can display “nan” (“not a number”) if some sections of the analyzed sequence were masked out.

*.perscan-indices

Plain text file, which shows the Max2, PMax2, Max3, PMax3, MaxMax, and PMaxMax indices. Max2 is the maximum percentage of windows with over the given range of periods (the highest cyan line in the bottom right display). PMax2 is the period that yields Max2. Max3 and PMax3 are defined analogously for (the highest blue line in the bottom right display). MaxMax is the highest fraction of windows which have at the value P±0.2 (maximum sum of five consecutive values in the bottom left display) and PMaxMax is the period that yields MaxMax.

log.txt

The console output from running the program. Check for any potential error and /or warning messages.


DATA POSTPROCESSING


The postprocessing options allow comparing the PerScan with sequence annotation. It is only applicable if the input file was in the standard GenBank format and included the features table (annotation). Note that prokaryotic genome sequences stored locally on our server are all in GenBank format. The users can specify the period range [Pmin, Pmax] and the MaxQ cutoff Q0. The postprocessing step will identify all annotated genes and other features that overlap with windows with or with (specified by the user) for . A gene is considered overlapping with a qualifying window if its either end is located within a window that satisfies the given conditions. The postprocessing step can be repeated with different parameters. Each postprocessing step generates two output files with .txt and .xls extensions (see below). The file names are autogenerated and the input parameters are embedded in the filenames. The output files are added to the PerScan results ditrectory.


User-defined parameters:


MaxQ cutoff Q0

Period range [Pmin, Pmax]

Choice to identify genes with higher or lower MaxQ than the cutoff


Postprocessing output

*.txt

This is a filtered features table from the original GenBank input file. Only features that satisfy the specified criteria are included. All information pertaining to the qualifying features is copied into the output exactly as listed in the GenBank file.

*.xls

More user-friendly list of genes including the most relevant information extracted from the GenBank in a tab-delimited text format suitable for Excel. Only the following GenBank features are included in this file: CDS, rRNA, tRNA, tmRNA, ncRNA, and misc_RNA. Features consisting of several segments (e.g., CDS for multi-exon genes) are treated as a single block starting at the first base of the first segment and ending at the last base of the last segment.