RepEx: Repeat Extractor for Biological Sequences |
The RepEx package requires the following to run successfully. In the
absence of one or more of these utilities, RepEx may fail to run correctly.
Listed in parenthesis are the minimum versions required to run the
RepEx package. These versions, or subsequent versions should assure the proper
execution of RepEx.
- make (GNU make 3.79.1)
- perl (PERL 5.6.0)
- sh (GNU sh 1.14.7)
- csh (tcsh 6.10.00)
- g++ (GNU gcc 2.95.3)
- sed (GNU sed 3.02)
- awk (GNU awk 3.0.4)
- ar (GNU ar 2.9.5)
Repex can be ran almost on any Linux based operating systems, provided the above mentioned utilities are installed properly.
Sufficient memory and disk space are necessary, but required sizes vary
with input size. Be aware of your disk and memory usage, because insufficient
capacities will result in incorrect or missing output. Required resources
differ depending on the input size, but in general 512 MB of RAM and 1 GB of
disk space is sufficient.
It is possible to port the toolkit to any system with a C++ compiler but
this has not been tested and will not be supported. In addition, you may need
to alter the Makefile to direct 'make' to your native compiler and other
system resources.
For Mac OSX, the Mac development kit must be downloaded and installed. This
kit will include 'gcc', 'ar', and 'make' which are necessary for building
RepEx. RepEx is not supported for any Mac operating system other than OSX.
For Windows users, Cygwin or other Unix-like environment and command-line interface
can be installed with the above mentioned utilities.
repex [options]
Program options
---------------
-m | Type of molecular sequence(s): DNA [n] or Protein [p] {Default: [n]} |
-t | Type of repeat to be extracted: Inverted [i] or Palindrome [ip] or Mirror [m] or Everted [e] (Mirror and everted repeat doesn't exist in protein sequence(s), thus -t m or -t e will not work for protein sequence(s)) {Default: [i]} |
-l | Minimum length of repeats to be extracted [positive integers]. (Caution !! Any lower than around 15 can significantly increase the number of spurious matches and therefore burst up the runtime) {Default: [20]} |
-s | Spacer intervals i.e., the number of bases or residues between the repeat pattern and its copy. All [a] or Local [l] (within 100 bases) or Global [g] (outside 100 bases) or Manual (For manual option, enter your length (x) of spacers preceding with appropriate letter (greater: g, lesser: l, equal: e) -s [gx or lx or ex]) {Default: [l] for DNA and [a] for proteins} |
-c | Class of repeat to be extracted: Identical [i] or Degenerative [d] or both [b]. {Default: [i]} |
-f | Input file path. |
Dr. K. Sekar
Associate Professor
Department of Computational and Data Sciences
#341,2nd Floor,Old CES building
Indian Institute of Science
Bangalore 560 012
INDIA
E-mail: | sekar@iisc.ac.in |