* "Reducing Alignment Time Complexity of Ultra-large Sets of Sequences"
* Authors: A. Rubio-Largo, L. Vanneschi1, M. Castelli, and M. A. Vega-Rodriguez
* e-mail: arl@unex.es
 
USE: 
   DAC.sh <input-fasta> <aligner> <output-fasta>


** NOTE that DAC.sh requires: 	1. UCLUST algorithm -> USEARCH v9.2 (32-bit version) (Ultra-fast sequence analysis) http://drive5.com/usearch/
								2. mlpack 2.0.3 (http://www.mlpack.org/files/mlpack-2.0.3.tar.gz)

1. UCLUST algorithm (USEARCH v9.2, 32-bit)
==========================================

The UCLUST algorithm divides a set of sequences into clusters. The cluster_fast and cluster_smallmem commands are based on UCLUST. A cluster is defined by one sequence, known as the centroid or representative sequence. Every sequence in the cluster must have similarity above a given identity threshold with the centroid, as shown in the figure below. In previous versions centroids were called seed sequences; this term is no longer used to avoid confusion with alignment seeds (matching words) in algorithms such as BLAST and UBLAST. The identity threshold (T) can be viewed as the radius of a cluster. Clustering commands include cluster_fast and cluister_smallmem.

Download from http://www.drive5.com/usearch/download.html and place it in ./bin

Edgar, R.C. (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461. doi: 10.1093/bioinformatics/btq461


2. mlpack: a scalable C++ machine learning library
==================================================

		mlpack is an intuitive, fast, scalable C++ machine learning library, meant to be
		a machine learning analog to LAPACK. It aims to implement a wide array of
		machine learning methods and functions as a "swiss army knife" for machine
		learning researchers.

	**Download [current stable version (2.0.3)](http://www.mlpack.org/files/mlpack-2.0.3.tar.gz).**


