FBK > IT > Content

minerva & minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers

We present a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets. We also provide four interfaces: R (minerva), Python, MATLAB/OCTAVE and C++ wrappers (minepy). For these interfaces, it significantly reduces the large memory requirement of the original Java implementation, supporting the applicability of MINE on large, high-throughput -omics datasets.

The family of Maximal Information-based Nonparametric Exploration (MINE) statistics, including the Maximal Information Coefficient (MIC) measure, was recently introduced in (Reshef et al., 2011), aimed at fast exploration of two-variable relationships in many-dimensional data sets. MINE consists of the algorithms for computing four measures of dependence - MIC, Maximum Asymmetry Score(MAS),Maximum Edge Value (MEV), Minimum Cell Number (MCN) - between two variables, having the generality and equitability property. The MINE suite received appraisal as a real breaktrough in the data mining of complex biological data (Speed, 2011) as well as criticisms (Simon and Tibshirani, 2012; Gorfine et al., 2012). Many groups worldwide have already proposed its use for explorative data analysis in computational biology, from networks interaction dynamics to virus ranking (Weiss et al., 2012; Das et al., 2012; Anderson et al., 2012; Karpinets et al., 2012; Faust and Raes, 2012). However, applicability of MINE.jar on all pairs of features on large datasets is currently limited due to memory requirements and computing time (Miller, 2012). Also, a native parallelization of MINE tasks is needed to speed up typical tasks in functional genomics and metagenomics — for example, as a substitute of Pearson correlation in network studies. Inspired by these considerations, we propose a C implementation of the MINE algorithms, and four interfaces from R (minerva), Python and MATLAB/Octave (minepy).

Download and Documentation

Download Supplementary Information

Supplementary Material (PDF)

Bibliography

Anderson, T., Laegreid, W., Cerutti, F., Osorio, F., Nelson, E., Christopher-Hennings, J., and Goldberg, T. (2012). Ranking viruses: measures of positional importance within networks define core viruses for rational polyvalent vaccine development. Bioinformatics, 28(12), 1624–1632.

Das, J., Mohammed, J., and Yu, H. (2012). Genome-scale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics, 28(14), 1873–1878.Faust, K. and Raes, J. (2012). Microbial interactions: from networks to models. Nature Rev Microbiol, 10, 538–550.

Gorfine, M., Heller, R., and Heller, Y. (2012). Comment on ”Detecting Novel Associations in Large Data Sets”. Preprint, available at the website http://iew3.technion.ac.il/~gorfinm/files/science6.pdf.

Karpinets, T., Park, B., and Uberbacher, E. (2012). Analyzing large biological datasets with association networks. Nucleic Acids Research, First published online May 25, 2012, gks403v1–gks403.

Miller, S. (2012). Putting Information Relationships to the Test. Blog post, available at the website http://www.information-management.com/blogs/MIC-MINE-predictive-big-data-Harvard-R-10022590-1.html.

Reshef, D., Reshef, Y., Finucane, H., Grossman, S., McVean, G., Turnbaugh, P., Lander, E., Mitzenmacher, M., and Sabeti, P. (2011). Detecting novel associations in large datasets. Science, 6062(334), 1518–1524.

Simon, N. and Tibshirani, R. (2012). Comment on ”Detecting novel associations in large data sets” by Reshef et al, Science Dec 16, 2011. Preprint, available at the website http://www-stat.stanford.edu/~tibs/reshef/comment.pdf.

Speed, T. (2011). A Correlation for the 21st Century. Science, 6062(334), 1502–1503.

Weiss, J., Karma, A., W.R., M., Deng, M., Rau, C., Rees, C., Wang, J., Wisniewski, N., Eskin, E., Horvath, S., Qu, Z., Wang, Y., and Lusis, A. (2012). ”Good enough solutions” and the genetics of complex diseases. Circ Res, 111(4), 493–504.