In a joint effort together with Institute for Systems Biology and Tampere University of Technology, associations uncovered within The Cancer Genome Atlas using RF-ACE can be viewed at Regulome Explorer, an interactive web application developed for exploring associations across molecular features spanning the human genome. With help of Techila and Golem, CPU intensive but embarassingly parallel computation was distributed across a collection of 1000 CPUs, cutting down computation from years to days.
RF-ACE is an efficient implementation of a robust machine learning algorithm for uncovering multivariate associations, either with classification or regression tree ensembles, from large and diverse data sets. RF-ACE natively handles numerical and categorical data with missing values, and potentially large quantities of noninformative features are handled gracefully utilizing artificial contrast features, bootstrapping, and p-value estimation. RF-ACE, which essentially builds upon the famous Random Forest (RF) and Gradient Boosting Trees (GBT) algorithms, is strongly related to the beefed-up version proposed in by Eugene Tuv, George Runger, Alexander Borisov of Intel and Kari Torkkola of Amazon in the Journal of Machine Learning Research.