TitleScalable Information Gain variant on Spark Cluster for Rapid Quantification of Microarray
AbstractMicroarray technology is one of the emerging technologies in the field of genetic research, which many researchers often use to monitor expression levels of genes in a given organism. Microarray experiments have wide range of applications in health care sector. The colossal amount of raw gene expression data often leads to computational and analytical challenges including feature selection and classification of the dataset into correct group or class. In this paper, mutual information feature selection method based on spark framework (sf-MIFS) is proposed to determine the pertinent features. After completion of feature selection process, various classifiers i.e., Logistic Regression (sf-LoR) and Naive Bayes (sf-NB) based on Spark framework has been applied to classify the microarray datasets. A detailed comparative analysis in terms of execution time and accuracy is enumerated on the proposed feature selection and classifier methodologies, based on Spark framework and conventional system respectively.