GENE SELECTION USING MULTIPLE QUEEN COLONIES IN LARGE SCALE MACHINE LEARNING
A. Sampath Kumar P. Vivekanandan
In the field of bioinformatics research, there has been a tremendous increase in the volume of data. This is due to the fact that all the processes are digitized and there is an availability of high throughput devices at a lower cost owing to which data volume is rising everywhere. As an example, the approximate size of a single sequenced human genome is 200 gigabytes. With the growth of big data technologies, this trend in the increasing volumes of data is bolstered by reduced computing expenses and enhanced analytics throughput. Technologies such as automated genome sequencers that capture big data are becoming lesser expensive with increased efficacy giving rise to this new era of big data in the field of bioinformatics. There has been a supply of large volume of data in many fields due to the development of microarray technology. This has been especially useful in predicting as well as in the diagnosis of cancer. Since the extracted genes from microarray are rife with noise, the task is selecting genes that are related to cancer, so that the disease can be classified precisely. For the efficient feature selection in the Hadoop framework, a new feature selection algorithm has been suggested- Correlation based Feature Selection (CFS), Genetic Algorithm (GA) and Honey Bee Mating Optimization (HBMO) algorithm. These techniques help in decreasing the problem dimension and noise and improvising the algorithm speed by the removal of irrelevant or superfluous features. It has be
This article is written in Adobe PDF format ( .pdf file ).To view this article you need to download the file. Please rightclick on the link below and then select "Save
target as" to download the file to your harddrive.