LGVH - Support Vector Machines

Computational Forecasting for Acute Respiratory Infectious Disease Dynamics

The study of infectious disease behavior has been a scientific concern for many years 2 as early identification of outbreaks provides great advantages including timely implementation 3 of public health measures to limit the spread of an epidemic. We propose a methodology that 4 merges the predictions of i) a computational model with machine learning, ii) a projection model, 5 and iii) a proposed smoothed endemic channel calculation. The predictions are made on weekly 6 acute respiratory infection (ARI) data obtained from epidemiological reports in Mexico, along with 7 the usage of key terms in the Google search engine. The results obtained with this methodology 8 were compared with state-of-the-art techniques resulting in reduced RMPSE and MAPE metrics, 9 achieving a MAPE of 21.7%. This methodology could be extended to detect and raise alerts on 10 possible outbreaks on ARI as well as for other seasonal infectious diseases.

- Download Raw Acute Respiratory Infection (ARI) dataset as pdf here

- Download Raw Acute Respiratory Infection (ARI) dataset as MS Excel here

Support Vector Machine Artificial Intelligence Algorithm in KIR Applications

Killer-cell Immunoglobulin-like Receptors (KIR) determine disease causation by modifying Natural killer (NK) cell responses to viral incursions and malignant cells. The analysis of the way that KIR proteins do this requires several levels of complexity to be considered. KIR genes are encoded in a 150 kb long region within the 1 Mb Leukocyte Receptor Complex in chromosome 19. All of the 17 KIR genes are polymorphic, and more than 600 alleles have been shown to existe to date (http://www.ebi.ac.uk/ipd/kir/). KIR interact with yet another diverse set of immune proteins known as HLA which are also polymorphic. The result of these KIR:HLA interactions determines the way in which the immune system responds to viruses and malignant cells. Despite of the accumulating evidence for KIR involvement in diseases, the complexity of the KIR system of genes and proteins and of their biological interactions has limited the adoption of clinical algorithms with medical decision support systems (DSS) in mind. In this page we describe the algorithm and supporting data sets that have enabled us to demonstrate the potential that support vector machines (SVM) have at classifying individuals into different disease risk groups based solely on KIR gene content.

This program is free software, you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details (www.gnu.org/licenses/gpl.htm)

Please e-mail any comments and feedback to: cuevas@uaslp.mx or cuevastello@gmail.com

Required toolboxes

All simulations and experiments were carried out with MATLAB® scripts. We use two external toolboxes, which are freely available:

a) MATLAB® Support Vector Machine Toolbox Gunn, S. (1998). Support vector machines for classiﬁcation and regression. Technical report, University of Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo.

Note: Once the toolbox has been downloaded, please replace the corresponding toolbox files with those supplied within this ZIP file: svm_upgrade.zip.

b) Genetic Algorithm Toolbox for MATLAB, Chipperﬁeld, A. J., Fleming, P. J., Pohlheim, H., and Fonseca, C. M. (1996). Genetic AlgorithmToolbox for use with MATLAB. Automatic Control and Systems Engineering, University of Shefﬁeld, 1.2 edition. http://www.shef.ac.uk/acse/research/ecrg/gat.html

Supporting scripts

Links to our MATLAB scripts can be found below. Note: most browsers will not display these scripts, please "save link" to your desktop.

- SGA_svm_genes_real_data.m
Used to estimate the SVM parameters: C and sigma. This scripts requires the two toolboxes mentioned above.

- objfun_svm_genes.m
This script represents the objective function of our genetic algorithm. This functions is used in SGA_svm_genes_real_data.m

- svm_classify_artificial_data.m
This script performs the classification on artificial data. Only training data is used here.

- svm_classify_artifical_data_test_mut.m
This script is also for classification on artificial data, but test data is generated to measure the performance of the SVM on unknown data, which is different to training data.

- svm_classify_real_data_change_test_data.m
With this script we carry out the clasification task on real data. The size of the test data used to measure the performance can also be modified.

- Real data set

- Artificial data set

- Download all of these files as Zip archive here

Copyright (C) 2010 Juan C. Cuevas-Tello, Christian A. Garcia-Sepulveda