The study of infectious disease behavior has been a scientific concern for many years
2 as early identification of outbreaks provides great advantages including timely implementation
3 of public health measures to limit the spread of an epidemic. We propose a methodology that
4 merges the predictions of i) a computational model with machine learning, ii) a projection model,
5 and iii) a proposed smoothed endemic channel calculation. The predictions are made on weekly
6 acute respiratory infection (ARI) data obtained from epidemiological reports in Mexico, along with
7 the usage of key terms in the Google search engine. The results obtained with this methodology
8 were compared with state-of-the-art techniques resulting in reduced RMPSE and MAPE metrics,
9 achieving a MAPE of 21.7%. This methodology could be extended to detect and raise alerts on
10 possible outbreaks on ARI as well as for other seasonal infectious diseases.
- Download Raw Acute Respiratory Infection (ARI) dataset as pdf here
- Download Raw Acute Respiratory Infection (ARI) dataset as MS Excel here
Killer-cell Immunoglobulin-like Receptors (KIR) determine disease causation
by modifying Natural killer (NK) cell responses to viral incursions and
malignant cells. The analysis of the way that KIR proteins do this requires
several levels of complexity to be considered. KIR genes are encoded in
a 150 kb long region within the 1 Mb Leukocyte Receptor Complex in chromosome
19. All of the 17 KIR genes are polymorphic, and more than 600 alleles have
been shown to existe to date (http://www.ebi.ac.uk/ipd/kir/). KIR interact
with yet another diverse set of immune proteins known as HLA which are also
polymorphic. The result of these KIR:HLA interactions determines the way
in which the immune system responds to viruses and malignant cells. Despite
of the accumulating evidence for KIR involvement in diseases, the complexity
of the KIR system of genes and proteins and of their biological interactions
has limited the adoption of clinical algorithms with medical decision support
systems (DSS) in mind. In this page we describe the algorithm and supporting
data sets that have enabled us to demonstrate the potential that support
vector machines (SVM) have at classifying individuals into different disease
risk groups based solely on KIR gene content.
This program is free software, you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation. This program is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details (www.gnu.org/licenses/gpl.htm)
Please e-mail any comments and feedback to: cuevas@uaslp.mx or cuevastello@gmail.com
All simulations and experiments were carried out with MATLAB® scripts.
We use two external toolboxes, which are freely available:
a) MATLAB® Support Vector Machine Toolbox Gunn, S. (1998). Support vector
machines for classification and regression. Technical report, University
of Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo.
Note: Once the toolbox has been downloaded, please replace the corresponding
toolbox files with those supplied within this ZIP file: svm_upgrade.zip.
b) Genetic Algorithm Toolbox for MATLAB, Chipperfield, A. J., Fleming, P.
J., Pohlheim, H., and Fonseca, C. M. (1996). Genetic AlgorithmToolbox for
use with MATLAB. Automatic Control and Systems Engineering, University of
Sheffield, 1.2 edition. http://www.shef.ac.uk/acse/research/ecrg/gat.html
Links to our MATLAB scripts can be found below. Note: most browsers will
not display these scripts, please "save link" to your desktop.
- SGA_svm_genes_real_data.m
Used to estimate the SVM parameters: C and sigma. This scripts requires
the two toolboxes mentioned above.
- objfun_svm_genes.m
This script represents the objective function of our genetic algorithm.
This functions is used in SGA_svm_genes_real_data.m
- svm_classify_artificial_data.m
This script performs the classification on artificial data. Only training
data is used here.
- svm_classify_artifical_data_test_mut.m
This script is also for classification on artificial data, but test data
is generated to measure the performance of the SVM on unknown data, which
is different to training data.
- svm_classify_real_data_change_test_data.m
With this script we carry out the clasification task on real data. The size
of the test data used to measure the performance can also be modified.
- Real data set
- Artificial data set
- Download all of these files as Zip archive here
Copyright (C) 2010 Juan C. Cuevas-Tello, Christian A. Garcia-Sepulveda