New Chromosome Combinations Section Review 54 Addison Wesley Longman
BMC Bioinformatics. 2013; 14: 170.
Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of characteristic pick and machine learning methods
Siow-Wee Chang
1Bioinformatics and Computational Biology, Found of Biology, Faculty of Scientific discipline, University of Malaya, Kuala Lumpur, Malaysia
2Department of Artificial Intelligence, Faculty of Informatics and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Sameem Abdul-Kareem
iiDepartment of Artificial Intelligence, Kinesthesia of Informatics and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Amir Feisal Merican
iBioinformatics and Computational Biology, Found of Biological Science, Faculty of Scientific discipline, University of Malaya, Kuala Lumpur, Malaysia
Rosnah Binti Zain
3Department of Oral Pathology and Oral Medicine and Periodontology, Oral Cancer Enquiry and Coordinating Center (OCRCC), Kinesthesia of Dentistry, Academy of Malaya, Kuala Lumpur, Malaysia
Received 2012 Nov 7; Accepted 2013 May 21.
Abstract
Groundwork
Machine learning techniques are becoming useful as an alternative arroyo to conventional medical diagnosis or prognosis equally they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians brand prognostic decisions based on clinicopathologic markers. However, it is not easy for the about skilful clinician to come out with an authentic prognosis past using these markers lonely. Thus, at that place is a need to use genomic markers to improve the accurateness of prognosis. The primary aim of this enquiry is to apply a hybrid of feature pick and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers.
Results
In the first phase of this enquiry, five characteristic choice methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each characteristic pick methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, bogus neural network, back up vector machine and logistic regression. A 1000-fold cross-validation is implemented on all types of classifiers due to the pocket-size sample size. The hybrid model of ReliefF-GA-ANFIS with iii-input features of drinkable, invasion and p63 accomplished the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis.
Conclusions
The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features tin can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies.
Keywords: Oral cancer prognosis, Clinicopathologic, Genomic, Feature selection, Auto learning
Groundwork
Various machine learning methods accept been applied in the diagnosis or prognosis of cancer enquiry, such as, artificial neural networks, fuzzy logic, genetic algorithm, support vector auto and other hybrid techniques [1,2]. From the medical perspective, diagnosis is to place a disease past its signs and symptoms while prognosis is to predict the outcome of the disease and the condition of the patient, whether the patient will survive or recover from the illness or vice versa. In some studies, the researchers have proven that motorcar learning methods could generate more accurate diagnosis or prognosis as compared to traditional statistical methods [2].
Unremarkably, clinicopathologic data or genomic data are used in researches either involving diagnosis or that with respect to prognosis. Currently, there are some researches that take shown that prognosis results are more accurate when using both clinicopathologic and genomic data. Examples of these are the works in [3] in diffuse large B-prison cell lymphoma (DLBCL) cancer, the works in [4,5] in chest cancer, [half dozen-x] in oral cancer, and [11] in float cancer. However, the number of published articles on researches that combine both clinicopathologic and genomic data are few as compared to that using just clinicopathologic data [2]. In the oral cancer domain, [6] used automobile learning techniques in the oral cancer susceptibility studies. They proposed a hybrid adaptive organisation inspired from learning classifier system, decision copse and statistical hypothesis testing. The dataset includes both demographic data and 11 types of genes. Their results showed that the proposed algorithm outperformed the other algorithms of Naive Bayes, C4.5, neural network and XCS (Development of Kingdom of the netherlands's Learning Classifier). Yet, they did not validate their results against the traditional statistical methods. [vii] focused on the v-twelvemonth overall survival in a group of oral squamous cell carcinoma (OSCC) patients and investigated the furnishings of demographic data, clinical data and genomic information, and human papillomavirus on the prognostic consequence. They used the statistical methods for the prediction and their results showed that the 5-year overall survival was 28.6% and highlighted the influence of p53 immunoexpression, age and anatomic localization on OCSS prognosis. However, in this inquiry, no machine learning methods were used and compared. Another oral cancer research that was washed by [8,nine] was in the oral cancer reoccurrence. Bayesian network was used and compared with ANN, SVM, conclusion tree, and random forests. They used multitude of heterogeneous data which included clinical, imaging, tissue and blood genomic data. They built a dissever classifier for unlike types of data and combined the best performing classification schemes. They claimed that they had achieved an accurateness of 100% with the combination of all types of information and proved that the prediction accuracy was the all-time when using all types of information. However, more than 70 markers were required for their concluding combined classifier.
For the genomic domain, [12] showed that p63 over expression associates with poor prognosis in oral cancer. Their written report showed that cases with lengthened p63 expression were more ambitious and poorly differentiated and related to a poorer prognosis, these findings supporting the use of p63 as an additional marking for diagnostic use in oral SCC. In [thirteen], immunohistochemical analysis of poly peptide expression for p53, p63 and p73 was performed for xl samples of well-differentiated human buccal squamous-cell carcinomas, with 10 specimens of normal buccal mucosa employed as controls. Their results indicated that both p73 and p63 may be involved in the evolution of homo buccal squamous-cell carcinoma, mayhap in concert with p53. Like results were obtained by [fourteen], they have showed that in head and cervix squamous carcinomas (HNSC), p63 was the most frequently expressed (94.7%), followed by p73 (68.4%) and p53 (52.6%). Their study indicated that p63 and p73 expression may represent an early event in HNSC tumorigenesis and p73 and p63 may role as oncogenes in the development of these tumors.
In this research, an oral cancer prognostic model is developed. This research used real-world oral cancer dataset which is nerveless locally at the Oral Cancer Inquiry and Analogous Centre (OCRCC), Kinesthesia of Dentistry, University of Malaya, Malaysia. The model takes both clinicopathologic and genomic data in society to investigate the touch of each marker or combination of markers to the accuracy of the prognosis of oral cancer. Five characteristic selection methods are proposed with the objectives to reduce the number of input variables to avoid over-fitting and to find out an optimum feature subset for oral cancer prognosis. This is followed by the classification procedures which are used to classify the status of the patient after ane–3 years of diagnosis (alive or dead). Four nomenclature methods, from both auto learning and statistical methods, are tested and compared. The objective of this enquiry is to prove that the prognosis is better by using both types of clinicopathologic and genomic markers, and to identify the key markers for oral cancer prognosis using the hybrid of feature selection and auto learning methods.
Methods
The framework for the oral cancer prognostic model is shown in Effigy1. Clinicopathologic variables from the OCRCC database and genomic variables from Immunohistochemistry (IHC) staining are fed into the model. Basically, in that location are iii main parts for the oral cancer prognostic model, which are wet-laboratory testing for genomic variables, feature selection methods and the nomenclature models. This research was approved by Medical Ethics Committee, Kinesthesia of Dentistry, University of Malaya.
Clinicopathologic data
A total of 31 oral cancer cases were selected from the Malaysian Oral Cancer Database and Tissue Bank System (MOCDTBS) coordinated past the Oral Cancer Enquiry and Coordinating Centre (OCRCC), Faculty of Dentistry, University of Malaya. The choice was based on the completeness of the clinicopathologic data, the availability of tissues and the availability of data (some data were not available for utilize due to medical confidentiality problems).
The selected cases were based on the oral cancer cases seen in the Faculty of Dentistry, University of Malaya and Infirmary Tunku Ampuan Rahimah, Klang, a Malaysian government hospital, from the year 2003 to 2007. These cases were diagnosed and followed up and the data were recorded in the standardised forms prepared past the MOCDTBS. Subsequently, MOCDTBS transcribed all the data from paper to an electronic version and stored in the database. All the cases selected were diagnosed as squamous cell carcinomas (SCC). Table1 shows the 1 to 3-year survival for these 31 cases.
Table 1
Elapsing of follow-up | Survival | No | % |
---|---|---|---|
1-year | Survive | 27 | 87.1 |
Dead | iv | 12.9 | |
Lost of follow-upwardly | 0 | 0.0 | |
2-year | Survive | xix | 61.three |
Dead | 10 | 32.3 | |
Lost of follow-upwardly | ii | six.5 | |
3-year | Survive | 17 | 54.8 |
Dead | 11 | 38.7 | |
Lost of follow-up | 3 | 9.7 |
Basically, iii types of data are available for each oral cancer case, namely, social demographic information (risk factors, ethnicity, age, occupation, marital status and others), clinical data (type of lesion, size of lesion, main site, clinical neck node and etc.), and pathological data (pathological TNM, neck node metastasis, bone invasion, neoplasm thickness and etc.). Pathological data were obtained from the biopsy reports before and after surgical procedures. In this research, we referred to the clinical and pathological data as clinicopathologic data. Based on the discussions with 2 oral cancer clinicians, Prof. Rosnah Binti Zain and Dr Thomas George Kallarakkal, 15 key variables had been identified equally important prognostic factors of oral cancer. The selected clinicopathologic variables are listed in Tabletwo(a).
Table ii
(a) Clinicopathologic variables | ||
---|---|---|
Name | Clarification | Name & parameters of membership function |
Age | Age at diagnosis | 1 - 40–fifty, 2 - >50-lx, iii - >60-70, 4 - >70 |
Eth | Ethnicity | 1 - Malay, two - Chinese, three - Indian |
Gen | Gender | 1 - Male, 2 - Female |
Smoke | Smoking habit | 1 - Yes, ii - No |
Beverage | Alcohol drinking addiction | one - Yes, 2 - No |
Chew | Quid chewing habit | i - Yes, 2 - No |
Site | Main site of tumor | 1 - Buccal mucosa, ii - tongue |
3 - floor, 4 - others | ||
Subtype | Subtype and differentiation for SCC | one - Well differentiated |
2 - moderate differentiated | ||
three - poorly differentiated | ||
Inv | Depth of Invasion front | 1 - Not-cohesive, 2 - cohesive |
Node | Cervix nodes | 1 - Negative, 2 - positive |
PT | Pathological tumor staging | 1 - T1, 2 - T2, 3 - T3, 4 - T4 |
PN | Pathological lymph nodes | 1 - N0, ii- N1, 3- N2A, 4- N2B |
Stage | Overall stage | i - I, two - II, 3 - Three, 4 - Iv |
Size | Size of tumor | 1 - 0-2 cm, 2 - >2-iv cm, 3 - >iv-6 cm, 4 - >6 cm |
Care for | Blazon of treatment | one - Surgery only |
2 - Surgery + Radiotherapy | ||
3 - Surgery + Chemotherapy | ||
(b) Genomic variables | ||
Proper noun | Description | Name and parameters of membership office |
p53 | Tumor suppressor gene | ane - negative, 2 - positive |
p63 | Tumor suppressor factor | 1 - negative, 2 - positive |
Genomic information
Two genomic variables had been identified through the literature studies and discussions with oral pathologists, from the Department of Oral Pathology and Oral Medicine and Periodontology, Faculty of Dentistry, Academy of Malaya. Both of these variables are neoplasm suppressor genes, namely, p53 and p63. p53 is the most oft associated mark in the head and neck cancers [vii,fifteen]. p53 is called the "Guardian of the genome", it is important in maintaining genomic stability, progression of cell cycle, cellular differentiation, DNA repair and apoptosis. It is difficult to demonstrate p53 protein in normal tissues using immunohistochemistry procedures due to its high catabolic charge per unit; however mutated p53 exhibits a much lower catabolic rate and accumulates in the cells [15]. In addition, p63 gene, a homolog of the p53 is located in chromosome 3q21-29, and its amplification has been associated with prognostic consequence in oral cancer [11,16]. The p63 cistron is highly expressed in the basal or progenitor layers of many epithelial tissues.
The cases selected were the same as in the clinicopathologic data. Immunohistochemistry (IHC) staining was performed on the selected formalin-fixed paraffin embedded oral cancer tissues to obtain the results for the selected genomic variables. The archival formalin-fixed alkane series embedded tissues were obtained from the Oral Pathology Diagnostic Laboratory, Faculty of Dentistry, University of Malaya. The tissues containing the tumour were cored and re-embedded and made into Tissue Macroarray blocks (TMaA). A total of 4-μm-thick sections of the resulting TMaA blocks were cut and placed on the poly-L-lysine-coated glass slides for IHC staining. The samples were mounted on the drinking glass slides and ready for IHC staining. In this research, the Dako Real™ EnVision™ Detection Kit was used. In total, fifteen TMaA slides with 31 oral cancer cases were stained. Ii types of antibodies were used namely Monoclonal Mouse Anti-Human p53 protein, clone 318-6-11 for p53 and Monoclonal Mouse Anti-Human being p63 poly peptide, clone 4A4 for p63.
The results of the staining were analyzed and the images were captured by using an prototype analyzer system which included Nikon Eclipse E400 Microscope with CFI program Fluor 40X objective for measurements, QImaging Development digital colour cooled camera with 5.0 megapixels, a personal computer (Pentium 4, 2.5Ghz, 2GB RAM) and MediaCybernatics Prototype Pro Plus version 6.3 prototype assay software. Each slide was start examined nether the microscope with lower objective, that is, the 4X objective. Cases were considered sufficient for evaluation if there were tumour cells presented in the sections. Next, the slide was divided into xx filigree cells and numbered accordingly from left to right. A simple randomization programme was used to generate random numbers. For each instance, v tumour representative areas were selected. If the number falls on the non-tumour representative area, the side by side number (cell) was chosen until all five areas were selected. Side by side, the five selected areas were examined under the microscope using a higher objective, that is, the 40X objective and the images were captured. The percentage of the positive nuclear cells for each area was counted and the average for v areas was calculated. The staining result is considered positive if there is more than 10% positive nuclear stained, in accord with the exercise used in the previous studies [7,17]. Figure2 shows the flowchart for the IHC results assay and scoring process. The results obtained from the IHC staining are combined with the clinicopathologic variables and served equally the inputs for the feature selection module. The combined dataset is further divided into two groups, which are Group 1 with clinicopathologic variables (fifteen variables) just and Grouping ii with both of the clinicopathologic and genomic variables (17 variables). We demand to emphasize that the genomic variables were obtained from the same corresponding cases from which the clinicopathologic variables (Group ane) were obtained. Thus, if the clinicopathologic variables were that of Instance 1, and so, the genomic variables were from the aforementioned case.
Feature selection
In this research, the purpose of feature pick is to notice an optimal number of features for the minor sample of oral cancer prognosis information. Five feature pick methods accept been selected and compared, which are, Pearson'south correlation coefficient (CC) [18], and Relief-F [19] as the filter approach, genetic algorithm (GA) [20,21] every bit the wrapper approach, CC-GA and ReliefF-GA as the hybrid approach.
Genetic algorithm (GA)
In the characteristic subset selection problem, a solution is a specific characteristic subset that can be encoded as a string of binary digits (bits). Each feature is represented by binary digits of 0 or one. For example, in the oral cancer prognosis dataset, if the solution is a 011001000010000 string of 15 binary digits, it indicates that features 2, 3, six, and xi are selected as the characteristic subset [21]. The initial population is generated randomly to select a subset of variables. In this proposed GA feature selection method, if the features are all different, the subset is included in the initial population. If not, information technology is regenerated until an initial population with the desired size for the characteristic subset (n-input model) is created.
The fitness function used in this method is a classifier to discriminate between two groups, which are alive and dead subsequently iii-year of diagnosis. The mean foursquare mistake rate of the classification is calculated using a 5-fold cross-validation. The fitness function is the final hateful square error rate obtained. The subset of variables with the lowest fault rate is selected. Figure3 shows the flowchart and the criteria used for the GA feature selection arroyo.
Pearson'due south correlation coefficient (CC)
Pearson'southward correlation coefficient, r, is use to come across if the values of two variables are associated. In this enquiry, r is calculated and ranked for each of the feature input and the one with the highest r is selected. For example, for the 3-input model, the superlative three inputs with the highest r value are selected. This is repeated for the n-input models for both Group one and Group 2.
Relief-F
Relief-F is the extension to the original Relief algorithm, which is able to deal with noisy and incomplete datasets also as multi-class problems. The cardinal idea of Relief is to estimate attributes according to how well their values distinguish among instances that are near to each other [18]. In this inquiry, each feature input is ranked and weighted using the k-nearest neighbours nomenclature, in which k = 1. The top features with large positive weights are selected for both groups of dataset.
Pearson'due south correlation coefficient and genetic algorithm (CC-GA)
This is the hybrid feature pick approach which consists of two stages: outset, it is a filter approach which calculates the correlation coefficient, r, and second, information technology is a wrapper approach of GA. In the showtime stage, x features with the highest r are selected and fed into the 2nd stage of the GA approach. The procedures of the GA are the same as that described in the previous section.
Relief-F and genetic algorithm (ReliefF-GA)
This hybrid feature choice approach consists of ii stages: get-go, it is the filter approach of Relief-F, and second, information technology is a wrapper approach of GA. In the get-go stage, x features with the highest weights are selected and fed into the second phase of the GA arroyo. In the second stage, n-input is selected for both Group 1 and Group 2.
Option of north-input models
Before the implementation of the feature selection method, a simple GA was run to find out the optimal number of inputs (northward-input model) from the 17 inputs of clinicopathologic and genomic information. The number of inputs with lower hateful foursquare fault rate was called. The error charge per unit for each n-input model is shown in Tabular array3, which shows that for Group ane, there are four models with the everyman error rate of 0.3871, which are the three-input, iv-input, 5-input, and 6-input model. Meanwhile, for Group two, the model with the lowest error rate is the 3-input model with an error rate of 0.2581. In this case, for comparison purposes, the number of inputs betwixt 3-input to 7-input are chosen. Hence n is set every bit n = 3, iv, 5, 6, 7 for the feature selection methods.
Table 3
Group 1 | Group 2 | |
---|---|---|
one-input | 0.3881 | 0.3626 |
ii-input | 0.4193 | 0.2903 |
three-input | 0.3871 | 0.2581 |
4-input | 0.3871 | 0.2903 |
5-input | 0.3871 | 0.3226 |
6-input | 0.3871 | 0.3548 |
7-input | 0.4571 | 0.3548 |
eight-input | 0.4839 | 0.4194 |
ix-input | 0.5161 | 0.4516 |
Classification
Next, the data with n selected features are fed into the classification models. The final output is the classification accuracy for oral cancer prognosis, which classifies the patients as alive or dead after subsequent years of diagnosis with the optimum characteristic of subset. Four nomenclature methods were experimented with and their results were later on compared, these are ANFIS, artificial neural network (ANN), back up vector motorcar and logistic regression.
In lodge to obtain accurate estimate results, cross-validation (CV) was used. CV provides an unbiased estimation, even so CV presents loftier variance with small samples in some studies [22]. In this inquiry, a v-fold cross-validation was implemented with each of the classifiers. 5-fold cross-validation was chosen over the commonly utilize 10-fold cross-validation due to the small sample size; hence, it will get out more instances for validation and has lower variance [23]. In 5-fold cross-validation, the 31 samples of oral cancer prognosis data were divided into 5 subsets of equal size and trained for five times, each time leaving out a sample as validation data.
Adaptive neuro-fuzzy inference organization (ANFIS)
ANFIS implements the Takagi-Sugeno fuzzy inference system. The details for ANFIS tin be plant in [24,25] respectively.
In the input layer, the number of input is divers by north, with n = 3, 4, 5, 6, 7. In the input membership (inputmf) layer, the number of membership function is defined past m i , with i = 2, 3, four. The rules generated are based on the number of input and the number of input membership functions, and it is represented as (k 2 n1 ten grand 3 n2 x m four n3 ) rules, in which n one , n ii , and n 3 represent the number of input with m i membership functions respectively, and n 1 + n 2 + n iii = n. For example, in the ANFIS with 3-input, 10, y, and z, in which input x has 2 membership functions, input y has two membership functions, and input z has 4 membership functions, hence the number of rules generated is (2 two × three 0 × four 1 ) = sixteen rules, equally shown in Figurefour.
The rules generated are the output membership functions which will exist computed as the summation of the contribution from each rule towards the overall output. The output is the survival condition, either alive or dead after 3-twelvemonth of diagnosis. The output is fix as one for dead and −1 for alive; the pseudo-code is every bit below:
if output ≥ 0
then set output = 1, allocate every bit dead
else output < 0,
then prepare output = −i, classify as live
The membership functions were obtained according to the chiselled variables that has been set through the discussions with ii oral cancer clinicians as mentioned in section Clinicopathologic Information. The type of membership function used was the Gaussian and the name and parameters of membership functions for each input variable are shown in Table2(a) for the clinicopathologic variables and ii(b) for the genomic variables. Each ANFIS was run for five epochs for the optimum result.
Artificial neural network (ANN)
The ANN employed in this research is the multi-layered feed forward (FF) neural network, which is the most mutual type of ANN [26]. The FF neural network was trained using the Levenberg-Marquardt algorithm. In this research, one hidden layer with five neurons (achieved the all-time results) was used and FF neural network was run for 5 epochs (achieved the best results). The training stopped when at that place was no comeback on the hateful squared error for the validation fix.
Support vector automobile (SVM)
For the purpose of this inquiry, a widely used SVM tool which is LIBSVM [27] was used. There are two steps involved in the LIBSVM; (i) the dataset was trained to obtain a model and (two) the model was used to predict the information for the testing dataset. The details for LIBSVM can exist found in [27,28] respectively. Linear kernel was used in this research.
Logistic regression (LR)
Logistic regression (LR) was selected equally the benchmark test for the statistical method. LR is the well-nigh commonly used statistical method for the prediction of diagnosis and prognosis in medical research. LR is the prediction of a relationship between the response variable y and the input variables x i [29]. In this research, multiple logistic regression is used.
Experiment
The oral cancer dataset with 3-year prognosis is used in this experiment. Get-go, the oral cancer prognosis dataset was divided into two groups; Grouping 1 consisted of clinicopathologic variables only (15 variables) and Group two consisted of clinicopathologic and genomic variables (17 variables). Next, characteristic selection methods were implemented on both groups to select the primal features for the n-input model. Lastly, the classifiers with v-fold cross-validation were tested on the n-input model. The results obtained from the 5-fold cross-validation were averaged in order to produce the overall performance of the algorithm. The measures used to compare the functioning of the proposed methods were sensitivity, specificity, accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC).
Results
Grouping 1 (clinicopathologic variables)
Table4 shows the features selected for the proposed feature option methods for Group 1. Side by side, the n-input models generated from each feature selection methods were tested with the proposed classification methods. Table5 shows the nomenclature results for ANFIS, ANN, SVM and LR.
Tabular array 4
Method | Characteristic subset selected |
---|---|
GA | |
3-input | Gen,Smo,PN |
4-input | Dri,Inv,PN,Size |
5-input | Dri,Node,PT,PN,Size |
six-input | Age,Gen,Smo,Inv,PT,Size |
7-input | Age,Eth,Chew,Inv,Node,PN,Size |
CC | |
three-input | Age,Inv,PN |
iv-input | Historic period,Gen,Inv,PN |
five-input | Age,Gen,Inv,PN,Size |
6-input | Historic period,Gen,Inv,PN,Sta,Size |
7-input | Age,Gen,Dri,Inv,PN,Sta,Size |
ReliefF | |
3-input | Eth,Dri,Sta |
4-input | Historic period,Eth,Dri,Sta |
5-input | Age,Eth,Dri,Sta,Tre |
vi-input | Age,Eth,Gen,Dri,Sta,Tre |
seven-input | Age,Eth,Gen,Dri,PT,Sta,Tre |
CC-GA | |
3-input | PT,PN,Sta |
4-input | Dri,Inv,PN,Size |
5-input | Age,Gen,Inv,PN,Size |
six-input | Gen,Dri,Node,PT,PN,Sta |
7-input | Gen,Dri,Chew,Inv,Node,PN,Size |
ReliefF-GA | |
iii-input | Gen,Inv,Node |
4-input | Gen,Dri,Inv,Node |
5-input | Gen,Dri,Inv,Node,PT |
6-input | Eth,Gen,Dri,Inv,Node,PT |
vii-input | Age,Eth,Gen,Smo,Dri,Node,Tre |
Table 5
Feature selection | 3-input | 4-input | 5-input | 6-input | vii-input | |||||
---|---|---|---|---|---|---|---|---|---|---|
% | AUC | % | AUC | % | AUC | % | AUC | % | AUC | |
ANFIS | ||||||||||
GA | seventy.95 | 0.66 | 67.42 | 0.61 | 64.76 | 0.63 | 58.57 | 0.55 | 57.62 | 0.54 |
CC | 58.ten | 0.53 | 74.76 | 0.70 | 51.43 | 0.43 | 57.62 | 0.50 | 64.29 | 0.58 |
ReliefF | 61.43 | 0.53 | 50.59 | 0.50 | 58.10 | 0.50 | 64.29 | 0.54 | 64.29 | 0.54 |
CC-GA | 44.76 | 0.44 | 67.62 | 0.57 | 63.81 | 0.55 | 64.29 | 0.54 | 57.62 | 0.52 |
ReliefF-GA | 67.14 | 0.55 | sixty.48 | 0.59 | 67.62 | 0.59 | 51.90 | 0.47 | 64.76 | 0.57 |
ANN | ||||||||||
GA | 45.52 | 0.53 | 52.43 | 0.53 | 45.05 | 0.47 | 48.38 | 0.52 | 45.33 | 0.50 |
CC | 54.48 | 0.61 | 53.57 | 0.59 | 51.29 | 0.58 | 51.29 | 0.51 | 52.33 | 0.53 |
ReliefF | 51.52 | 0.48 | 41.62 | 0.47 | 46.05 | 0.49 | 46.05 | 0.48 | 44.10 | 0.48 |
CC-GA | 49.24 | 0.51 | 49.48 | 0.52 | 46.67 | 0.49 | 48.29 | 0.49 | 50.48 | 0.51 |
ReliefF-GA | l.24 | 0.55 | 52.86 | 0.59 | 56.76 | 0.58 | 47.00 | 0.51 | 50.05 | 0.54 |
SVM | ||||||||||
GA | 60.95 | 0.53 | 61.43 | 0.51 | 58.x | 0.48 | 58.10 | 0.46 | 61.43 | 0.49 |
CC | 60.95 | 0.53 | 60.95 | 0.53 | 58.ten | 0.46 | 51.43 | 0.41 | 51.43 | 0.41 |
ReliefF | 54.29 | 0.44 | 50.95 | 0.42 | 51.43 | 0.42 | 48.10 | 0.twoscore | 50.95 | 0.45 |
CC-GA | 63.81 | 0.55 | 61.43 | 0.51 | 58.10 | 0.46 | 58.x | 0.48 | 58.ten | 0.49 |
ReliefF-GA | 64.29 | 0.50 | 64.29 | 0.50 | 64.29 | 0.50 | 64.29 | 0.50 | 54.76 | 0.46 |
LR | ||||||||||
GA | 64.29 | 0.56 | 67.62 | 0.lx | 64.76 | 0.55 | 68.ten | 0.64 | 64.29 | 0.sixty |
CC | 64.29 | 0.56 | 60.48 | 0.57 | 67.62 | 0.61 | 67.62 | 0.61 | 64.29 | 0.58 |
ReliefF | fifty.59 | 0.44 | fifty.59 | 0.44 | 48.10 | 0.39 | 41.43 | 0.34 | 44.29 | 0.39 |
CC-GA | 67.62 | 0.57 | 67.62 | 0.60 | 61.43 | 0.51 | lxx.95 | 0.72 | 64.76 | 0.67 |
ReliefF-GA | 54.29 | 0.54 | 51.43 | 0.52 | 61.43 | 0.62 | 47.62 | 0.55 | 48.ten | 0.51 |
From Tablefive, it can exist seen that ANFIS with the CC-iv-input model, obtained the all-time accurateness of 74.76% and an AUC of 0.70. For the ANN results, the model with the highest accuracy is the ReliefF-GA-5-input model with an accuracy of 56.76% and an AUC of 0.58. Whereas, for the SVM classifier, the models with the best accurateness are ReliefF-GA-3-input to 6-input models with an accuracy of 64.29% and an AUC of 0.fifty. As for LR nomenclature, the best model is the CC-GA-6-input model with an accuracy of 70.95% and an AUC of 0.72. The results obtained from both ANN and SVM showed low accuracy (56.76% & 64.29% respectively) and depression AUC (0.58 and 0.50 respectively), hence, indicated that these two are not the suitable classifiers to use for Group 1.
Grouping 2 (clinicopathologic and genomic variables)
The same experiments were carried out on Group 2, which is the combination of clinicopathologic and genomic variables. The selected features for each n-input model are listed in Tablehalf-dozen. Table6 shows that almost all the feature option methods included the genomic variable equally one of the key features, except for the ReliefF-3-input and ReliefF-iv-input.
Table half-dozen
Method | Characteristic subset selected |
---|---|
GA | |
3-input | Inv,Node,p63 |
4-input | Gen,Inv,Size,p53 |
v-input | Age,PT,PN,Size,p53 |
6-input | Historic period,PT,PN,Size,Tre,p53 |
7-input | Age,Eth,Smo,PT,PN,Size,p53 |
CC | |
3-input | Inv,PN,p63 |
four-input | Age,Inv,PN,p63 |
5-input | Age,Gen,Inv,PN,p63 |
half dozen-input | Age,Gen,Inv,PN,Size,p63 |
7-input | Historic period,Gen,Inv,PN,Size,p53,p63 |
ReliefF | |
3-input | Historic period,Eth,Dri |
iv-input | Age,Eth,Dri,Tre |
5-input | Historic period,Eth,Dri,Tre,p53 |
half dozen-input | Age,Eth,Dri,Tre,p53,p63 |
7-input | Age,Eth,Gen,Dri,Tre,p53,p63 |
CC-GA | |
3-input | Inv,Node,p63 |
4-input | Gen,Inv,Size,p53 |
5-input | Age,Dri,PN,Size,p53 |
vi-input | Gen,Inv,Node,PN,Size,p53 |
seven-input | Gen,Dri,Inv,Node,PN,Size,p53 |
ReliefF-GA | |
3-input | Dri,Inv,p63 |
4-input | Dri,Inv,Tre,p63 |
5-input | Age,Gen,Smo,Dri,p63 |
6-input | Historic period,Gen,Smo,Dri,Inv,p63 |
7-input | Age,Eth,Inv,Sta,Tre,p53,p63 |
For Group 2 using the ANFIS classification (Table7), there are five models with an accuracy of above 70%, these are namely, GA-3-input, CC-GA-3-input, CC-GA-4-input, ReliefF-GA-3-input and ReliefF-GA-iv-input. The best results were obtained from the ReliefF-GA-3-input and ReliefF-GA-4-input with an accuracy of 93.81% and an AUC of 0.90 and the features selected for the ReliefF-GA-3-input are drink, invasion, and p63 while the features selected for the ReliefF-GA-4-input are drink, invasion, handling and p63 (refer Table6).
Tabular array 7
Feature choice | 3-input | 4-input | 5-input | half-dozen-input | 7-input | |||||
---|---|---|---|---|---|---|---|---|---|---|
% | AUC | % | AUC | % | AUC | % | AUC | % | AUC | |
ANFIS | ||||||||||
GA | 74.76 | 0.74 | 67.62 | 0.70 | 41.xc | 0.40 | 58.57 | 0.58 | 35.71 | 0.36 |
CC | 58.x | 0.48 | 58.ten | 0.52 | 51.90 | 0.48 | 48.57 | 0.46 | 61.90 | 0.59 |
ReliefF | 54.29 | 0.47 | 44.29 | 0.38 | 48.10 | 0.53 | 67.14 | 0.62 | 67.14 | 0.62 |
CC-GA | 74.76 | 0.70 | 70.48 | 0.71 | 54.76 | 0.57 | 61.43 | 0.61 | 64.29 | 0.65 |
ReliefF-GA | 93.81 | 0.90 | 93.81 | 0.90 | 65.71 | 0.63 | 64.76 | 0.62 | 68.10 | 0.67 |
ANN | ||||||||||
GA | 45.14 | 0.50 | 51.48 | 0.55 | 45.81 | 0.49 | 46.xiv | 0.50 | 47.71 | 0.51 |
CC | 46.24 | 0.46 | 49.38 | 0.49 | 46.14 | 0.50 | 57.38 | 0.58 | 55.48 | 0.57 |
ReliefF | 40.62 | 0.48 | 43.24 | 0.49 | 47.71 | 0.50 | 49.48 | 0.51 | 48.76 | 0.50 |
CC-GA | 49.38 | 0.52 | 53.xc | 0.60 | 47.05 | 0.52 | 44.76 | 0.48 | 55.19 | 0.57 |
ReliefF-GA | 84.62 | 0.83 | 73.38 | 0.75 | 48.00 | 0.52 | 51.57 | 0.53 | 45.86 | 0.47 |
SVM | ||||||||||
GA | 74.76 | 0.70 | 54.76 | 0.51 | seventy.95 | 0.65 | 60.95 | 0.55 | 50.95 | 0.42 |
CC | 64.76 | 0.55 | 64.76 | 0.55 | 64.76 | 0.55 | 67.62 | 0.56 | 67.62 | 0.62 |
ReliefF | 54.29 | 0.44 | 54.29 | 0.44 | 44.29 | 0.36 | 48.10 | 0.46 | 34.76 | 0.28 |
CC-GA | 74.76 | 0.70 | 54.76 | 0.51 | 61.43 | 0.fifty | 58.10 | 0.54 | 61.43 | 0.57 |
ReliefF-GA | 74.76 | 0.lxx | 71.43 | 0.68 | 74.76 | 0.70 | 74.43 | 0.66 | 54.76 | 0.53 |
LR | ||||||||||
GA | 74.76 | 0.70 | 63.81 | 0.64 | 67.14 | 0.57 | 54.76 | 0.43 | 54.29 | 0.47 |
CC | 71.43 | 0.67 | 71.43 | 0.67 | 61.43 | 0.59 | 68.10 | 0.65 | 61.43 | 0.59 |
ReliefF | 50.59 | 0.45 | 48.10 | 0.39 | 48.x | 0.41 | 44.76 | 0.43 | 41.43 | 0.41 |
CC-GA | 74.76 | 0.70 | 63.81 | 0.64 | 60.48 | 0.61 | 64.29 | 0.63 | 60.48 | 0.54 |
ReliefF-GA | 74.76 | 0.70 | 74.76 | 0.70 | 71.43 | 0.68 | 58.10 | 0.55 | 61.43 | 0.threescore |
As shown in Table7, the FF neural network together with the ReliefF-GA-iii-input model accomplished the best issue with an accuracy of 84.62% and an AUC of 0.83. For SVM classification, the classification results are by and large improve in Group 2 when compared to Grouping 1 (Tablefive) with some exceptions (GA-three-input, GA-7-input, CC-GA-4-input, ReliefF-5-input and ReliefF-7-input). The best accuracy in Group 2 is obtained past the GA-3-input, CC-GA-iii-input, ReliefF-GA-3-input, and ReliefF-GA-5-input with an accuracy of 74.76% and an AUC of 0.70. Whereas, for LR classification in Group 2, GA-iii-input, CC-GA-3-input, ReliefF-GA-3-input and ReliefF-GA-four-input achieved the best nomenclature accuracy of 74.76% and an AUC of 0.lxx.
Comparing of the results for group ane and grouping 2
This section discusses and compares the results generated from different classification methods for both Group 1 and Grouping two. Tableviii summarizes the all-time accuracy for the due north-input model based on the characteristic selection method for Group i and Group 2. The summary is also depicted in the graph as shown in Effigy5 and Figurehalf dozen respectively.
Table 8
Feature selection method | Grouping one | Group 2 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
n- input model | n -input model | |||||||||
3 | 4 | five | 6 | seven | three | 4 | 5 | half-dozen | 7 | |
GA | 70.95 | 67.62 | 64.76 | 68.ten | 64.29 | 74.76 | 67.62 | 70.95 | lx.95 | 54.29 |
CC | 64.29 | 74.76 | 67.62 | 67.62 | 64.29 | 71.43 | 71.43 | 64.76 | 68.10 | 67.62 |
ReliefF | 61.43 | 50.59 | 58.10 | 64.29 | 64.29 | 54.29 | 54.29 | 48.10 | 67.14 | 67.14 |
CC-GA | 67.62 | 67.62 | 63.81 | seventy.95 | 64.76 | 74.76 | 70.48 | 61.43 | 64.29 | 64.29 |
ReliefF-GA | 67.fourteen | 64.29 | 67.62 | 64.29 | 64.76 | 93.81 | 93.81 | 74.76 | 74.43 | 68.ten |
For Group one (Figure5), the correlation coefficient (CC) feature selection method performed ameliorate than the other methods with the highest accurateness of 74.76% in the 4-input model. There are iii models that achieved accuracies of above 70%; the other two are GA-three-input and CC-GA-half dozen-input. ReliefF feature choice method obtained the worst results when compared to the other methods
As regards to Grouping 2 (Figure6), the ReliefF-GA feature selection method outperformed the others in all the n-input models, with the highest accuracy of 93.81%. There are ten models with accuracies higher up 70% as shown in Tabular array8; this confirms that Grouping 2 which includes genomic variables accomplished higher accuracy with feature selection methods. In addition, well-nigh of the models with college accuracy are the lower input models with 3 or 4-input only.
Next, Tablenine lists the all-time accurateness by classification method and the graphs are depicted in Figures7 and 8 for both Group 1 and Group two respectively.
Tabular array 9
Feature choice method | Group 1 | Group 2 | ||||||
---|---|---|---|---|---|---|---|---|
Nomenclature method | Nomenclature method | |||||||
ANFIS | ANN | SVM | LR | ANFIS | ANN | SVM | LR | |
GA | lxx.95 | 52.43 | 61.43 | 68.10 | 74.76 | 51.48 | 74.76 | 74.76 |
CC | 74.76 | 54.48 | 60.95 | 67.62 | 61.90 | 57.38 | 67.62 | 71.43 |
ReliefF | 64.29 | 51.52 | 54.29 | 50.59 | 67.14 | 49.48 | 54.29 | l.59 |
CC-GA | 67.62 | 50.48 | 63.81 | 70.95 | 74.76 | 55.19 | 74.76 | 74.76 |
ReliefF-GA | 67.62 | 56.76 | 64.29 | 61.43 | 93.81 | 84.62 | 74.76 | 74.76 |
From Effigyvii, ANFIS performed the best in Group 1 when compared to other classification methods for all types of feature selection methods except CC-GA method. All the classification methods performed worst in ReliefF feature selection method except for ANN. ANN had the everyman accuracy rate if compared to other methods.
Whereas, in Group two as shown in Effigy8, ANFIS outperformed the other nomenclature methods except in CC feature selection method. The best accuracy is accomplished past ANFIS in ReliefF-GA method with the accuracy of 93.81% (Table9). In general, all classification methods performed improve in CC-GA and ReliefF-GA hybrid feature selection methods except for SVM and LR. Equally with Grouping 1, ANN had the lowest classification rate except in ReliefF-GA method. Overall, the performance of the classification method is better in Group 2 every bit compared to Grouping 1. Tabular array10 summarizes the all-time models with their selected features for both Group 1 and Group 2. All the models with the accuracy of 70% and to a higher place are selected.
Table 10
Accuracy | AUC | Nomenclature method | Selected features | |
---|---|---|---|---|
Group 1 | ||||
CC-3-input | 74.76 | 0.70 | ANFIS | Historic period,Inv,PN |
GA-3-input | 70.95 | 0.66 | ANFIS | PT,PN,Sta |
CC-GA-6-input | 70.95 | 0.73 | LR | Gen,Dri,Node,PT,PN,Sta |
Group 2 | ||||
ReliefF-GA-iii-input | 93.81 | 0.90 | ANFIS | Dri,Inv,p63 |
ReliefF-GA-4-input | 93.81 | 0.90 | ANFIS | Dri,Inv,Tre,p63 |
ReliefF-GA-iii-input | 84.62 | 0.83 | ANN | Dri,Inv,p63 |
ReliefF-GA-3-input | 84.62 | 0.83 | ANN | Dri,Inv,p63 |
GA-3-input | 74.76 | 0.74 | ANFIS | Inv,Node,p63 |
CC-GA-3-input | 74.76 | 0.lxx | ANFIS | Inv,Node,p63 |
CC-GA-3-input | 74.76 | 0.seventy | SVM | Inv,Node,p63 |
CC-GA-3-input | 74.76 | 0.lxx | LR | Inv,Node,p63 |
ReliefF-GA-3-input | 74.76 | 0.70 | SVM | Dri,Inv,p63 |
ReliefF-GA-three-input | 74.76 | 0.lxx | LR | Dri,Inv,p63 |
Relief-GA-iv-input | 74.76 | 0.70 | LR | Dri,Inv,Tre,p63 |
Relief-GA-v-input | 74.76 | 0.70 | SVM | Historic period,Gen,Smo,Dri,p63 |
Relief-GA-6-input | 74.43 | 0.66 | SVM | Age,Gen,Smo,Dri,Inv,p63 |
Relief-GA-4-input | 73.38 | 0.75 | ANN | Dri,Inv,Tre,p63 |
Relief-GA-4-input | 71.43 | 0.68 | SVM | Dri,Inv,Tre,p63 |
Relief-GA-5-input | 71.43 | 0.68 | LR | Age,Gen,Smo,Dri,p63 |
CC-three-input | 71.43 | 0.67 | LR | Inv,PN,p63 |
CC-4-input | 71.43 | 0.67 | LR | Age,Inv,PN,p63 |
CC-GA-four-input | 70.48 | 0.71 | ANFIS | Gen,Inv,Size,p53 |
From Tableten, the models with the highest accuracy are ReliefF-GA-3-input and ReliefF-GA-4-input from Group 2 with ANFIS classification, the accuracy is 93.81% and AUC of 0.ninety. The features selected are Drink, Invasion and p63 and Drink, Invasion, Treatment, and p63 respectively. This is followed by the ReliefF-GA-three-input model from Grouping ii with ANN classification, with the accuracy of 84.62% and AUC of 0.83. Most of the best models are generated from the ReliefF-GA feature selection method; this proves that the features selected by this method are the optimum features for the oral cancer prognosis dataset.
Discussions
The results shown meets the objective of this research, namely, the classification performance is much better with the existence of genomic variables in Group 2. From the results in Table10, the all-time feature selection method for oral cancer prognosis is ReliefF-GA with ANFIS classification. This shows that the ANFIS is the most optimum classification tool for oral cancer prognosis.
Since there are two top models with the same accurateness, hence, the simpler one is recommended for further works in like researches which is the ReliefF-GA-3-input model with ANFIS classification, and the optimum subset of features are Drink, Invasion and p63. These findings confirmed that of some previous studies. Alcohol consumption has always been considered as a risk factor and one of the reasons for poor prognosis of oral cancer [30-33]. Walker D et al. [34] showed that the depth of invasion is one of the most of import predictors of lymph node metastasis in tongue cancer. The unlike researches done by [35-38], discovered a significant link between the depth of invasion and oral cancer survival. As regards to p63, [12-14] showed that p63 over expression associates with poor prognosis in oral cancer.
A comparison between the current methodology and the other methods in the literature was done and shown in Table11. Nonetheless, direct comparisons cannot be performed since different datasets accept been employed in each case. In this comparison, we compared the studies which utilized at to the lowest degree both types of clinical and genomic data in oral cancer. In general, the proposed methodology exhibits superior results compared to the other methods except the work done by [eight,9] which claimed to achieve an accuracy of 100%. Nonetheless, they employed unlike classifiers for different source of data and more than 70 markers were required for their final combined classifier. A meaning advantage of our proposed methodology is only three optimum markers are needed with a single classifier for both types of clinicopathologic and genomic data to obtain high accurateness outcome. It is hope that the proposed methodology is feasible to expedite oral cancer clinicians in the decision support stage and to better predict the survival rate of the oral cancer patients based on the three markers only.
Table xi
Author | Sample size | Accuracy (%) |
---|---|---|
Passaro et al. [half dozen] | 124 patients, 231 controls | 74-79 |
Oliveira et al. [7] | 500 | 5-twelvemonth survival of 28.6% |
Exarchos et al. [viii] | 41 | 100 |
Exarchos et al. [9] | 86 | 100 |
Dom et al. [ten] | 84 patients, 87 controls | 82 |
Electric current work | 31 | 93.81 |
A common problem associated with medical dataset is pocket-sized sample size. It is time consuming and costly to obtain large amount of samples in medical enquiry and the samples are unremarkably inconsistent, incomplete or noisy in nature. The small sample size problem is more than visible in the oral cancer enquiry since oral cancer is not one of the acme ten virtually common cancers in Malaysia, hence at that place are not many cases. For example, in Peninsular Malaysia, there are just 1,921 new oral cancer cases from 2003 to 2005 [39] and 592 new oral cancer cases in the year 2006 [40] as compared to breast cancer, where the incidence between 2003 and 2005 is 12,209 [39] and the incidence for 2006 is 3,591 [forty]. Out of these oral cancer cases, some patients are lost to follow-up, some patients seek treatments in other private hospitals and thus, their information are not available for this enquiry. Another reason for small-scale sample size is caused by the medical confidentiality issues. This can be viewed from two aspects, namely, patients and clinicians. Some patients do non wish to reveal any data about their diseases to others, and are not willing to donate their tissues for enquiry/educational purposes. As for clinicians, some may not want to share patients' data with others especially those from the non-medical fields, while some practise not keep their medical records in the correct medical form. From those available cases, some patients' clinicopathologic data are incomplete, some tissues are missing due to improper management and some are duplicated cases. Due to that, the number of cases that tin can actually be used for this inquiry is very limited.
In order to overcome the problem of modest sample size, we employed the feature selection methods on our dataset to choose the most optimum feature subsets based on the correlations of the input and output variables. The features selected are fed into the proposed classifier and the results showed that the ReliefF-GA-ANFIS prognostic model is suitable for pocket-sized sample size data with the proposed optimum feature subset of drinkable, invasion and p63.
Significance testing
The significance test used in this inquiry was the Kruskal-Wallis test. Kruskal-Wallis is a not-parametric exam to compare samples from two or more than groups and returns the p-value. For this inquiry, we wanted to test if there is any statistical pregnant deviation between the accuracy results generated for the 3-input model of Group 2 for the unlike characteristic choice methods. Thus, the null hypothesis is ready every bit: H 0 = There is no deviation betwixt the results of the different characteristic selection models. If the p-value computed from the examination is 0.05 or less, the H 0 is rejected, which means there is a departure betwixt the results of the unlike feature option methods. The p-value that generated was 0.0312, which is less than 0.05, this means the H 0 is rejected and in that location is a significant departure between the feature selection methods.
Validation testing
In this section, the all-time model of ReliefF-GA-3-input model is compared with other models with a random permutation of three inputs. The purpose is to validate that the features selected by the ReliefF-GA method are the optimum subset for oral cancer prognosis. In addition, the full-input model (the model with all the 17 variables) volition be tested every bit well in guild to verify that the reduced model can accomplish the aforementioned or better results than the total model. In this testing, the classification method used is ANFIS due to its best functioning in the previous section and the results are tabulated in Table12.
Tabular array 12
Models | ANFIS | |
---|---|---|
% | AUC | |
Random permutation model | ||
Age, Inv, p63 | 64.76 | 0.63 |
Eth, Dri, p53 | 57.14 | 0.49 |
PT, PN, Sta | 58.10 | 0.51 |
Gen, Node, Tre | lxx.95 | 0.59 |
Eth, Gen, Sub | 39.05 | 0.32 |
Dri, p53, p63 | 80.48 | 0.70 |
Age, p53, p63 | 67.fourteen | 0.67 |
Gen, Dri, Inv | 54.76 | 0.55 |
Site, Inv, Size | 32.86 | 0.28 |
Age, Chew, Size | 48.10 | 0.41 |
Full model | ||
Total model with ANFIS | North.A.* | N.A.* |
Full model with NN | 42.90 | 0.47 |
Total model with SVM | 54.76 | 0.46 |
Total model with LR | 54.76 | 0.59 |
*N.A. - Results non available due to over-fitting problem equally the rule-base of operations generated was too large.
Table12 presents the results from unlike permutation of the 3-input models using ANFIS and that of the total model with all the 17 variables using the different classification methods. The three inputs are generated randomly and the best accuracy obtained is lxxx.48% with an AUC of 0.seventy. The features selected are Drink, p53 and p63. The results of the full model are not promising and the results of full model using ANFIS cannot exist generated due to over-plumbing fixtures problems as the rule base generated is too large.
Finally, the selected features are tested on the oral cancer dataset for 1-year and ii-yr with ANFIS classification and the results are very promising with an accuracy for 1-twelvemonth prognosis of 93.33% and 2-twelvemonth prognosis observed at 84.29%, equally compared to the 3-year prognosis of 93.81%. The results are shown in Table13.
Table 13
Oral cancer prognosis | Accurateness (%) | AUC |
---|---|---|
1-year | 93.33 | 0.90 |
2-year | 84.29 | 0.77 |
iii-year | 93.81 | 0.90 |
Findings
The analyses and findings from this research are:
(i) The performance of Group 2 (clinicopathologic and genomic variables) is better than Grouping one (clinicopathologic variables). This is in accordance with the objective of this enquiry, which shows that the prognostic upshot is more than authentic with the combination of clinicopathologic and genomic markers.
(ii) The model with the best accurateness is the ReliefF-GA-3-input model with the ANFIS nomenclature model from Grouping two and the Kruskal-Wallis test showed a meaning departure as compared to the 3-input model of GA, CC, ReliefF and CC-GA.
(three) The optimum subset of features for oral cancer prognosis is beverage, invasion and p63 and this finding is in accordance with similar studies in the literature.
(iv) The ANFIS classification model accomplished the all-time accuracy in oral cancer prognosis when compared to bogus neural network, back up vector auto and statistical method of logistic regression.
(v) The prognostic result is more accurate with fewer inputs in comparing with the full model.
Equally a summary, the hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 accomplished the best accurateness. Through the identification of fewer markers for oral cancer prognosis, it is hoped that this will assistance clinicians in carrying out prognostic procedures, and thus help them in making a more than accurate prognosis in a shorter time at lower costs. Furthermore, the results of this research helps patients and their family plan their hereafter and lifestyle through a more reliable prognosis.
Conclusions
In this research, we presented a prognostic system using the hybrid of characteristic pick and machine learning methods for the purpose of oral cancer prognosis based on clinicopathologic and genomic markers. Every bit a conclusion, the hybrid model of ReliefF-GA-ANFIS resulted in the best accuracy (accuracy = 93.81%, AUC = 0.ninety) with the selected features of drink, invasion and p63. The results proved that the prognosis is more accurate when using the combination of clinicopathologic and genomic markers. Yet, more tests and experiments needed to be washed in gild to farther verify the results obtained in this enquiry. Our future works include increasing the sample size of the dataset by providing more medical samples thus making it closer to the real population and including more genomic markers in our written report.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
SWC developed the prognostic model, performed the experiments and drafted the manuscript. SWC and SAK conceived with the study and contributed to the experimental pattern. AFM and RBZ contributed in the assay and interpretation of oral cancer prognostic dataset. All authors read and approved the final manuscript.
Acknowledgment
This study is supported past the University of Malaya Enquiry Grant (UMRG) with the projection number RG026-09ICT. The authors would like to thank Dr Mannil Thomas Abraham from the Tunku Ampuan Rahimah Infirmary, Ministry building of Health, Malaysia, Dr Thomas George Kallarakkal from the Department of Oral Pathology and Oral Medicine and Periodontology, the staff from the Oral & Maxillofacial Surgery department, the Oral Pathology Diagnostic Laboratory, the OCRCC, the Faculty of Dentistry, and the ENT department, Faculty of Medicine, University of Malaya for the preparation of the dataset and the related data and documents for this projection.
References
- Lisboa PJ, Taktak AFG. The Use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw. 2006;19:408–415. doi: 10.1016/j.neunet.2005.10.007. [PubMed] [CrossRef] [Google Scholar]
- Cruz JA, Wishart DS. Applications of car learning in cancer prediction and prognosis. Cancer Informatics. 2006;2:59–78. [PMC free article] [PubMed] [Google Scholar]
- Futschik ME, Sullivan M, Reeve A, Kasabov Northward. Prediction of clinical behaviour and treatment for cancers. Appl Bioinformatics. 2003;2(3 Suppl):S53–S58. [PubMed] [Google Scholar]
- Gevaert O, Smet FD, Timmerman D, Moreau D, Moor BD. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics. 2006;22(14):e184–e190. doi: 10.1093/bioinformatics/btl230. [PubMed] [CrossRef] [Google Scholar]
- Sunday Y, Goodison S, Li J, Liu L, Farmerie Westward. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007;23(1):thirty–37. doi: x.1093/bioinformatics/btl543. [PMC gratis article] [PubMed] [CrossRef] [Google Scholar]
- Passaro A, Baronti F, Maggini V. Exploring relationships betwixt genotype and oral cancer development through XCS. New York, United states of america: GECCO′05; 2005. [Google Scholar]
- Oliveira LR, Ribeiro-Silve A, Costa JPO, Simoes AL, Di Matteo MAS, Zucoloto S. Prognostic factors and survival analysis in a sample of oral squamous cell carcinoma patients. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology. 2008;106(5):685–695. doi: 10.1016/j.tripleo.2008.07.002. [PubMed] [CrossRef] [Google Scholar]
- Exarchos K, Goletsis Y, Fotiadis D. Multiparametric Conclusion Back up System for the Prediction of Oral Cancer Reoccurrence. IEEE Trans Inf Technol Biomed. 2011;16(6):1127–1134. [PubMed] [Google Scholar]
- Exarchos Yard, Goletsis Y, Fotiadis D. A multiscale and multiparametric approach for modeling the progression of oral cancer. BMC Med Inform Decis Mak. 2012;12:136–150. doi: 10.1186/1472-6947-12-136. [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
- Dom RM, Abdul-Kareem S, Abidin B, Jallaludin RLR, Cheong SC, Zain RB. Oral cancer prediction model for Malaysian sample. Austral-Asian Journal of Cancer. 2008;7(4):209–214. [Google Scholar]
- Catto JWF, Abbod MF, Linkens DA, Hamdy FC. Neuro-fuzzy modeling: an accurate and interpretable method for predicting bladder cancer progression. J Urol. 2006;175:474–479. doi: x.1016/S0022-5347(05)00246-6. [PubMed] [CrossRef] [Google Scholar]
- Muzio LL, Santarelli A, Caltabiano R, Rubini C, Pieramici T, Trevisiol L. p63 overexpression associates with poor prognosis in head and neck squamous cell carcinoma. Hum Pathol. 2005;36:187–194. doi: 10.1016/j.humpath.2004.12.003. [PubMed] [CrossRef] [Google Scholar]
- Chen YK, Huse SS, Lin LM. Differential expression of p53, p63 and p73 proteins in human buccal squamous-cell carcinomas. Clin Otolaryngol Allied Sci. 2003;28(5):451–455. doi: x.1046/j.1365-2273.2003.00743.x. [PubMed] [CrossRef] [Google Scholar]
- Choi H-R, Batsakis JG, Zhan F, Sturgis E, Luna MA, El-Naggar AK. Differential expression of p53 gene family members p63 and p73 in caput and neck squamous tumorigenesis. Hum Pathol. 2002;33(two):158–164. doi: 10.1053/hupa.2002.30722. [PubMed] [CrossRef] [Google Scholar]
- Mehrotra R, Yadav Southward. Oral squamous jail cell carcinoma: etiology, pathogenesis and prognostic value of genomic alterations. Indian J Cancer. 2006;43(2):60–66. doi: x.4103/0019-509X.25886. [PubMed] [CrossRef] [Google Scholar]
- Thurfjell Northward, Coates PJ, Boldrup 50, Lindgren B, Bäcklund B, Uusitalo T, Mahani D, Dabelsteen E. Function and Importance of p63 in Normal Oral Mucosa and Squamous Cell Carcinoma of the Caput and Neck. Current Research in Head and Neck Cancer. 2005;62:49–57. [PubMed] [Google Scholar]
- Zigeuner R, Tsybrovskyy O, Ratschek M, Rehak P, Lipsky K, Langner C. Prognostic touch on of p63 and p53 in upper urinary tract transitional prison cell carcinoma. Adult Urology. 2004;63(vi):1079–1083. doi: 10.1016/j.urology.2004.01.009. [PubMed] [CrossRef] [Google Scholar]
- Rosner B. Fundamentals of Biostatistics. 6. California: Thomson Higher Education; 2006. [Google Scholar]
- Kononenko I. ECML-94 Proceedings of the European conference on car learning on Auto Learning: 1994. Catania, Italy: Springer; 1994. Estimating Attributes: Analysis and Extension of RELIEF; pp. 171–182. [Google Scholar]
- Goldberg DE. Genetic Algorithms in Search, Optimization, and Auto Learning. Boston: Addison-Wesley Longman; 1989. [Google Scholar]
- Siow-Wee C, Kareem SA, Kallarakkal TG, Merican AF, Abraham MT, Zain RB. Feature Selection Methods for Optimizing Clinicopathologic Input Variables in Oral Cancer Prognosis. Asia Pacific Journal of Cancer Prevention. 2011;12(x):2659–2664. [PubMed] [Google Scholar]
- Efron B. Estimating the mistake rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc. 1983;78(382):316–330. doi: 10.1080/01621459.1983.10477973. [CrossRef] [Google Scholar]
- Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005;21(15):3301–3307. doi: 10.1093/bioinformatics/bti499. [PubMed] [CrossRef] [Google Scholar]
- Jang JSR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Human being Cybern. 1993;23(three):665–685. doi: 10.1109/21.256541. [CrossRef] [Google Scholar]
- Jang JSR. Input Option for ANFIS Learning. Fifth IEEE International Briefing on Fuzzy Systems vol. 2. 1996. pp. 1493–1499.
- Gershenson C. Bogus Neural Network For Beginners. Formal Computational Skills Teaching Package, COGS, University of Sussex; 2001. [Google Scholar]
- Chih-Chung C, Chih-Jen L. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2:27:21–27:27. [Google Scholar]
- Chih-Wei H, Chang C-C, Lin C-J. Technical Report. Taiwan: National Taiwan University; 2010. A Practical Guide to Support Vector Auto. [Google Scholar]
- Ross SM. Introductory Statistics. iii. New York, United states of america: Academic Printing, Elsevier; 2010. [Google Scholar]
- Jefferies S, Foulkes WD. Genetic mechanisms in squamous jail cell carcinoma of the head and neck. Oral Oncol. 2001;37:115–126. doi: 10.1016/S1368-8375(00)00065-viii. [PubMed] [CrossRef] [Google Scholar]
- Leite ICG, Koifman S. Survival analysis in a sample of oral cancer patients at a reference hospital in Rio de Janeiro, Brazil. Oral Oncol. 1998;34(1998):347–352. [PubMed] [Google Scholar]
- Reichart PA. Identification of adventure groups for oral precancer and cancer and preventive measures. Clin Oral Invest. 2001;5:207–213. doi: ten.1007/s00784-001-0132-5. [PubMed] [CrossRef] [Google Scholar]
- Zain RB, Ghazali N. A review of epidemiological studies of oral cancer and precancer in Malaysia. Register of Dentistry University of Malaya. 2001;viii:50–56. [Google Scholar]
- Walker D, Boey G, McDonald L. The pathology of oral cancer. Pathology. 2003;35(five):376–383. doi: ten.1080/00310290310001602558. [PubMed] [CrossRef] [Google Scholar]
- Asakage T, Yokose T, Mukai Yard, Tsugane S, Tsubono Y, Asai G, Ebihara S. Tumor thickness predicts cervical metastasis in patients with stage I/II carcinoma of the tongue. Cancer. 1998;82:1443–1448. doi: x.1002/(SICI)1097-0142(19980415)82:viii<1443::AID-CNCR2>3.0.CO;2-A. [PubMed] [CrossRef] [Google Scholar]
- Giacomarra V, Tirelli G, Papanikolla L, Bussani R. Predictive factors of nodal metastases in oral cavity and oropharynx carcinomas. Laryngoscope. 1999;109:795–799. doi: ten.1097/00005537-199905000-00021. [PubMed] [CrossRef] [Google Scholar]
- Morton R, Ferguson C, Lambie Northward, Whitlock R. Tumor thickness in early natural language cancer. Arch Otolaryngol Caput Neck Surg. 1994;120:717–720. doi: ten.1001/archotol.1994.01880310023005. [PubMed] [CrossRef] [Google Scholar]
- Williams J, Carlson G, Cohen C, Derose P, Hunter S, Jurkiewicz M. Tumor angiogenesis as a prognostic cistron in oral cavity tumors. Am J Surg. 1994;168:373–380. doi: ten.1016/S0002-9610(05)80079-0. [PubMed] [CrossRef] [Google Scholar]
- Gerard LCC, Rampal S, Yahaya H. Third Written report of the National Cancer Registry Cancer Incidence in Malaysia (2005) National Cancer Registry, Ministry building of Wellness Malaysia. 2005.
- Omar ZA, Ali ZM, Tamin NSI. Malaysian Cancer Statistics - Information and Figure, Peninsular Malaysia 2006. National Cancer Registry, Ministry of Health Malaysia. 2006.
Articles from BMC Bioinformatics are provided here courtesy of BioMed Central
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673908/
0 Response to "New Chromosome Combinations Section Review 54 Addison Wesley Longman"
Post a Comment