New Chromosome Combinations Section Review 54 Addison Wesley Longman

BMC Bioinformatics. 2013; 14: 170.

Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of characteristic pick and machine learning methods

Siow-Wee Chang

¹Bioinformatics and Computational Biology, Found of Biology, Faculty of Scientific discipline, University of Malaya, Kuala Lumpur, Malaysia

²Department of Artificial Intelligence, Faculty of Informatics and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Sameem Abdul-Kareem

ⁱⁱDepartment of Artificial Intelligence, Kinesthesia of Informatics and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Amir Feisal Merican

ⁱBioinformatics and Computational Biology, Found of Biological Science, Faculty of Scientific discipline, University of Malaya, Kuala Lumpur, Malaysia

Rosnah Binti Zain

³Department of Oral Pathology and Oral Medicine and Periodontology, Oral Cancer Enquiry and Coordinating Center (OCRCC), Kinesthesia of Dentistry, Academy of Malaya, Kuala Lumpur, Malaysia

Received 2012 Nov 7; Accepted 2013 May 21.

Abstract

Groundwork

Machine learning techniques are becoming useful as an alternative arroyo to conventional medical diagnosis or prognosis equally they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians brand prognostic decisions based on clinicopathologic markers. However, it is not easy for the about skilful clinician to come out with an authentic prognosis past using these markers lonely. Thus, at that place is a need to use genomic markers to improve the accurateness of prognosis. The primary aim of this enquiry is to apply a hybrid of feature pick and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers.

Results

In the first phase of this enquiry, five characteristic choice methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each characteristic pick methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, bogus neural network, back up vector machine and logistic regression. A 1000-fold cross-validation is implemented on all types of classifiers due to the pocket-size sample size. The hybrid model of ReliefF-GA-ANFIS with iii-input features of drinkable, invasion and p63 accomplished the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis.

Conclusions

The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features tin can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies.

Keywords: Oral cancer prognosis, Clinicopathologic, Genomic, Feature selection, Auto learning

Groundwork

Various machine learning methods accept been applied in the diagnosis or prognosis of cancer enquiry, such as, artificial neural networks, fuzzy logic, genetic algorithm, support vector auto and other hybrid techniques [1,2]. From the medical perspective, diagnosis is to place a disease past its signs and symptoms while prognosis is to predict the outcome of the disease and the condition of the patient, whether the patient will survive or recover from the illness or vice versa. In some studies, the researchers have proven that motorcar learning methods could generate more accurate diagnosis or prognosis as compared to traditional statistical methods [2].

Unremarkably, clinicopathologic data or genomic data are used in researches either involving diagnosis or that with respect to prognosis. Currently, there are some researches that take shown that prognosis results are more accurate when using both clinicopathologic and genomic data. Examples of these are the works in [3] in diffuse large B-prison cell lymphoma (DLBCL) cancer, the works in [4,5] in chest cancer, [half dozen-x] in oral cancer, and [11] in float cancer. However, the number of published articles on researches that combine both clinicopathologic and genomic data are few as compared to that using just clinicopathologic data [2]. In the oral cancer domain, [6] used automobile learning techniques in the oral cancer susceptibility studies. They proposed a hybrid adaptive organisation inspired from learning classifier system, decision copse and statistical hypothesis testing. The dataset includes both demographic data and 11 types of genes. Their results showed that the proposed algorithm outperformed the other algorithms of Naive Bayes, C4.5, neural network and XCS (Development of Kingdom of the netherlands's Learning Classifier). Yet, they did not validate their results against the traditional statistical methods. [vii] focused on the v-twelvemonth overall survival in a group of oral squamous cell carcinoma (OSCC) patients and investigated the furnishings of demographic data, clinical data and genomic information, and human papillomavirus on the prognostic consequence. They used the statistical methods for the prediction and their results showed that the 5-year overall survival was 28.6% and highlighted the influence of p53 immunoexpression, age and anatomic localization on OCSS prognosis. However, in this inquiry, no machine learning methods were used and compared. Another oral cancer research that was washed by [8,nine] was in the oral cancer reoccurrence. Bayesian network was used and compared with ANN, SVM, conclusion tree, and random forests. They used multitude of heterogeneous data which included clinical, imaging, tissue and blood genomic data. They built a dissever classifier for unlike types of data and combined the best performing classification schemes. They claimed that they had achieved an accurateness of 100% with the combination of all types of information and proved that the prediction accuracy was the all-time when using all types of information. However, more than 70 markers were required for their concluding combined classifier.

For the genomic domain, [12] showed that p63 over expression associates with poor prognosis in oral cancer. Their written report showed that cases with lengthened p63 expression were more ambitious and poorly differentiated and related to a poorer prognosis, these findings supporting the use of p63 as an additional marking for diagnostic use in oral SCC. In [thirteen], immunohistochemical analysis of poly peptide expression for p53, p63 and p73 was performed for xl samples of well-differentiated human buccal squamous-cell carcinomas, with 10 specimens of normal buccal mucosa employed as controls. Their results indicated that both p73 and p63 may be involved in the evolution of homo buccal squamous-cell carcinoma, mayhap in concert with p53. Like results were obtained by [fourteen], they have showed that in head and cervix squamous carcinomas (HNSC), p63 was the most frequently expressed (94.7%), followed by p73 (68.4%) and p53 (52.6%). Their study indicated that p63 and p73 expression may represent an early event in HNSC tumorigenesis and p73 and p63 may role as oncogenes in the development of these tumors.

In this research, an oral cancer prognostic model is developed. This research used real-world oral cancer dataset which is nerveless locally at the Oral Cancer Inquiry and Analogous Centre (OCRCC), Kinesthesia of Dentistry, University of Malaya, Malaysia. The model takes both clinicopathologic and genomic data in society to investigate the touch of each marker or combination of markers to the accuracy of the prognosis of oral cancer. Five characteristic selection methods are proposed with the objectives to reduce the number of input variables to avoid over-fitting and to find out an optimum feature subset for oral cancer prognosis. This is followed by the classification procedures which are used to classify the status of the patient after ane–3 years of diagnosis (alive or dead). Four nomenclature methods, from both auto learning and statistical methods, are tested and compared. The objective of this enquiry is to prove that the prognosis is better by using both types of clinicopathologic and genomic markers, and to identify the key markers for oral cancer prognosis using the hybrid of feature selection and auto learning methods.

Methods

The framework for the oral cancer prognostic model is shown in Effigy1. Clinicopathologic variables from the OCRCC database and genomic variables from Immunohistochemistry (IHC) staining are fed into the model. Basically, in that location are iii main parts for the oral cancer prognostic model, which are wet-laboratory testing for genomic variables, feature selection methods and the nomenclature models. This research was approved by Medical Ethics Committee, Kinesthesia of Dentistry, University of Malaya.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-1.jpg

Framework for oral cancer prognostic model.

Clinicopathologic data

A total of 31 oral cancer cases were selected from the Malaysian Oral Cancer Database and Tissue Bank System (MOCDTBS) coordinated past the Oral Cancer Enquiry and Coordinating Centre (OCRCC), Faculty of Dentistry, University of Malaya. The choice was based on the completeness of the clinicopathologic data, the availability of tissues and the availability of data (some data were not available for utilize due to medical confidentiality problems).

The selected cases were based on the oral cancer cases seen in the Faculty of Dentistry, University of Malaya and Infirmary Tunku Ampuan Rahimah, Klang, a Malaysian government hospital, from the year 2003 to 2007. These cases were diagnosed and followed up and the data were recorded in the standardised forms prepared past the MOCDTBS. Subsequently, MOCDTBS transcribed all the data from paper to an electronic version and stored in the database. All the cases selected were diagnosed as squamous cell carcinomas (SCC). Table1 shows the 1 to 3-year survival for these 31 cases.

Table 1

1-year, 2-year and iii-year survival

Elapsing of follow-up	Survival	No	%
1-year	Survive	27	87.1
	Dead	iv	12.9
	Lost of follow-upwardly	0	0.0
2-year	Survive	xix	61.three
	Dead	10	32.3
	Lost of follow-upwardly	ii	six.5
3-year	Survive	17	54.8
	Dead	11	38.7
	Lost of follow-up	3	9.7

Basically, iii types of data are available for each oral cancer case, namely, social demographic information (risk factors, ethnicity, age, occupation, marital status and others), clinical data (type of lesion, size of lesion, main site, clinical neck node and etc.), and pathological data (pathological TNM, neck node metastasis, bone invasion, neoplasm thickness and etc.). Pathological data were obtained from the biopsy reports before and after surgical procedures. In this research, we referred to the clinical and pathological data as clinicopathologic data. Based on the discussions with 2 oral cancer clinicians, Prof. Rosnah Binti Zain and Dr Thomas George Kallarakkal, 15 key variables had been identified equally important prognostic factors of oral cancer. The selected clinicopathologic variables are listed in Tabletwo(a).

Table ii

Description and membership function for clinicopathologic and genomic variables

(a) Clinicopathologic variables
Name	Clarification	Name & parameters of membership function
Age	Age at diagnosis	1 - 40–fifty, 2 - >50-lx, iii - >60-70, 4 - >70
Eth	Ethnicity	1 - Malay, two - Chinese, three - Indian
Gen	Gender	1 - Male, 2 - Female
Smoke	Smoking habit	1 - Yes, ii - No
Beverage	Alcohol drinking addiction	one - Yes, 2 - No
Chew	Quid chewing habit	i - Yes, 2 - No
Site	Main site of tumor	1 - Buccal mucosa, ii - tongue
Site	Main site of tumor	3 - floor, 4 - others
Subtype	Subtype and differentiation for SCC	one - Well differentiated
		2 - moderate differentiated
		three - poorly differentiated
Inv	Depth of Invasion front	1 - Not-cohesive, 2 - cohesive
Node	Cervix nodes	1 - Negative, 2 - positive
PT	Pathological tumor staging	1 - T1, 2 - T2, 3 - T3, 4 - T4
PN	Pathological lymph nodes	1 - N0, ii- N1, 3- N2A, 4- N2B
Stage	Overall stage	i - I, two - II, 3 - Three, 4 - Iv
Size	Size of tumor	1 - 0-2 cm, 2 - >2-iv cm, 3 - >iv-6 cm, 4 - >6 cm
Care for	Blazon of treatment	one - Surgery only
		2 - Surgery + Radiotherapy
		3 - Surgery + Chemotherapy
(b) Genomic variables
Proper noun	Description	Name and parameters of membership office
p53	Tumor suppressor gene	ane - negative, 2 - positive
p63	Tumor suppressor factor	1 - negative, 2 - positive

Genomic information

Two genomic variables had been identified through the literature studies and discussions with oral pathologists, from the Department of Oral Pathology and Oral Medicine and Periodontology, Faculty of Dentistry, Academy of Malaya. Both of these variables are neoplasm suppressor genes, namely, p53 and p63. p53 is the most oft associated mark in the head and neck cancers [vii,fifteen]. p53 is called the "Guardian of the genome", it is important in maintaining genomic stability, progression of cell cycle, cellular differentiation, DNA repair and apoptosis. It is difficult to demonstrate p53 protein in normal tissues using immunohistochemistry procedures due to its high catabolic charge per unit; however mutated p53 exhibits a much lower catabolic rate and accumulates in the cells [15]. In addition, p63 gene, a homolog of the p53 is located in chromosome 3q21-29, and its amplification has been associated with prognostic consequence in oral cancer [11,16]. The p63 cistron is highly expressed in the basal or progenitor layers of many epithelial tissues.

The cases selected were the same as in the clinicopathologic data. Immunohistochemistry (IHC) staining was performed on the selected formalin-fixed paraffin embedded oral cancer tissues to obtain the results for the selected genomic variables. The archival formalin-fixed alkane series embedded tissues were obtained from the Oral Pathology Diagnostic Laboratory, Faculty of Dentistry, University of Malaya. The tissues containing the tumour were cored and re-embedded and made into Tissue Macroarray blocks (TMaA). A total of 4-μm-thick sections of the resulting TMaA blocks were cut and placed on the poly-L-lysine-coated glass slides for IHC staining. The samples were mounted on the drinking glass slides and ready for IHC staining. In this research, the Dako Real™ EnVision™ Detection Kit was used. In total, fifteen TMaA slides with 31 oral cancer cases were stained. Ii types of antibodies were used namely Monoclonal Mouse Anti-Human p53 protein, clone 318-6-11 for p53 and Monoclonal Mouse Anti-Human being p63 poly peptide, clone 4A4 for p63.

The results of the staining were analyzed and the images were captured by using an prototype analyzer system which included Nikon Eclipse E400 Microscope with CFI program Fluor 40X objective for measurements, QImaging Development digital colour cooled camera with 5.0 megapixels, a personal computer (Pentium 4, 2.5Ghz, 2GB RAM) and MediaCybernatics Prototype Pro Plus version 6.3 prototype assay software. Each slide was start examined nether the microscope with lower objective, that is, the 4X objective. Cases were considered sufficient for evaluation if there were tumour cells presented in the sections. Next, the slide was divided into xx filigree cells and numbered accordingly from left to right. A simple randomization programme was used to generate random numbers. For each instance, v tumour representative areas were selected. If the number falls on the non-tumour representative area, the side by side number (cell) was chosen until all five areas were selected. Side by side, the five selected areas were examined under the microscope using a higher objective, that is, the 40X objective and the images were captured. The percentage of the positive nuclear cells for each area was counted and the average for v areas was calculated. The staining result is considered positive if there is more than 10% positive nuclear stained, in accord with the exercise used in the previous studies [7,17]. Figure2 shows the flowchart for the IHC results assay and scoring process. The results obtained from the IHC staining are combined with the clinicopathologic variables and served equally the inputs for the feature selection module. The combined dataset is further divided into two groups, which are Group 1 with clinicopathologic variables (fifteen variables) just and Grouping ii with both of the clinicopathologic and genomic variables (17 variables). We demand to emphasize that the genomic variables were obtained from the same corresponding cases from which the clinicopathologic variables (Group ane) were obtained. Thus, if the clinicopathologic variables were that of Instance 1, and so, the genomic variables were from the aforementioned case.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-2.jpg

Procedures for IHC results analysis and scoring.

Feature selection

In this research, the purpose of feature pick is to notice an optimal number of features for the minor sample of oral cancer prognosis information. Five feature pick methods accept been selected and compared, which are, Pearson'south correlation coefficient (CC) [18], and Relief-F [19] as the filter approach, genetic algorithm (GA) [20,21] every bit the wrapper approach, CC-GA and ReliefF-GA as the hybrid approach.

Genetic algorithm (GA)

In the characteristic subset selection problem, a solution is a specific characteristic subset that can be encoded as a string of binary digits (bits). Each feature is represented by binary digits of 0 or one. For example, in the oral cancer prognosis dataset, if the solution is a 011001000010000 string of 15 binary digits, it indicates that features 2, 3, six, and xi are selected as the characteristic subset [21]. The initial population is generated randomly to select a subset of variables. In this proposed GA feature selection method, if the features are all different, the subset is included in the initial population. If not, information technology is regenerated until an initial population with the desired size for the characteristic subset (n-input model) is created.

The fitness function used in this method is a classifier to discriminate between two groups, which are alive and dead subsequently iii-year of diagnosis. The mean foursquare mistake rate of the classification is calculated using a 5-fold cross-validation. The fitness function is the final hateful square error rate obtained. The subset of variables with the lowest fault rate is selected. Figure3 shows the flowchart and the criteria used for the GA feature selection arroyo.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-3.jpg

Genetic algorithm feature choice flowchart.

Pearson'due south correlation coefficient (CC)

Pearson'southward correlation coefficient, r, is use to come across if the values of two variables are associated. In this enquiry, r is calculated and ranked for each of the feature input and the one with the highest r is selected. For example, for the 3-input model, the superlative three inputs with the highest r value are selected. This is repeated for the n-input models for both Group one and Group 2.

Relief-F

Relief-F is the extension to the original Relief algorithm, which is able to deal with noisy and incomplete datasets also as multi-class problems. The cardinal idea of Relief is to estimate attributes according to how well their values distinguish among instances that are near to each other [18]. In this inquiry, each feature input is ranked and weighted using the k-nearest neighbours nomenclature, in which k = 1. The top features with large positive weights are selected for both groups of dataset.

Pearson'due south correlation coefficient and genetic algorithm (CC-GA)

This is the hybrid feature pick approach which consists of two stages: outset, it is a filter approach which calculates the correlation coefficient, r, and second, information technology is a wrapper approach of GA. In the showtime stage, x features with the highest r are selected and fed into the 2nd stage of the GA approach. The procedures of the GA are the same as that described in the previous section.

Relief-F and genetic algorithm (ReliefF-GA)

This hybrid feature choice approach consists of ii stages: get-go, it is the filter approach of Relief-F, and second, information technology is a wrapper approach of GA. In the get-go stage, x features with the highest weights are selected and fed into the second phase of the GA arroyo. In the second stage, n-input is selected for both Group 1 and Group 2.

Option of north-input models

Before the implementation of the feature selection method, a simple GA was run to find out the optimal number of inputs (northward-input model) from the 17 inputs of clinicopathologic and genomic information. The number of inputs with lower hateful foursquare fault rate was called. The error charge per unit for each n-input model is shown in Tabular array3, which shows that for Group ane, there are four models with the everyman error rate of 0.3871, which are the three-input, iv-input, 5-input, and 6-input model. Meanwhile, for Group two, the model with the lowest error rate is the 3-input model with an error rate of 0.2581. In this case, for comparison purposes, the number of inputs betwixt 3-input to 7-input are chosen. Hence n is set every bit n = 3, iv, 5, 6, 7 for the feature selection methods.

Table 3

Hateful foursquare error rate for n -input model

	Group 1	Group 2
one-input	0.3881	0.3626
ii-input	0.4193	0.2903
three-input	0.3871	0.2581
4-input	0.3871	0.2903
5-input	0.3871	0.3226
6-input	0.3871	0.3548
7-input	0.4571	0.3548
eight-input	0.4839	0.4194
ix-input	0.5161	0.4516

Classification

Next, the data with n selected features are fed into the classification models. The final output is the classification accuracy for oral cancer prognosis, which classifies the patients as alive or dead after subsequent years of diagnosis with the optimum characteristic of subset. Four nomenclature methods were experimented with and their results were later on compared, these are ANFIS, artificial neural network (ANN), back up vector motorcar and logistic regression.

In lodge to obtain accurate estimate results, cross-validation (CV) was used. CV provides an unbiased estimation, even so CV presents loftier variance with small samples in some studies [22]. In this inquiry, a v-fold cross-validation was implemented with each of the classifiers. 5-fold cross-validation was chosen over the commonly utilize 10-fold cross-validation due to the small sample size; hence, it will get out more instances for validation and has lower variance [23]. In 5-fold cross-validation, the 31 samples of oral cancer prognosis data were divided into 5 subsets of equal size and trained for five times, each time leaving out a sample as validation data.

Adaptive neuro-fuzzy inference organization (ANFIS)

ANFIS implements the Takagi-Sugeno fuzzy inference system. The details for ANFIS tin be plant in [24,25] respectively.

In the input layer, the number of input is divers by north, with n = 3, 4, 5, 6, 7. In the input membership (inputmf) layer, the number of membership function is defined past m _i, with i = 2, 3, four. The rules generated are based on the number of input and the number of input membership functions, and it is represented as (k ₂ ⁿ¹ ten grand ₃ ⁿ² x m _four ⁿ³ ) rules, in which n _one , n _ii , and n ₃ represent the number of input with m _i membership functions respectively, and n ₁+ n ₂+ n _iii= n. For example, in the ANFIS with 3-input, 10, y, and z, in which input x has 2 membership functions, input y has two membership functions, and input z has 4 membership functions, hence the number of rules generated is (2 ^two × three ⁰× four ¹) = sixteen rules, equally shown in Figurefour.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-4.jpg

ANFIS rules for a 3-input model.

The rules generated are the output membership functions which will exist computed as the summation of the contribution from each rule towards the overall output. The output is the survival condition, either alive or dead after 3-twelvemonth of diagnosis. The output is fix as one for dead and −1 for alive; the pseudo-code is every bit below:

if output ≥ 0

then set output = 1, allocate every bit dead

else output < 0,

then prepare output = −i, classify as live

The membership functions were obtained according to the chiselled variables that has been set through the discussions with ii oral cancer clinicians as mentioned in section Clinicopathologic Information. The type of membership function used was the Gaussian and the name and parameters of membership functions for each input variable are shown in Table2(a) for the clinicopathologic variables and ii(b) for the genomic variables. Each ANFIS was run for five epochs for the optimum result.

Artificial neural network (ANN)

The ANN employed in this research is the multi-layered feed forward (FF) neural network, which is the most mutual type of ANN [26]. The FF neural network was trained using the Levenberg-Marquardt algorithm. In this research, one hidden layer with five neurons (achieved the all-time results) was used and FF neural network was run for 5 epochs (achieved the best results). The training stopped when at that place was no comeback on the hateful squared error for the validation fix.

Support vector automobile (SVM)

For the purpose of this inquiry, a widely used SVM tool which is LIBSVM [27] was used. There are two steps involved in the LIBSVM; (i) the dataset was trained to obtain a model and (two) the model was used to predict the information for the testing dataset. The details for LIBSVM can exist found in [27,28] respectively. Linear kernel was used in this research.

Logistic regression (LR)

Logistic regression (LR) was selected equally the benchmark test for the statistical method. LR is the well-nigh commonly used statistical method for the prediction of diagnosis and prognosis in medical research. LR is the prediction of a relationship between the response variable y and the input variables x _i[29]. In this research, multiple logistic regression is used.

Experiment

The oral cancer dataset with 3-year prognosis is used in this experiment. Get-go, the oral cancer prognosis dataset was divided into two groups; Grouping 1 consisted of clinicopathologic variables only (15 variables) and Group two consisted of clinicopathologic and genomic variables (17 variables). Next, characteristic selection methods were implemented on both groups to select the primal features for the n-input model. Lastly, the classifiers with v-fold cross-validation were tested on the n-input model. The results obtained from the 5-fold cross-validation were averaged in order to produce the overall performance of the algorithm. The measures used to compare the functioning of the proposed methods were sensitivity, specificity, accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC).

Results

Grouping 1 (clinicopathologic variables)

Table4 shows the features selected for the proposed feature option methods for Group 1. Side by side, the n-input models generated from each feature selection methods were tested with the proposed classification methods. Table5 shows the nomenclature results for ANFIS, ANN, SVM and LR.

Tabular array 4

Feature subset selected for group one

Method	Characteristic subset selected
GA
3-input	Gen,Smo,PN
4-input	Dri,Inv,PN,Size
5-input	Dri,Node,PT,PN,Size
six-input	Age,Gen,Smo,Inv,PT,Size
7-input	Age,Eth,Chew,Inv,Node,PN,Size
CC
three-input	Age,Inv,PN
iv-input	Historic period,Gen,Inv,PN
five-input	Age,Gen,Inv,PN,Size
6-input	Historic period,Gen,Inv,PN,Sta,Size
7-input	Age,Gen,Dri,Inv,PN,Sta,Size
ReliefF
3-input	Eth,Dri,Sta
4-input	Historic period,Eth,Dri,Sta
5-input	Age,Eth,Dri,Sta,Tre
vi-input	Age,Eth,Gen,Dri,Sta,Tre
seven-input	Age,Eth,Gen,Dri,PT,Sta,Tre
CC-GA
3-input	PT,PN,Sta
4-input	Dri,Inv,PN,Size
5-input	Age,Gen,Inv,PN,Size
six-input	Gen,Dri,Node,PT,PN,Sta
7-input	Gen,Dri,Chew,Inv,Node,PN,Size
ReliefF-GA
iii-input	Gen,Inv,Node
4-input	Gen,Dri,Inv,Node
5-input	Gen,Dri,Inv,Node,PT
6-input	Eth,Gen,Dri,Inv,Node,PT
vii-input	Age,Eth,Gen,Smo,Dri,Node,Tre

Table 5

Classification accuracy and AUC for group ane

Feature selection	3-input		4-input		5-input		6-input		vii-input
Feature selection	%	AUC	%	AUC	%	AUC	%	AUC	%	AUC
ANFIS
GA	seventy.95	0.66	67.42	0.61	64.76	0.63	58.57	0.55	57.62	0.54
CC	58.ten	0.53	74.76	0.70	51.43	0.43	57.62	0.50	64.29	0.58
ReliefF	61.43	0.53	50.59	0.50	58.10	0.50	64.29	0.54	64.29	0.54
CC-GA	44.76	0.44	67.62	0.57	63.81	0.55	64.29	0.54	57.62	0.52
ReliefF-GA	67.14	0.55	sixty.48	0.59	67.62	0.59	51.90	0.47	64.76	0.57
ANN
GA	45.52	0.53	52.43	0.53	45.05	0.47	48.38	0.52	45.33	0.50
CC	54.48	0.61	53.57	0.59	51.29	0.58	51.29	0.51	52.33	0.53
ReliefF	51.52	0.48	41.62	0.47	46.05	0.49	46.05	0.48	44.10	0.48
CC-GA	49.24	0.51	49.48	0.52	46.67	0.49	48.29	0.49	50.48	0.51
ReliefF-GA	l.24	0.55	52.86	0.59	56.76	0.58	47.00	0.51	50.05	0.54
SVM
GA	60.95	0.53	61.43	0.51	58.x	0.48	58.10	0.46	61.43	0.49
CC	60.95	0.53	60.95	0.53	58.ten	0.46	51.43	0.41	51.43	0.41
ReliefF	54.29	0.44	50.95	0.42	51.43	0.42	48.10	0.twoscore	50.95	0.45
CC-GA	63.81	0.55	61.43	0.51	58.10	0.46	58.x	0.48	58.ten	0.49
ReliefF-GA	64.29	0.50	64.29	0.50	64.29	0.50	64.29	0.50	54.76	0.46
LR
GA	64.29	0.56	67.62	0.lx	64.76	0.55	68.ten	0.64	64.29	0.sixty
CC	64.29	0.56	60.48	0.57	67.62	0.61	67.62	0.61	64.29	0.58
ReliefF	fifty.59	0.44	fifty.59	0.44	48.10	0.39	41.43	0.34	44.29	0.39
CC-GA	67.62	0.57	67.62	0.60	61.43	0.51	lxx.95	0.72	64.76	0.67
ReliefF-GA	54.29	0.54	51.43	0.52	61.43	0.62	47.62	0.55	48.ten	0.51

From Tablefive, it can exist seen that ANFIS with the CC-iv-input model, obtained the all-time accurateness of 74.76% and an AUC of 0.70. For the ANN results, the model with the highest accuracy is the ReliefF-GA-5-input model with an accuracy of 56.76% and an AUC of 0.58. Whereas, for the SVM classifier, the models with the best accurateness are ReliefF-GA-3-input to 6-input models with an accuracy of 64.29% and an AUC of 0.fifty. As for LR nomenclature, the best model is the CC-GA-6-input model with an accuracy of 70.95% and an AUC of 0.72. The results obtained from both ANN and SVM showed low accuracy (56.76% & 64.29% respectively) and depression AUC (0.58 and 0.50 respectively), hence, indicated that these two are not the suitable classifiers to use for Group 1.

Grouping 2 (clinicopathologic and genomic variables)

The same experiments were carried out on Group 2, which is the combination of clinicopathologic and genomic variables. The selected features for each n-input model are listed in Tablehalf-dozen. Table6 shows that almost all the feature option methods included the genomic variable equally one of the key features, except for the ReliefF-3-input and ReliefF-iv-input.

Table half-dozen

Feature subset selected for grouping 2

Method	Characteristic subset selected
GA
3-input	Inv,Node,p63
4-input	Gen,Inv,Size,p53
v-input	Age,PT,PN,Size,p53
6-input	Historic period,PT,PN,Size,Tre,p53
7-input	Age,Eth,Smo,PT,PN,Size,p53
CC
3-input	Inv,PN,p63
four-input	Age,Inv,PN,p63
5-input	Age,Gen,Inv,PN,p63
half dozen-input	Age,Gen,Inv,PN,Size,p63
7-input	Historic period,Gen,Inv,PN,Size,p53,p63
ReliefF
3-input	Historic period,Eth,Dri
iv-input	Age,Eth,Dri,Tre
5-input	Historic period,Eth,Dri,Tre,p53
half dozen-input	Age,Eth,Dri,Tre,p53,p63
7-input	Age,Eth,Gen,Dri,Tre,p53,p63
CC-GA
3-input	Inv,Node,p63
4-input	Gen,Inv,Size,p53
5-input	Age,Dri,PN,Size,p53
vi-input	Gen,Inv,Node,PN,Size,p53
seven-input	Gen,Dri,Inv,Node,PN,Size,p53
ReliefF-GA
3-input	Dri,Inv,p63
4-input	Dri,Inv,Tre,p63
5-input	Age,Gen,Smo,Dri,p63
6-input	Historic period,Gen,Smo,Dri,Inv,p63
7-input	Age,Eth,Inv,Sta,Tre,p53,p63

For Group 2 using the ANFIS classification (Table7), there are five models with an accuracy of above 70%, these are namely, GA-3-input, CC-GA-3-input, CC-GA-4-input, ReliefF-GA-3-input and ReliefF-GA-iv-input. The best results were obtained from the ReliefF-GA-3-input and ReliefF-GA-4-input with an accuracy of 93.81% and an AUC of 0.90 and the features selected for the ReliefF-GA-3-input are drink, invasion, and p63 while the features selected for the ReliefF-GA-4-input are drink, invasion, handling and p63 (refer Table6).

Tabular array 7

Classification accurateness and AUC for group 2

Feature choice	3-input		4-input		5-input		half-dozen-input		7-input
Feature choice	%	AUC	%	AUC	%	AUC	%	AUC	%	AUC
ANFIS
GA	74.76	0.74	67.62	0.70	41.xc	0.40	58.57	0.58	35.71	0.36
CC	58.x	0.48	58.ten	0.52	51.90	0.48	48.57	0.46	61.90	0.59
ReliefF	54.29	0.47	44.29	0.38	48.10	0.53	67.14	0.62	67.14	0.62
CC-GA	74.76	0.70	70.48	0.71	54.76	0.57	61.43	0.61	64.29	0.65
ReliefF-GA	93.81	0.90	93.81	0.90	65.71	0.63	64.76	0.62	68.10	0.67
ANN
GA	45.14	0.50	51.48	0.55	45.81	0.49	46.xiv	0.50	47.71	0.51
CC	46.24	0.46	49.38	0.49	46.14	0.50	57.38	0.58	55.48	0.57
ReliefF	40.62	0.48	43.24	0.49	47.71	0.50	49.48	0.51	48.76	0.50
CC-GA	49.38	0.52	53.xc	0.60	47.05	0.52	44.76	0.48	55.19	0.57
ReliefF-GA	84.62	0.83	73.38	0.75	48.00	0.52	51.57	0.53	45.86	0.47
SVM
GA	74.76	0.70	54.76	0.51	seventy.95	0.65	60.95	0.55	50.95	0.42
CC	64.76	0.55	64.76	0.55	64.76	0.55	67.62	0.56	67.62	0.62
ReliefF	54.29	0.44	54.29	0.44	44.29	0.36	48.10	0.46	34.76	0.28
CC-GA	74.76	0.70	54.76	0.51	61.43	0.fifty	58.10	0.54	61.43	0.57
ReliefF-GA	74.76	0.lxx	71.43	0.68	74.76	0.70	74.43	0.66	54.76	0.53
LR
GA	74.76	0.70	63.81	0.64	67.14	0.57	54.76	0.43	54.29	0.47
CC	71.43	0.67	71.43	0.67	61.43	0.59	68.10	0.65	61.43	0.59
ReliefF	50.59	0.45	48.10	0.39	48.x	0.41	44.76	0.43	41.43	0.41
CC-GA	74.76	0.70	63.81	0.64	60.48	0.61	64.29	0.63	60.48	0.54
ReliefF-GA	74.76	0.70	74.76	0.70	71.43	0.68	58.10	0.55	61.43	0.threescore

As shown in Table7, the FF neural network together with the ReliefF-GA-iii-input model accomplished the best issue with an accuracy of 84.62% and an AUC of 0.83. For SVM classification, the classification results are by and large improve in Group 2 when compared to Grouping 1 (Tablefive) with some exceptions (GA-three-input, GA-7-input, CC-GA-4-input, ReliefF-5-input and ReliefF-7-input). The best accuracy in Group 2 is obtained past the GA-3-input, CC-GA-iii-input, ReliefF-GA-3-input, and ReliefF-GA-5-input with an accuracy of 74.76% and an AUC of 0.70. Whereas, for LR classification in Group 2, GA-iii-input, CC-GA-3-input, ReliefF-GA-3-input and ReliefF-GA-four-input achieved the best nomenclature accuracy of 74.76% and an AUC of 0.lxx.

Comparing of the results for group ane and grouping 2

This section discusses and compares the results generated from different classification methods for both Group 1 and Grouping two. Tableviii summarizes the all-time accuracy for the due north-input model based on the characteristic selection method for Group i and Group 2. The summary is also depicted in the graph as shown in Effigy5 and Figurehalf dozen respectively.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-5.jpg

Graphs for best accurateness for n-input model based on feature choice method for Group 1.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-6.jpg

Graphs for best accuracy for north-input model based on feature selection method for Grouping two.

Table 8

Best accuracy for n -input model based on feature pick method

Feature selection method	Grouping one					Group 2
	n- input model					n -input model
	3	4	five	6	seven	three	4	5	half-dozen	7
GA	70.95	67.62	64.76	68.ten	64.29	74.76	67.62	70.95	lx.95	54.29
CC	64.29	74.76	67.62	67.62	64.29	71.43	71.43	64.76	68.10	67.62
ReliefF	61.43	50.59	58.10	64.29	64.29	54.29	54.29	48.10	67.14	67.14
CC-GA	67.62	67.62	63.81	seventy.95	64.76	74.76	70.48	61.43	64.29	64.29
ReliefF-GA	67.fourteen	64.29	67.62	64.29	64.76	93.81	93.81	74.76	74.43	68.ten

For Group one (Figure5), the correlation coefficient (CC) feature selection method performed ameliorate than the other methods with the highest accurateness of 74.76% in the 4-input model. There are iii models that achieved accuracies of above 70%; the other two are GA-three-input and CC-GA-half dozen-input. ReliefF feature choice method obtained the worst results when compared to the other methods

As regards to Grouping 2 (Figure6), the ReliefF-GA feature selection method outperformed the others in all the n-input models, with the highest accuracy of 93.81%. There are ten models with accuracies higher up 70% as shown in Tabular array8; this confirms that Grouping 2 which includes genomic variables accomplished higher accuracy with feature selection methods. In addition, well-nigh of the models with college accuracy are the lower input models with 3 or 4-input only.

Next, Tablenine lists the all-time accurateness by classification method and the graphs are depicted in Figures7 and 8 for both Group 1 and Group two respectively.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-7.jpg

Graphs for best accuracy past nomenclature method for Group 1.

An external file that holds a picture, illustration, etc. Object name is 1471-2105-14-170-8.jpg

Graphs for best accuracy by classification method for Group 2.

Tabular array 9

Best accurateness by classification method

Feature choice method	Group 1				Group 2
	Nomenclature method				Nomenclature method
	ANFIS	ANN	SVM	LR	ANFIS	ANN	SVM	LR
GA	lxx.95	52.43	61.43	68.10	74.76	51.48	74.76	74.76
CC	74.76	54.48	60.95	67.62	61.90	57.38	67.62	71.43
ReliefF	64.29	51.52	54.29	50.59	67.14	49.48	54.29	l.59
CC-GA	67.62	50.48	63.81	70.95	74.76	55.19	74.76	74.76
ReliefF-GA	67.62	56.76	64.29	61.43	93.81	84.62	74.76	74.76

From Effigyvii, ANFIS performed the best in Group 1 when compared to other classification methods for all types of feature selection methods except CC-GA method. All the classification methods performed worst in ReliefF feature selection method except for ANN. ANN had the everyman accuracy rate if compared to other methods.

Whereas, in Group two as shown in Effigy8, ANFIS outperformed the other nomenclature methods except in CC feature selection method. The best accuracy is accomplished past ANFIS in ReliefF-GA method with the accuracy of 93.81% (Table9). In general, all classification methods performed improve in CC-GA and ReliefF-GA hybrid feature selection methods except for SVM and LR. Equally with Grouping 1, ANN had the lowest classification rate except in ReliefF-GA method. Overall, the performance of the classification method is better in Group 2 every bit compared to Grouping 1. Tabular array10 summarizes the all-time models with their selected features for both Group 1 and Group 2. All the models with the accuracy of 70% and to a higher place are selected.

Table 10

Best models with accurateness, AUC, classification method and selected features

	Accuracy	AUC	Nomenclature method	Selected features
Group 1
CC-3-input	74.76	0.70	ANFIS	Historic period,Inv,PN
GA-3-input	70.95	0.66	ANFIS	PT,PN,Sta
CC-GA-6-input	70.95	0.73	LR	Gen,Dri,Node,PT,PN,Sta
Group 2
ReliefF-GA-iii-input	93.81	0.90	ANFIS	Dri,Inv,p63
ReliefF-GA-4-input	93.81	0.90	ANFIS	Dri,Inv,Tre,p63
ReliefF-GA-iii-input	84.62	0.83	ANN	Dri,Inv,p63
ReliefF-GA-3-input	84.62	0.83	ANN	Dri,Inv,p63
GA-3-input	74.76	0.74	ANFIS	Inv,Node,p63
CC-GA-3-input	74.76	0.lxx	ANFIS	Inv,Node,p63
CC-GA-3-input	74.76	0.seventy	SVM	Inv,Node,p63
CC-GA-3-input	74.76	0.lxx	LR	Inv,Node,p63
ReliefF-GA-3-input	74.76	0.70	SVM	Dri,Inv,p63
ReliefF-GA-three-input	74.76	0.lxx	LR	Dri,Inv,p63
Relief-GA-iv-input	74.76	0.70	LR	Dri,Inv,Tre,p63
Relief-GA-v-input	74.76	0.70	SVM	Historic period,Gen,Smo,Dri,p63
Relief-GA-6-input	74.43	0.66	SVM	Age,Gen,Smo,Dri,Inv,p63
Relief-GA-4-input	73.38	0.75	ANN	Dri,Inv,Tre,p63
Relief-GA-4-input	71.43	0.68	SVM	Dri,Inv,Tre,p63
Relief-GA-5-input	71.43	0.68	LR	Age,Gen,Smo,Dri,p63
CC-three-input	71.43	0.67	LR	Inv,PN,p63
CC-4-input	71.43	0.67	LR	Age,Inv,PN,p63
CC-GA-four-input	70.48	0.71	ANFIS	Gen,Inv,Size,p53

From Tableten, the models with the highest accuracy are ReliefF-GA-3-input and ReliefF-GA-4-input from Group 2 with ANFIS classification, the accuracy is 93.81% and AUC of 0.ninety. The features selected are Drink, Invasion and p63 and Drink, Invasion, Treatment, and p63 respectively. This is followed by the ReliefF-GA-three-input model from Grouping ii with ANN classification, with the accuracy of 84.62% and AUC of 0.83. Most of the best models are generated from the ReliefF-GA feature selection method; this proves that the features selected by this method are the optimum features for the oral cancer prognosis dataset.

Discussions

The results shown meets the objective of this research, namely, the classification performance is much better with the existence of genomic variables in Group 2. From the results in Table10, the all-time feature selection method for oral cancer prognosis is ReliefF-GA with ANFIS classification. This shows that the ANFIS is the most optimum classification tool for oral cancer prognosis.

Since there are two top models with the same accurateness, hence, the simpler one is recommended for further works in like researches which is the ReliefF-GA-3-input model with ANFIS classification, and the optimum subset of features are Drink, Invasion and p63. These findings confirmed that of some previous studies. Alcohol consumption has always been considered as a risk factor and one of the reasons for poor prognosis of oral cancer [30-33]. Walker D et al. [34] showed that the depth of invasion is one of the most of import predictors of lymph node metastasis in tongue cancer. The unlike researches done by [35-38], discovered a significant link between the depth of invasion and oral cancer survival. As regards to p63, [12-14] showed that p63 over expression associates with poor prognosis in oral cancer.

A comparison between the current methodology and the other methods in the literature was done and shown in Table11. Nonetheless, direct comparisons cannot be performed since different datasets accept been employed in each case. In this comparison, we compared the studies which utilized at to the lowest degree both types of clinical and genomic data in oral cancer. In general, the proposed methodology exhibits superior results compared to the other methods except the work done by [eight,9] which claimed to achieve an accuracy of 100%. Nonetheless, they employed unlike classifiers for different source of data and more than 70 markers were required for their final combined classifier. A meaning advantage of our proposed methodology is only three optimum markers are needed with a single classifier for both types of clinicopathologic and genomic data to obtain high accurateness outcome. It is hope that the proposed methodology is feasible to expedite oral cancer clinicians in the decision support stage and to better predict the survival rate of the oral cancer patients based on the three markers only.

Table xi

Comparing between the electric current work and the literature

Author	Sample size	Accuracy (%)
Passaro et al. [half dozen]	124 patients, 231 controls	74-79
Oliveira et al. [7]	500	5-twelvemonth survival of 28.6%
Exarchos et al. [viii]	41	100
Exarchos et al. [9]	86	100
Dom et al. [ten]	84 patients, 87 controls	82
Electric current work	31	93.81

A common problem associated with medical dataset is pocket-sized sample size. It is time consuming and costly to obtain large amount of samples in medical enquiry and the samples are unremarkably inconsistent, incomplete or noisy in nature. The small sample size problem is more than visible in the oral cancer enquiry since oral cancer is not one of the acme ten virtually common cancers in Malaysia, hence at that place are not many cases. For example, in Peninsular Malaysia, there are just 1,921 new oral cancer cases from 2003 to 2005 [39] and 592 new oral cancer cases in the year 2006 [40] as compared to breast cancer, where the incidence between 2003 and 2005 is 12,209 [39] and the incidence for 2006 is 3,591 [forty]. Out of these oral cancer cases, some patients are lost to follow-up, some patients seek treatments in other private hospitals and thus, their information are not available for this enquiry. Another reason for small-scale sample size is caused by the medical confidentiality issues. This can be viewed from two aspects, namely, patients and clinicians. Some patients do non wish to reveal any data about their diseases to others, and are not willing to donate their tissues for enquiry/educational purposes. As for clinicians, some may not want to share patients' data with others especially those from the non-medical fields, while some practise not keep their medical records in the correct medical form. From those available cases, some patients' clinicopathologic data are incomplete, some tissues are missing due to improper management and some are duplicated cases. Due to that, the number of cases that tin can actually be used for this inquiry is very limited.

In order to overcome the problem of modest sample size, we employed the feature selection methods on our dataset to choose the most optimum feature subsets based on the correlations of the input and output variables. The features selected are fed into the proposed classifier and the results showed that the ReliefF-GA-ANFIS prognostic model is suitable for pocket-sized sample size data with the proposed optimum feature subset of drinkable, invasion and p63.

Significance testing

The significance test used in this inquiry was the Kruskal-Wallis test. Kruskal-Wallis is a not-parametric exam to compare samples from two or more than groups and returns the p-value. For this inquiry, we wanted to test if there is any statistical pregnant deviation between the accuracy results generated for the 3-input model of Group 2 for the unlike characteristic choice methods. Thus, the null hypothesis is ready every bit: H ₀= There is no deviation betwixt the results of the different characteristic selection models. If the p-value computed from the examination is 0.05 or less, the H ₀ is rejected, which means there is a departure betwixt the results of the unlike feature option methods. The p-value that generated was 0.0312, which is less than 0.05, this means the H ₀ is rejected and in that location is a significant departure between the feature selection methods.

Validation testing

In this section, the all-time model of ReliefF-GA-3-input model is compared with other models with a random permutation of three inputs. The purpose is to validate that the features selected by the ReliefF-GA method are the optimum subset for oral cancer prognosis. In addition, the full-input model (the model with all the 17 variables) volition be tested every bit well in guild to verify that the reduced model can accomplish the aforementioned or better results than the total model. In this testing, the classification method used is ANFIS due to its best functioning in the previous section and the results are tabulated in Table12.

Tabular array 12

Validation exam with random permutation of iii-input model and total input model for Group ii

Models	ANFIS
	%	AUC
Random permutation model
Age, Inv, p63	64.76	0.63
Eth, Dri, p53	57.14	0.49
PT, PN, Sta	58.10	0.51
Gen, Node, Tre	lxx.95	0.59
Eth, Gen, Sub	39.05	0.32
Dri, p53, p63	80.48	0.70
Age, p53, p63	67.fourteen	0.67
Gen, Dri, Inv	54.76	0.55
Site, Inv, Size	32.86	0.28
Age, Chew, Size	48.10	0.41
Full model
Total model with ANFIS	North.A.*	N.A.*
Full model with NN	42.90	0.47
Total model with SVM	54.76	0.46
Total model with LR	54.76	0.59

*N.A. - Results non available due to over-fitting problem equally the rule-base of operations generated was too large.

Table12 presents the results from unlike permutation of the 3-input models using ANFIS and that of the total model with all the 17 variables using the different classification methods. The three inputs are generated randomly and the best accuracy obtained is lxxx.48% with an AUC of 0.seventy. The features selected are Drink, p53 and p63. The results of the full model are not promising and the results of full model using ANFIS cannot exist generated due to over-plumbing fixtures problems as the rule base generated is too large.

Finally, the selected features are tested on the oral cancer dataset for 1-year and ii-yr with ANFIS classification and the results are very promising with an accuracy for 1-twelvemonth prognosis of 93.33% and 2-twelvemonth prognosis observed at 84.29%, equally compared to the 3-year prognosis of 93.81%. The results are shown in Table13.

Table 13

Nomenclature results for 1-year to iii-year oral cancer prognosis

Oral cancer prognosis	Accurateness (%)	AUC
1-year	93.33	0.90
2-year	84.29	0.77
iii-year	93.81	0.90

Findings

The analyses and findings from this research are:

(i) The performance of Group 2 (clinicopathologic and genomic variables) is better than Grouping one (clinicopathologic variables). This is in accordance with the objective of this enquiry, which shows that the prognostic upshot is more than authentic with the combination of clinicopathologic and genomic markers.

(ii) The model with the best accurateness is the ReliefF-GA-3-input model with the ANFIS nomenclature model from Grouping two and the Kruskal-Wallis test showed a meaning departure as compared to the 3-input model of GA, CC, ReliefF and CC-GA.

(three) The optimum subset of features for oral cancer prognosis is beverage, invasion and p63 and this finding is in accordance with similar studies in the literature.

(iv) The ANFIS classification model accomplished the all-time accuracy in oral cancer prognosis when compared to bogus neural network, back up vector auto and statistical method of logistic regression.

(v) The prognostic result is more accurate with fewer inputs in comparing with the full model.

Equally a summary, the hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 accomplished the best accurateness. Through the identification of fewer markers for oral cancer prognosis, it is hoped that this will assistance clinicians in carrying out prognostic procedures, and thus help them in making a more than accurate prognosis in a shorter time at lower costs. Furthermore, the results of this research helps patients and their family plan their hereafter and lifestyle through a more reliable prognosis.

Conclusions

In this research, we presented a prognostic system using the hybrid of characteristic pick and machine learning methods for the purpose of oral cancer prognosis based on clinicopathologic and genomic markers. Every bit a conclusion, the hybrid model of ReliefF-GA-ANFIS resulted in the best accuracy (accuracy = 93.81%, AUC = 0.ninety) with the selected features of drink, invasion and p63. The results proved that the prognosis is more accurate when using the combination of clinicopathologic and genomic markers. Yet, more tests and experiments needed to be washed in gild to farther verify the results obtained in this enquiry. Our future works include increasing the sample size of the dataset by providing more medical samples thus making it closer to the real population and including more genomic markers in our written report.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SWC developed the prognostic model, performed the experiments and drafted the manuscript. SWC and SAK conceived with the study and contributed to the experimental pattern. AFM and RBZ contributed in the assay and interpretation of oral cancer prognostic dataset. All authors read and approved the final manuscript.

Acknowledgment

This study is supported past the University of Malaya Enquiry Grant (UMRG) with the projection number RG026-09ICT. The authors would like to thank Dr Mannil Thomas Abraham from the Tunku Ampuan Rahimah Infirmary, Ministry building of Health, Malaysia, Dr Thomas George Kallarakkal from the Department of Oral Pathology and Oral Medicine and Periodontology, the staff from the Oral & Maxillofacial Surgery department, the Oral Pathology Diagnostic Laboratory, the OCRCC, the Faculty of Dentistry, and the ENT department, Faculty of Medicine, University of Malaya for the preparation of the dataset and the related data and documents for this projection.

References

Lisboa PJ, Taktak AFG. The Use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw. 2006;19:408–415. doi: 10.1016/j.neunet.2005.10.007. [PubMed] [CrossRef] [Google Scholar]
Cruz JA, Wishart DS. Applications of car learning in cancer prediction and prognosis. Cancer Informatics. 2006;2:59–78. [PMC free article] [PubMed] [Google Scholar]
Futschik ME, Sullivan M, Reeve A, Kasabov Northward. Prediction of clinical behaviour and treatment for cancers. Appl Bioinformatics. 2003;2(3 Suppl):S53–S58. [PubMed] [Google Scholar]
Gevaert O, Smet FD, Timmerman D, Moreau D, Moor BD. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics. 2006;22(14):e184–e190. doi: 10.1093/bioinformatics/btl230. [PubMed] [CrossRef] [Google Scholar]
Sunday Y, Goodison S, Li J, Liu L, Farmerie Westward. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007;23(1):thirty–37. doi: x.1093/bioinformatics/btl543. [PMC gratis article] [PubMed] [CrossRef] [Google Scholar]
Passaro A, Baronti F, Maggini V. Exploring relationships betwixt genotype and oral cancer development through XCS. New York, United states of america: GECCO′05; 2005. [Google Scholar]
Oliveira LR, Ribeiro-Silve A, Costa JPO, Simoes AL, Di Matteo MAS, Zucoloto S. Prognostic factors and survival analysis in a sample of oral squamous cell carcinoma patients. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology. 2008;106(5):685–695. doi: 10.1016/j.tripleo.2008.07.002. [PubMed] [CrossRef] [Google Scholar]
Exarchos K, Goletsis Y, Fotiadis D. Multiparametric Conclusion Back up System for the Prediction of Oral Cancer Reoccurrence. IEEE Trans Inf Technol Biomed. 2011;16(6):1127–1134. [PubMed] [Google Scholar]
Exarchos Yard, Goletsis Y, Fotiadis D. A multiscale and multiparametric approach for modeling the progression of oral cancer. BMC Med Inform Decis Mak. 2012;12:136–150. doi: 10.1186/1472-6947-12-136. [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
Dom RM, Abdul-Kareem S, Abidin B, Jallaludin RLR, Cheong SC, Zain RB. Oral cancer prediction model for Malaysian sample. Austral-Asian Journal of Cancer. 2008;7(4):209–214. [Google Scholar]
Catto JWF, Abbod MF, Linkens DA, Hamdy FC. Neuro-fuzzy modeling: an accurate and interpretable method for predicting bladder cancer progression. J Urol. 2006;175:474–479. doi: x.1016/S0022-5347(05)00246-6. [PubMed] [CrossRef] [Google Scholar]
Muzio LL, Santarelli A, Caltabiano R, Rubini C, Pieramici T, Trevisiol L. p63 overexpression associates with poor prognosis in head and neck squamous cell carcinoma. Hum Pathol. 2005;36:187–194. doi: 10.1016/j.humpath.2004.12.003. [PubMed] [CrossRef] [Google Scholar]
Chen YK, Huse SS, Lin LM. Differential expression of p53, p63 and p73 proteins in human buccal squamous-cell carcinomas. Clin Otolaryngol Allied Sci. 2003;28(5):451–455. doi: x.1046/j.1365-2273.2003.00743.x. [PubMed] [CrossRef] [Google Scholar]
Choi H-R, Batsakis JG, Zhan F, Sturgis E, Luna MA, El-Naggar AK. Differential expression of p53 gene family members p63 and p73 in caput and neck squamous tumorigenesis. Hum Pathol. 2002;33(two):158–164. doi: 10.1053/hupa.2002.30722. [PubMed] [CrossRef] [Google Scholar]
Mehrotra R, Yadav Southward. Oral squamous jail cell carcinoma: etiology, pathogenesis and prognostic value of genomic alterations. Indian J Cancer. 2006;43(2):60–66. doi: x.4103/0019-509X.25886. [PubMed] [CrossRef] [Google Scholar]
Thurfjell Northward, Coates PJ, Boldrup 50, Lindgren B, Bäcklund B, Uusitalo T, Mahani D, Dabelsteen E. Function and Importance of p63 in Normal Oral Mucosa and Squamous Cell Carcinoma of the Caput and Neck. Current Research in Head and Neck Cancer. 2005;62:49–57. [PubMed] [Google Scholar]
Zigeuner R, Tsybrovskyy O, Ratschek M, Rehak P, Lipsky K, Langner C. Prognostic touch on of p63 and p53 in upper urinary tract transitional prison cell carcinoma. Adult Urology. 2004;63(vi):1079–1083. doi: 10.1016/j.urology.2004.01.009. [PubMed] [CrossRef] [Google Scholar]
Rosner B. Fundamentals of Biostatistics. 6. California: Thomson Higher Education; 2006. [Google Scholar]
Kononenko I. ECML-94 Proceedings of the European conference on car learning on Auto Learning: 1994. Catania, Italy: Springer; 1994. Estimating Attributes: Analysis and Extension of RELIEF; pp. 171–182. [Google Scholar]
Goldberg DE. Genetic Algorithms in Search, Optimization, and Auto Learning. Boston: Addison-Wesley Longman; 1989. [Google Scholar]
Siow-Wee C, Kareem SA, Kallarakkal TG, Merican AF, Abraham MT, Zain RB. Feature Selection Methods for Optimizing Clinicopathologic Input Variables in Oral Cancer Prognosis. Asia Pacific Journal of Cancer Prevention. 2011;12(x):2659–2664. [PubMed] [Google Scholar]
Efron B. Estimating the mistake rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc. 1983;78(382):316–330. doi: 10.1080/01621459.1983.10477973. [CrossRef] [Google Scholar]
Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005;21(15):3301–3307. doi: 10.1093/bioinformatics/bti499. [PubMed] [CrossRef] [Google Scholar]
Jang JSR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Human being Cybern. 1993;23(three):665–685. doi: 10.1109/21.256541. [CrossRef] [Google Scholar]
Jang JSR. Input Option for ANFIS Learning. Fifth IEEE International Briefing on Fuzzy Systems vol. 2. 1996. pp. 1493–1499.
Gershenson C. Bogus Neural Network For Beginners. Formal Computational Skills Teaching Package, COGS, University of Sussex; 2001. [Google Scholar]
Chih-Chung C, Chih-Jen L. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2:27:21–27:27. [Google Scholar]
Chih-Wei H, Chang C-C, Lin C-J. Technical Report. Taiwan: National Taiwan University; 2010. A Practical Guide to Support Vector Auto. [Google Scholar]
Ross SM. Introductory Statistics. iii. New York, United states of america: Academic Printing, Elsevier; 2010. [Google Scholar]
Jefferies S, Foulkes WD. Genetic mechanisms in squamous jail cell carcinoma of the head and neck. Oral Oncol. 2001;37:115–126. doi: 10.1016/S1368-8375(00)00065-viii. [PubMed] [CrossRef] [Google Scholar]
Leite ICG, Koifman S. Survival analysis in a sample of oral cancer patients at a reference hospital in Rio de Janeiro, Brazil. Oral Oncol. 1998;34(1998):347–352. [PubMed] [Google Scholar]
Reichart PA. Identification of adventure groups for oral precancer and cancer and preventive measures. Clin Oral Invest. 2001;5:207–213. doi: ten.1007/s00784-001-0132-5. [PubMed] [CrossRef] [Google Scholar]
Zain RB, Ghazali N. A review of epidemiological studies of oral cancer and precancer in Malaysia. Register of Dentistry University of Malaya. 2001;viii:50–56. [Google Scholar]
Walker D, Boey G, McDonald L. The pathology of oral cancer. Pathology. 2003;35(five):376–383. doi: ten.1080/00310290310001602558. [PubMed] [CrossRef] [Google Scholar]
Asakage T, Yokose T, Mukai Yard, Tsugane S, Tsubono Y, Asai G, Ebihara S. Tumor thickness predicts cervical metastasis in patients with stage I/II carcinoma of the tongue. Cancer. 1998;82:1443–1448. doi: x.1002/(SICI)1097-0142(19980415)82:viii<1443::AID-CNCR2>3.0.CO;2-A. [PubMed] [CrossRef] [Google Scholar]
Giacomarra V, Tirelli G, Papanikolla L, Bussani R. Predictive factors of nodal metastases in oral cavity and oropharynx carcinomas. Laryngoscope. 1999;109:795–799. doi: ten.1097/00005537-199905000-00021. [PubMed] [CrossRef] [Google Scholar]
Morton R, Ferguson C, Lambie Northward, Whitlock R. Tumor thickness in early natural language cancer. Arch Otolaryngol Caput Neck Surg. 1994;120:717–720. doi: ten.1001/archotol.1994.01880310023005. [PubMed] [CrossRef] [Google Scholar]
Williams J, Carlson G, Cohen C, Derose P, Hunter S, Jurkiewicz M. Tumor angiogenesis as a prognostic cistron in oral cavity tumors. Am J Surg. 1994;168:373–380. doi: ten.1016/S0002-9610(05)80079-0. [PubMed] [CrossRef] [Google Scholar]
Gerard LCC, Rampal S, Yahaya H. Third Written report of the National Cancer Registry Cancer Incidence in Malaysia (2005) National Cancer Registry, Ministry building of Wellness Malaysia. 2005.
Omar ZA, Ali ZM, Tamin NSI. Malaysian Cancer Statistics - Information and Figure, Peninsular Malaysia 2006. National Cancer Registry, Ministry of Health Malaysia. 2006.

Articles from BMC Bioinformatics are provided here courtesy of BioMed Central

moubrayfectined.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673908/

New Chromosome Combinations Section Review 54 Addison Wesley Longman

Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of characteristic pick and machine learning methods

Siow-Wee Chang

Sameem Abdul-Kareem

Amir Feisal Merican

Rosnah Binti Zain

Abstract

Groundwork

Results

Conclusions

Groundwork

Methods

Clinicopathologic data

Table 1

Table ii

Genomic information

Feature selection

Genetic algorithm (GA)

Pearson'due south correlation coefficient (CC)

Relief-F

Pearson'due south correlation coefficient and genetic algorithm (CC-GA)

Relief-F and genetic algorithm (ReliefF-GA)

Option of north-input models

Table 3

Classification

Adaptive neuro-fuzzy inference organization (ANFIS)

Artificial neural network (ANN)

Support vector automobile (SVM)

Logistic regression (LR)

Experiment

Results

Grouping 1 (clinicopathologic variables)

Tabular array 4

Table 5

Grouping 2 (clinicopathologic and genomic variables)

Table half-dozen

Tabular array 7

Comparing of the results for group ane and grouping 2

Table 8

Tabular array 9

Table 10

Discussions

Table xi

Significance testing

Validation testing

Tabular array 12

Table 13

Findings

Conclusions

Competing interests

Authors' contributions

Acknowledgment

References

0 Response to "New Chromosome Combinations Section Review 54 Addison Wesley Longman"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel