Machine Learning model Vs Deep Learning – Case Study Approach

8 Apr 2020

The debate of preferring deep learning over machine learning is going on for a while among data scientists, especially after Google made the tensor-flow as an open source in 2016. Deep learning algorithms are out performing the machine learning approaches in some imaging applications like face recognition and object detection. Deep Learning (DL) algorithms are like black boxes, whereas the Machine Learning (ML) algorithms are like art. One needs to be innovative to handcraft the features from the images to build the ML models.

Deep learning algorithms need more data(images) to learn the patterns from it, but in many domains, it is not easy to acquire more data. Whenever there is not enough data, we can use a machine learning approach. In ML approach, a programmer needs to perform feature extraction. For example, say there are tumor images, in ML approach the job of the programmer is to extract the numeric features like, area, diameter, volume, smoothness, edge patterns etc., from the tumor images. To extract those features, one should have both programming and domain knowledge. In Deep Learning approach we will just feed more and more tumor images to the model and the features will be learned by the model on its own. The role of the programmer is minimum in the case of Deep Learning as the feature extraction process is not explicit.In this article we are going to take you through a step by step approach on implementing image processing case study through ML approach. The example that we are going to discuss in this article is about predicting the malignancy of the nodules(tumor) present inside the lung region.

The diagnosis of lung cancer at an early stage is critical and uncertain. The lung is a sponge kind of tissue anatomy. The process of taking a tissue from the lung is known as a biopsy, to examine its cancerous nature is a painful process and taking a tissue accurately from a small abnormal tissue cluster inside a lung is a challenging task. Physicians will not recommend biopsy without reliable evidence for lung cancer. Physicians analyse the CT scan of a patient and if they suspect any symptoms of lung cancer, direct the patient to undergo one more CT scan after a time span of 3,6,9,12 or 18 months based on patient smoking habits and environmental conditions. After analyzing CT images at different intervals and based on the level of disease progression, a biopsy is performed.

The qualitative analysis performed by the physicians may vary from one expert to another due to human factors and the huge number of images that they need to analyse, which is time consuming and requires a trained eye. Hence, it has become essential to develop an intelligent computerized model to analyse the CT scans for identifying the malignancy in present. The objective of this case study is to segment the region of interest from the image and extract the features to build a machine learning model to detect the malignant nature of lung from CT scan images.

The database provided for this work consists of data from 50 patient CT scans. Each CT scan has 60 to 250 cross sectional CT images in it. The small white tissue clusters inside the lung parenchyma region is known as ‘nodule’. These nodules may be the potential indicator for lung cancer (but most of the nodules are generally benign !!!).MACHINE LEARNING APPROACH

We know that the machine learning algorithm needs the tabular numeric data as an input. We need to extract the features from all the suspected nodules from the lung parenchyma region. Our region of interest to diagnose the cancer is only the nodules present inside the lung, not the whole lung region. Therefore, the first step we need to perform is to segment this region of interest from the CT scan. The following section of the script will load the image and perform segmentation. There are many segmentation techniques reported in literature. In this work we used the OSTU threshold-based segmentation technique.

import cv2

import matplotlib.pyplot as plt

import numpy as np

import time

img=cv2.imread(‘lung_img.jpg’,0)

cv2.imshow(“Lung Image”,img)

th,ostu_img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

In the above figure, along with nodules other tissue regions are also segmented as white regions. Morphological processing will be performed to remove those surrounded white regions. Different morphological techniques like erosion, dilation, inversion, opening, closing, filling etc., can be performed to get the region of interest alone from the segmented images. Choice of appropriate morphological technique is based on the output of the initial segmentation process. In this example case, we have used the clear-border morphology to remove all the unwanted white regions and keep only the nodules.

from skimage.segmentation import clear_border

img2=clear_border(ostu_img)

cv2.imshow(“clear_border”,img2

Also, we can perform different morphological operations to get the lung masks and the nodules as shown below:

img3=255-ostu_img

cv2.imshow(“Inverted_image”,img3)

img4=clear_border(img3)

cv2.imshow(“clear_border1”,img4)

se_fill=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(21,21))

img4_fill = cv2.morphologyEx(img4, cv2.MORPH_CLOSE, se_fill)

cv2.imshow(“filled_image”,img4_fill)

se_open=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(21,21))

img4_open = cv2.morphologyEx(img4_fill, cv2.MORPH_OPEN, se_open)

cv2.imshow(“Open_image”,img4_open)

paren=img & img4_open

cv2.imshow(“Parenchyma”,paren)

thos,nod_th = cv2.threshold(paren,100,255,cv2.THRESH_BINARY)

cv2.imshow(“Nodules”,nod_th)

se_open=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))

nodules = cv2.morphologyEx(nod_th, cv2.MORPH_OPEN, se_open)

cv2.imshow(“Final_Nodules”,nodules)

After segmenting all the nodules from the lung CT images, different features need to be computed for all the nodules. Generally, features will be computed on three platforms: 1. Shape 2. Texture and 3. Colour. In this work we computed the shape based features.

nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(nodules)

cv2.imshow(“Final_Nodules”,nodules)

sizes = stats[1:, -1];

min_size = 15

nodules1 = np.zeros((labels.shape),dtype=’uint8′)

for i in range(0, nlabels-1):

if sizes[i] >= min_size:

nodules1[labels == i + 1] = 255

cv2.imshow(“Final_Nodules1”,nodules1)

nlabels1, labels1, stats1, centroids1 = cv2.connectedComponentsWithStats(nodules1)

#pos1=np.where(stats[3]>10)

#nod1=np.zeros([512,512],dtype=’uint8′)

all_nod=[]

feat=np.zeros([nlabels1-1,9],dtype=’float’)

for i in range(1,nlabels1):

nod1=np.zeros([512,512],dtype=’uint8′)

pos=np.where(labels1==i)

nod1[pos]=nodules1[pos]

cv2.imshow(“nod1”,nod1)

all_nod.append(nod1)

x,y,w,h = cv2.boundingRect(nod1)

feat[i-1,0] = float(w)/h #Aspect Ratio

cc, hierarchy = cv2.findContours(nod1.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)

area = cv2.contourArea(cc[0])

feat[i-1,1]=area

rect_area = w*h

feat[i-1,2]= float(area)/rect_area #Extent

hull = cv2.convexHull(cc[0])

hull_area = cv2.contourArea(hull)

feat[i-1,3]=hull_area #hull area

feat[i-1,4] = float(area)/hull_area #solidity

feat[i-1,5] = np.sqrt(4*area/np.pi) #equi_diameter

[(x,y),(MA,ma),angle] = cv2.fitEllipse(cc[0])

feat[i-1,6] =angle

feat[:,7]=centroids1[1:,0]

feat[:,8]=centroids1[1:,1]

Nine shape features for 18 segmented nodules from one CT scan are extracted after executing the above script.

Nodules	Aspect Ratio	Area	Extent	Hull area	Solidity	Equi-diameter	Angle	Centroid_X	Centroid_Y
Nodule 1	1	30.5	0.376543	35.5	0.859155	6.23168	150.249	279.659	189.854
Nodule 2	0.348315	788.5	0.285792	1845	0.427371	31.6852	159.649	335.058	248.16
Nodule 3	0.5	11	0.34375	13.5	0.814815	3.74241	13.3688	358.421	212.526
Nodule 4	0.9375	1041.5	0.271224	2109.5	0.493719	36.4154	141.711	190.555	264.887
Nodule 5	0.7	42	0.6	43.5	0.965517	7.31273	13.31	203.741	239.519
Nodule 6	0.833333	11.5	0.383333	12	0.958333	3.82652	139.736	216.5	244.611
Nodule 7	0.833333	71.5	0.595833	74	0.966216	9.54131	156.121	323.244	247.081
Nodule 8	0.625	326.5	0.3265	614	0.531759	20.389	159.57	229.711	286.905
Nodule 9	1.5	29.5	0.546296	31	0.951613	6.12867	81.095	166.575	277.8
Nodule 10	1.28571	22	0.349206	29	0.758621	5.29257	119.901	255.094	278.969
Nodule 11	0.5	52	0.530612	55.5	0.936937	8.13686	18.0453	175.104	296.403
Nodule 12	1	11.5	0.319444	11.5	1	3.82652	45	333.722	292.722
Nodule 13	0.666667	65.5	0.303241	99	0.661616	9.13221	162.512	365.172	302.483
Nodule 14	0.702703	484.5	0.503638	576.5	0.840416	24.8372	9.18256	159.629	318.956
Nodule 15	1	21	0.259259	24	0.875	5.17088	136.903	384.129	317.387
Nodule 16	0.714286	18	0.514286	18.5	0.972973	4.78731	20.5219	320.923	357.962
Nodule 17	0.833333	14	0.466667	14.5	0.965517	4.22201	28.568	336.905	371.476
Nodule 18	1.75	11	0.392857	12	0.916667	3.74241	110.04	359	370.5

Out of these 18 nodules, nodule 14 has been detected as cancerous by the medical expert. Hence, the corresponding nodule has been labelled as ‘1’ while the other nodules have been labelled as ‘0’.

Similarly, the nodule segmentation and feature extraction process can be carried out on different patient’s scan images and by concatenating all this information we can form a structured supervised data-set, on which any machine learning classification algorithms can be applied to build a predictive model.

The skill of deriving the structured data from the un-structured data is the vital step to build the predictive model. We hope this article gives you the hand-on experience of converting the images in to structured data.

Machine Learning model Vs Deep Learning – Case Study Approach

Programs Offered By UNext

Programs Offered By UNext

Programs Offered By UNext

Machine Learning model Vs Deep Learning – Case Study Approach

Related Articles