The debate of preferring deep learning over machine learning is going on for a while among data scientists, especially after Google made the tensor-flow as an open source in 2016. Deep learning algorithms are out performing the machine learning approaches in some imaging applications like face recognition and object detection. Deep Learning (DL) algorithms are like black boxes, whereas the Machine Learning (ML) algorithms are like art. One needs to be innovative to handcraft the features from the images to build the ML models.
Deep learning algorithms need more data(images) to learn the patterns from it, but in many domains, it is not easy to acquire more data. Whenever there is not enough data, we can use a machine learning approach. In ML approach, a programmer needs to perform feature extraction. For example, say there are tumor images, in ML approach the job of the programmer is to extract the numeric features like, area, diameter, volume, smoothness, edge patterns etc., from the tumor images. To extract those features, one should have both programming and domain knowledge. In Deep Learning approach we will just feed more and more tumor images to the model and the features will be learned by the model on its own. The role of the programmer is minimum in the case of Deep Learning as the feature extraction process is not explicit.In this article we are going to take you through a step by step approach on implementing image processing case study through ML approach. The example that we are going to discuss in this article is about predicting the malignancy of the nodules(tumor) present inside the lung region.
The diagnosis of lung cancer at an early stage is critical and uncertain. The lung is a sponge kind of tissue anatomy. The process of taking a tissue from the lung is known as a biopsy, to examine its cancerous nature is a painful process and taking a tissue accurately from a small abnormal tissue cluster inside a lung is a challenging task. Physicians will not recommend biopsy without reliable evidence for lung cancer. Physicians analyse the CT scan of a patient and if they suspect any symptoms of lung cancer, direct the patient to undergo one more CT scan after a time span of 3,6,9,12 or 18 months based on patient smoking habits and environmental conditions. After analyzing CT images at different intervals and based on the level of disease progression, a biopsy is performed.
The qualitative analysis performed by the physicians may vary from one expert to another due to human factors and the huge number of images that they need to analyse, which is time consuming and requires a trained eye. Hence, it has become essential to develop an intelligent computerized model to analyse the CT scans for identifying the malignancy in present. The objective of this case study is to segment the region of interest from the image and extract the features to build a machine learning model to detect the malignant nature of lung from CT scan images.
The database provided for this work consists of data from 50 patient CT scans. Each CT scan has 60 to 250 cross sectional CT images in it. The small white tissue clusters inside the lung parenchyma region is known as ‘nodule’. These nodules may be the potential indicator for lung cancer (but most of the nodules are generally benign !!!).MACHINE LEARNING APPROACH
We know that the machine learning algorithm needs the tabular numeric data as an input. We need to extract the features from all the suspected nodules from the lung parenchyma region. Our region of interest to diagnose the cancer is only the nodules present inside the lung, not the whole lung region. Therefore, the first step we need to perform is to segment this region of interest from the CT scan. The following section of the script will load the image and perform segmentation. There are many segmentation techniques reported in literature. In this work we used the OSTU threshold-based segmentation technique.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import time
img=cv2.imread(‘lung_img.jpg’,0)
cv2.imshow(“Lung Image”,img)
th,ostu_img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
In the above figure, along with nodules other tissue regions are also segmented as white regions. Morphological processing will be performed to remove those surrounded white regions. Different morphological techniques like erosion, dilation, inversion, opening, closing, filling etc., can be performed to get the region of interest alone from the segmented images. Choice of appropriate morphological technique is based on the output of the initial segmentation process. In this example case, we have used the clear-border morphology to remove all the unwanted white regions and keep only the nodules.
from skimage.segmentation import clear_border
img2=clear_border(ostu_img)
cv2.imshow(“clear_border”,img2
Also, we can perform different morphological operations to get the lung masks and the nodules as shown below:
img3=255-ostu_img
cv2.imshow(“Inverted_image”,img3)
img4=clear_border(img3)
cv2.imshow(“clear_border1”,img4)
se_fill=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(21,21))
img4_fill = cv2.morphologyEx(img4, cv2.MORPH_CLOSE, se_fill)
cv2.imshow(“filled_image”,img4_fill)
se_open=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(21,21))
img4_open = cv2.morphologyEx(img4_fill, cv2.MORPH_OPEN, se_open)
cv2.imshow(“Open_image”,img4_open)
paren=img & img4_open
cv2.imshow(“Parenchyma”,paren)
thos,nod_th = cv2.threshold(paren,100,255,cv2.THRESH_BINARY)
cv2.imshow(“Nodules”,nod_th)
se_open=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))
nodules = cv2.morphologyEx(nod_th, cv2.MORPH_OPEN, se_open)
cv2.imshow(“Final_Nodules”,nodules)
After segmenting all the nodules from the lung CT images, different features need to be computed for all the nodules. Generally, features will be computed on three platforms: 1. Shape 2. Texture and 3. Colour. In this work we computed the shape based features.
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(nodules)
sizes = stats[1:, -1];
min_size = 15
nodules1 = np.zeros((labels.shape),dtype=’uint8′)
for i in range(0, nlabels-1):
if sizes[i] >= min_size:
nodules1[labels == i + 1] = 255
cv2.imshow(“Final_Nodules1”,nodules1)
nlabels1, labels1, stats1, centroids1 = cv2.connectedComponentsWithStats(nodules1)
#pos1=np.where(stats[3]>10)
#nod1=np.zeros([512,512],dtype=’uint8′)
all_nod=[]
feat=np.zeros([nlabels1-1,9],dtype=’float’)
for i in range(1,nlabels1):
nod1=np.zeros([512,512],dtype=’uint8′)
pos=np.where(labels1==i)
nod1[pos]=nodules1[pos]
cv2.imshow(“nod1”,nod1)
all_nod.append(nod1)
x,y,w,h = cv2.boundingRect(nod1)
feat[i-1,0] = float(w)/h #Aspect Ratio
cc, hierarchy = cv2.findContours(nod1.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
area = cv2.contourArea(cc[0])
feat[i-1,1]=area
rect_area = w*h
feat[i-1,2]= float(area)/rect_area #Extent
hull = cv2.convexHull(cc[0])
hull_area = cv2.contourArea(hull)
feat[i-1,3]=hull_area #hull area
feat[i-1,4] = float(area)/hull_area #solidity
feat[i-1,5] = np.sqrt(4*area/np.pi) #equi_diameter
[(x,y),(MA,ma),angle] = cv2.fitEllipse(cc[0])
feat[i-1,6] =angle
feat[:,7]=centroids1[1:,0]
feat[:,8]=centroids1[1:,1]
Nine shape features for 18 segmented nodules from one CT scan are extracted after executing the above script.
area
Out of these 18 nodules, nodule 14 has been detected as cancerous by the medical expert. Hence, the corresponding nodule has been labelled as ‘1’ while the other nodules have been labelled as ‘0’.
Similarly, the nodule segmentation and feature extraction process can be carried out on different patient’s scan images and by concatenating all this information we can form a structured supervised data-set, on which any machine learning classification algorithms can be applied to build a predictive model.
The skill of deriving the structured data from the un-structured data is the vital step to build the predictive model. We hope this article gives you the hand-on experience of converting the images in to structured data.
Fill in the details to know more
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Personalized Teaching with AI: Revolutionizing Traditional Teaching Methods
April 28, 2023
Metaverse: The Virtual Universe and its impact on the World of Finance
April 13, 2023
Artificial Intelligence – Learning To Manage The Mind Created By The Human Mind!
March 22, 2023
Wake Up to the Importance of Sleep: Celebrating World Sleep Day!
March 18, 2023
Operations Management and AI: How Do They Work?
March 15, 2023
Data Visualization Best Practices
March 23, 2023
What Are Distribution Plots in Python?
March 20, 2023
What Are DDL Commands in SQL?
March 10, 2023
Best TCS Data Analyst Interview Questions and Answers for 2023
March 7, 2023
Best Data Science Companies for Data Scientists !
February 26, 2023
The Portal Podcast Transcription – Episode 3 – Analytics in HR Management With Sayantani Pyne
Podcast Transcript Episode 2: Product Thinking For Entrepreneurs With Mr. Praveen Udupa, Co-founder, eedge.ai
March 13, 2023
“The Power of SQL in Driving Business Success”
March 8, 2023
Exploring the Potential of Artificial Intelligence & Machine Learning for Improving Program Management
February 28, 2023
Cyber Safe Behaviour In Banking Systems
February 17, 2023
How Does BYOP(Bring Your Own Project) Help In Building Your Portfolio?
What Are the Ethics in Artificial Intelligence (AI)?
November 25, 2022
What is Epoch in Machine Learning?| UNext
November 24, 2022
The Impact Of Artificial Intelligence (AI) in Cloud Computing
November 18, 2022
Role of Artificial Intelligence and Machine Learning in Supply Chain Management
November 11, 2022
Best Python Libraries for Machine Learning in 2022
November 7, 2022