*Author for correspondence E-mail: eswarijp@gmail.com
Development of pancreatic CT – scan image dataset and retrieval process for diagnosis
K Jayaprakash1* and R Anandan2
1Department of Biotechnology, 2Department of Computer Science & Engineering, Karpaga Vinayaga College of Engineering
& Technology, Madhuranthagam, Chennai 603 308, India
Received 09 December 2011; revised 31 July 2012; accepted 01 August 2012
This study presents medical CT scan image feature analysis, creation of data bank and development of a data mining technique. A dataset of 50 known pancreas digital CT scan images with their clinical diagnosis were composed. All images were subjected for image textual characters (energy, entropy, contrast, homogeneity and correlation), which were statistically calculated in numerical MAT lab environment with syntax. Gray level co-occurrence matrix was utilized for interpretation of 2D digital images. Data was compared and mean values were maintained for retracing a matched image with query image. Results were discussed with future innovation and scope in medical digital image diagnosis.
Keywords: Biomedical image retrieval system, Development of digital image database & data mining, Feature extraction & CBIR in medical image
Introduction
Digital image feature extraction using statistical design and content based image retrieval (CBIR) have been employed in medical imaging science. CBIR is a computational procedure underlying with image physical principles with statistical analysis. There lies a much gap and dearth of knowledge exists in this area, although a large number of systems and explanation of the technologies have been implemented. Enser1 has elaborated image archives, various indexing methods and common searching task using text based queries on annotated images. Gupta & Jain2 highlighted past, present and future retrieval process of medical images. Tang et al3 presented a well documented medical image retrieval systems in current usage. Although many different techniques are adapted for medical image retrieval, CBIR is considered the most. However, still a lacuna exists in general application of picture recapturing system and programming tools4. CBIR has been proved worthy as it is mainly based on image feature extraction, feature storage; future comparison and query interface5. Existing image processing tools (khoros / Cantata / Visi Quest 1), insight toolkits (ITK)2, visualization tool kit (VTK)3 or
image 14 are available for feature extraction and comparison, but do not have a supporting system for generation of data bank. Therefore, processing tool that is not coupled with storage system may do complications and deprives with new information. Further, end user (clinician) may find difficulty for easy operations of retrieval algorithms. This study presents abdominal CT scan digital images of pancreas of human individuals.
Experimental Section
Out of 800 cases of intra abdominal human pancreas CT scan images, collected from various clinical and scan centers, 50 selected image samples were used for statistical interpretation, and diagnosed by an efficient histopathologist. Diseased conditions were also recorded.
Feature Extraction
Gray level co-occurrence matrices (GLCM) were used for estimation of image properties as per reported method6. Statistical and structural calculations were done as follows: G * G (GLCM) Pd for displacement vector d
= (dx, dy ). Entry (i,j) of Pd is number of occurrences of the pair of grey levels I and j, which are a distance d apart. Pd (i, j) = |{ (r,s), (t,v) :| (r,s) = i | (t,v) = j } | , where (r,s), (t,v) ∑ N * N, (t,v) = (r+dx, s+dy), and |.| is cardinality of a set. From co-occurrence matrix, useful
texture features (energy, entropy, contrast, homogeneity and correlation) were calculated6-8 (Table 1). Here µx and µy are statistical means and ìx and ìy are S.D of Pd (x) and Pd (y), where Pd (x) = “ Pd (x,j) and Pd (y) = Σ Pd (i,y). Due to intensive nature of computation involved, only d = 1 with angles 0o, 45o, 90o and 135o were considered. SYNTAX used for calculating GLCM of image was Glcm = graycomatrix (I, ‘NUMLEVES’, 8,
‘G’,[5], offset.
Minimum Distance Approach
Difference between textual feature values of query sample image and image stored in dataset were calculated. Different values for all features (energy, entropy, contrast, homogeneity, and correlation) were also determined. All values were stored in ascending order.
Image with lowest difference (nearest to zero) was selected. This was done by step wise step procedure.
Results and Discussion
Feature Extraction
A database consisting of 50 pancreas images was composed (Table 2). Clinical diagnosis was also ascertained by pathologist. Feature properties of all the 50 pancreas CT scan images were calculated for different angles orientations. Statistical mean values for each case of feature characteristic were worked out.
Differences between various structural features of query image and the same numerical properties of each image in constructed database were compared. Number of images (1-50) present in organized picture pool with
lowest difference is automatically selected. Selected image may be considered as an equal picture in all concepts. Therefore, selected image would be the maximum structural analogy.
Algorithm (Retrieval Procedure)
Step 0 Create a database with images and their details; Step 1 Start from the first image of database and proceed step 2 until last image is arrived; Step 2 Find feature extraction for image in 4 directions (0°, 45°, 90° & 135°) with syntax; Step 3 While stopping condition is false, do step 4; Step 4 Select query image and proceed with Step 2; Step 5 Repeat step 4 until stopping condition is false; Step 6 If query image is matched with data base image using minimum distance algorithm, resultant image is captured otherwise it will be added as a new image in the data base for future analysis; and Step 7 Display resultant image and details of diagnosis.
Query image tested for efficiency of present retrieval process is presented (Fig. 1a). Various structural features were worked out in MAT lab environment. Entry of query image picture data into the dataset of 50 pancreas image has automatically brought out a nearest equal image. It was found as 33rd image among images scanned, indicating that 33rd image is equal in all respects with query image (Fig. 1b). Resultant image Fig. 1b) and its textural features characteristics were automatically compared and minimum different values were calculated (Table 3). It was observed that query image has very minimum difference between features of entropy,
Table 1—Texture feature determined
Sl no Feature Description Formula adopted
1 Energy Measures number of repeated pairs. Energy is expected to be high if occurrence of repeated pixel pairs is high. Energy is 1 for a constant image.
2 Entropy Statistical measure of randomness that can be used to characterize texture of input image. Entropy is expected to be high if gray levels are
distributed randomly throughout the image.
3 Contrast Returns a measure of intensity contrast between a pixel and its neighbour over the whole image. Range = [0(size(GLCM)-1)2]. Contrast is expected to be low if gray levels of each pixel pair are similar.Contrast
is 0 for a constant image.
4 Homogeneity Measures local homogeneity of a pixel pair. Homogeneity is expected to be large if gray levels of each pixel pair are similar. Range = [0 1].
5 Correlation Returns a measure of how correlated a pixel is to its neighbor over the whole image. Range = [-1 1]. Correlation 1 or -1 for a perfectly positively or negatively correlated image. Correlation is expected to be high if gray levels of pixel pairs are highly correlated.
contrast, homogeneity and correlation. At 45° angle orientation, characters (entropy and correlation) have minimum difference on comparison of the first sample image and resultant image was (No. 33) (0.002377 and 0.0000004). For 90° angle analysis, features
(contrast, homogeneity and correlation) have minimum differences, which were recorded as 0.005491;
0.000641and 0.00000042 respectively. Likewise for 135°
angle orientation analysis, corresponding minimum difference were occurred in structural features of entropy
Table 2—Database of 50 human pancreatic CT scan images and their diagnosed clinical condition
Image Clinical condition Image Clinical condition
order order
1 Pseudocyst compresses stomach and 26 Small cell carcinoma of the pancreas with
spleen implants
2 Cystadenocarcinoma of the pancreas 27 Islet cell tumor of the pancreas
3 normal 28 Cystic fibrosis (fat replaced pancreas)
4 Cystadenocarcinoma of the pancreas 29 Invasive pancreatic adenocarcinoma 5 Cystadenocarcinoma of the pancreas 30 Carcinomatosis with implants
6 Normal 31 Pancreatic cancer presents as an abdominal
aortic aneurysm
7 Carcinoma of tail of pancreas 32 Serous cystadenoma
8 Normal 33 Acute pancreatitis
9 Cystic fibrosis with dilated bowel 34 Adenocarcinoma of the tail of pancreas
10 Normal 35 Pancreatic cancer invades splenic vein
11 Pancreatic cancer 36 Mucinous carcinoma of pancreas
12 Stone in distal common bile duct 37 Retained sponge simulates a pancreatic pseudocyst
13 Normal 38 Duodenal perforation with free air s/p an ercp
14 Cystadenoma pancreas 39 Lymphoma infiltrares pancreas
15 Chronic pancreatitis 40 Cystadenoma of tail of pancreas
16 Pancreatic cancer invades splenic vein 41 Annular pancreas
17 Adenocarcinoma of pancreas 42 Carcinoma of tail of pancreas
18 VHL with multiple renal carcinomas with 43 Normal
metastases to pancreas
19 Normal 44 Infected pseudocysts
20 Pancreatic cancer with vessel encasement 45 Carcinoma of pancreas invades splenic artery
& vein & results in splenic infarction 21 Pancreatic cancer with vessel encasement 46 Cystadenoma of pancreas
and liver metastases
22 Lymphoma infiltrates pancreas 47 Hamoudi tumor
23 Splenic artery aneurysm simulates a 48 Normal
pancreatic mass
24 Pancreatic cystadenoma 49 Acute pancreatitis
25 Invasive pancreatic cancer recurrence 50 Cystic fibrosis involves pancreas
Fig. 1—Human pancreas CT scan image: a) Query pancreas image; and b) Resultant image retrieved from database (33rd image in order)
a) b)
and correlation. Differential unit values were calculated as 0.00573 and 0.00000125 (Table 3).
These results unambiguously demonstrated that image no. 33rd, known as retrieval image, may be considered as an equal structural image of query pancreas scan image. Resultant image (no. 33rd) has appeared 3 times in 0° angle orientation, 2 times in 45°, 3 times in 90° and again 2 times in 135° angle orientation (Table 4).
Significant point is that sample image has minimum distance in all the 4 angles of direction in correlation studies (Table 4). Therefore, present minimum distance approach may also be one of the suitable computational tool for medical image recapture studies from dataset and query image diagnosis. Also, there is a significant speed of recapturing image when compared with other commercial software (VTK)3. Accuracy of present system is substantiated with other biochemical parameters, clinical symptoms and earlier known cases.
Application of computational numerical determination of textural features and selection of most probable equal matching image save the time. It can also help clinician to initiate treatment procedures.. Many similar studies6,9-
12 have been conducted in various anatomical tissue abnormalities by image characteristics. However, these works are based on characterization of image structural features by computer analysis and not by case specific.
These study and pancreas image database developments are a new approach to understand image recognition by using texture properties. Currently the most active areas of image retrieval research appear to be the detection of features and topics with in images using automated annotation methods13.
Conclusions
Minimum absolute difference approach for CBIR system has been tested for efficiency. A database consists of 50 pancreas images and their 5 textural properties have been created. A sample query images have been tested to establish this concept. Odd image, which has not been matched with any image, is entered automatically into the data set as a new image. This can improve efficiency of retrieval system and to develop a large data bank. Future scope lies in this area for the state of art of 3D visual perception and increase the speed of recapture.
Acknowledgements
Authors thank Director, Advisor, Principal and Dean of Karpaga Vinayaga College of Engineering and Technology, Chennai 603 308, India for encouragements.
References
1 Enser P G B, Pictorial information retrieval, J Doc, 51 (1995) 126-170.
2 Gupta A & Jain R, Visual information retrieval, ACM Commun, 40 (1997) 70-79.
3 Tang L H Y, Hanka R & Ip H H S, A review of intelligent content-based indexing and browsing of medical images, Health Inform J, 5 (1999); 40-49.
4 Carson C, Belongie S, Greenspan H & Malik J, Blobworld – Image segmentation using expectation- mamization and its application to image querying, IEEE Trans Pattern Anal Machine Intell, 24 (2002) 1026-1038.
5 Montagnat J, Breton V & Magnin I E, Using grid technologies to face medical image analysis challenges, in Proc Third IEEE ACM Int Symp on Cluster Computing and Grid, 2003, 588-593.
6 Haralick R, Stastical and structural approaches to texture, Proc IEEE, 67 (1979) 786-804.
Table 3—Difference between first sample image and resultant image (33rd image)
Sl no. Feature Character 0° 45° 90° 135°
1 Energy 0.004292 0.000574 0.000384 0.000437
2 Entropy 0.001128 0.002377 0.006979 0.00573
3 Contrast 0.011776 0.008851 0.005491 0.002728
4 Homogeneity 0.000292 0.001818 0.000641 0.00482
5 Correlation 0.000304 0.0000004 0.0000042 0.00000125
Table 4—Result of image retrieval process for different orientation
Degree Contrast Energy Entropy Homogeneity Correlation Max no. of occurrence of
image / time
0 45 36 33 33 33 3/33
45 46 21 33 26 33 2/33
90 33 21 25 33 33 3/33
135 45 21 33 26 33 2/33
7 Belongie S & Malik J, Puzicha shape matching and object recognization using shape concepts, IEE Trans Pattern Anal Machine Intell, 24 (2002) 502-522.
8 Gonzalea R, Rafel C W & Richard E, Digital image processing using MAT LAB pearson education, 2005.
9 Miles K, Functional computer tomography in oncology, Eur J Cancer, 38 (2002) 2079-2084.
10 Wurflinger T, Stockhausen J, Meyer-Ebrect D & Bocking A, Automatic co registration, segmentation and classification for multimodal cytopathology, in Proc Med Inform Europe Conf (St Malo, France) 2003.
11 Antani S, Long L R & Thoma G R, Bridging the Gap; Enabling CBIR in medical applications, in Proc 21st Int Symp on Comput based Med Syst (University of Jyvaskyla, Finland) 2008.
12 Smietanski J & Rysz A R & Tradeusiewicz E L, Texture analysis in perfusion images of prostate cancer – A case study, Int J Appl Math Comput Sci, 20 (2010) 49-156.
13 Yavlinsky A & Ruger S, Efficient re-indexing of automatically annotated image collections using keyword combination, in Proc SPIE: Multimedia Content Access; Algorithms and Systems, vol 6506, 2007.