Classification of Fish Species with Image Data Using K-Nearest Neighbor

Classification is a technique that many of us encounter in everyday life, classification science is also growing and being applied to various types of data and cases in everyday life, in computer science classification has been developed to facilitate human work, one example of its application is to classify fish species in the world, the number of fish species in the world is very much so that there are still many people who are sometimes confused to distinguish them, therefore in this study a study will be conducted to classify fish species using the K-Nearest Neighbor Method. 4 types of fish, all data totaling 160 data. The purpose of this study was to test the K-Nearest Neighbor method for classifying fish species based on color, texture, and shape features. Based on the test results, the accuracy value of the truth is obtained using the value of K = 7 with a percentage of the truth of 77.50%, the second-highest accuracy value is the value of K = 10, namely 76.88%. Based on the results of this study, it can be concluded that the K-Nearest Neighbor method has a good enough ability to classify, but it can be done by adding variables or adding more amount of data, and using other types of fish.


I. INTRODUCTION
The ocean area is 361 million km2 and the land area is 149 million km2 so that the ocean area is 71% and the land area is 29% of the earth's surface area. The extent and location of the oceans consist of the Ocean (Ocean), the edge sea, the Inland sea / Mediterranean sea [1].
Fish are an important part of biodiversity and one of the most widespread organisms in the world. Recently categorized into 6 classes, 62 orders, 540 fish families, and about 27,683 fish species [2,3]. There are many types of fish in this world, of course, there are many types of fish that have the same shape, color, and even size. Morphological identification has succeeded in describing nearly one million species that exist on earth by classification and species identification [4,5]. Species classification has four parts. First, differences in individual, gender, geography, phenotypic plasticity, and genetic variability can lead to misclassification [6]. Second, there is ecological damage to the environment and human activities that cause damage to the fishery environment, making it difficult to collect fish species [7,8]. Third, some fish show different shapes, patterns, colors, sizes, even though they belong to the same species. Finally, it takes taxonomic knowledge to diagnose errors in classification [9].
From the brief explanation above, the authors are interested in researching fish classification using the KNN method. 4 types of fish will be classified, Black Sea Spart, Gilt Head Bream, Horse Mackerel, and Red Mullet. These fish live in the high seas, so there are still many who do not understand these types of fish. With the amount of data as much as 160 data sourced from Kaggle, and using 1024 x 768 pixels.

II. RESEARCH METHODS
Several studies that have been conducted using the K-Nearest Neighbor algorithm, such as that conducted by Kaharudin, et al. (2019) conducted a study on the classification of types of spices in Indonesia based on shape, color, and texture feature using the K-Nearest Neighbor algorithm. accuracy reached 84% using 7 test scenarios [10].
Research by Andayani, et al. (2018) used three types of fish in the Scombridae family which were classified using the Neural Probabilistic Network method with an accuracy rate of 89.65% using 112 training data images and 29 image data testing [11].
Research by Montalbo, et al. (2019) conducted a study that aimed to classify fish species on the island of Verde using the Deep Convolutional Neural Network (DCNN) model that achieved an accuracy of 99%. Enlarged images are flipped, rotated, cropped, enlarged, and shifted to provide some powerful features for its accuracy classification [12].
Research by Alsmadi, et al. (2020) conducted survey research on fish classification techniques. This survey also reviewed the use of databases such as Fish4-Knowledge (F4K), knowledge databases, and Global Information System (GIS) on Fishes and other FC databases. The study of preprocessing method of sender extraction technique and classifier was collected from recent work to increase understanding of the characteristics of pre-processing methods, feature extraction techniques, and classifiers to guide the direction of research [13].
Research by Jin, et al (2021) conducted a study with a classification approach that combines Elastic Net-Stacked Autoencode (EN-SAE) with Kernel Density Estimation (KDE) with the name ESK-model, which is proposed based on DNA coding. Whereas ESK models can accurately correlate fish from different families based on DNA [14]. Research from Adebayo, et al (2016) classified fish based on physical form processed from images, feature extraction, and classification methods. Fish feature vectors are obtained from Single Value Decomposition (SVD) extracted from fish images. Performed the test using an Artificial Neural Network (ANN) with 36 fish images and got an accuracy of 94% [15].

Research Methodology
The research method consists of 6 stages, namely: First, Literature study to find literature and references as a reference for conducting research both from books, journals, proceedings, and others. Furthermore, data collection was carried out by searching for the dataset to be used in this study, the data used were sourced from Kaggle.Com [16]. After the dataset in the form of an image has been collected, before processing, the data must then be cleaned first at the image pre-processing stage, at this stage the background is removed to black then adjusts the overall dimensions of the image to 1024 × 768 pixels, this is intended so that At the time of extracting the value in the image, only the value of the fish object is extracted, while the purpose of adjusting the dimensions of the image is to ensure the extraction value is taken from the same number of pixels. After pre-processing the image, then enter the feature extraction stage using an application made using MATLAB. At this stage, the color feature values are taken which consist of RGB, Texture consisting of Contrast, Correlation, Energy, and Homogeneity. Feature Form consisting of Eccentricity and Metric. After the features are successfully extracted into the CSV file, then they enter the Analysis stage, at this stage testing using the WEKA application, the testing phase using the 10 Fold Cross Validation Evaluation method, while testing is carried out by using several K values or the number of closest neighbors, including K = 1, K = 3, K = 5, K = 7, K = 9, and K = 10. After conducting the test, the final level of accuracy of the test results can be obtained, then a conclusion can be drawn.

III. RESULT AND ANALYSIS 3.1 Dataset
The dataset used is an image of 160 data, each class or type of fish has 40 data. The image used has dimensions of 1024 × 768 pixels.
Here are some examples of the data used. Image data is taken using a digital camera with a view of 30-50 cm from the object.

Pre-processing
The pre-processing stage starts from removing the background in the image, with the aim that at the time of extracting the image value will not be influenced by the background of the object so that the analysis is expected to be more accurate.

Feature Extraction
The pre-processing stage starts from removing the background in the image, with the aim that at the time of extracting the image value will not be influenced by the background of the object so that the analysis is expected to be more accurate. In the feature extraction process, the features taken are color, texture, and shape features. Color features are taken using the RGB formula, texture features are taken using the GLCM (Gray Level Co-Occurrence Matrix) method, the first color image must be converted into a greyscale image first and then the GLCM value can be taken, the value taken for texture features is Contrast, Correlation, Energy and Homogeneity. The shape features are taken using eccentricity and metric formulas. To retrieve the features of the grayscale image, it must be converted into a Filling Holes and Area Open image first.

Analysis Using K-Nearest Neighbor
To classify fish species using K-Nearest Neighbor, modeling is necessary first, to provide an overview of how the classification process is carried out.
The following is an example of the application of the K-Nearest Neighbor to calculate the types of fish. In this example, it will only use 4 data from 4 types of fish, then only 3 variables will be used, namely red, contrast and eccentricity. 1. Determine the number of closest neighbors (K value). In this example, the classification results will be taken based on the value of K = 1, meaning that it is based on 1 number of closest neighbors 2. Calculating the distance between training data and testing data, distance calculation can be used with several methods, in this study we will use the euclidean distance, calculate the proximity of the testing data to all existing testing data. The formula used can be seen in equation 3: First, calculate the euclidean distance testing data with the first training data: Based on the results of the calculation of the Euclidean Distance testing data on the four training data above, it is found that the smallest distance value is the fourth data, namely 1.07762, so if using the value of K = 1 it can be concluded that the testing data is classified as a type of Red Mullet fish.

Implementasi
Testing is done using the WEKA application, for the classification process, WEKA provides fairly complete
Following are some of the test results using WEKA can be seen in the image below. The following can be seen a diagram of the percentage of the truth of all test results Figure 9. Test results diagram Based on the diagram above, it can be seen that the value of K with the highest accuracy of truth is K = 7 with a percentage of 77.50%, then K = 10 with a percentage of the truth of 76.88%.

VI. CONCLUSIONS
Based on the above test, it can be concluded that the K-Nearest Neighbor method has a fairly good ability to classify fish types based on color, texture, and shape, with an accuracy value of 77.50%, further research is expected to be able to use other features or use classification methods. others and use more training data so that the accuracy value is better.