Data Science With Excel

Sigit Setiyanto, Ismail Setiawan


The stages in data science consist of several stages, one of which is data preparation. At this stage, many things are done so that the dirty data becomes clean data that is ready for modeling. Many applications offer data science convenience in terms of processing data. One of them is excel, this application from Microsoft can perform data processing so that the data is ready for modeling. However, there are limitations in using excel. The maximum number of rows that excel has is only 1,048,576 and the number of columns is 16,384. However, if you process data of no more than 1 million rows, excel can still handle it by using features such as error detection, removing duplicate data, correcting error values, detecting outlier values, handling missing data and validating data. This study shows some of these features along with examples of their use.

Full Text:



Pandita, R., Parnin, C., Hermans, F., & Murphy-Hill, E. (2018). No half-measures: A study of manual and tool-assisted end-user programming tasks in Excel. 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 95–103.J. Breckling, Ed., The Analysis of Directional Time Series: Applications to Wind Speed and Direction, ser. Lecture Notes in Statistics. Berlin, Germany: Springer, 1989, vol. 61.

Ruel, E., William, W., & Gillespie, B. J. (2018). Data cleaning. The Practice of Survey Research: Theory and Applications, 208–237.

Hossain, E. (2021). MS Excel in Engineering Data. In Excel Crash Course for Engineers (pp. 169–242). Springer.

Huang, Z., & He, Y. (2018). Auto-detect: Data-driven error detection in tables. Proceedings of the 2018 International Conference on Management of Data, 1377–1392.

Wang, P., & He, Y. (2019). Uni-detect: A unified approach to automated error detection in tables. Proceedings of the 2019 International Conference on Management of Data, 811–828.

Liu, R., Glover, K. P., Feasel, M. G., & Wallqvist, A. (2018). General approach to estimate error bars for quantitative structure– activity relationship predictions of molecular activity. Journal of Chemical Information and Modeling, 58(8), 1561– 1575.

Wu, Z., Wu, Z., & Rilett, L. R. (2020). Outlier Record, filtering. 2674(10), Transportation Research 167–176.

Grech, V. (2018). WASP (Write a Scientific Paper) using Excel–3: Plotting data. Early Human Development, 117, 110–112.

Kaminskyi, R., Kunanets, N., Pasichnyk, V., Rzheuskyi, A., & Khudyi, A. (2018). Recovery Gaps in Experimental Data. COLINS, 108– 118.

Biessmann, F., Salinas, D., Schelter, S., Schmidt, P., & Lange, D. (2018). “ Deep”Learning for Missing Value Imputationin Tables with Non-numerical Data. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2017–2025.

Sofalvi, S., & Schueler, H. E. (2021). Assessment of Bioanalytical Method Validation Data Utilizing Heteroscedastic Seven-Point Linear Calibration Curves by EZSTATSG1 Customized Microsoft Excel Template. Journal of Analytical Toxicology, 45(8), 772–779.

Georgieva, P., Nikolova, E., & Orozova, D. (2020). Data Cleaning Techniques in Detecting Tendencies in Software Engineering. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), 1028– 1033.


  • There are currently no refbacks.