Research Article
BibTex RIS Cite

Novel Approach for Detecting the Number of Columns of a Résumé

Year 2025, Volume: 12 Issue: 1, 127 - 153, 26.03.2025
https://doi.org/10.54287/gujsa.1636051

Abstract

In recruitment processes, manually reviewing résumés is a highly time-consuming job. In order to reduce the cost of these reviews, Information Extraction tasks have been introduced to extract the structure of the document and the personal information contained within. However, because there is no consensus on a standard structure of résumés, i.e., each résumé has its own distinctive layout, column numbers, or text properties, an accurate extraction process becomes highly challenging. This study addresses a part of this problem. We focus on the problem of estimating the number of columns in résumés, as we experience in the further processes that knowing the number of columns facilitates the separation of the main sections of the résumés, hence the analysis of the finer subsections. We employ the coordinates of the text blocks that build up a résumé. We hypothesize that the coordinates of the text blocks carry information on the number of columns. We define the problem in a clustering context. We proposed a novel clustering approaches dedicated to finding the number of columns in a résumé by the separation of the text block coordinates. The experiments are conducted on a dataset of the résumés of real applicants in two languages: Turkish and English. The results reveal that hybrid approaches that use the intermediate methods perform better than the individual methods. Furthermore, these findings could be extended to any unstructured textual data in any language and document format

References

  • Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 1-38. https://doi.org/10.1186/s40537-019-0254-8
  • Alamelu, M., Kumar, D. S., Sanjana, R., Sree, J. S., Devi, A. S., & Kavitha, D. (2021, December). Resume validation and filtration using natural language processing. In 2021 10th International conference on internet of everything, microwave engineering, communication and networks (IEMECON) (pp. 1-5). IEEE. https://doi.org/10.1109/IEMECON53809.2021.9689075
  • Chen, K., Yin, F., & Liu, C. L. (2013, August). Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In 2013 12th International Conference on Document Analysis and Recognition (pp. 958-962). IEEE. https://doi.org/10.1109/ICDAR.2013.194
  • Chen, J., Zhang, C., & Niu, Z. (2018). A Two‐Step Resume Information Extraction Algorithm. Mathematical Problems in Engineering, 2018(1), 5761287. https://doi.org/10.1155/2018/5761287
  • Çelik, D., & Elçi, A. (2012). An ontology-based information extraction approach for résumés. In Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World ICPCA/SWS’12 (p. 165–179). Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-642-37015-1_14
  • Das, P., Pandey, M., & Rautaray, S. S. (2018). A CV parser model using entity extraction process and big data tools. International Journal of Information Technology and Computer Science, 9(2), 21-31. https://doi.org/10.5815/ijitcs.2018.09.03
  • Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).
  • Farkas, R., Dobó, A., Kurai, Z., Miklós, I., Nagy, Á., Vincze, V., & Zsibrita, J. (2014). Information extraction from Hungarian, English and German CVs for a career portal. In Mining Intelligence and Knowledge Exploration: Second International Conference, MIKE 2014, Cork, Ireland, December 10-12, 2014. Proceedings (pp. 333-341). Springer International Publishing. https://doi.org/10.1007/978-3-319-13817-6_32
  • Gaur, B., Saluja, G. S., Sivakumar, H. B., & Singh, S. (2021). Semi-supervised deep learning based named entity recognition model to parse education section of resumes. Neural Computing and Applications, 33, 5705-5718. https://doi.org/10.1007/s00521-020-05351-2
  • Holm, A. B. (2012). E-recruitment: Towards an ubiquitous recruitment process and candidate relationship management. German Journal of Human Resource Management, 26(3), 241-259 https://doi.org/10.1177/239700221202600303
  • Ji, X., Zeng, J., Zhang, S., & Wu, C. (2010). Tag tree template for Web information and schema extraction. Expert Systems with applications, 37(12), 8492-8498. https://doi.org/10.1016/j.eswa.2010.05.027
  • Joan, S. P. F., & Valli, S. (2019). A survey on text information extraction from born-digital and scene text images. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 89, 77–101. https://doi.org/10.1007/s40010-017-0478-y
  • Kariyer.net (2025) www.kariyer.net
  • Keskin, Ş. R., Balı, Y., Orman, G. K., Daniş, F. S., & Turhan, S. N. (2022, June). Determining Column Numbers in Résumés with Clustering. In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 460-471). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-08337-2_38
  • Kokoska, S., & Zwillinger, D. (2000). CRC standard probability and statistics tables and formulae. Crc Press.
  • Liu, X., Gao, F., Zhang, Q., & Zhao, H. (2019). Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279.
  • Luo, Y., Zhang, H., Wang, Y., Wen, Y., & Zhang, X. (2018, November). ResumeNet: A learning-based framework for automatic resume quality assessment. In 2018 IEEE International Conference on Data Mining (ICDM) (pp. 307-316). IEEE. https://doi.org/10.1109/ICDM.2018.00046
  • Mao, S., Rosenfeld, A., & Kanungo, T. (2003). Document structure analysis algorithms: a literature survey. In T. Kanungo, E. H. B. Smith, J. Hu, & P. B. Kantor (Eds.), Document Recognition and Retrieval (pp. 197 – 207). International Society for Optics and Photonics SPIE volume 5010. https://doi.org/10.1117/12.476326
  • Mittal, V., Mehta, P., Relan, D., & Gabrani, G. (2020). Methodology for resume parsing and job domain prediction. Journal of Statistics and Management Systems, 23(7), 1265-1274. https://doi.org/10.1080/09720510.2020.1799583
  • Qin, C., Zhu, H., Xu, T., Zhu, C., Ma, C., Chen, E., & Xiong, H. (2020). An enhanced neural network approach to person-job fit in talent recruitment. ACM Trans. Inf. Syst.,38 https://doi.org/10.1145/3376927
  • Rao, G. A., Srinivas, G., Rao, K. V., & Reddy, P. P. (2018). A partial ratio and ratio based fuzzy-wuzzy procedure for characteristic mining of mathematical formulas from documents. IJSC—ICTACT J Soft Comput, 8(4), 1728-1732. https://doi.org/10.21917/ijsc.2018.0242
  • Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics,20, 53–65 https://doi.org/10.1016/0377-0427(87)90125-7
  • Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A machine learning approach for automation of resume recommendation system. Procedia Computer Science, 167, 2318-2327. https://doi.org/10.1016/j.procs.2020.03.284
  • Sarawagi, S. (2008). Information extraction. Foundations and Trends® in Databases, 1, 261–377. https://doi.org/10.1561/1900000003
  • Sinha, A.K., Amir Khusru Akhtar, M., Kumar, A. (2021). Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review. In: Swain, D., Pattnaik, P.K., Athawale, T. (eds) Machine Learning and Information Processing. Advances in Intelligent Systems and Computing, vol 1311. Springer, Singapore. https://doi.org/10.1007/978-981-33-4859-2_21
  • Sonar, S., & Bankar, B. (2012). Resume parsing with named entity clustering algorithm. Published paper, SVPM College of Engineering Baramati, Maharashtra.
  • Tejaswini, K., Umadevi, V., Kadiwal, S. M., & Revanna, S. (2022). Design and development of machine learning based resume ranking system. Global Transitions Proceedings, 3(2), 371-375. https://doi.org/10.1016/j.gltp.2021.10.002
  • Tobing, B. C. L., Suhendra, I. R., & Halim, C. (2019, June). Catapa resume parser: end to end Indonesian resume extraction. In Proceedings of the 2019 3rd international conference on natural language processing and information retrieval (pp. 68-74). https://doi.org/10.1145/3342827.3342832
  • Xu, Q., Zhang, J., Zhu, Y., Li, B., Guan, D., & Wang, X. (2020). A blocklevel RNN model for resume block classification. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 5855–5857). https://doi.org/10.1109/BigData50022.2020.9377771
  • Xuan, G., Zhang, W., & Chai, P. (2001, October). EM algorithms of Gaussian mixture model and hidden Markov model. In Proceedings 2001 international conference on image processing (Cat. No. 01CH37205) (Vol. 1, pp. 145-148). IEEE. https://doi.org/10.1109/ICIP.2001.958974
  • Yakubovich, V., & Lup, D. (2006). Stages of the recruitment process and the referrer’s performance effect. Organization science, 17(6), 710–723. https://doi.org/10.1287/orsc.1060.0214
  • Yasmin, F., Nur, M. I., & Arefin, M. S. (2020). Potential candidate selection using information extraction and skyline queries. In Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2019) (pp. 511-522). Springer International Publishing. https://doi.org/10.1007/978-3-030-43192-1_58
  • Yu, K., Guan, G., & Zhou, M. (2005, June). Resume information extraction with cascaded hybrid model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 499-506).
  • Zaroor, A., Maree, M., & Sabha, M. (2017, May). A hybrid approach to conceptual classification and ranking of resumes and their corresponding job posts. In International Conference on Intelligent Decision Technologies (pp. 107-119). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-59421-7_10
  • Zu, S., Wang, X., & Darren, S. (2019). Resume information extraction with a novel text block segmentation algorithm. Linguistics, 8, 29–48. doi:10. 5121/ijnlc.2019.8503. In Proceedings 2001 International Conference on Image Processing (Cat.No.01CH37205) (pp. 145–148 vol.1). volume 1.
Year 2025, Volume: 12 Issue: 1, 127 - 153, 26.03.2025
https://doi.org/10.54287/gujsa.1636051

Abstract

References

  • Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 1-38. https://doi.org/10.1186/s40537-019-0254-8
  • Alamelu, M., Kumar, D. S., Sanjana, R., Sree, J. S., Devi, A. S., & Kavitha, D. (2021, December). Resume validation and filtration using natural language processing. In 2021 10th International conference on internet of everything, microwave engineering, communication and networks (IEMECON) (pp. 1-5). IEEE. https://doi.org/10.1109/IEMECON53809.2021.9689075
  • Chen, K., Yin, F., & Liu, C. L. (2013, August). Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In 2013 12th International Conference on Document Analysis and Recognition (pp. 958-962). IEEE. https://doi.org/10.1109/ICDAR.2013.194
  • Chen, J., Zhang, C., & Niu, Z. (2018). A Two‐Step Resume Information Extraction Algorithm. Mathematical Problems in Engineering, 2018(1), 5761287. https://doi.org/10.1155/2018/5761287
  • Çelik, D., & Elçi, A. (2012). An ontology-based information extraction approach for résumés. In Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World ICPCA/SWS’12 (p. 165–179). Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-642-37015-1_14
  • Das, P., Pandey, M., & Rautaray, S. S. (2018). A CV parser model using entity extraction process and big data tools. International Journal of Information Technology and Computer Science, 9(2), 21-31. https://doi.org/10.5815/ijitcs.2018.09.03
  • Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).
  • Farkas, R., Dobó, A., Kurai, Z., Miklós, I., Nagy, Á., Vincze, V., & Zsibrita, J. (2014). Information extraction from Hungarian, English and German CVs for a career portal. In Mining Intelligence and Knowledge Exploration: Second International Conference, MIKE 2014, Cork, Ireland, December 10-12, 2014. Proceedings (pp. 333-341). Springer International Publishing. https://doi.org/10.1007/978-3-319-13817-6_32
  • Gaur, B., Saluja, G. S., Sivakumar, H. B., & Singh, S. (2021). Semi-supervised deep learning based named entity recognition model to parse education section of resumes. Neural Computing and Applications, 33, 5705-5718. https://doi.org/10.1007/s00521-020-05351-2
  • Holm, A. B. (2012). E-recruitment: Towards an ubiquitous recruitment process and candidate relationship management. German Journal of Human Resource Management, 26(3), 241-259 https://doi.org/10.1177/239700221202600303
  • Ji, X., Zeng, J., Zhang, S., & Wu, C. (2010). Tag tree template for Web information and schema extraction. Expert Systems with applications, 37(12), 8492-8498. https://doi.org/10.1016/j.eswa.2010.05.027
  • Joan, S. P. F., & Valli, S. (2019). A survey on text information extraction from born-digital and scene text images. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 89, 77–101. https://doi.org/10.1007/s40010-017-0478-y
  • Kariyer.net (2025) www.kariyer.net
  • Keskin, Ş. R., Balı, Y., Orman, G. K., Daniş, F. S., & Turhan, S. N. (2022, June). Determining Column Numbers in Résumés with Clustering. In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 460-471). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-08337-2_38
  • Kokoska, S., & Zwillinger, D. (2000). CRC standard probability and statistics tables and formulae. Crc Press.
  • Liu, X., Gao, F., Zhang, Q., & Zhao, H. (2019). Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279.
  • Luo, Y., Zhang, H., Wang, Y., Wen, Y., & Zhang, X. (2018, November). ResumeNet: A learning-based framework for automatic resume quality assessment. In 2018 IEEE International Conference on Data Mining (ICDM) (pp. 307-316). IEEE. https://doi.org/10.1109/ICDM.2018.00046
  • Mao, S., Rosenfeld, A., & Kanungo, T. (2003). Document structure analysis algorithms: a literature survey. In T. Kanungo, E. H. B. Smith, J. Hu, & P. B. Kantor (Eds.), Document Recognition and Retrieval (pp. 197 – 207). International Society for Optics and Photonics SPIE volume 5010. https://doi.org/10.1117/12.476326
  • Mittal, V., Mehta, P., Relan, D., & Gabrani, G. (2020). Methodology for resume parsing and job domain prediction. Journal of Statistics and Management Systems, 23(7), 1265-1274. https://doi.org/10.1080/09720510.2020.1799583
  • Qin, C., Zhu, H., Xu, T., Zhu, C., Ma, C., Chen, E., & Xiong, H. (2020). An enhanced neural network approach to person-job fit in talent recruitment. ACM Trans. Inf. Syst.,38 https://doi.org/10.1145/3376927
  • Rao, G. A., Srinivas, G., Rao, K. V., & Reddy, P. P. (2018). A partial ratio and ratio based fuzzy-wuzzy procedure for characteristic mining of mathematical formulas from documents. IJSC—ICTACT J Soft Comput, 8(4), 1728-1732. https://doi.org/10.21917/ijsc.2018.0242
  • Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics,20, 53–65 https://doi.org/10.1016/0377-0427(87)90125-7
  • Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A machine learning approach for automation of resume recommendation system. Procedia Computer Science, 167, 2318-2327. https://doi.org/10.1016/j.procs.2020.03.284
  • Sarawagi, S. (2008). Information extraction. Foundations and Trends® in Databases, 1, 261–377. https://doi.org/10.1561/1900000003
  • Sinha, A.K., Amir Khusru Akhtar, M., Kumar, A. (2021). Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review. In: Swain, D., Pattnaik, P.K., Athawale, T. (eds) Machine Learning and Information Processing. Advances in Intelligent Systems and Computing, vol 1311. Springer, Singapore. https://doi.org/10.1007/978-981-33-4859-2_21
  • Sonar, S., & Bankar, B. (2012). Resume parsing with named entity clustering algorithm. Published paper, SVPM College of Engineering Baramati, Maharashtra.
  • Tejaswini, K., Umadevi, V., Kadiwal, S. M., & Revanna, S. (2022). Design and development of machine learning based resume ranking system. Global Transitions Proceedings, 3(2), 371-375. https://doi.org/10.1016/j.gltp.2021.10.002
  • Tobing, B. C. L., Suhendra, I. R., & Halim, C. (2019, June). Catapa resume parser: end to end Indonesian resume extraction. In Proceedings of the 2019 3rd international conference on natural language processing and information retrieval (pp. 68-74). https://doi.org/10.1145/3342827.3342832
  • Xu, Q., Zhang, J., Zhu, Y., Li, B., Guan, D., & Wang, X. (2020). A blocklevel RNN model for resume block classification. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 5855–5857). https://doi.org/10.1109/BigData50022.2020.9377771
  • Xuan, G., Zhang, W., & Chai, P. (2001, October). EM algorithms of Gaussian mixture model and hidden Markov model. In Proceedings 2001 international conference on image processing (Cat. No. 01CH37205) (Vol. 1, pp. 145-148). IEEE. https://doi.org/10.1109/ICIP.2001.958974
  • Yakubovich, V., & Lup, D. (2006). Stages of the recruitment process and the referrer’s performance effect. Organization science, 17(6), 710–723. https://doi.org/10.1287/orsc.1060.0214
  • Yasmin, F., Nur, M. I., & Arefin, M. S. (2020). Potential candidate selection using information extraction and skyline queries. In Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2019) (pp. 511-522). Springer International Publishing. https://doi.org/10.1007/978-3-030-43192-1_58
  • Yu, K., Guan, G., & Zhou, M. (2005, June). Resume information extraction with cascaded hybrid model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 499-506).
  • Zaroor, A., Maree, M., & Sabha, M. (2017, May). A hybrid approach to conceptual classification and ranking of resumes and their corresponding job posts. In International Conference on Intelligent Decision Technologies (pp. 107-119). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-59421-7_10
  • Zu, S., Wang, X., & Darren, S. (2019). Resume information extraction with a novel text block segmentation algorithm. Linguistics, 8, 29–48. doi:10. 5121/ijnlc.2019.8503. In Proceedings 2001 International Conference on Image Processing (Cat.No.01CH37205) (pp. 145–148 vol.1). volume 1.
There are 35 citations in total.

Details

Primary Language English
Subjects Computing Applications in Life Sciences
Journal Section Information and Computing Sciences
Authors

Yavuz Bali 0000-0003-0621-3069

Günce Keziban Orman 0000-0003-0402-8417

Sultan Nezihe Turhan 0000-0001-9763-0882

Publication Date March 26, 2025
Submission Date February 10, 2025
Acceptance Date March 12, 2025
Published in Issue Year 2025 Volume: 12 Issue: 1

Cite

APA Bali, Y., Orman, G. K., & Turhan, S. N. (2025). Novel Approach for Detecting the Number of Columns of a Résumé. Gazi University Journal of Science Part A: Engineering and Innovation, 12(1), 127-153. https://doi.org/10.54287/gujsa.1636051