USING COMPUTER VISION AND NLP TECHNOLOGIES FOR INTELLIGENT DOCUMENT PROCESSING IN ELECTRONIC DOCUMENT MANAGEMENT SYSTEMS

I. V. Pronenko

doi:doi:10.58168/MIST2026_1068-1073

Home / Conferences / MODELING INFORMATION SYSTEMS AND TECHNOLOGIES – 2026 / MODELING INFORMATION SYSTEMS AND TECHNOLOGIES – 2026 : Proceedings of the International Scientific and Practical Conference dedicated to the 5th anniversary of the Faculty of Computer Science and Technology, Voronezh, Marth 30, 2026

USING COMPUTER VISION AND NLP TECHNOLOGIES FOR INTELLIGENT DOCUMENT PROCESSING IN ELECTRONIC DOCUMENT MANAGEMENT SYSTEMS

Submit manuscript

To cite

USING COMPUTER VISION AND NLP TECHNOLOGIES FOR INTELLIGENT DOCUMENT PROCESSING IN ELECTRONIC DOCUMENT MANAGEMENT SYSTEMS

Section: SECTION 2. COMPUTER-AIDED DESIGN SYSTEMS

Proceedings: MODELING INFORMATION SYSTEMS AND TECHNOLOGIES – 2026 : PROCEEDINGS OF THE INTERNATIONAL SCIENTIFIC AND PRACTICAL CONFERENCE DEDICATED TO THE 5TH ANNIVERSARY OF THE FACULTY OF COMPUTER SCIENCE AND TECHNOLOGY, VORONEZH, MARTH 30, 2026

UDC 004

I. V. Pronenko ¹

Author and publication information

Authors:

1. Voronezh State University of Forestry and Technologies named after G.F. Morozov

Type:

Сonference article

DOI:

https://doi.org/10.58168/MIST2026_1068-1073

Pages:

from 1068 to 1073

Published:

30.06.2026

Subject area:

UDC 004

Language:

Russian

Keywords:

intelligent document processing, electronic document management, computer vision, natural language processing, optical character recognition, LayoutLMv3, multimodal models, transformers

Abstract and keywords

Abstract:
This article examines the principles of applying computer vision and natural language processing (NLP) technologies to create intelligent document processing (IDP) systems for electronic document management. Multimodal architectures based on transformers–LayoutLMv3, DocLLM, and UDOP–that enable the combined encoding of text, images, and spatial markup of documents are analyzed. Quantitative benchmark results for data extraction from forms and receipts, classification, and visual question-and-answer interactions are presented. The economic feasibility of implementing IDP solutions is substantiated using market statistics for 2023–2025. Key limitations and promising development areas are identified.

Keywords:
intelligent document processing, electronic document management, computer vision, natural language processing, optical character recognition, LayoutLMv3, multimodal models, transformers

References

1. Intelligent Document Processing (IDP) Market Size to Hit USD 43.92 Billion by 2034 // Precedence Research. – Updated: November 2025. – URL: https://www.precedenceresearch.com/intelligent-document-processing-market (data obrascheniya: 12.02.2026).

2. Rossiyskiy rynok SED uderzhivaet tempy rosta v 15–20% ezhegodno // CNews Analytics. – 2024. – URL: https://corp.cnews.ru/reviews/rynok_sed_2024 (data obrascheniya: 12.02.2026).

3. 2025 OCR Accuracy Benchmark Results: A Deep Dive Analysis // Sparkco AI. – 2025. – URL: https://sparkco.ai/blog/2025-ocr-accuracy-benchmark-results-a-deep-dive-analysis (data obrascheniya: 12.02.2026).

4. Xu Y. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding / Y. Xu, Y. Xu, T. Lv [et al.] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). – 2021. – P. 2579–2591. DOI: https://doi.org/10.18653/v1/2021.acl-long.201

5. Huang Y. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking / Y. Huang, T. Lv, L. Cui [et al.] // Proceedings of the 30th ACM International Conference on Multimedia (MM '22). – 2022. – P. 4083–4091. DOI: https://doi.org/10.1145/3503161.3548112

6. Tang Z. Unifying Vision, Text, and Layout for Universal Document Processing / Z. Tang, Z. Yang, G. Wang [et al.] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). – 2023. – P. 19254–19264. DOI: https://doi.org/10.1109/CVPR52729.2023.01845

7. Wang D. DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding / D. Wang, N. Raman, M. Sibue [et al.] // Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). – 2024. – P. 8442–8468. DOI: https://doi.org/10.18653/v1/2024.acl-long.463

8. Yang X. Clinical Concept Extraction Using Transformers / X. Yang, J. Bian, W. R. Hogan, Y. Wu // Journal of the American Medical Informatics Association. – 2020. – Vol. 27, No. 12. – P. 1935–1942. DOI: https://doi.org/10.1093/jamia/ocaa189

9. Abilio R. Evaluating Named Entity Recognition: A comparative analysis of mono- and multilingual transformer models on a novel Brazilian corporate earnings call transcripts dataset / R. Abilio, L. A. F. Pereira, R. M. Marcacini // Expert Systems with Applications. – 2024. – Vol. 255. – Art. 124647. DOI: https://doi.org/10.1016/j.asoc.2024.112158

10. Lai H. Language models for data extraction and risk of bias assessment in complementary medicine / H. Lai, J. Tang, G. Liu [et al.] // npj Digital Medicine. – 2025. – Vol. 8, No. 1. – Art. 74. DOI: https://doi.org/10.1038/s41746-025-01457-w

Submit manuscript

To cite

Citations:

Confirmation

Регистрация