This blog stems from my curiosity to research the performance of Paddle-OCR, a robust and versatile optical character recognition (OCR) toolkit. Paddle-OCR offers an array of features and capabilities for extracting text from images and documents, and I was eager to explore its capabilities and limitations.
Text detection of image data of the English-1 book, Chapter-1, Council of Higher Secondary Education, Odisha, Bhubaneswar for +2 Examination
Line number, bounding box, text and confidence are captured in a Python dictionary.
config_dict = { "alpha" : 1.0 , "benchmark" : False , "beta" : 1.0 , "cls_batch_num" : 6 , "cls_image_shape" : '3, 48, 192' , "cls_model_dir" : '/Users/prajendr/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer' , "cls_thresh" : 0.9 , "cpu_threads" : 10 , "crop_res_save_dir" : './output' , "det" : True , "det_algorithm" : 'DB' , "det_box_type" : 'quad' , "det_db_box_thresh" : 0.6 , "det_db_score_mode" : 'fast' , "det_db_thresh" : 0.3 , "det_db_unclip_ratio" : 1.5 , "det_east_cover_thresh" : 0.1 , "det_east_nms_thresh" : 0.2 , "det_east_score_thresh" : 0.8 , "det_limit_side_len" : 960 , "det_limit_type" : 'max' , "det_model_dir" : '/Users/prajendr/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer' , "det_pse_box_thresh" : 0.85 , "det_pse_min_area" : 16 , "det_pse_scale" : 1 , "det_pse_thresh" : 0 , "det_sast_nms_thresh" : 0.2 , "det_sast_score_thresh" : 0.5 , "draw_img_save_dir" : './inference_results' , "drop_score" : 0.5 , "e2e_algorithm" : 'PGNet' , "e2e_char_dict_path" : './ppocr/utils/ic15_dict.txt' , "e2e_limit_side_len" : 768 , "e2e_limit_type" : 'max' , "e2e_model_dir" : None , "e2e_pgnet_mode" : 'fast' , "e2e_pgnet_score_thresh" : 0.5 , "e2e_pgnet_valid_set" : 'totaltext' , "enable_mkldnn" : False , "fourier_degree" : 5 , "gpu_mem" : 500 , "help" : '==SUPPRESS==' , "image_dir" : None , "image_orientation" : False , "ir_optim" : True , "kie_algorithm" : 'LayoutXLM' , "label_list" : ['0' , '180' ], "lang" : 'en' , "layout" : True , "layout_dict_path" : None , "layout_model_dir" : None , "layout_nms_threshold" : 0.5 …