FYP demonstration of document parsing of booking documents
OCR Tool for the 1853 Archive Site
Analyze documents to extract and structure text
Extract and query terms from documents
Find relevant text chunks from documents based on a query
Find information using text queries
Extract information from documents by asking questions
Find similar text segments based on your query
Analyze scanned documents to detect and label content
A token classification model identifies and labels specific
Find relevant legal documents for your query
Search documents and retrieve relevant chunks
Process text to extract entities and details
Donut-booking-gradio is a tool designed to extract text from scanned booking documents. It is built as a proof-of-concept for a Final Year Project (FYP) focusing on document parsing and text extraction. The application leverages AI technology to analyze and extract readable text from scanned or image-based booking documents, making it easier to work with digital data.
• Live Interface: Provides a real-time interface for uploading and processing documents.
• Multi-Page Support: Capable of handling documents with multiple pages or scanned images.
• Text Extraction: Accurately extracts text from scanned booking documents using AI.
• Customizable Settings: Allows users to adjust settings for better extraction accuracy.
• Export Options: Enables users to export extracted text for further use.
• User-Friendly Design: Designed with a simple and intuitive user interface.
• Cross-Platform Compatibility: Works seamlessly across different operating systems.
1. What file formats does donut-booking-gradio support?
Donut-booking-gradio supports common image formats like JPG, PNG, and PDF.
2. How accurate is the text extraction?
The accuracy depends on the quality of the scanned document. Clear images yield better results, while blurry or distorted scans may reduce accuracy.
3. Can I process multiple pages at once?
Yes, the application supports multi-page documents, allowing you to process and extract text from all pages simultaneously.