Does your online marketplace deal with contracts, receipts, and invoices?
Then you probably need a better tool for automated invoice scanning and data capture that will unchain you from the paper routine.
In this article, we want to tell you about one of the tools for data extraction - Xtracta and share our experience of its integration in the B2C marketplace for facilities management.
Xtracta - automatic data recognition tool
Let's get straight to the point.
The motto of Xtracta is:
OCR but not as you know it
Xtracta is an automated data entry software that automatically captures data from any type of document.
It takes any document format such as PDF, DOC, JPG, email, PNG, XLS and automatically extracts the information you need.
How is it different from common OCR?
Unlike traditional OCR, there is no need to create templates for each document design. Xtracta is powered by artificial intelligence and machine learning. It learns from each document processes and automatically creates fields for the unlimited number of documents in different languages.
Here is what you get by integrating Xtracta into your software
- Multi-field capture
Tell the engine what information you want to capture and it will automatically pull the appropriate vendor details, amounts, dates, payment information (card and cash, taxes, line items).
- Different input channels
You can input your invoices to email, use SFTP for bulk uploading, mobile app, web portal with drag and drop function or API upload.
- Great mobile experience
Xtracta API scans the receipt and captures the data in seconds.
- Receipt stitching
Stitching multiple images together as one receipt
- Mobile app and image capture SDK
You can use Xtracta mobile app or create your own with Image Capture SDK.
Xtracta has a global customer base with over 500,000 users each day and 10 million pages processed per month.
The project we used it in deals with a plenty of invoices and documents coming out from different channels.
In addition, different companies have their own invoice templates, so OCR wasn't the right choice, as it requires separate templates for each invoice. That's why there was a need to automate these actions, make them faster and more efficient.
Let's dive right into the ins and outs of Xtracta integration process.
Xtracta integration (our experience)
First of all, let's see how Xtracta works within our project.
The project we are working on deals wit invoices from different service providers.
These invoices may come in different ways. For example, a user may send them directly to the application. In such a case, we need to create the separate functionality and UI. But there is a simpler and faster way out - send these invoices to an appropriate email.
The users don`t waste time to sign up for the application , all they need is email the invoice.
Consequently, we had to choose some package, that will listen to email. We have picked mail-listener2.
This package turned out to be not fully suitable for our project, so we had to slightly update it.
For example, if the user uploads approximately 10 MB of documents, attachments processing takes a lot of time because mail listener makes it synchronously. The problem we had was that the Galaxy dropped our container, as it took too much time of processor time.
We have forked this package and fixed the problems. In case you will face similar issues, you can find the updated package here.
We get a notification when an email arrives to the inbox. After this, the email attachments are automatically derived and sent to Xtracta.
Check out technical documentation about Xtracta API here.
After Xtracta returns captured invoice data, we try to link our internal entities like a client, supplier, property, etc. with the invoice document.
Xtracta has such option, but sometimes the data isn't correspondent, so we decided to do it on our own.
Our task was to match names/titles from the invoice with the names/titles in our database.
We use MongoDB for our application, so firstly we decided to use MongoDB text search,which is good for address matching. However, it searches only by full words.
For example, if the name of the client is “client name”, but the Xtracta has returned us “client nam” (similar cases happen quite often), the text search won't match these strings.
We have dealt with the invoice from Purity Water Company. This is how Xtracta has recognized the name.
It has shown the result “Turity Ater Mpany”, which also proves the problem with fonts recognition (“T” instead of “P”).
So we have gone with the mix of $regex - expression, that provides regular capabilities for matching strings in queries and text search.
After the invoices are captured and matched, we return them to users of our app, where they can approve it i.e edit data, and add new fields.
Conclusion
We truly hope that this review about one of data extraction and processing tools was useful for you. In case you want to use it for your online marketplace or just have any questions about the process of Xtracta integration, feel free to drop us few lines.
TechHub