Data Extraction
Docutains Data Extraction/ Data Capture SDK for Windows comes with the ability to extract document data based on imported documents.
Initialization
- Follow the Getting started guide
- Initialize the Docutain Android SDK as described here
Set Analyze Configuration
If you do not need to analyze for BIC, PaymentState , SEPACreditor or ReadGiroCode, you can skip this section and jump directly to Get the detected data.
If you want to analyze for BIC, PaymentState, SEPACreditor or ReadGiroCode, you need to set the analyze configuration accordingly.
This needs to be done before the scan process is started. The best place to do this is right after initializing the SDK.
// ...
// Initialize the SDK first, then set the Analyze Configuration
// ...
AnalyzeConfiguration analyzeConfiguration = new Docutain.SDK.Windows.AnalyzeConfiguration();
analyzeConfiguration.ReadBIC = true; //defaults to false
analyzeConfiguration.ReadPaymentState = true; //defaults to false
analyzeConfiguration.ReadSEPACreditor = true; //defaults to false
analyzeConfiguration.ReadGiroCode = true; //defaults to false
if(!DocutainSDK.SetAnalyzeConfiguration(analyzeConfiguration)){
string error = DocutainSDK.GetLastError();
}
Get the detected data
In order to get the detected data of the imported document, call the following line of code:
...
//import file
...
string jsonData = Document.Analyze();
The detected data will be returned as JSON string. Depending on how you configured your AnalyzeConfiguration after initializing the SDK (see here), the structure will be one of the two following. The differences will be in IBAN, Bank and PaymentState.
When reading BIC is deactivated (default behaviour), you will get a value IBAN which contains all detected IBANs:
{
"Address":
{
"Name1": "Verbandsgemeindeverwaltung Nastätten",
"Name2": "",
"Name3": "",
"Zipcode": "56352",
"City": "Nastätten",
"Street": "Postfach",
"Phone": "06772 802 0",
"CustomerId": "",
"IBAN": ["DE76570928000208303503", "DE41510500150710030316"]
},
"Date": "2020-01-24",
"Amount": "940.84",
"InvoiceId": "20/VA06894/0009500",
"Reference": "RNr:20/VA06894/0009500 vom 24.01.2020"
}
When reading BIC is activated, you will get a value Bank which contains tuples of the BIC and the IBAN, if any detected:
{
"Address":
{
"Name1": "Verbandsgemeindeverwaltung Nastätten",
"Name2": "",
"Name3": "",
"Zipcode": "56352",
"City": "Nastätten",
"Street": "Postfach",
"Phone": "06772 802 0",
"CustomerId": "",
"Bank": [{"BIC": "GENODE51DIE",
"IBAN": "DE76570928000208303503"},
{"BIC": "NASSDE55XXX",
"IBAN": "DE41510500150710030316"}]
},
"Date": "2020-01-24",
"Amount": "940.84",
"InvoiceId": "20/VA06894/0009500",
"Reference": "RNr:20/VA06894/0009500 vom 24.01.2020"
}
When reading the payment state is activated, you will get a value PaymentState which contains either Paid or ToBePaid:
{
"Address":
{
"Name1": "Verbandsgemeindeverwaltung Nastätten",
"Name2": "",
"Name3": "",
"Zipcode": "56352",
"City": "Nastätten",
"Street": "Postfach",
"Phone": "06772 802 0",
"CustomerId": "",
"Bank": [{"BIC": "GENODE51DIE",
"IBAN": "DE76570928000208303503"},
{"BIC": "NASSDE55XXX",
"IBAN": "DE41510500150710030316"}]
},
"Date": "2020-01-24",
"Amount": "940.84",
"InvoiceId": "20/VA06894/0009500",
"Reference": "RNr:20/VA06894/0009500 vom 24.01.2020",
"PaymentState": "ToBePaid"
}
When reading the SEPA creditor is activated, you will get a value SEPACreditor. If you are using the data extraction for example for photo payment, you should activate reading of the SEPA creditor. Some invoices specify a SEPA creditor that is not the same as the sender of the document, but you need to get the recipient of the payment.
{
"Address":
{
"Name1": "DB Fernverkehr AG",
"Name2": "",
"Name3": "",
"Zipcode": "60643",
"City": "Frankfurt am Main",
"Street": "BahnCard-Service",
"Phone": "0302970",
"CustomerId": "",
"Bank": [{"BIC": "PBNKDEFFXXX",
"IBAN": "DE02100100100152517108"}]
},
"Date": "2024-10-14",
"Amount": "244.00",
"InvoiceId": "2023174086",
"Reference": "RNr:2023174086 vom 14.10.2024",
"PaymentState": "ToBePaid",
"SEPACreditor": "DB Vertrieb GmbH"
}
}
When GiroCode reading is enabled and a valid GiroCode is found on the document, the SDK returns the extracted GiroCode payment data instead of performing a full document analysis. This can significantly reduce processing time, especially for larger documents.
If no valid GiroCode is found, the SDK automatically falls back to the standard recognition process.
In both cases, the returned result contains all payment-relevant information.