Juan Brekes
batch_upload.py
and streaming_upload.py
.batch_upload.py
: This script provides functionality for batch uploading multiple invoices for processing. It takes a directory of PDF files as input, processes them in batches, and updates the BigQuery table with the extracted data. It utilizes asynchronous batch processing capabilities of Google Document AI.streaming_upload.py
: Similar to batch_upload.py
, this script allows for parsing and uploading invoices one at a time. It provides a simpler interface for processing individual documents by specifying the document location directly.extract_data()
function within the scripts handles this transformation.load_to_bigquery()
function. This function inserts the data into the specified BigQuery dataset and table.batch_upload.py
)streaming_upload.py
)project_id
: Google Cloud project ID.location
: Location where the Document AI processor is deployed.processor_id
: ID of the Document AI processor.credentials_path
: Path to the service account credentials JSON file.dataset_id
: BigQuery dataset ID.table_id
: BigQuery table ID.gcs_input_prefix
: Google Cloud Storage prefix for input documents.gcs_output_uri
: Google Cloud Storage URI for output documents.