batch_upload.py and streaming_upload.py.batch_upload.py: This script provides functionality for batch uploading multiple invoices for processing. It takes a directory of PDF files as input, processes them in batches, and updates the BigQuery table with the extracted data. It utilizes asynchronous batch processing capabilities of Google Document AI.streaming_upload.py: Similar to batch_upload.py, this script allows for parsing and uploading invoices one at a time. It provides a simpler interface for processing individual documents by specifying the document location directly.extract_data() function within the scripts handles this transformation.load_to_bigquery() function. This function inserts the data into the specified BigQuery dataset and table.batch_upload.py)streaming_upload.py)project_id: Google Cloud project ID.location: Location where the Document AI processor is deployed.processor_id: ID of the Document AI processor.credentials_path: Path to the service account credentials JSON file.dataset_id: BigQuery dataset ID.table_id: BigQuery table ID.gcs_input_prefix: Google Cloud Storage prefix for input documents.gcs_output_uri: Google Cloud Storage URI for output documents.Posted Aug 26, 2024
Parse incoice PDF data and upload to Google BigQuery
0
12