Bulk Data Insertion API (BDIA)

Introduction#

Bulk Data Insertion API (BDIA) is an Adobe Analytics service that allows us to integrate data/server calls from Non-Internet powered devices into Adobe Analytics. It supports server calls to be uploaded into batch files which can be used to ingest (current) as well as historical data into Adobe Analytics.

The Bulk Data Insertion API supports server-side ingestion of tracking calls/CSV formatted data into Adobe Analytics instead of using CLient-Side implementation library AppMeasurement.js for tracking.

The difference between Data Insertion API and Bulk Data Insertion API is that (BDIA) supports multiple server calls to be ingested and processed simultaneously and supports CSV file uploads for bulk imports. The (DIA) supports HTTP POST and HTTP GET requests only to submit properly-formatted Data Insertion XML to the Data Insertion URL per server call.

Prerequisites#

For a basic working demo (batch upload) - below are high-level pre-requisites

  • The report suite is timestamp-enabled or timestamp optional.
  • Adobe IO API Integration (We have used JWT-based integration for this demo)
  • Formatted CSV file for batch ingestion (explained below)

CSV File Format (BDIA)#

The Bulk Data Insertion API supports Batch file processing and below are minimal required columns for uploading server calls. Please be advised the real production data will vary with multiple dimensions and metrics. The sample file shared below contains only the required columns in addition to eVar4/prop4 and event1 for testing.

Minimum Required Columns

  • One Visitor Identification Column which can contain visitorID, marketingCloudVisitorID, IPAddress, customerID.[customerIDType].id with customerID.[customerIDType].isMCSeed set to 1.
  • reportSuiteID
  • userAgent
  • timestamp (supports EPOCH timestamp - https://www.epochconverter.com/)
  • Atleast one of dimension from below -
    • pageURL
    • pageName
    • linkType with linkName or linkURL
    • queryString that includes pageURL, pageName, or linkType as query string parameters with values

Below is a sample file for reference -

Python Client for Adobe Analytics API Authentication#

Python Client API Authetication For the demo, I used the Python client for Adobe Analytics API 2.0 available in the Adobe Analytics docs/git repo (https://github.com/AdobeDocs/analytics-2.0-apis/tree/main/resources/python)

Steps For Batch Upload Process

  1. Install dependencies using the requirements.txt file
pip install requirements.txt
  1. Update config.ini file with Adobe IO credentials. The below values are required in order to authenticate and use Analytics API
  • API Key (Client ID)
  • Technical account ID
  • Organization ID
  • Client secret
  • private.key
  1. Run the ims_client.py file (more details in Adobe Docs) to authenticate API (create JWT and generate access_token for accessing API end-points)

  2. Once the access_token is generated, We can use the below script to compress the CSV file. The Batch Data Insertion API uses ‘gzip’ compressions.

# Handles CSV compression to GZ
import pandas
bulk_data_insertion_file = "data_insertion_upload_v5.csv"
df = pd.read_csv(bulk_data_insertion_file, index_col=None)
print(df.head())
bulk_data_insertion_file = bulk_data_insertion_file + ".gz"
df.to_csv(bulk_data_insertion_file, compression='gzip', index=False)
  1. Once we have compressed (".gz”) CSV file ready we can use the below code block to batch-upload CSV file for server calls processing in Adobe Analytics using (BDIA)
headers = {
            "accept": "application/json",
            "Authorization": "Bearer {}".format(access_token),
            "x-api-key": config["apikey"],
            "x-proxy-global-company-id": global_company_id,
            "x-adobe-vgid": "visitorIDGroup90"            
            }

files = {
    'file': open(bulk_data_insertion_file, 'rb'),
}

response = requests.post('https://analytics-collection.adobe.io/aa/collect/v1/events', headers=headers, files=files)
print(response.json())

  1. Below is a sample response code for successful imports -
  {
   'file_id': 'xxxx-409a-b794-36f9exxxxxxx',
   'visitor_group_id': 'visitorIDGroup90',
   'size': 330,
   'received_date': 1655203539,
   'rows': 8,
   'invalid_rows': 0,
   'upload_name': 'data_insertion_upload_v5.csv.gz',
   'status': 'File received, awaiting processing',
   'status_code': 'UPLOADED',
   'processing_log': 'Processing complete: 8 rows will be submitted.  No invalid rows.\n',
   'idempotency_key': 'xxxx-409a-b794-36f9exxxxxxx'
    }

Reviewing Worskapce Reports#

Once the Bulk Data Insertion API completes processing we can analyze the server calls data collected by creating workspace reports similar to using data collected client-side by AppMeasurement Library.

All the diemsnions and metrics work similar to client-side data-collection. The best part is we can also append the historical data for analysis.

Important references#

If you enjoyed this post, I’d be very grateful if you’d help it spread by sharing it on Twitter/LinkedIn. Thank you!

©2020-2024 abhinavpuri.com