Dynamic Google Analytics API using Python

Problem: Power BI has a Google Analytics connector but one can not apply filters to the download such as date range, event category etc. This article will solve the problem by dynamically downloading data using Google Analytics API based on user’s requirements.

Software Used: Python and Google Analytics Query Explorer

Solution: OAuth method is used for Google Analytics authentication, therefore google API library needs to be installed in Python. To use OAuth, you need a client_secret.json that can be downloaded from console.developers.google.com (Refer https://developers.google.com/analytics/devguides/config/mgmt/v3/quickstart/service-py and https://developers.google.com/analytics/devguides/config/mgmt/v3/quickstart/installed-py to download client_secret.json)

> pip install googleapiclient

Before you start programming, you should familiarize yourself with Google Analytics Query Explorer. It generates a query URI which results in a JSON response object. Login to https://ga-dev-tools.appspot.com/query-explorer/#report-start with your google credentials. After Login, you can see all the google analytics accounts you have access to and select one from the drop down menu.

If you scroll down the page, then you will see the API Query URI. Our goal is to generate this URI using Python. You need to understand the different components of the query. and play around with tool so that you know what results you are expecting from the program.

We will advise you to read https://developers.google.com/analytics/devguides/reporting/core/v3/reference#ids which will help you to understand the type of operators allowed to use.

There are five python files :
1. GADataDownload.py
2. GoogleAuthentication.py
3. URITemplate.py
4. AccessToken.py
5. SaveCSVFile.py

GADataDownload.py contains the main() function and other files determines the different components of the program.

First step is to make an authentication call to Google and store a file that contains an access token which is an important component of the final query URI. Please note libraries used in Google API official guide are deprecated. You need to create a python file called GoogleAuthentication.py. This will create credentials by using client_secrets.json downloaded earlier and build a service object. get_service_request() function generates a .dat file which contains an access token. get_first_profile_id() function uses service object to get view id for the authorized users.

import argparse
import httplib2
from googleapiclient.discovery import build
from oauth2client import tools, client, file


def get_service_request(api_name, api_version, scope, client_secret):

    parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter, parents=[tools.argparser])
    flags = parser.parse_args([])

    flow = client.flow_from_clientsecrets(client_secret, scope=scope, message=tools.message_if_missing(client_secret))

    storage = file.Storage(api_name + '.dat')
    credentials = storage.get()
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, flags)
    http = credentials.authorize(http=httplib2.Http())

    service = build(api_name, api_version, http=http)

    return service


def get_first_profile_id(service):
    accounts = service.management().accounts().list().execute()
    if accounts.get('items'):
        account = accounts.get('items')[0].get('id')

        properties = service.management().webproperties().list(accountId=account).execute()

        if properties.get('items'):
            prop = properties.get('items')[0].get('id')
            profiles = service.management().profiles().list(accountId=account,webPropertyId=prop).execute()

            if profiles.get('items'):
                return profiles.get('items')[0].get('id')

    return None

Secon step is to create AccessToken.py which will return the access_token stored in the analytics.dat file.

import json


def get_access_token():
    with open('analytics.dat') as json_file:
        data = json.load(json_file)
        access_token = data["access_token"]
    return access_token

Third Step is the heart of the program where you need to generate a query URI that takes parameters such as view_id, start_date, end_date, metrics, dimensions and filters. URITemplate.py will be created at this step. You can alter the function with the number of parameters you want to fulfill your requirement. generate_uri() function will use the access token returned by the previous step. By default, number of records returned by Google Analytics is 1000; to increase the number of records you can alter the value of max-results which can go to a maximum of 10000. Function uses urllib.parse.quote_plus() to encode the values into URL supported format. Basically, query_uri is a string that contains all the components of the URI. return_uri() function will accept all the parameter values and return the query. It might be confusing that it is not necessary to create two separate functions as everything can be incorporated into one. We have tried to somewhat implement SOLID principles which defines that every function must hold a single responsibility.

import urllib.parse
from AccessToken import get_access_token


def generate_uri(scope, view_id, start_date, end_date, metrics, dimensions, filters):
    access_token = get_access_token()
    max_results = 10000
    query_uri = scope \
                + 'ids=' + urllib.parse.quote_plus(view_id) \
                + '&start-date=' + start_date \
                + '&end-date=' + end_date \
                + '&metrics=' + urllib.parse.quote_plus(','.join(metrics)) \
                + '&dimensions=' + urllib.parse.quote_plus(','.join(dimensions)) \
                + '&filters=' + urllib.parse.quote_plus(','.join(filters)) \
                + '&max-results=' + str(max_results) \
                + '&access_token=' + access_token
    return query_uri


def return_uri(view_id, start_date, end_date, metrics, dimensions, filters):
    scope = 'https://www.googleapis.com/analytics/v3/data/ga?'
    query_uri = generate_uri(scope, view_id, start_date, end_date, metrics, dimensions, filters)
    return query_uri

Last step will be to save the data from the JSON response to a .csv file and create a file called SaveCSVFile.py. You can notice from the JSON Response that there are several array objects such as columnHeaders, Rows, id, query, etc.

So, you need to explicitly mention the array name that needs to be written in the .csv file. ColumnHeaders and Rows have different data format, thus it is needed to write them separately into the file. so, you need to open the file first to write column header values and then append the same file with the data rows.

import requests
import json
import csv


def save_output_file(uri, file_name):
    file_path = '<Enter you file location>' + file_name
    request = requests.get(uri)
    json_data = json.loads(request.text)
    column_header = json_data["columnHeaders"]
    header_list = []
    for header in range(len(column_header)):
        header_list.append(column_header[header]["name"])
    write_file = open(file_path, 'w', newline='')
    csv_writer = csv.writer(write_file)
    csv_writer.writerow(header_list)
    write_file.close()

    ga_data = json_data["rows"]
    append_file = open(file_path, 'a', newline='')
    row_write = csv.writer(append_file)
    for i in ga_data:
        row_write.writerow(i)
    append_file.close()

Finally, you need to create the main python file that calls all the functions named GADataDownload.py.

from URITemplate import return_uri
from GoogleAuthentication import get_service, get_first_profile_id
from SaveCSVFile import save_output_file


def main():
    scope = ['https://www.googleapis.com/auth/analytics.readonly']
    service = get_service('analytics', 'v3', scope, 'client_secret.json')
    get_first_profile_id(service)

    uri = return_uri('ga:<enter your view-id>', '2daysAgo', 'yesterday',
                    ['ga:totalEvents', 'ga:uniqueEvents'],
                    ['ga:adDistributionNetwork', 'ga:adwordsCampaignID', 'ga:adwordsAdGroupID',
                     'ga:deviceCategory', 'ga:eventLabel', 'ga:date'],
                    ['ga:eventCategory=='<enter the event category value>';ga:eventAction=='<enter the event action value>';ga:sourceMedium=='<enter the source value>'])
    file_name = '<enter the file name>'
    save_output_file(uri, file_name)


if __name__ == '__main__':
    main()

Voila, you have the desired data saved in a .csv file at the location provided. You can always save the data directly to the database or a blob, alter the SaveCSVFile.py accordingly.

Note: When you run the program for the first time then you have to allow some permissions. You will see the following pop-ups:

Choose the account you want to access the Google Analytics API through.

Click on Allow and then you will receive a message says authentication flow has been completed.

Similarly, you will get a notification on Python Output Console that authentication needs to be completed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: