Using Google Sheets to collaborate is a great way to get work accomplished if you and your collaborator/s:

have different schedules and upload data at different times.
collect different data that needs to be aggregated in the same place,
want to work off the same kind of data.
don't use a formal database for data collection.

As an active data scientist, while I would prefer to have data captured in a queryable database, sometimes that might not be an option; especially if that data is a one-off experiment. Luckily, Google Sheets has an api to which I can connect and programmatically manipulate data. This blog post will walk through some of these actions using Python.

Before jumping into the code, I am presuming you have a familiarity with Google's service account system. To quote from Google's cloud documentation, "A service account is a special type of Google account intended to represent a non-human user that needs to authenticate and be authorized to access data in Google APIs." You can read more about it here. If you don't have a service account you won't be able to access data from a spreadsheet.

Once your service account is set up we can get started by installing some packages. First we'll install the google-api-python-client package. This api will let us use Google's APIs. Next, we'll install the dataframe manipulator package, pandas, followed up with boto3, AWS's API Python client library. We'll use this library to get our secret account credentials from AWS. The json and os packages are needed for data formatting and environment variable calling, respectively. Finally, we'll install the faker package. This last one will help us generate some nice dummy data, on which to experiment, while testing out the Sheets API.

from google.oauth2 import service_account
from googleapiclient.discovery import build
import pandas as pd
import boto3
import json
import os
from faker import Faker

Next, we need to create and new spreadsheet to which we'll connect using the block of code below. Type sheets.new into your browser and a new Google spreadsheet workbook will be created. Be sure to give it a title. You'll next want to grab the workbook ID from the URL bar.

I placed mine into an environment variable and passed it to my notebook environment.

Alright, let's connect to the Sheets API. We'll need to retrieve our service account credentials. I saved mine in AWS secret's manager service. Using boto3, I can pass these credentials to the sheets service, authorizing connection.

sm = boto3.client("secretsmanager")
secret_json = sm.get_secret_value(SecretId="trimble-service-account-google-apis")[
    "SecretString"
]
secret_data = json.loads(secret_json)
scopes = ["https://www.googleapis.com/auth/spreadsheets"]

creds = service_account.Credentials.from_service_account_info(
    secret_data, scopes=scopes
)
service = build("sheets", "v4", credentials=creds)
sheets = service.spreadsheets()
sheet_id = os.environ['SHEET_ID']

The first thing we're going to do is add data to the spreadsheet. We'll use the faker module to generate some random data. We can call the profile() method which creates a "person" with various information include a Social Security Number, address, name, etc. It's already formatted correctly for incorporation into a dataframe with pandas. Let's generate the data and visualize it calling the head() method on the dataframe.

fake = Faker()
profileData = [fake.profile() for i in range(100)]
df = pd.DataFrame(profileData)
df.head()

	job	company	ssn	residence	current_location	blood_group	website	username	name	sex	address	mail	birthdate
0	Lecturer, higher education	Hart PLC	164-04-3123	13655 Ashley Unions Suite 616\nLoveborough, ID...	(53.0466615, -82.546387)	A+	[http://www.howell-sharp.net/, https://www.rob...	youngzachary	Cassandra Alvarado	F	63480 Liu Run\nNorth Anne, OR 10500	crystal50@yahoo.com	1972-12-03
1	Records manager	Walker, Patton and George	180-92-5579	21263 Pamela Extensions\nVillamouth, KS 96993	(-78.0920215, -165.065549)	O-	[https://campbell-reed.com/]	nelsonsamantha	Michael Blake	M	67091 Adam Wall Suite 530\nNorth Lisa, AK 93355	asmith@hotmail.com	1937-08-24
2	Visual merchandiser	Kelley-Simmons	475-02-6595	5506 Riggs Junctions\nTaylorville, PA 48609	(10.722590, -41.148270)	A+	[https://www.brown.net/, http://www.davis.com/...	jefferysmith	Richard Jones	M	72826 Norman Grove\nAlantown, MD 17603	imelendez@yahoo.com	1993-08-01
3	Mining engineer	Warren, Walsh and Myers	894-20-0037	31788 Crane Extension\nGarciafort, MS 74611	(25.205013, 133.580372)	A+	[https://www.higgins.info/, https://cline.com/]	josephbell	Jessica Velez	F	7032 Devin Groves Apt. 845\nNew Emilyville, UT...	brandonbright@yahoo.com	1911-03-21
4	Event organiser	Peterson-Johnson	609-50-0924	97149 Shannon Divide\nNorth Williamberg, UT 09107	(82.5767035, -111.439504)	O-	[https://www.smith.com/]	markadams	Jennifer Hernandez	F	Unit 1455 Box 1494\nDPO AA 37368	wpeterson@hotmail.com	1965-10-06

Let's down select from some of the data to get a managable set. Below I selected 4 columns of data I'll use to demonstrate Google sheet uploading.

upload_df = df[['job', 'company', 'blood_group', 'username']]

upload_df.head()

	job	company	blood_group	username
0	Lecturer, higher education	Hart PLC	A+	youngzachary
1	Records manager	Walker, Patton and George	O-	nelsonsamantha
2	Visual merchandiser	Kelley-Simmons	A+	jefferysmith
3	Mining engineer	Warren, Walsh and Myers	A+	josephbell
4	Event organiser	Peterson-Johnson	O-	markadams

To start the uploading of values to Sheets, we need to create a list of lists. Each row represents a list of data. We'll combine these into another list for uploading. We then have some parameters to pass to the API. These include the range which can accept the name of the sheet to which you're putting the data or a range of values like A1:B10. For our case, I'll use the sheet name, Sheet1. Another parameter is value_input_option. This lets Sheets infer the data type or represent the information as is. I want it to be represented as is so I'll pass RAW as an input.

values = [upload_df.columns.values.tolist()]
values.extend(upload_df.values.tolist())
data = [
    {'range' : 'Sheet1', 
     'values' : values}
]
batch_update = {
    'value_input_option': 'RAW',
    'data': data 
}

request = (
    service.spreadsheets()
    .values()
    .batchUpdate(
        spreadsheetId=sheet_id,
        body=batch_update
    )
    .execute()
)

If you navigate back to the workbook in your browser, you'll see that the data has made it's way into the sheet just as it was in your dataframe. We'll extract the data we just uploaded to demonstrate getting information from a sheet.

We'll first need to describe the range of cells from which we'll be pulling data. Google Sheets uses A1 notation. You first provide the sheet name, followed by an ! and then a range of values. Below I have Sheet1!:E101. I'm compensating for column headers. This will extract every value in columns A - E and rows 1 - 101.

_r = "Sheet1!A1:E101"

Because the data returned in the response is provided as a list of lists format, we can pass it directly to the pd.DataFrame() method of pandas.

new_data = (
    pd.DataFrame(
        sheets.values()
        .get(spreadsheetId=sheet_id, range=_r, majorDimension='ROWS')
        .execute()['values'])
)
new_data.head()

	0	1	2	3	4
0	job	company	blood_group	username	id_number
1	Lecturer, higher education	Hart PLC	A+	youngzachary	1
2	Records manager	Walker, Patton and George	O-	nelsonsamantha	2
3	Visual merchandiser	Kelley-Simmons	A+	jefferysmith	3
4	Mining engineer	Warren, Walsh and Myers	A+	josephbell	4

The column headers are returned as generic data values and are the first data in each column. We can set that row as the column names. The code below accomplishes this for us. We can finally visualize that data in workable fashion.

new_data.columns = new_data.iloc[0]
new_data.drop(index=[0], inplace=True)
new_data.head()

	job	company	blood_group	username	id_number
1	Lecturer, higher education	Hart PLC	A+	youngzachary	1
2	Records manager	Walker, Patton and George	O-	nelsonsamantha	2
3	Visual merchandiser	Kelley-Simmons	A+	jefferysmith	3
4	Mining engineer	Warren, Walsh and Myers	A+	josephbell	4
5	Event organiser	Peterson-Johnson	O-	markadams	5

Working with the Google Sheets API can be difficult at times because the documentation isn't the best. However, when you get it down it is extremely useful for collaborating with colleagues who are not as data or code fluent.