Building my digital bookshelf using Notion and Google Books APIs
Enriching Notion's reading list template with metadata on each book using Google Books APIs
APIs Used: Google Books API, Notion API
Date Completed: December 30, 2021
Date Started: December 27, 2021
Final Product: notion.so/d2b8521392574593afdd8735d166c943
Project Goal: Updating my Notion Reading list with Book’s metadata from Google Books API
Status: Completed
github: github.com/deepagoyal/Notion-Bookshelf
Created Date : December 30, 2021
Written By : Deepa Goyal
Notion provides a very thorough reading list template that once I started using, I got hooked to it. I have been reviewing my reading list. As the list of my finished books grew, I kept adding more to my list. I also enjoy using this template to take notes from the books that I’m reading. Having it all in one place has allowed me to revisit them more often than I used to.
The view on Notion’s template gallery looks neat with images and titles. It has the classic iTunes library look that I find just perfect.
But then I looked at my Reading List, I didn’t have any images unless I was manually uploading them. Ugh!
To get to the aesthetically pleasing view I would have to do much more data entry work which isn’t something I’m excited about. So I decided to write a python script to do it for me.
As I started looking into it, I found out that Goodreads API has been deprecated as of December 2020, which is a shame because I have seen a few good projects people have done using those. The only other free books API I could find was Google books API.
Here in project you would see the code that ultimately made my Reading list look like a Book shelf.
Google’s API was fairly straight forward to use. But I was quite surprised that of all the data it provides, it doesn’t have a link to the google books page for the given book.
Getting started with the Notion API(about:blank#Getting-started-with-the-Notion-API)
- Finding the Database ID and Page ID - This is pretty deep in the page, but very important to get started. The database ID is 32 character alphanumeric part of your URL to the database.
- Sharing the page you want your App to access with the App - Notion API’s get started page has a good guide with gifs showing how to create an App, share your page with the app to get started. I somehow missed this page at first and found some posts on stack overflow where other people had too. So don’t forget to share your page with the app.
- Query database vs Retrive database - I struggled with Retrieve database endpoint trying to parse the data but it was not complete and parsing it was too many steps. Query database endpoint was much more useful for and intuitive.
- Dealing with datatypes of the properties - The data is nested too deep and to update it, you have to build the vocabulary for it.
I had tremendous fun doing this project. The only thing I wish for was if Notion provided a way to import this Jupyter notebook into a notion page so that I could do all my work here and publish it as a blog using notion. When i tried to import this page as HTML, i had to do a lot of manual edits in notion, when i imported it as markdown, code wasn't imported or formatted properly.
import pandas as pd
import requests
import json
import math
Google Books Api search based on search term(about:blank#Google-Books-Api-search-based-on-search-term)
In [2]:
#google search for the book using google books API
def google_book_search(search_terms):
url = 'https://www.googleapis.com/books/v1/volumes?q='
response = requests.get(url+search_terms, verify = False)
# load data using Python JSON module
r = response.content
data = json.loads(r)
# Normalizing data
df = pd.json_normalize(data, record_path =['items'])
first_row = df.iloc[0]
first_row_df = df.iloc[:1]
first_row_df['search_term']=search_terms
return first_row_df;
Parameters(about:blank#Parameters)
In [3]:
secret_key = '<your-secret-key-here>'
database_id = '<get-database-id-from-your-database-url>'
Query database(about:blank#Query-database)
In [4]:
def query_databases(secret_key, database_id):
url = "https://api.notion.com/v1/databases/"+database_id+'/query'
#print(url)
payload={'id' : database_id}
headers = {
'Notion-Version': '2021-05-13',
#"Content-Type": "application/json",
'Authorization': 'Bearer '+secret_key
}
response = requests.request("POST", url, headers=headers, data=payload, verify=False)
print(f"The response code is {response.status_code}")
if response.status_code != 200:
raise ApiError(f'Response Status: {response.status_code}')
else:
return response.json()
Get data from Reading List database in Notion
In [5]:
res = query_databases(secret_key, database_id)
Results has the data we need(about:blank#Results-has-the-data-we-need)
In [6]:
#dict_keys(['object', 'results', 'next_cursor', 'has_more'])
res.get('results')
Get property data(about:blank#Get-property-data)
In [7]:
#results are in list format, this has all the items from the database, validate the number of items to make sure all items were read successfully
results = res.get('results')
df = pd.DataFrame([])
print(type(df))
print(len(list(results)))
Put data into data frame across all items in database(about:blank#Put-data-into-data-frame-across-all-items-in-database)
In [8]:
i=0
while i < len(results):
#each items in results to iterate and convert into dataframe
page_properties = results[i].get('properties')
page_keys = page_properties.keys()
page_values = page_properties.values()
#get page URL
page_properties['page_url'] = results[i].get('url')
#create dataframe
df_line = pd.DataFrame(list([page_values]), columns=list(page_keys))
df = df.append(df_line)
i+=1
df = df.reset_index()
df.shape
df
Cleaning data column by column
In [9]:
#make copy of our dataframe
Notion_data=df
print(f'Shape of dataframe:{Notion_data.shape}')
print(Notion_data.columns)
Link data(about:blank#Link-data)
In [10]:
def clean_link(value):
try:
if type(value) is dict:
value = (value.get('url'))
except:
value = 'error'
return value
Name Data(about:blank#Name-Data)
In [11]:
#df['Name'].iloc[1].get('title')[0].get('plain_text')
def clean_name(value):
try:
if len(value.get('title')) != 0:
ret_=value.get('title')[0].get('plain_text')
else:
ret_= None #value.get('title')
except:
ret_ = 'error'
return ret_
Author data(about:blank#Author-data)
In [12]:
#Notion_data['Author'].iloc[1].get('multi_select')[0].get('name')
def clean_author(value):
if type(value) is dict:
try:
if len(value.get('multi_select')) != 0:
ret_=value.get('multi_select')[0].get('name')
else:
ret_= None #value.get('title')
except:
ret_ = 'error'
else:
ret_ = value
return ret_
Status data(about:blank#Status-data)
In [13]:
#not being used
def clean_status(value):
if type(value) is dict:
try:
ret_= value.get('select').get('name')
except:
ret_= 'error'
else:
ret_ = value
return ret_
Update dataframe using Lambda Functions(about:blank#Update-dataframe-using-Lambda-Functions)
In [14]:
Notion_data['_Name']=Notion_data['Name'].apply(lambda x:clean_name(x))
Notion_data['_Link']=Notion_data['Link'].apply(lambda x:clean_link(x))
Notion_data['_Author']=Notion_data['Author'].apply(lambda x:clean_author(x))
Notion_data['page_id']=Notion_data['page_url'].apply(lambda x: x[-32:])
Final dataframe with Notion Data(about:blank#Final-dataframe-with-Notion-Data)
In [15]:
Notion_data
Out[15]:
List of page IDs for all items in database(about:blank#List-of-page-IDs-for-all-items-in-database)
In [16]:
Notion_data['page_id']
Get google books API data for the Books in my Reading List
Create a list of books for google api search(about:blank#Create-a-list-of-books-for-google-api-search)
In [17]:
my_list = list(filter(None, Notion_data['_Name']))
In [18]:
my_list_no_nan = []
for i in my_list:
try:
if pd.isnull(i)!=True:
#print('is null')
my_list_no_nan+=[i]
#else:
#print('null')
#my_list_no_nan += [i]
except:
print('except')
print(len(my_list_no_nan))
print(my_list_no_nan)
Run list through goole api to get data about the books(about:blank#Run-list-through-goole-api-to-get-data-about-the-books)
In [19]:
#my_reading_list = ['Never Split The Difference', "Seeing what others don't", '48 Laws of Power']
#search_terms = 'Never Split The Difference'
info = pd.DataFrame()
for books in my_list_no_nan:
print(books)
info = info.append(google_book_search(books))
Data we received from Google Books API(about:blank#Data-we-received-from-Google-Books-API)
In [20]:
info.columns
In [21]:
print(info.shape)
google_data = info[['selfLink', 'volumeInfo.title',
'volumeInfo.subtitle', 'volumeInfo.authors', 'volumeInfo.publisher',
'volumeInfo.publishedDate', 'volumeInfo.description','volumeInfo.pageCount', 'volumeInfo.categories',
'volumeInfo.imageLinks.smallThumbnail','volumeInfo.imageLinks.thumbnail','saleInfo.country', 'saleInfo.retailPrice.amount',
'saleInfo.retailPrice.currencyCode', 'search_term'
]]
Reset index on google_data dataframe(about:blank#Reset-index-on-google_data-dataframe)
In [22]:
google_data = (google_data).reset_index()
In [32]:
google_data['volumeInfo.title']
Dataset from Notion
In [23]:
Notion_data
Dataset from Google API
In [24]:
google_data
In [25]:
look_up = pd.merge(Notion_data, google_data, left_on= ['_Name'], right_on= ['search_term'], how = 'left')
Update notion with google data(about:blank#Update-notion-with-google-data)
https://developers.notion.com/reference/page#property-value-object lists the properties and how to format them
In [26]:
#Update a property on a page based on property type
def update_page(page_id, property_name, property_type, property_value):
url = f"https://api.notion.com/v1/pages/{page_id}"
if property_type =='date':
property_payload = {
"start": property_value
}
elif property_type =='url':
property_payload = property_value
elif property_type =='number':
property_payload = property_value
elif property_type =='rich_text':
property_payload = [{
"type": "text",
"text": {
"content": property_value
}
}]
elif property_type == 'select':
property_payload = {
"name": property_value
}
payload = json.dumps({
"properties": {
property_name: {
property_type: property_payload
}
}
})
#print(payload)
headers = {
'Content-Type': 'application/json',
'Notion-Version': '2021-05-13',
'Authorization': f'Bearer {secret_key}'
}
response = requests.request("PATCH", url, headers=headers, data=payload,verify = False)
print(response.status_code)
#print(response.json)
return response.json
In [27]:
def update_properties(property_name, property_type, data_column):
page_id = look_up['page_id'][i]
property_value = look_up[data_column][i]
update_page(page_id, property_name, property_type, property_value)
Update Publishing Dates using Google data(about:blank#Update-Publishing-Dates-using-Google-data)
In [290]:
#Update publishing dates
property_name = 'Publishing/Release Date'
property_type = "date"
data_column = 'volumeInfo.publishedDate'
for i in look_up['page_id'].index:
#if math.isnan(look_up['Publishing/Release Date'][i]):
update_properties(property_name, property_type, data_column)
Update Links to Google Books Links(about:blank#Update-Links-to-Google-Books-Links)
In [288]:
#Update Links
property_name = 'Link'
property_type = "url"
data_column = 'selfLink'
for i in look_up['page_id'].index:
# if math.isnan(look_up['Link'][i]):
update_properties(property_name, property_type, data_column)
Update Publisher Names(about:blank#Update-Publisher-Names)
In [309]:
#Update Publishers
property_name = 'Publisher'
property_type = "select"
data_column = 'volumeInfo.publisher'
for i in look_up['page_id'].index:
# if math.isnan(look_up['Link'][i]):
update_properties(property_name, property_type, data_column)
Update number of pages(about:blank#Update-number-of-pages)
In [295]:
#Update Number of pages
property_name = 'Pages'
property_type = "number"
data_column = 'volumeInfo.pageCount'
for i in look_up['page_id'].index:
# if math.isnan(look_up['Link'][i]):
update_properties(property_name, property_type, data_column)
Update Summaries(about:blank#Update-Summaries)
In [307]:
#Update Number of Summary
property_name = 'Summary'
property_type = "rich_text"
data_column = 'volumeInfo.description'
for i in look_up['page_id'].index:
# if math.isnan(look_up['Link'][i]):
update_properties(property_name, property_type, data_column)
Update Icons for pages to thumbnail images from Google Boooks(about:blank#Update-Icons-for-pages-to-thumbnail-images-from-Google-Boooks)
In [30]:
#Update icons for pages
data_column = 'volumeInfo.imageLinks.smallThumbnail'
def update_page_icon(i, data_column, icon_or_cover):
page_id = look_up['page_id'][i]
property_value = look_up[data_column][i]
url = f"https://api.notion.com/v1/pages/{page_id}"
payload = json.dumps({icon_or_cover: {
"type": "external",
"external": {
"url" : property_value
}}})
print(payload)
headers = {
'Content-Type': 'application/json',
'Notion-Version': '2021-05-13',
'Authorization': f'Bearer {secret_key}'
}
response = requests.request("PATCH", url, headers=headers, data=payload,verify = False)
print(response.status_code)
print(response.json)
In [29]:
#Update icons for pages
for i in look_up['page_id'].index:
update_page_icon(i, data_column, 'icon')
In [31]:
#Update cover for pages
for i in look_up['page_id'].index:
update_page_icon(i, data_column, 'cover')
This should have your Reading list notion page updated with Icons, Covers along with Publisher names, published dates, page counts, etc. You can toggle to Gallery view and you have your very own gorgeous Notion bookshelf.