Building my digital bookshelf using Notion and Google Books APIs

Building my digital bookshelf using Notion and Google Books APIs

Enriching Notion's reading list template with metadata on each book using Google Books APIs

APIs Used: Google Books API, Notion API

Date Completed: December 30, 2021

Date Started: December 27, 2021

Final Product: notion.so/d2b8521392574593afdd8735d166c943

Project Goal: Updating my Notion Reading list with Book’s metadata from Google Books API

Status: Completed

github: github.com/deepagoyal/Notion-Bookshelf

Created Date : December 30, 2021

Written By : Deepa Goyal

Notion provides a very thorough reading list template that once I started using, I got hooked to it. I have been reviewing my reading list. As the list of my finished books grew, I kept adding more to my list. I also enjoy using this template to take notes from the books that I’m reading. Having it all in one place has allowed me to revisit them more often than I used to.

https://i.ibb.co/F3PFTdw/Brand-building-blog-images.png

The view on Notion’s template gallery looks neat with images and titles. It has the classic iTunes library look that I find just perfect.

https://i.ibb.co/1vdbbJP/screencapture-notion-so-Reading-List-14918425560f4ec5b303314a29eb53e1-2021-12-30-11-51-48.png

But then I looked at my Reading List, I didn’t have any images unless I was manually uploading them. Ugh!

https://i.ibb.co/qxcwT6p/Screen-Shot-2021-12-30-at-12-01-16-PM.png

To get to the aesthetically pleasing view I would have to do much more data entry work which isn’t something I’m excited about. So I decided to write a python script to do it for me.

https://i.ibb.co/YL2369d/ezgif-2-c9f9365b25.gif

As I started looking into it, I found out that Goodreads API has been deprecated as of December 2020, which is a shame because I have seen a few good projects people have done using those. The only other free books API I could find was Google books API.

Here in project you would see the code that ultimately made my Reading list look like a Book shelf.

https://i.ibb.co/dkqj9Nk/Screen-Shot-2021-12-30-at-12-18-25-PM.png

Google’s API was fairly straight forward to use. But I was quite surprised that of all the data it provides, it doesn’t have a link to the google books page for the given book.

Getting started with the Notion API(about:blank#Getting-started-with-the-Notion-API)

  1. Finding the Database ID and Page ID - This is pretty deep in the page, but very important to get started. The database ID is 32 character alphanumeric part of your URL to the database.

https://i.ibb.co/3zjKJ3Z/Screen-Shot-2021-12-30-at-12-30-09-PM.png

  1. Sharing the page you want your App to access with the App - Notion API’s get started page has a good guide with gifs showing how to create an App, share your page with the app to get started. I somehow missed this page at first and found some posts on stack overflow where other people had too. So don’t forget to share your page with the app.
  2. Query database vs Retrive database - I struggled with Retrieve database endpoint trying to parse the data but it was not complete and parsing it was too many steps. Query database endpoint was much more useful for and intuitive.
  3. Dealing with datatypes of the properties - The data is nested too deep and to update it, you have to build the vocabulary for it.

I had tremendous fun doing this project. The only thing I wish for was if Notion provided a way to import this Jupyter notebook into a notion page so that I could do all my work here and publish it as a blog using notion. When i tried to import this page as HTML, i had to do a lot of manual edits in notion, when i imported it as markdown, code wasn't imported or formatted properly.

import pandas as pd
import requests
import json
import math

Google Books Api search based on search term(about:blank#Google-Books-Api-search-based-on-search-term)

In [2]:

#google search for the book using google books API
def google_book_search(search_terms):
    url = 'https://www.googleapis.com/books/v1/volumes?q='
    response = requests.get(url+search_terms, verify = False)
    # load data using Python JSON module
    r = response.content
    data = json.loads(r)
    # Normalizing data
    df = pd.json_normalize(data, record_path =['items'])
    first_row = df.iloc[0]
    first_row_df = df.iloc[:1]
    first_row_df['search_term']=search_terms
    return first_row_df;

Parameters(about:blank#Parameters)

In [3]:

secret_key = '<your-secret-key-here>'
database_id = '<get-database-id-from-your-database-url>'

Query database(about:blank#Query-database)

In [4]:

def query_databases(secret_key, database_id):
    url = "https://api.notion.com/v1/databases/"+database_id+'/query'
    #print(url)

    payload={'id' : database_id}
    headers = {
        'Notion-Version': '2021-05-13',
        #"Content-Type": "application/json",
        'Authorization': 'Bearer '+secret_key
    }

    response = requests.request("POST", url, headers=headers, data=payload, verify=False)
    print(f"The response code is {response.status_code}")
    if response.status_code != 200:
        raise ApiError(f'Response Status: {response.status_code}')
    else:
        return response.json()

Get data from Reading List database in Notion

In [5]:

res = query_databases(secret_key, database_id)

Results has the data we need(about:blank#Results-has-the-data-we-need)

In [6]:

#dict_keys(['object', 'results', 'next_cursor', 'has_more'])
res.get('results')

Get property data(about:blank#Get-property-data)

In [7]:

#results are in list format, this has all the items from the database, validate the number of items to make sure all items were read successfully
results = res.get('results')
df = pd.DataFrame([])
print(type(df))
print(len(list(results)))

Put data into data frame across all items in database(about:blank#Put-data-into-data-frame-across-all-items-in-database)

In [8]:

i=0
while i < len(results):
#each items in results to iterate and convert into dataframe
    page_properties = results[i].get('properties')
    page_keys = page_properties.keys()
    page_values = page_properties.values()

    #get page URL
    page_properties['page_url'] = results[i].get('url')

    #create dataframe
    df_line = pd.DataFrame(list([page_values]), columns=list(page_keys))
    df = df.append(df_line)
    i+=1
df = df.reset_index()
df.shape    
df

Cleaning data column by column

In [9]:

#make copy of our dataframe
Notion_data=df
print(f'Shape of dataframe:{Notion_data.shape}')
print(Notion_data.columns)

Link data(about:blank#Link-data)

In [10]:

def clean_link(value):
    try:
        if type(value) is dict:
            value = (value.get('url'))
    except:
        value = 'error'
    return value

Name Data(about:blank#Name-Data)

In [11]:

#df['Name'].iloc[1].get('title')[0].get('plain_text')
def clean_name(value):
    try:
        if len(value.get('title')) != 0:
            ret_=value.get('title')[0].get('plain_text')
        else:
            ret_= None #value.get('title')
    except:
        ret_ = 'error'
    return ret_

Author data(about:blank#Author-data)

In [12]:

#Notion_data['Author'].iloc[1].get('multi_select')[0].get('name')
def clean_author(value):
    if type(value) is dict:
        try:
            if len(value.get('multi_select')) != 0:
                ret_=value.get('multi_select')[0].get('name')
            else:
                ret_= None #value.get('title')
        except:
            ret_ = 'error'
    else:
        ret_ = value
    return ret_

Status data(about:blank#Status-data)

In [13]:

#not being used
def clean_status(value):
    if type(value) is dict:
        try:
            ret_= value.get('select').get('name')
        except:
            ret_= 'error'
    else:
        ret_ = value
    return ret_

Update dataframe using Lambda Functions(about:blank#Update-dataframe-using-Lambda-Functions)

In [14]:

Notion_data['_Name']=Notion_data['Name'].apply(lambda x:clean_name(x))
Notion_data['_Link']=Notion_data['Link'].apply(lambda x:clean_link(x))
Notion_data['_Author']=Notion_data['Author'].apply(lambda x:clean_author(x))
Notion_data['page_id']=Notion_data['page_url'].apply(lambda x: x[-32:])

Final dataframe with Notion Data(about:blank#Final-dataframe-with-Notion-Data)

In [15]:

Notion_data

Out[15]:

List of page IDs for all items in database(about:blank#List-of-page-IDs-for-all-items-in-database)

In [16]:

Notion_data['page_id']

Get google books API data for the Books in my Reading List

Create a list of books for google api search(about:blank#Create-a-list-of-books-for-google-api-search)

In [17]:

my_list = list(filter(None, Notion_data['_Name']))

In [18]:

my_list_no_nan = []
for i in my_list:
    try:
        if pd.isnull(i)!=True:
            #print('is null')
            my_list_no_nan+=[i]
        #else:
            #print('null')
            #my_list_no_nan += [i]
    except:
        print('except')

print(len(my_list_no_nan))
print(my_list_no_nan)

Run list through goole api to get data about the books(about:blank#Run-list-through-goole-api-to-get-data-about-the-books)

In [19]:

#my_reading_list = ['Never Split The Difference', "Seeing what others don't", '48 Laws of Power']
#search_terms = 'Never Split The Difference'

info = pd.DataFrame()
for books in my_list_no_nan:
    print(books)
    info = info.append(google_book_search(books))

Data we received from Google Books API(about:blank#Data-we-received-from-Google-Books-API)

In [20]:

info.columns

In [21]:

print(info.shape)
google_data = info[['selfLink', 'volumeInfo.title',
       'volumeInfo.subtitle', 'volumeInfo.authors', 'volumeInfo.publisher',
       'volumeInfo.publishedDate', 'volumeInfo.description','volumeInfo.pageCount', 'volumeInfo.categories',
       'volumeInfo.imageLinks.smallThumbnail','volumeInfo.imageLinks.thumbnail','saleInfo.country', 'saleInfo.retailPrice.amount',
       'saleInfo.retailPrice.currencyCode', 'search_term'
     ]]

Reset index on google_data dataframe(about:blank#Reset-index-on-google_data-dataframe)

In [22]:

google_data = (google_data).reset_index()

In [32]:

google_data['volumeInfo.title']

Dataset from Notion

In [23]:

Notion_data

Dataset from Google API

In [24]:

google_data

In [25]:

look_up = pd.merge(Notion_data, google_data, left_on=  ['_Name'], right_on= ['search_term'], how = 'left')

Update notion with google data(about:blank#Update-notion-with-google-data)

https://developers.notion.com/reference/page#property-value-object lists the properties and how to format them

In [26]:

#Update a property on a page based on property type 
def update_page(page_id, property_name, property_type, property_value):
    url = f"https://api.notion.com/v1/pages/{page_id}"

    if property_type =='date':
        property_payload = {
            "start": property_value
        }
    elif property_type =='url':
        property_payload = property_value
    elif property_type =='number':
        property_payload = property_value
    elif property_type =='rich_text':
        property_payload = [{
            "type": "text",
            "text": {
                "content": property_value
        }
        }]
    elif property_type == 'select':
        property_payload = {
            "name": property_value
        }


    payload = json.dumps({
  "properties": {
    property_name: {
      property_type: property_payload
    }
  }
})
    #print(payload)   
    headers = {
      'Content-Type': 'application/json',
      'Notion-Version': '2021-05-13',
      'Authorization': f'Bearer {secret_key}'
    }

    response = requests.request("PATCH", url, headers=headers, data=payload,verify = False)
    print(response.status_code)
    #print(response.json)
    return response.json

In [27]:

def update_properties(property_name, property_type, data_column):
    page_id = look_up['page_id'][i]
    property_value = look_up[data_column][i]
    update_page(page_id, property_name, property_type, property_value)

Update Publishing Dates using Google data(about:blank#Update-Publishing-Dates-using-Google-data)

In [290]:

#Update publishing dates
property_name = 'Publishing/Release Date'
property_type = "date"
data_column = 'volumeInfo.publishedDate'

for i in look_up['page_id'].index:
#if math.isnan(look_up['Publishing/Release Date'][i]):
    update_properties(property_name, property_type, data_column)

Update Links to Google Books Links(about:blank#Update-Links-to-Google-Books-Links)

In [288]:

#Update Links
property_name = 'Link'
property_type = "url"
data_column = 'selfLink'

for i in look_up['page_id'].index:
   # if math.isnan(look_up['Link'][i]):
    update_properties(property_name, property_type, data_column)

Update Publisher Names(about:blank#Update-Publisher-Names)

In [309]:

#Update Publishers
property_name = 'Publisher'
property_type = "select"
data_column = 'volumeInfo.publisher'

for i in look_up['page_id'].index:
   # if math.isnan(look_up['Link'][i]):
    update_properties(property_name, property_type, data_column)

Update number of pages(about:blank#Update-number-of-pages)

In [295]:

#Update Number of pages
property_name = 'Pages'
property_type = "number"
data_column = 'volumeInfo.pageCount'

for i in look_up['page_id'].index:
   # if math.isnan(look_up['Link'][i]):
    update_properties(property_name, property_type, data_column)

Update Summaries(about:blank#Update-Summaries)

In [307]:

#Update Number of Summary
property_name = 'Summary'
property_type = "rich_text"
data_column = 'volumeInfo.description'

for i in look_up['page_id'].index:
   # if math.isnan(look_up['Link'][i]):
    update_properties(property_name, property_type, data_column)

Update Icons for pages to thumbnail images from Google Boooks(about:blank#Update-Icons-for-pages-to-thumbnail-images-from-Google-Boooks)

In [30]:

#Update icons for pages
data_column = 'volumeInfo.imageLinks.smallThumbnail'

def update_page_icon(i, data_column, icon_or_cover):

    page_id = look_up['page_id'][i]
    property_value = look_up[data_column][i]

    url = f"https://api.notion.com/v1/pages/{page_id}"

    payload = json.dumps({icon_or_cover: {
        "type": "external",
            "external": {
                "url" : property_value
      }}})
    print(payload)

    headers = {
      'Content-Type': 'application/json',
      'Notion-Version': '2021-05-13',
      'Authorization': f'Bearer {secret_key}'
    }

    response = requests.request("PATCH", url, headers=headers, data=payload,verify = False)
    print(response.status_code)
    print(response.json)

In [29]:

#Update icons for pages
for i in look_up['page_id'].index:
    update_page_icon(i, data_column, 'icon')

In [31]:

#Update cover for pages
for i in look_up['page_id'].index:
    update_page_icon(i, data_column, 'cover')

This should have your Reading list notion page updated with Icons, Covers along with Publisher names, published dates, page counts, etc. You can toggle to Gallery view and you have your very own gorgeous Notion bookshelf.

https://i.ibb.co/dkqj9Nk/Screen-Shot-2021-12-30-at-12-18-25-PM.png