Exploring the data on physical vs digital titles in our Singapore Libraries
As a physical book person, I always wonder if I should borrow more digital library books given how easy it is. Since I have access to the NLB API and some time on my hand, I wanted to explore more about the physical and digital titles in our Singapore libraries as a library user. In this post, I will share this data exploration exercise and the insights I found.
Caveats
Using an API meant for an app or website to analyse book availability presents a few issues. Firstly, I can only get the book loans data at the time of my API calls, rather than the historical records of book loans. Doing my data extraction on 13th Dec, 2023, this single time slice analysis of book loans can be easily skewed by things like seasonality (holiday season affecting how library users borrow their library reads). Because of this, it is important to remember that my analysis will not be representative.
The API data structure also made the analysis rather tricky, as the same books in different formats and editions have different IDs (BRNs / ISBNs), so it was impossible to match them through some ID for comparison. In the end, I used book titles to match books across different formats, but even that required quite a bit of manual cleaning, such as cleaning up certain spelling differences and shortening the book titles to the matching easier.
Because of the manual data cleaning, I could only sample a selection of books from popular non-fiction authors for my analysis: Malcolm Gladwell, Ryan Holiday, Nassim Nicholas Taleb, Annie Duke, Nate Silver and Adam Grant. Other decisions I were to drop non-English and summary titles.
Lastly, to check the accuracy of my data extraction, processing and analysis, I randomly checked some results of my API numbers to see if they match with the search results from the NLB website and Libby app.
When possible, I will manually input the data that was obviously erroneous from the NLB API.
Despite my best efforts to improve the data accuracy, I unfortunately cannot vouch for the full accuracy of the results I get from the NLB API. Did I tell mention how bad the NLB API is?
Analysis — Sample of Books
Those interested to explore the processed data can find it here. Fig 5 and 6 shows all the books by authors that I sampled from our Singapore libraries. Be reminded that I have shortened their titles to make it easier to compare them across their formats.
Books not stocked by our libraries may not show up in my analysis, as it will be too much effort for me, but someone interested in understanding how easy it is to borrow any book from a particular author would want consider to do so. However, some interesting titles did still show up in this analysis, even though their quantities are limited.
Analysis — Total Copies
My API extractions found 1,316 audiobooks, 630 physical books and 1,630 electronic books in our Singapore libraries from these authors.
Given the ease of providing digital copies, I did expect our libraries to have more digital than physical copies, but I did not expect the difference to be more than double. In fact, some digital copies are available in really large quantities, such as Adam Grant’s “Think Again” and “Hidden Potential”, Ryan Holiday’s “The Daily Stoic” and Nassim Nicholas Taleb’s “Skin in The Game”.
I cannot imagine any library stocking up 100 physical copies of any book. Interestingly, the largest quantity of a single physical copy I found is Adam Grant’s “Hidden Potential”, which is also a lot to me, given that we have only 26 libraries all over Singapore. On the flip side, my sample didn’t have titles that had more physical than digital copies, as shown in Fig 9.
In fact, if you want to read “Give and Take”, “Fooled By Randomness” or “The Black Swan” from our Singapore libraries, you can have to read their digital copies.
Analysis — Author Specific Patterns
I also found some interesting patterns from specific authors that I want to share.
Physical copies of Nassim Taleb’s older books are generally not available in our libraries (“Skin in the Game” is his most recent book). My hypothesis is maybe his older books were so popular in the early 2000s that they were heavily borrowed and read, and now are removed from our library shelves because they are not in readable condition anymore, and that our library replace them with their digital copies. “The Bed of Procrustes” is one of his older books, but as it is just a series of philosophical ramblings (I read it before), it is quite different from his usual content and most probably was not as popular and well borrowed.
From my list of authors, Annie Duke may be a less notable author, but she wrote a series of books on decision making using her background as a poker player. While our library has a lot of her decision-based books, her poker-related books are not stocked by our Singapore library (the missing “Annie Duke” title could be a data issue that I decide to not tackle). Maybe our libraries did not want to stock her poker books and be seen as encouraging gambling activities? I could try to find other gambling (vice) titles in our libraries to disprove this hypothesis, and this could be something that I want to explore in another analysis!
On domain knowledge
I chose to analyse books of authors that I knew because I believe that analytics be done in a vacuum (without domain knowledge) will miss out a lot of nuances with respect to the data.
Part I Conclusion + Next Post
This analysis is getting too long, so I am break up my analysis into another post. My initial idea of a simple data investigation began to sprawl across many tasks. Trying to get the data into a state that I could analyse was rather painful, due to a mix of the nature of the data and the unfortunately dirty and at times missing data from the API. I tried to make the best of the data, and hope anyone reading found this analysis interesting.
So far, my analysis seem to indicate that borrowing digital copies over physical copies from our Singapore libraries will give you more access to interesting non-fiction titles. This is getting me to think if I should explore switching to reading digital copies again.
In my next post, I will go beyond total copies and dive into the demands of these books, using availability of these books from our libraries. This is because even though there are more digital copies of each book that provided by our libraries, overwhelming demand for these digital books may still make them difficult to borrow. In my next post, I will also touch on more considerations and limitations in understanding book demands using the NLB API data that I have.
A better way to find physical library books
If you finished reading my post about such a geeky topic, I assume you are a hardcore library user. And if you borrow a lot of physical books from our Singapore libraries (like me, for now at least), the web app I built may make it easier for you to know the availability of the physical library books, by their library locations.
I built this tool for myself as I found the experience finding books in our libraries frustrating, and I decided to release it to the public for anyone who wants a better experience finding physical books in our libraries too. I am also adding improvements to my web app, so if you visited my app in the past and not done so recently, do give us a try again! Feel free to provide any feedback to me as well!
To those who have not, feel free to connect with me on Linkedin, where I share more about my tech and data related stuff!