Judging a Book By Its Cover

Machine learning automates the process of creating clickable bookshelf images for effortless sharing of your current reading list.

Automated book identification (📷: James' Coffee Blog)

Those that were involved in the computing scene of the early- to mid-1990s will never forget the hype that surrounded the so-called multimedia computers of the day. As graphical and sound capabilities rapidly advanced, in conjunction with the widespread availability of optical media drives that provided a seemingly endless amount of storage capacity (~650 MB!), user interface designers started getting … creative. Experimental interfaces ditched the traditional desktop environment for less abstract representations. One might instead navigate through a home and click on a stack of papers on a desk to open a word processor, for example.

This trend proved to be short-lived as it was an incredibly inefficient way to operate a computer, not to mention a horrible waste of precious CPU cycles and memory. Fast forward about 30 years, and we see the old saying “everything old is new again” playing out, but this time with some modern updates. And those modern updates may actually make the interface useful this time around.

James, over at James' Coffee Blog, was looking for an interesting way to show others what he was reading and provide links to more information about each of the books. Rather than the traditional list of text links, James instead wanted to provide an image of his bookshelf, with each book being clickable.

Segment Anything refines the results from Grounding DINO (📷: James' Coffee Blog)

Sure, this could be done with a simple HTML image map, but manual work is so last decade. Who wants to define all of those polygons on their own? James certainly did not, so he instead used machine learning to do the work for him. Starting with an image of the bookshelf, the Grounding DINO model was utilized to locate book spines. The results were then fed into Segment Anything for refinement.

With each book located, images of the spines were then passed into GPT-4 with Vision, along with a prompt directing it to find the title and author’s name. This data was then sent to the Google Books API to perform a search, which returned a link where more information about the book could be found. The link was embedded into a JavaScript onclick handler in an SVG.

Taken together, the pieces of this approach can automatically turn virtually any image of a bookshelf into a clickable version that directs the user to more information about each book. Of course, there are caveats — like if the title is not clearly visible or the selected API is not aware of a particular book. But in any case, this project demonstrates an interesting and effortless way to show others what you have been reading lately.

nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles