Episode 6: Clay Tablets and Metadata Overwriting

This episode starts with a story about clay tablets and builds a connection all the way up to a modern metadata issue.

The story is about how a team of researchers utilized cutting edge mathematics to trace the trade itineraries and sales transactions left behind by Assyrian merchants 4,000 years ago. Using this information, the team was able to determine the location of cities that were hidden to time, and were able to find corollaries between the size of those ancient cities and those same cities today. This is a fascinating look at how data can change the way you look at issues. You can read the paper here: https://www.nber.org/system/files/working_papers/w23992/w23992.pdf

This story is a great example of what can happen when you take a more data-centric approach to an issue. As many publishers have learned over the last decade, taking better control of your data is imperative to your success. The best way to do that is with a title performance monitoring tool.

One important issue that impacts publishers every day, often without them even knowing it, is data overwriting. Joshua talks extensively about this issue, and provides listeners with five suggestions to help them more effectively deal with their book data being overwritten.

Looking at only part of the picture can prove costly, and lacking access to the right data when you need it can leave you feeling vulnerable. Firebrand’s Eloquence on Alert platform is the most powerful title performance monitoring tool available. See how your marketing events relate to your sales rank, reviews, and more. Get real data, and real insights, into what’s happening with your books on book retail sites. Learn more and sign up for a demo at eloquenceonalert.com.


I’d like to start the episode this week by telling you a story. About 4000 years ago, the city of Kanesh located in the middle of Anatolia or modern-day Turkey, was a center for trade. During excavations of this ancient city, archeologists have found more than 23,000 cuneiform texts that were inscribed by Assyrian merchants during the Bronze Age. These tablets mostly consist of business letters, shipment documents, accounting records, seals and contracts. A typical passage from the clay tablets reads something like this.

From Durhumit until Kaneš I incurred expenses of 5 minas of refined (copper), I spent 3 minas of copper until Wahšhušhana, I acquired and spent small wares for a value of 4 shekels of silver.

Now, to most archaeologists, the mere mention of these towns and settlements is exciting in its own right, because while some of these towns like Kanesh have been found and excavated, others like Durhimit have not. To that end, archaeologists normally will take a qualitative approach to proposing potential locations for ancient cities. And in some cases, those qualitative approaches lead to multiple ideas of where a city might have been. And the qualitative approaches are helped a lot by having the names of these cities show up in different tablets.

But a team of researchers including one historian in a few economists had a different idea. What if they analyzed the quantitative data that was contained in the tablets instead? Could digging into that information in a more detailed way help them understand more about the ancient Near East.

So, in the text I just quoted, for example, you have a record of three separate cargo shipments Durhumit to Kaneš, Kaneš to Wahšhušhana and Durhumit to Wahšhušhana. So, this team of researchers translated and parsed the data from 12,000 of those clay tablets, and they extracted information from them on the merchants trade itineraries, on the cities that they visited, on their travel costs, and on the complexities of their travels, and more. Then they analyzed that data, and they used it to track those trade itineraries in comparison to one another. And this, along with some very ingenious math and an understanding of the history of ancient Anatolia, made it possible for them to postulate the location of some of the cities that were mentioned in the tablets that had never been found by archaeologists. Now, in some cases where the standard qualitative approach had led to multiple potential city locations, these new quantitative findings led to the confirmation of one of those proposals over all the others.

The researchers were also able to provide some new data on how cities grow over time, and they’re able to link all of that to the centrality of trade routes and their geographic limitations. I asked the lead researcher, Gojko Barjamovic, about the paper, and he said that he is most interested in how the study highlights a remarkable depth and continuity of commercial structures over a given space through a very long period of time. To quote the conclusion of the paper,

“Despite a gap of 4000 years, we find that ancient economic size predicts the income and population of corresponding regions in present-day Turkey. We argue that the persistence of cities’ fortunes in the very long run can best be explained by their strategic position in the network of natural trade routes.”

So, if you want to read more about the research or read the paper, I’ll actually link to the paper in the show notes.

So why am I telling you the story? What is the reason for bringing this up at the beginning of the BookSmarts Podcast? Well, first, I just think is really cool. I’m an archeology nerd, a history nerd. I studied history in college, and I really like these kinds of stories.

But second, I love this story specifically, because it provides a clear example of what can happen when you take a more data-centric approach to an issue. You know, in publishing, as in archaeology and a bunch of other disciplines, there’s a tendency to read the tea leaves and try to assess the potential success of a book, or a marketing campaign or another project of some kind, based on qualitative data instead of on quantitative data. We also tend to look at the “individual tablets” and we try to decipher its mysteries instead of looking at the bigger picture and analyzing the patterns.

So this is why I’m such a huge advocate for title performance monitoring tools. There’s a lot of quantitative data about your books out there just waiting to be collected and analyzed, and a solid title performance monitoring tool will help you filter through all of that data and find the issues and the potential opportunities that can help you lead to better sales.

Metadata Overwriting

One of the most problematic data issues that we see in the publishing industry, and one of the most important reasons to utilize a title performance monitoring tool, is data overriding. So that’s what I’d like to touch on in today’s episode. When you’re not looking, your product data can change without notice and in ways that you would never expect. Now this is because product information is not really intended to be static. It’s constantly changing. It’s being impacted by forces beyond your control, for sure, and impacted by the decisions of lots of other companies and other people. You would think that the manufacturer of a product would be the one who has control over that product’s data. But that’s just not the case. Everyone else seems to think the data is theirs: cover images get changed, descriptions are overwritten, inconsistent copy is sent out, old metadata is sent out, and more.

I’ve seen publishers lose control of their metadata for a lot of different reasons, and most of the time, it’s something that they had no knowledge would happen in the first place and they were surprised by it in the moment.

One publisher that I talked to a few years ago ran headlong into this issue. So an old distributor in the UK sent a major US retailer their product data for hundreds of that publisher’s titles. It was a mistake on the distributor’s part, but not normally something that would have been a major problem. However, the retailer saw that data come in, gave that distributor control over those titles, those hundreds of titles, and started sending orders for new copies of the books that they needed. The distributor obviously rejected those orders, but then the retailer never checked with the publisher to ensure that the vendor of record was supposed to change in the first place. It took a lot of effort on the publishers part to get those titles returned back to them as the supplier and get those orders fulfilled.

Another publisher that I talked with had a similar problem when a major retailer misapplied their hardcover prices to their paperback books, causing all of those paperback books to lose their buy buttons.

So, data overwriting like this is actually fairly common, as are a lot of other problems like book product pages just disappearing for no apparent reason. Publishers are supposed to be the source of data for their books, but it’s not uncommon for retailers to acquire data from a lot of other places. There are wholesalers out there that send out data. Data aggregators is one of the bigger problems, where major aggregators will send out data to retail sites, or have retail sites subscribe to their data feeds, and even change the data or have data that’s out of date, because publishers aren’t sending that data directly to the data aggregators themselves. There’s also other retailers, because some retailers will go and scrape the sites of other retailers to try to pull information they don’t have. There’s third party sellers, that will provide information—especially on Amazon this is a big problem, where a third party seller will set up their own page about your book and they’ll write their own description or grab an older description they copied from somewhere else or whatever, and they’ll put their own image into that listing and things like that. And so you’ll have this data that’s overwriting yours. You’ll even see authors and other sources, sometimes doing the same thing where you’ve got, you know, they’re writing things about their book, and that information somehow makes it onto the product pages.

So what can you do? What’s the solution to the problem. I have five suggestions that I’m going to recommend to you for what you can do when your metadata is doing things without you knowing it.

1. Get your data under control internally

So the first thing is to get your data under control internally. This is a fairly obvious one, I think most publishers know this already. If you don’t have a solid metadata management system or strategy internally, then making that a priority is going to be your first step. You have to have your own data in control, be able to handle your own details internally, before you can trust anything that’s going out from your shop out into the world to be correct and up to date. So if you don’t have some sort of database system or management system for your data, I highly recommend that you get that in place. And it doesn’t matter if you’re really big or really small. Having some sort of database or some sort of at the very least a spreadsheet of your data and all your data in one place is really, really important for being able to set up your company for success in the future.

2. Make your data desirable

The second suggestion is that you make your data desirable. So why does my cat ignore me and then go begg at my wife’s feet? Well, it’s because she’s the one that’s in the kitchen, cooking the chicken and the cat wants the chicken. So if we create better metadata than the data aggregators and the other people who might be sending data about our books, if the data is better that we’re sending, it’s more fleshed out and it’s more reliable, then retailers will have fewer reasons to override it. Now, that doesn’t mean they won’t, it just means that they’ll have fewer reasons to do so. Now, remember, retailers are your partners; their entire existence revolves around selling things. And if they can do that better with data that you provide, then it won’t benefit them to go elsewhere. This is where again, focusing on the quality of the data that you have, making your data better than everybody else’s, putting more information into your metadata, your product data, that’s really important.

3. Send out your data yourself

My third suggestion is that you send out the data yourself, if you’re able to. So if you’re able to, and you have the requisite account types with the different retail partners, then it is best to be in control of your own destiny. If you can manage that data sending feed yourself, then by all means do that. However, I know a lot of publishers, probably the majority publishers nowadays will work with a distributor and that’s fine. Working with a distributor is not a problem. The question is always about you understanding what their delivery mechanisms and timelines are. A lot of publishers just kind of throw it into the black box and don’t think about it very much. But it’s helpful to understand what the process is for your data. How often does your distributor send out your data? Is that different for different retailers? Or for different formats of your book? Is there a deadline for you to get in data changes to them before the delivery time happens? So if they’re sending out data every day, what time does it go out? If they’re sending out once a week? What day does it go out and what time so you can know better how often you need to give them your data and when you need to give them your data so that you can ensure that that data is going to be ingested into their system and properly sent out. This is even more important, if you have an emergency feed—you need to know how quickly can that data get out to a retailer? So, the price was set at 99 cents for a hardcover book, how can I get that changed very quickly across all the different sites to get the wrong data? It’s very helpful to have that in place if you can.

4. Send a data update monthly

So my fourth suggestion is that you send an update of your data monthly. If you’re having issues with data overwriting, you just can’t seem to get past it, you might try sending out a full data feed once every month or so to partners who will accept those types of feeds. This is something we do at Firebrand all the time; we found that it can really nip these issues in the bud in quite a few of these situations. Again, if your distributor is in charge of your data, and then talk to them about setting up something like that. There are limitations as to which partners will take these kinds of monthly feeds. But a full data feed, a full refresh of all your data, can go in and overwrite anything that was overwriting your stuff before and can sometimes be the fix that you need.

5. Watch your data actively

And then my fifth suggestion—you probably understand where this one comes from—is to watch your data actively. The problem for most publishers, I think, is that their data is more like Schrodinger’s cat: Is it changing or is it not changing? And until you observe it and look at it in the real world, it’s kind of both, you really can’t know for sure. So, often, in a lot of publishing houses, this job falls to an intern or someone that’s fairly low on the totem pole in marketing. However, it’s impossible for that one person to manually check all of your titles every single day looking for issues, looking for opportunities, looking for things that need to be fixed or for data that’s being overwritten. So just assume that it takes someone a minimum of 10 minutes to check each title on all the sites and make sure that everything looks good, they’re going to be lucky if they’re able to keep a close eye on 40 titles, nothing close to the hundreds that I’m assuming you probably have in your backlist alone. So this is where a title performance monitoring tool is going to be really helpful for you to help you watch your titles and be more efficient in that process and bring more of that data to the forefront. I always say it’s better to use your head than to break your back. And I think this is a prime example of that in real life. Take advantage of tools that will help you watch your data.

A Recap

So I’ve been thinking about and talking about publishing data for a long time. And I’ve become a really big advocate over the years for a more data-informed approach to the business of publishing. And the BookSmarts Podcast is just one of my attempts to spread some of those thoughts more broadly, and to have an opportunity to talk with some very smart people about related topics. And over the last four episodes of this podcast, we’ve talked with some extremely smart people. I hope that those conversations have been informative to all of you, and have given you some food for thought regarding your own publishing business. Just in case you’ve missed some of those episodes, I’m gonna give you a brief recap of some of the things we’ve talked about over the course of the last four episodes here on the BookSmarts Podcast.

So in Episode Two Brian O’Leary and I talked about how us publishers lead the world in intellectual property creation (so, we’re creating a lot of content) and how foreign rights sales are becoming more and more important for publishers to engage in. This change also means that publishers who have their available rights clearly defined and searchable, preferably in some sort of database, and not just written down on old paper contracts in a filing cabinet, will have a much easier time selling those rights when someone comes looking, or marketing those rights to potential foreign publishers. So, if you’re interested in that topic, I highly recommend you listen to Episode Two.

Then in Episode Three Guy LeCharles Gonzalez and I talked about how creating a community and building direct to consumer sales can be one of the most important and impactful changes that a publisher can make, taking you from being a data-driven publisher to being data-informed. I think Guy has some really interesting thoughts about the issue of being data informed, so we dove into that quite a bit. And the direct-to-consumer sales, I think is extremely important for publishers, especially now, coming out of the pandemic and seeing just how important it is for a publisher to have more control over their own destiny, and direct to consumer sales is a great way to do that. So Episode Three is great—that conversation with Guy.

Then in Episode Four, I talked with Rachel Noorda. And Kathi Inman Berens about how so much of what we know about book discovery doesn’t really work the way you might think. Consumers are discovering content that they want to consume, whether it’s books or audiobooks or other content in a lot of different ways. And they’re often consuming that content, while they’re multitasking, not just, “I’m driving to work,” or “I’m, you know, on my treadmill” or something like that. But, you know, multitasking, like “I’m cooking in the kitchen,” or “I’m, you know, walking somewhere and I’m reading a book, an actual physical book. So, lots of different things that can happen in the way multitasking works. So the interesting data that these two ladies published in the Panorama Project Report, I think should be required reading for everyone on your marketing team. It’ll totally revolutionize how you think about marketing, and especially book discovery and these very important things. So that’s Episode Four.

And then in our last episode, Episode Five, I talked with Ian Lamont about Amazon and Facebook advertising, and how important it is not to put all of your eggs in just one basket. He stressed the importance of watching your advertising programs carefully, tracking the data from those programs to see what’s working and what’s not, and being willing to try out new things to see if they can help. In case you didn’t see it, actually, Ian tweeted just a few days after the podcast went live a couple weeks ago, and said that he decided to try out the video advertising on Amazon. And he was impressed to see that he made $125 in sales on a $2 ad spend. Now that kind of ROI may not be normal, but trying out new things can sometimes lead to big impact. And I think Ian is one of the smartest people when it comes to this topic because he knows how Amazon Facebook advertising works is he does it every day. He digs into those systems. So, I highly recommend Episode Five.

If you missed any of those episodes, I would love for you to go back and listen to them. I’m trying to keep these about 20 minutes long, and keep them as informative and as actionable as I can for you.

So if you have other topic ideas or suggestions for the podcast, please let me know. I would love to hear ideas that you have, topics that you want me to dig into, people that you think I should be interviewing. I would love to hear that. So you can email me at joshua@firebrandtech.com if you have suggestions like that.

Also, wherever you listen to this podcast, I would appreciate it if you would give it a review. It helps make it more visible to more people and tell your colleagues and friends about it.

I hope that this is an interesting podcast for you to be listening to and we appreciate you coming around. So we’ll be back again in a couple weeks with another episode and until then, thanks for joining and for getting smarter about your books.