Nicholas

“Nobody wanted to do this work”: How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Nicholas

Tim McAleer is a producer at Ken Burns’s Florentine Films who is responsible for the technology and processes that power their documentary production. Rather than using AI to generate creative content, Tim has built custom AI-powered tools that automate the most tedious parts of documentary filmmaking: organizing and extracting metadata from tens of thousands of archival images, videos, and audio files. In this episode, Tim demonstrates how he’s transformed post-production workflows using AI to make vast archives of historical material actually usable and searchable. What you’ll learn: - How Tim built an AI system that automatically extracts and embeds metadata into archival images and footage - The custom iOS app he created that transforms chaotic archival research into structured, searchable data - How AI-powered OCR is making previously illegible historical documents accessible - Why Tim uses different AI models for different tasks (Claude for coding, OpenAI for images, Whisper for audio) - How vector embeddings enable semantic search across massive documentary archives - A practical approach to building custom AI tools that solve specific workflow problems - Why AI is most valuable for automating tedious tasks rather than replacing creative work — Brought to you by: Brex—The intelligent finance platform built for founders — Where to find Tim McAleer: Website: https://timmcaleer.com/ LinkedIn: https://www.linkedin.com/in/timmcaleer/Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevoIn this episode, we cover: (00:00) Introduction to Tim McAleer (02:23) The scale of media management in documentary filmmaking (04:16) Building a database system for archival assets (06:02) Early experiments with AI image description (08:59) Adding metadata extraction to improve accuracy (12:54) Scaling from single scripts to a complete REST API (15:16) Processing video with frame sampling and audio transcription (19:10) Implementing vector embeddings for semantic search (21:22) How AI frees up researchers to focus on content discovery (24:21) Demo of “Flip Flop” iOS app for field research (29:33) How structured file naming improves workflow efficiency (32:20) “OCR Party” app for processing historical documents (34:56) The versatility of different app form factors for specific workflows (40:34) Learning approach and parallels with creative software (42:00) Perspectives on AI in the film industry (44:05) Prompting techniques and troubleshooting AI workflows — Tools referenced: • Claude: https://claude.ai/ • ChatGPT: https://chat.openai.com/ • OpenAI Vision API: https://platform.openai.com/docs/guides/vision • Whisper: https://github.com/openai/whisper • Cursor: https://cursor.sh/ • Superwhisper: https://superwhisper.com/ • CLIP: https://github.com/openai/CLIP • Gemini: https://deepmind.google/technologies/gemini/Other references: • Florentine Films: https://www.florentinefilms.com/ • Ken Burns: https://www.pbs.org/kenburns/ • Muhammad Ali documentary: https://www.pbs.org/kenburns/muhammad-ali/The American Revolution series: https://www.pbs.org/kenburns/the-american-revolution/ • Archival Producers Alliance: https://www.archivalproducersalliance.com/genai-guidelines • Exif metadata standard: https://en.wikipedia.org/wiki/Exif • Library of Congress: https://www.loc.gov/ — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [redacted email].

Published
Published Nov 17, 2025
Uploaded
Uploaded Jun 12, 2026
File type
Podcast
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:31

[00:00] How did you think about what problems there were to solve in AI relative to your job and the people that you work with? And why did you start where you started? Post-production is like a technical mess of media management. You have many different file types. You have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. [00:20] So it ends up being hundreds of hours of footage, tens of thousands of photos. The data management piece when you're dealing with all that different stuff is the mess that I have used AI to tackle. My goal was to automate this. [00:33] For years, this has been manual data entry. Automate away toil. That's what we want to do. No one was going to make me this app. And so the ability to make an extremely specific app that makes a workflow on my team and my company easier. It's been an unbelievable moment. [00:52] Welcome back to How I AI. I'm Clara Vo, product leader and AI obsessive here on a mission to help you build better with these new tools. [01:03] Quarantine Films, who's responsible for the technology and processes that bring these amazing films to life. Instead of focusing on how AI can create [01:12] creative for these films. We're actually going to talk about how Tim uses AI to build software products that make his post-production and research team's lives [01:22] a lot better. [01:23] If you're working with images, video, sound, or just a lot of data, [01:29] This episode is a great one for you.

1:31-3:01

[01:31] Let's get to it. [01:32] This episode is brought to you by Brex. If you're listening to the show, you already know AI is changing how we work in real practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issues, expenses are filed, and fraud is stopped in real time without you having to think about it. [02:02] Add Brex's banking solution with a high yield treasury account. And you've got a system that helps you spend smarter, move faster and scale with confidence. One in three startups in the U.S. already runs on Brex. You can too at brex.com slash how I AI. [02:23] Tim, welcome to How I AI. I'm excited to have you here. [02:26] Thank you for having me. What I love about what we're going to talk about today is you work in a very interesting and creative industry, putting out amazing content. And we're going to talk a little bit about how AI is impacting the creation side of things. But you've actually used AI to smooth out some of the challenges you've had on the production and post-production side of things. So I'm curious, how did you think about what problems there were to solve in the world? [02:55] in AI relative to your job and the people that you work with? And why did you start where you started?

3:01-4:31

[03:01] Yeah, I think most of the flashiest use cases of AI in creation or media and entertainment right now are often in like, [03:10] generating full video content or images or whatever it is, [03:14] But post-production specifically is like a technical mess of media management. [03:20] Especially in nonfiction, you have like many different file types, right? And you have images, you have archival footage that you're gathering, you [03:27] live footage that you may have filmed out in the field, interviews, transcripts. And so like, [03:32] the data management piece when you're dealing with all that different stuff is the mess that i have used ai to tackle and i think that the sort of like ai as a tool versus ai for generation [03:45] is even more immediately applicable in our field at the moment. [03:49] Well, and I have a very, you know, very simple, humble little podcast, but even for us, [03:54] we create a lot of research and longer content and we're editing it down. I'm just curious, [04:00] with documentaries and nonfiction work [04:04] What do you think the ratio is of media captured, researched and archived to actually publish? Because that will maybe give us a sense of how much of this you have to grapple with to get a good piece of content on the end. [04:16] We have a thing in our industry called a shooting ratio. And so you can imagine in like a fiction series or, you know, like a sitcom on air, [04:23] I don't quite know what those shooting ratios would be, but you're working with a script, and so you're going to have a slightly lower ratio. [04:29] In documentary, it can get quite high, like,

4:31-6:07

[04:31] I can tell you that we made a series about [04:34] Muhammad Ali a few years ago is an eight hour show. [04:37] We gathered 20,000 still images in the database of just stills. [04:42] I think it was over 100 hours of footage because he had a lot of fights and that kind of thing. [04:47] News footage. And then we also filmed, I want to say like 35 interviews for the piece. [04:53] So it ends up being like hundreds of hours of footage [04:56] tens of thousands of photos. And that's just like, that's one example of, you know, a particularly famous individual, but that tends to be what it looks like for our shows. [05:04] So that's what you have to manage, make searchable, make usable by the entire production team. And you got inspired by ChatGPT and some of these early AI tools to do some of that. So you want to hop in and show us what, you know, the first use case is? [05:21] Absolutely. So I'm going to start by kind of just showing you the like end result before I go right to like how I got here. So on any film that we work on, we end up having some kind of database, right? So this is a database where you can see the still images we've gathered. [05:37] You can see there's a footage section, a music section, anything that might go into the film. [05:41] And all the kind of stuff you might expect to see, right? Descriptions, tags, a date on the thing where we got it from. [05:48] Some more technical detail is also going to appear over here. [05:51] In any event, [05:53] My goal was to automate this. For years, this has been manual data entry. And so I remember vividly, [06:00] I'm going to jump into cursor now. But I do remember like when I first started doing this, it was ChatGPT. I remember ChatGPT,

6:07-7:39

[06:07] added image upload. And it was this insane day for us. I was like in the office with my colleague, Clark, and we were just like throwing images at it. [06:15] and seeing kind of the quality of the output, like it was this... [06:19] an aha moment where it was like, oh my God, this thing can see. And how could we harness this text generation, right, to use it for our database entry? So I'm going to-- [06:30] simulate that like the starting point [06:33] And then we'll jump to where we're at today. [06:36] But... [06:37] Essentially, what it looked like at the beginning was we would throw something into GPT and we would say like, hey, can you describe this? [06:42] and it would hallucinate a little bit. [06:45] But it was so tempting to figure out a way to harness that, that [06:49] I started essentially like writing little Python scripts with chat GPT. And at that time it was like VS code on one monitor and GPT on another and [06:57] And I'm going to, all right, I'm just going to go ahead and demo what that kind of looked like. [07:02] I'm going to speak my prompts if that's okay. I use this tool called Super Whisperer. [07:07] because it kind of cleans up my off-the-cuff dictation. So I have an image here of a nice... [07:15] "Street in Somewhere America," maybe mid-20th century, [07:18] We're going to see what kind of description we get from AI. [07:21] Thank you. [07:21] All right. Write me a script that submits the JPEG at the root of this workspace. [07:28] to open AI for description. I want just a general visual description of what we can see in the image [07:35] Any API credentials you need are in a text file at the root

7:39-9:23

[07:39] of the folder. [07:41] And... [07:42] What we can see here is that everything I just said got funneled through this app called Super Whisper. So it got funneled through a prompt that itself... [07:50] is cleaning up my like messy vibe coating. [07:54] I think it's clean enough, so we're going to go ahead and submit it. [07:56] And I see you're using Claude 4-5 Sonnet. Is that by choice or by default? Yeah. [08:02] That is because I'm on a podcast right now, to be honest. Like, I think this is a very easy task for AI. [08:08] I could keep it on auto for this, right? [08:10] I will say I switch between various Claude models depending upon the difficulty. And I do try and be cheap and stay on auto if I know that I'm asking for easy stuff. Okay, so you're giving us a little bit of quality control here. Yeah, I don't want it to mess up. We're live on air. [08:28] All right, so... [08:30] It's telling me that I need to install some requirements. My guess is I have those requirements. It's got a submit image. [08:36] Script, let's see what it did. [08:39] Here we go. It's running. Submitting this image to OpenAI for analysis. [08:44] What kind of... [08:46] What kind of description will we get? [08:48] There we go. This image depicts a small... [08:51] Rural Main Street from what appears to be the mid-20th century, we had guessed that, [08:55] There are a series of wooden storefronts, each with signs indicating there are local businesses. Okay, so this is great. And this is kind of what we were getting in those early days of GPT image upload. But the problem here is like you're making a film, you want to know what rural Main Street, what town are we in, what is the exact year. And you can't really just go with this kind of generic description. So a lot of times we happen to know that images come with embedded metadata.

9:23-10:55

[09:23] And you know, if you're using your iPhone camera today, you know that [09:26] Maybe there's some metadata like GPS data, that kind of stuff. [09:29] but archival images will often come with whatever notes people have scribbled onto them over time. [09:34] And so [09:36] I'm going to now, I'm going to iterate on this one time and say... [09:40] I want you to add a step to this script. [09:42] I want to scrape any available metadata from the file first. [09:46] and append that to the prompt. [09:48] The goal here is that [09:50] we are using any available metadata as like a source of truth for what this image actually is and not just guessing. [09:57] - And so just repeating that while this is running, what you're saying is, for this particular use case, you're working with a set of archival photos [10:06] from sources that have embedded [10:09] probably additional layers of metadata into it that you can read that give more information, which is different than [10:15] you know, scanning something or taking something off your phone, which I think we're going to look at a bit later. So you're trying to harness the structured metadata off this file. [10:26] which if you go back to the tab that shows the image, we can't see with our human eyes. But our agent friends can read with its robot brain. And you're using that information to then... [10:41] upgrade this script that is going to do all this AI analysis for you. [10:46] That's exactly right. And so in this case, it's going to be embedded metadata. I... [10:51] I happen to know this is an image from Library of Congress. There's going to be some metadata on it.

10:55-12:23

[10:55] But it could also be something on the web. Like where this eventually goes to is like, okay, I know that there's a website with information. [11:01] may not be in the file, but hey, how about you go and scrape the web [11:05] Gather anything you can know about this because [11:07] Ultimately, this is a journalistic endeavor. These shows get fact-checked. We want everything going into our database to be, you know, [11:15] true and verifiable information. [11:17] All right, so let's see how it did when it added that metadata. [11:21] check. So we can see in the console it did a little bit of a scrape [11:26] It looks messy as hell, but somewhere in here we can see stuff like, yeah... [11:30] archival information [11:35] And it's now going to use that... [11:37] And what we've generally found is that when you add those guardrails, when you give it [11:41] information you know to be true about the image, it relies on that so much more than just what it can see. [11:46] Like, [11:47] you know, AI really wants to perform for us. It really wants to do a good job. And so when you give it, [11:52] the tools and information to kind of write a better description, it's going to [11:56] It's going to be able to get there. [11:57] And I want to call out something. So we talked about using the anthropic Claude models in particular for the actual coding of the script, but you're relying on the open AI models for the image analysis. Why open AI versus any other models that like stick with the one that you love, or it was the first one that did a good job for you, or do you feel like it's particularly good at image analysis? I'm curious why you select those different models for different

12:27-14:01

[12:27] they had a vision preview on their API. They did it before Claude. And like, [12:31] I had built up enough of an infrastructure using that API call that it was like the switching costs were too much, you know? Yep. [12:39] Alright, so let's see what we got this time. [12:41] It's much more detailed. It is. It's much more detailed. So the image shows a street scene on the main street of Cascade, Idaho. There we go. We know where it is now. [12:50] Captured in 1941 by photographer Russell Lee. We've got photo credits. [12:54] All right. So... [12:56] This is a great example of like you add the guardrails and you're going to get more detail, but you're also just going to get facts right before. I don't know if it's still up here somewhere yet before it was a small rural Main Street. [13:07] Now it is the main street of Cascade, Idaho. [13:10] And so we can imagine this getting duplicated [13:13] in various ways, right? This image has embedded metadata. Maybe it's a website that we're going and gathering it from. But effectively, this is where it all started. It started with a single Python script that I was running on my computer, and I was like, this is awesome. [13:26] My database software is like, it's advanced enough. [13:29] to call external scripts. You can kind of use any database to do this, you know, Airtable, whatever. [13:35] But you just need something that has an API, and that can call an external script or webhook or something. [13:40] So this is where we started. [13:42] And now I'm going to switch my screen share to a remote machine machine. [13:47] like a little Mac mini that I have running in my office. [13:50] And what this [13:51] you know, it's hard to, at this moment, it's a more complex cursor workspace you can see. Maybe I'll [13:57] bop into the rules basically what this is is a rest api

14:01-15:31

[14:01] so that every image file, video file, music file, anything that ends up in that database that we looked at at the beginning, [14:09] pings off of this REST API for all kinds of different [14:13] like metadata tasks. [14:15] If I, if I, [14:17] pop into the jobs folder here for a second, you can... [14:19] We could zero in on like basically what we were just doing, but the current iteration of it. So I call it auto log, 'cause the process of writing this in for years, [14:30] The manual data entry is called logging, so it's not the cleverest name, but it fits. [14:35] And you got a five step process here. Basically, first, we're going to gather the info. [14:40] meaning like file specs, you know, how big the image is, is it a JPEG, is it a TIFF? We're going to copy the file to our server, we're going to name it our ID number. [14:48] We're going to parse it for metadata. Is there any metadata? If there is, [14:52] Great, but either way, we're going to look for more information on the web in this step four here, scrape URL. [14:57] And then once we know everything we could possibly know about that image, [15:01] we're going to generate a description for it. And when you imagine how this might work for video, well, like video is itself, it's just... [15:08] 24 images in a second, plus some audio. And so basically this just gets scaled up to deal with video files too. [15:16] Are you using the same model for video files? Are you taking them extracting the stills and putting them through open AI or using a different model? [15:24] I use a different model for, so I have to the, the video files requires like two levels. [15:30] Most video...

15:32-17:06

[15:32] like AI models out there seem to do basically some version of frame sampling. So it could be extremely expensive if you were sending all 24 images every second. [15:42] to an API, right? So I pull at five-second intervals because I'm cheap. [15:47] Some others maybe pull in a smarter way, maybe at like lighting changes or something like that. Like there's different ways of thinking about the frame sampling. [15:55] So for the frame captions themselves, I will use a cheap model. I'll use like a nano, GPT-5 nano. But then for the, and I can go in and show you a prompt here, which maybe illustrates this. [16:06] I have frame prompts [16:09] which basically ask for just like a prompt of an individual still image extracted from video. [16:15] But then I have a... [16:17] larger parent prompt [16:19] You can see that my prompts have gotten slightly more sophisticated over time. Basically, what this does is it sends... [16:26] Every single frame that we've extracted from a video file, it extends... [16:31] Anything like any of the audio we've transcribed from that video file. [16:35] It packages it up into this elaborate prompt [16:38] And it sends it to a reasoning model. [16:40] And the purpose of that is to say, like, these are all the video events that we have observed in this video. [16:46] Here is like a massive text file of data. Tell me what you think is happening in the video. [16:52] Got it. [16:53] Yeah, maybe tip from one of our other How I AI guests, but I found that the Gemini... [17:00] The Gemini models are quite good with video. It's actually what we use to do our...

17:06-18:36

[17:06] podcast raw recording to [17:09] both highlight stills and a blog post that I put out. I process them through the Gemini models and have had a lot of success. [17:16] And it just pulls out like the stills that might be... It automatically pulls interesting stills. It actually gives me interesting stills plus five seconds or like plus five seconds plus minus five or minus five seconds because sometimes the guest and I are looking... [17:31] ridiculous. Yeah, yeah, yeah, of course. So tip to anybody out there with video who hasn't tried the Gemini models, I find those particularly good for this use case. You might have just, you know, added something to our little roadmap here. There you go. [17:44] - Well, [17:46] And so, and then I'm curious about the audio side of things. So I kind of, you know, I've, I play with the Gemini models for video, uh, [17:52] This still makes tons of sense to me. Tell us a little bit about the audio side of things. So the audio is also, now I feel like I'm an open AI shill. Everything I'm using is open AI. And I think except for the coding, which is interesting, [18:06] But I think it's just habit. [18:08] I use Whisper for audio. So like Whisper is an incredible open source model for speech to text detection. [18:14] Even the like medium sized model does a pretty good job. And what I do is, and I can pop back into the database software maybe to like illustrate this. [18:23] What I do is I extract, you can see like frames pulled every five seconds. And there's a caption associated with each frame. And then there's, this is a shot of an alligator in a swamp. So he doesn't have any audio. He wasn't talking.

18:36-20:06

[18:36] But I basically pull audio at five second increments so that when we send those like video events up to the reasoning model, [18:44] We are... [18:45] Sending a full transcript, but we're sending it like kind of like pegged to the moment in the video that it happened. [18:50] If that makes sense. [18:52] So the transcription is all happening on my back end over here. [18:56] Everything, like, I think I could probably open up the console and see, like, [19:00] There we go. Like someone just sent a job through not that long ago. [19:04] Like I can kind of come in here and see what my colleagues are doing as they ping my API all day long. [19:10] Great. And so your [19:13] pairing [19:14] a snapshot image every five seconds from a video [19:17] The five-second transcript of the audio speech-to-text via whisperer [19:23] Metadata, if you have it, parsing that all together and then getting a very robust, [19:29] description and analysis of the content that you have available in back in this tool that you're using to archive log manage all all your assets. [19:38] Yeah. And like I said, that tool could be kind of agnostic. Like you could do it in a Google sheet if that's, you know, if that's what you like. But I like this. We've been using it for a while. Everything we just talked about is how we kind of get to like metadata that we can read, right? Like generative metadata. [19:54] That is, A, we know it's accurate because it's kind of been put on these guardrails by our metadata extraction steps. And then also it provides this like nice visual for us. We can see what this thing is at a glance.

20:07-21:48

[20:07] But the next step of this, now that you have this API running in the background, is you can generate [20:12] something that maybe I can't read, but the AI can read pretty well, which is vector embeddings. [20:17] So I'll jump back to stills for this because I think it's a maybe an easier illustration of it. [20:23] Every asset in our database gets put through two modes of embedding. So we'll send the thumbnail through and run it against an open source model. I use clip for this. [20:35] And I'll generate an embedding off of that. And then we'll send the description through... [20:40] I use, again, an OpenAI text model for this and get an embedding for that. And then we'll fuse them. [20:47] And the purposes of that is that so now we have like the ability to discover things semantically, like prior to this. And I think in a lot of film production today, you're working with exact text search, you know, like if that description says dog, but you know, somebody wrote in puppy, you're not finding that image. And so [21:05] This has been kind of the most exciting part of it, not necessarily where I knew it was going when it started, [21:11] Like I was just excited to generate a description, right? But now the ability to discover semantically is, I think, you know, the most robust part of the system. [21:21] So what I love about this, I mean, a couple of things is one, you've really pushed [21:28] Every step of the way, you know, you could have stopped at like, we got good descriptions or we got like destruction metadata out. And now I have a script that runs it. [21:35] You could have stopped at images only, but you took it to video and video and audio. You could have stopped at structured data only, but you went to embeddings to get semantic search. So I love just the breadth of applicability of the AI in this process.

21:48-23:23

[21:48] But what I probably love more is I doubt this was anybody's favorite part of their job. Like, I doubt it was anybody's favorite part of their job to be like, I'm going to go read some Library of Congress. It used to be my job. So I can tell you firsthand, not my favorite part. [22:05] And it's also like, I think the best argument I have for all the work I've done creating this system is that like the same people who used to write this data... [22:14] were the ones who are responsible for doing the research. So you've now freed them up to just look more, right? Yeah. [22:19] Like maybe now we could gather 25,000 still images for the Muhammad Ali project because you have that much more time. You're not just like copy and pasting stuff off a website to put it in this form, you know? [22:29] Well, and you probably get to select from this big archive of data [22:35] better assets to use in your content because they're more discoverable because you have more confidence in the source and the content of that data so I bet it up levels at the end of the day the quality at at the end because you have just much more data to work off of. [22:51] 100%. I mean, a real quick example of that too is like, I'm going to use Abe Lincoln here, which is maybe not the best use of this image. But [22:59] Embeddings enable us to find things in ways we never would have thought to find them before. So like I have a button down here. [23:05] or when I click it, what it basically is going to do is reverse image search within our own collection. [23:10] So if I'm an editor and I like an image, and this is going to take a while because I'm not on site, but if I like an image, I can click the find similar button and it's just going to go and find every image that kind of has that vibe.

23:23-24:54

[23:23] You can see here we have a duplicate of this one. But then there you go. It recognized the man and it started pulling in. [23:29] other portraits. [23:31] This episode is brought to you by Brex. If you're listening to the show, you already know AI is changing how we work in real practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issues, expenses are filed, [24:01] Add Brex's banking solution with a high yield treasury account. And you've got a system that helps you spend smarter, move faster and scale with confidence. One in three startups in the U.S. already runs on Brex. You can too at brex.com slash how I AI. [24:21] I love this. Okay, so this is more of your archival and footage data, but you capture a lot of stuff in... [24:30] the field where people are not sitting in front of cursor or their desktop, I'm looking through these assets. And I know that you use some vibe coding and a creative approach to get more information about those assets. Could you walk us through that? [24:44] Yeah, so... [24:46] The next use case is an app that I developed for archival research in the field. So I think that we really pride ourselves on like...

24:54-26:29

[24:54] turning over every rock on not just relying on what's digitized and available online, and going and visiting physical archives. [25:02] The process of visiting a physical archive [25:05] is basically... [25:07] You have a bunch of folders that you pull ahead of time. You arrive there and your goal is just to snap like low-res resolution iPhone snaps of everything you can possibly get. And so you're snapping the front of the image and you're snapping the back of the image. Because the back is typically where there's going to be like... [25:23] a scrolled description or maybe like an accession number, an ID number that the archive has added themselves. And so this process used to look like you show up at the archive, you take iPhone snaps for two days, you get back to the office, you have the messiest camera roll you've ever had. [25:39] You cannot actually pair your fronts to your backs because it just got out. Somehow it got out of order along the way. And so the goal was basically to make that process like a little better. So I... [25:51] I vibe coded this iOS app to deal with this problem. And I, I, I tend to just like speak in screens like the way I, maybe it's because I'm a visual person. Like the way I deal with it is I just think like, [26:03] Okay, I see a screen that does this and a screen that does this. I imagine a button that does this. [26:08] And the purpose of this was basically like, I want people to be able to create collections [26:12] for each folder they're capturing. I want them to be able to snap a front and a back [26:17] like the flip side of the image so that they can [26:21] Easily associate those, so the file names associate them. And I want to immediately transcribe any information on the back and embed it into the original image.

26:29-27:59

[26:29] So now I have this app called Flip Flop. [26:32] I ask ChatGPT at the end of my dog walk to generate some kind of specs doc or requirement doc. [26:38] It pretty much does it in one go. If you chat with it for 30 minutes, you know, you can get a lot done. [26:43] And then I fed this [26:45] PRD to Claude Code. [26:47] And it this one, it like it, it didn't build it in one shot, but it certainly built the UI in one shot. And so I guess maybe we should just jump into like the actual app. Yeah, let's do it. [26:58] So flip-flop, [27:00] which is my cute little name for it, is... [27:03] basically designed to capture those fronts and backs that I was talking about. [27:07] So you have three screens here. You've got a collection screen where you're gonna create your folders. You've got a capture screen where you're gonna take your images. And I'll just quickly highlight this part [27:16] which is where you kind of have your AI processing options. So... [27:21] I allow people to define a separate prompt for what I call the flip side of damage, the front. [27:26] and the flop side of the image, the back. And so in this example, I'm gonna show you some photos of my dog now. [27:31] And, uh... [27:32] The flop side of the image is going to have some text on it, so [27:35] Our prompts here are really just designed to get a decent caption from the image and to transcribe any text that we see on the back end. [27:43] So let's create a new collection. We're going to call it how [27:46] Bye. [27:47] A, I, that's good enough. [27:50] There's also an option here to add more context. You know, the AI loves context. And so maybe if you're [27:55] You know, you can imagine if you're digitizing an entire collection of

27:59-29:32

[27:59] you know, someone's personal letters or someone's portrait photographs, you would add that kind of thing here. But for now, we're just going to create a collection. [28:09] tap into that collection. [28:11] And capture. So here we go. [28:14] It's a screen share within a screen share. [28:17] We're gonna not care about the glare too much. I'm gonna capture the front side of this image of my dog Tony's third birthday. [28:24] I now have the option to add notes if that's what I want to do, or I could just add a flopside to the image. [28:31] right here. [28:33] And... [28:34] When I complete that, [28:36] It will have, because it's lightning fast already. [28:40] Sent it up to OpenAI for a description. [28:43] and embedded it, and this is the really crucial thing, 'cause you just saw the first system I had, embedded it in the image metadata itself. [28:49] So the flop details have the transcription, Tony's third birthday. [28:53] And all of that will show up in the what we call EXIF metadata, which is just the image metadata standard. [28:59] Got it. And just for people that may be passed by, instead of simply generating kind of the text description and storing that in a database relative to the original image you took, you actually now have this structured metadata on the image file itself, which again, like what a pain. [29:17] - Oh, a giant pane. - To do. - A pane to do manually. And so now anytime anybody uses one of these images, even if they don't have, [29:26] access to this app even now that that image is embedded with that metadata.

29:32-31:03

[29:32] 100%. So you could pull this onto any computer or any app, anything that can read underlying [29:38] and it's going to be able to see that this was Tony's third birthday. And so that's structured metadata in the sense that [29:44] We've now structured the actual information about the image. But the other thing that's really crucial, honestly, [29:49] is that we've structured [29:51] the files themselves, right? So you can see they're getting named in a particular way. And so we've moved from like camera roll mess to, [29:58] to like files that are going to sort [30:00] in your computer that you're going to be able to import cleanly. You're going to be able to distinguish easily what's the front of the image, what's the back of the image. [30:07] And that has, I think, been the other unlock. Like I had two colleagues out in the field [30:13] couple weeks ago, and they came back with 1,400 images. And I don't think that's only because... [30:19] They were able to use Flip Flop to capture it, but I think Flip Flop is certainly making the process easier since they've gotten back. [30:25] The thing that I want to call out for folks, maybe a general takeaway here is... [30:31] These AI models are so good. [30:34] with files and code can do a lot of stuff with files. And a lot of the people we talk to, you know, markdown is the file type du jour these days, which is, you know, just like a specially formatted text document. But if you start to look at other file types and really understand what can be [30:54] put in a particular file type, you can actually discover some pretty interesting things you can do with a combination of

31:03-32:48

[31:03] AI and coding to make those files much more useful for your use case. So this is one of these takeaways where I'm like, I haven't thought about [31:11] like what can be embedded in an image file or what can be embedded in a video file. [31:17] And [31:18] Even just having, you know, ChatGPT or one of your general models say, hey, I'm working with an image. [31:23] "How can I load it up with as much context and specificity as possible? What's available to me?" And then using that as a jumping off point, [31:32] for what you do is a pretty interesting use case of AI. - I didn't even know, like I'm very familiar with stills, underlying metadata fields, but I didn't really know what was available in audio or what was available in video files. [31:44] And I just sort of I go into Cursor and I ask, right, like now where you have a music workflow, which we're not going to look at, but like where we embed artist album kind of like licensing data into any music we consider for a film. [31:56] And I didn't know that there was a metadata field we could just store that in. But of course there is, you know, somebody thought of this a long time ago. [32:03] Yep. Amazing. Okay. We have one last use case, which, um, mom, if you're listening, I think you're going to like this one. My mom's a genealogist. So, uh, I think she's going to like this, this use case, but let's show it first. And then I'll call out mama where I think you can use it. Okay. All right. So you can imagine in our films, we work with a lot of documents and we're not always interested in the entire document. Sometimes like we just want to transcribe [32:32] Maybe part of it. Maybe we want to translate and transcribe part of it. Like take this newspaper document, for instance, like maybe the Arkansas State News is the article we're interested in. That's the transcript we want to be searchable. That's what our editor might want to consider for the film.

32:48-34:20

[32:48] We can't just like put this in Adobe Acrobat and OCR the whole thing. It's like, it's not going to work. [32:54] And even more than that, like the quality of the image would not work with most OCR engines, you know. So AI is really good at OCR of old documents. It's really good at handwriting. It's pretty good at translation, too. So I built and we're not going to get into the building necessarily, but this is. [33:12] This is one of the few Xcode builds I had to do. So this is a Swift build. [33:16] a little Mac menu bar app. [33:18] It's called OCR Party. [33:20] which stems from the fact that we're just... [33:22] OCRing part of the image. You got to have fun with these things. And let's see. [33:28] We're gonna open up that newspaper in OCR Party, [33:31] We're going to get like a little preview window. Yeah. [33:34] So let's say actually what we want is Coolidge seeks peace in the world. So let's zoom in a little bit. [33:41] Let's... [33:41] open up our [33:43] cropping tool. [33:44] This little thing down here is basically a choice between Mac OS vision and an AI API call. And the purpose of that is because sometimes people don't... [33:54] Sometimes people don't trust AI, you might have heard. And so I built that in as an option, essentially. I would think the AI option gets used more, but nevertheless, [34:03] Now you're gonna select just the part of this article you care about or this paper that you care about. And you can see there's like a crease in the paper, there's a weird black mark here. [34:13] But you can imagine we submit this for OCR. [34:16] Thank you. [34:17] Now we have just that text that we pulled.

34:20-35:50

[34:20] We're also calling out for our editors, like where on the page they're going to be able to find it if they want to sort of zoom in on it, crop to that particular article. [34:29] And I can't exactly remember what text we were looking at, but [34:32] It certainly completed those sentences where there was a black marker. Yeah. Right. So AI was able to kind of infer... [34:39] to the best of our ability, what that sentence might have said. And, you know, if this ends up in a film... [34:45] I could guarantee it would get fact-checked later, but for the purposes of gathering documents, thousands of documents, [34:51] This ability to kind of like precisely OCR is, is, it's been a nice little unlock for us. [34:56] One thing I also want to make sure people take away from this episode is we've seen basically three form factors of apps. So, yes, they've all used AI, but you've been able to swap between sort of like a Python API service that gets called by another software application or database, a iOS app that, you know, you can run on your phone and then like a little desktop toolbar widget. [35:26] engineering. [35:27] is like if you have basic software engineering practices and then you know enough to be dangerous like [35:33] Yeah, you can you can vibe code and, you know, a Swift, Swift app to run on on your local desktop. Just a hyper specific app. Yeah. No one was going to make me this app. And so the ability to make like an extremely specific app that makes a workflow. [35:48] on my team and my company easier,

35:50-37:18

[35:50] It's been an unbelievable moment. Yeah, I would say the TAM for this app is like you. Yeah, yeah, yeah. I mean, I think I could sell it to like two colleagues. Well, and then my mom. So what I was going to tell you is my mom is a genealogist for the Daughters of the American Revolution, of which I am one. Fun fact on Claire. Oh, no way. And she does the lineage tracing. And do you know how many times she screenshots something and is like, can you read this cursive? Like, what in the world? [36:20] is this name. And it's like, you know, one name and a big... [36:24] a big image. And so I do think AI's, and I'm like, yeah, I'm going to drop this in a chat GPT and I'll tell you what I think it says. And I think it's ability to read handwriting. [36:34] old typefaces, kind of understand the nuances of spelling and things like that are just really, really interesting for these sort of research use cases. [36:44] Yeah, we didn't look at a handwritten doc here, but that is definitely something happening at our company, like the ability to read letters that we could not read before and also just other languages. Right. And then we immediately have that text to you have letters written in some kind of cursive scrawl from the 17th century that is now translated to English and made legible for you. [37:05] Amazing. Well, we've seen three great use cases. I am sure you are the hero on the team for this kind of stuff, because I can imagine, again, people might be tired of hearing me talk about AI, but thank you.

37:35-39:06

[37:35] zooming in and squinting at the text to try to get it the most accurate as possible. Trying to, you know, automate away painful processes, right? Not the things people liked. [37:46] - Automate away toil, that's what we wanna do. Okay, well we're gonna do a couple lightning round questions. I'm gonna get you. [37:53] out of here to, you know, go digitize a thousand more or more images. So the first thing I want to ask you about is just your approach to learning. It seems like from what I'm seeing, you're pretty fearless about, you know, you're going to be a little bit more about learning. I'm going to be a little bit more about learning. [38:06] new technologies, new things. I think this moment is such a critical moment for upskilling and learning. How do you think about [38:13] learning in this moment. [38:15] I think that one of the reasons that I find like tools like Cursor or Cloud Code kind of intuitive is to me, there's a parallel with creative software. So like at various moments in my career, I have been deep in Photoshop or deep in Adobe Premiere, Avid Media Composer, whatever it is. And those softwares are so complex. They are like a maze of tool menus and you end up on Reddit and on YouTube. [38:39] doing your research, trying to just like figure out how to accomplish the thing. [38:44] And I think that that's essentially what a lot of these tools are today, too. Like I've been on Cursor YouTube and Cursor Reddit and learned... [38:52] tips and tricks on like from the vibe coding people of the internet. [38:56] And, you know, I think it sort of starts from knowing what could be done or what's possible. And the like path to get there is is swifter than ever before.

39:06-40:41

[39:06] What I like about this, I started sort of my fascination with technology in these creative tools. I will like this is like pre Photoshop where I would go and how can I make my text look like liquid gold? And I would follow these like five step, you know, you know. [39:23] graphics, tools, tutorials. And what I love about this moment in vibe coding or AI assisted engineering is coding feels so much more creative. [39:35] than technical, where these tools feel really like creation engines to me more than functional tools to write code. And so I love that parallel because it's what's made me so excited about technology my entire career. And I think it's why I'm so excited. [39:52] leaned in this moment, like activates that same feeling of like, Oh, now I can do can make this thing that I didn't think I could make before. [39:59] I think that there are a lot of people too in my industry who have a kind of creative brain and creative approach to these things that would, you know, maybe like looking at a cursor window right now when you have no idea what it is, is a little scary. But I actually think that they are... [40:12] more well-suited for the work than they might know. [40:15] Well, let's talk a little bit about your industry, because I know that the film and creative world is deeply skeptical of AI. Sometimes we wade into the waters of AI video generation on this podcast and get a little feedback. And I totally understand. I have family that's in the creative industry. I'm curious, you know, what's your point of view of AI, particularly in the film world? What are you excited about and where do you think these...

40:41-42:12

[40:41] kind of concerns are really warranted? And then where do you think the most practical applications are? [40:46] I think today it's like sort of where we started at the top. The practical applications are more in like tooling than they are in creation. But I do think that like the creation is going to get there. Like today I play with I play with all the generative video models. Like how can I not? They're super fun. [41:03] They are not like at professional grade quality yet. Like the amount of time you spend throwing tokens at even the highest end video models, you're not going to be able to match your shots that well. You're not going to be able to match the footage you shot yourself that well. And so I don't think they're there yet, but like I'll be honest, they're going to get there. [41:21] I think that like they are still exciting to me, but I would separate a couple of things. Like in the nonfiction world, I think people should be careful. Like I think we should not be generating archival footage. We should not be trying to fool our viewers into thinking that there was video in 1750, you know? And I think that that's the part that's like a little scary. And then of course there's the job displacement aspect of things. I think people are scared if you film stuff for a living. [41:49] you're definitely scared that like that, [41:52] You're going to be able to just use text to generate that same video you used to shoot. So... [41:57] I don't know how to like, I don't think anybody has like good answers to that part of it, but [42:02] My approach has certainly just been like jump in and learn the tools. Like they are... [42:08] they are gonna be here whether we want them to be or not. And, uh,

42:12-43:42

[42:12] I think that they have a lot of practical benefits today that are less scary. Yeah. The best advice I can give to people, and I have – [42:20] Of all the spaces, and I'll say this honestly, of all the spaces I have the most job displacement concern... [42:26] it's in video generation for [42:29] non [42:31] non-archival non-documentary commercial use cases um you just you just see how [42:37] it could be very applicable. And [42:39] The best advice that I can give to people in this moment is the more you learn the tools, the better off you will be. [42:47] Whether or not, you know, whether or not you love where the tools are taking us as an industry or as a culture, you know, [42:52] knowledge is power. And so the more you learn and understand one, you can identify opportunities where it does add value, even in your creative process. And two, [43:04] you're going to be differentiated in the market from a job perspective because you're going to have a more robust sense of what's available in your industry. And I think that stands for people in your industry. I think it stands for people in my industry and technology. So I just say there is no harm in learning this stuff. [43:19] Absolutely. I also think that like there's a place in the process for it, which allows you like a place to learn without thinking it needs to end up in the final product, right? Like you can use video models for storyboarding all day. [43:29] you can maybe prove whether or not that shoot is worth spending that money on. Now you've learned how to use the video models a little bit and you know, [43:36] You haven't necessarily displaced anyone, but you've made your production a little bit more efficient, a little smarter. Maybe you've shot better...

43:43-45:17

[43:43] footage as a result of it, you know? Yes. But we're not, we're not generating fake archival footage of like Genghis Khan. We are not, we are not doing that. Definitely not doing that. And I'm like PBS, which is where most of our films end up, have a lot of guidelines around that. And I think that's a good thing. [43:58] But it's the other stuff. It's commercial. It's visual effects. Like a lot of stuff is going to get easier. And so it's coming one way or another. Great. Well, last question. Have to ask you when, you know, you're on your dog walk with ChatGPT doing voice mode and it's not listening to you or not giving you what you want. What is your personal prompting technique, especially because you use voice? Like I'm willing to type things to AI. I don't know if I'd be willing to say them. So what's your technique here? [44:28] - It's different when you have to say it out loud. I am super nice to the AI. I can vividly remember the one time I was mean to it. I'm nice to the AI. I don't know where this is going. I'm gonna be nice to all the models. What I do is like, for lack of a better, [44:41] way of describing it, I just start over. Like I will, I know that a lot of these things have ways of like consolidating the context window now and sort of summarizing, but I will ask for what I call like a resume work prompt. [44:53] I'll be like, this isn't working. I want to resume work later with another AI dev. Can you give me a prompt with everything they'll need to know? [45:00] And typically what you'll find is that that prompt shows you where it was off. [45:04] You know, like in its summarization of what it was doing, I'll be like, oh, see, like I wasn't asking for that. That's that's why we were not communicating. And then I'll take that resume work prompt. I'll prune it a little bit.

45:17-46:47

[45:17] pop it into another chat. [45:18] And then, you know, you'll find that you wish you hadn't beat your head against the wall with the previous chat for 20 minutes. [45:24] I am also team be polite to your AI, but then again, you hurt the one you love the most. And I've found myself occasionally… [45:31] getting testy. And you know, when I stopped being mean to AI is when reasoning really started to show and I could see it reasoning how upset I was. It'll be like the user is mad at me right now. User is really frustrated with me right now. I need to totally rethink. Go sweet, sweet baby AI. I'm sorry. Apologize. I'm not that mad at you. Okay. So create a, you know, return to progress prompt, really get the summary, take that to understand if there was some misunderstanding, [46:01] then just start fresh. That's great. Well, Tim, this has been super fun so much for me to learn. I have tons of ideas, even just for my day-to-day life about how I can use. I have kids, so I probably have 30,000 images. Let me know if your mom wants the OCR party. I will. She'll love it. Okay, mom, I have gotten you your first Vibe Coded app direct from the podcast source. Tim, where can we find you and how can we be helpful? [46:24] Yeah, I'm not that active on social, to be honest, but I am on LinkedIn. You can find me on there. I have a website that is itself a fun Vibe Code project. So you can find me at timmacleary.com. [46:36] I have a little chat bot there, the GP Tim. You can go chat with him, learn a little bit more about me and my work. And then other than that, I would say tune in to Florentine Films upcoming production. We have it at a...

46:47-47:34

[46:47] A series about the American Revolution coming out in November. So on your local PBS station. [46:53] My kids are obsessed with the American Revolution. So everybody walks in. Sounds like it's in the family. Yeah, we will be big fans. Tim, this has been great. Thank you so much. And thanks for joining How I AI. [47:05] Thank you for having me. [47:15] You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com. See you next time.

Want to learn more?