Most of us would have heard of interesting use cases of generative AI in creating art. After all, that is a significant bulk of the buzz with AI today. But have you heard anything about using AI to analyze art? This is the space that today’s guest lives in. In this episode, Eathan Janney is joined by David Stork, a distinguished scientist, author, and innovator with significant contributions to various fields like machine learning, pattern recognition, computer vision, and artificial intelligence. David is a prolific author covering a diversity of topics, including computational optics and image analysis of fine art. David is putting the final touches on his upcoming book titled Pixels and Paintings. Join the conversation and find out what AI can do to help us better appreciate and understand art, how David is optimistic about the future based on evidence from his own experience, and when and when not to use computers to learn stuff. Tune in!
- Deep neural networks have revolutionized AI and machine learning by enabling the analysis and interpretation of large data sets, such as images.
- AI can assist art scholars in analyzing paintings by providing insights into lighting analysis, pose estimation, and other aspects of art analysis.
- The use of AI in art analysis does not replace the expertise of art scholars but enhances their understanding and interpretation of fine art.
- Symbolic mathematics and computer-assisted techniques allow for more efficient and accurate solving of complex mathematical problems.
- Individuals should be intentional about leveraging the capabilities of computers while also developing their own skills and understanding.
- “AI can assist art scholars in addressing problems they’ve never encountered before and provide more accurate insights than human connoisseurs.”
- “The use of AI in art analysis allows for the analysis of thousands of paintings in a few minutes, providing valuable insights into artistic trends and techniques.”
- “More and more mathematics is going to be done symbolically on a computer, just the way, like dividing large numbers. You don’t do that by hand anymore, right?”
- “Be very intentional about what you want to leave to the computer and what you want to learn, and don’t ignore either side of it.”
- “Symbolic mathematics is dealing with the symbols. It’s just extraordinary development. I just love this stuff.”
Listen to the podcast here
AI’s Journey: From Object Recognition To Artistic Appreciation With Dr. David G. Stork
This is Dr. David G. Stork, adjunct professor at Stanford University and author of Pixels And Paintings: Foundations Of Computer-Assisted Connoisseurship. I’m on the Edge of AI, a show curated especially for the connoisseurs of the finest AI knowledge. Keep reading.
Here’s what’s to come on this episode’s journey. Find out what AI can do to help us better appreciate and understand art, how our guest is optimistic about the future based on evidence from his experience, and finally, when and when not to use computers to learn stuff. Follow some more on this episode.
AI travelers, it’s going to be quite a voice. I’m driving this ship with my unique perspective as a polymath. I’ve ventured into the realms of music, art, science, and business. I go deep. That means I’m a Neuroscience PhD but I’m also a registered piano technician. I’m also a Cofounder at Edge Of Company. We empower tech and cultural pioneers through top-notch endeavors like this very AI show spaceship. Our guest will help me guide you through uncharted territories, where it will unravel the mysteries of AI and push the boundaries of its impact. Are you ready to chart a course for innovation? Anchors away, readers.
In this episode, our guest is David G. Stork, a distinguished scientist, author, and innovator with significant contributions to various fields, among them machine learning, pattern recognition, computer vision, and artificial intelligence. Dr. Stork holds a BS in Physics from MIT with a thesis under Edwin Land, then president and CEO of the Polaroid Corporation, and his MS and PhD in physics from the University of Maryland.
Dr. Stork’s pioneering work spans academia, industry, and entrepreneurship. He has held faculty positions at Wellesley and Swarthmore Colleges, as well as Clark, Boston, and Stanford Universities teaching disciplines ranging from Physics to Computer Science. His expertise extends to diverse domains, including Electrical Engineering, Statistics, Neuroscience, Psychology, and Art, as well as Art History.
A prolific author and researcher, David has 220 plus peer-reviewed scholarly works and 8 books to his name, covering a diversity of topics, including computational optics and image analysis of fine art. Additionally, Dr. Stork holds 64 US patents. He is a fellow of many respected organizations, such as the Institute for Electrical and Electronics Engineers and other organizations in the fields of optics and imaging.
At the time of this interview, David and I are both fellows at the 2023 Leonardo Djerassi Residency in Woodside, California. It’s five weeks of precious time and space to develop our work at the intersection of arts and sciences. While here, David is putting the final touches on his book titled Pixels And Paintings: Foundations Of Computer-Assisted Connoisseurship. David, welcome.
Thank you very much. It’s a great pleasure to be here in such a gorgeous environment.
We were talking about this since day one of the residency, sitting outside in the beautiful mountain countryside and thinking, “We could probably film this out there.” It turns out the lighting is not so consistent. We get to look at a beautiful view of the mountains. We’re sitting in the library of the artist’s residence area but we’re still having a lot of fun and it’s a beautiful setting.
One of the reasons I thought what a great opportunity to have you on the show as one of our foundational episodes as well is because you’ve been viewing this whole thing from not the very beginning but along the way, you’ve probably seen this thing, from places where they might not even call it as the predecessor to AI. I’d love to get a little bit of an insight into where your exposure to what you see as the foundations of what we’re calling AI. Where did that start?
I won’t go to the history before I participate. I worked at the Center for Adaptive Systems at Boston University. I was a professor there for two years, which was one of the leading centers for neural-inspired computation and biologically relevant neural networks. I was working on things like computer models for lip reading, putting together the sound, and the site for speech recognition. When I came out to Stanford, I worked in the lab of David Rumelhart, who’s most famous for developing the backpropagation learning algorithm for three-layer neural networks.
At that time, we knew some of the limitations and so forth. We all envisioned and imagined a day when we would get what is called deep networks, and here we are. It’s been a long slow road to increasing theoretical understanding of neural systems. Most importantly, getting massive data sets, as I’m sure your audience knows, and you and the computing power to lead to systems like ChatGPT, which is the presenting case. DALL-E and many other deep neural network systems.We all envisioned and imagined a day where we would get what are now called deep networks, and here we are. It's been a long, slow road of increasing theoretical understanding of neural systems. Click To Tweet
The readership that we’re aiming for here is going to be an interesting diversity. We’re going to have probably some folks that are building some of the techs in the background like we’re talking about. Some of them are using the tech may be for the entrepreneurial, purposes of an organization, or movement of some sort. We’d love to give insights to the average person who’s intellectually curious about what’s going on here and get the context here. That history piece is useful. For a little bit of vocab, maybe one of our first vocab words here on the program is deep network.
Deep neural networks or deep learning.
What makes it deep? Why was that not a capacity at the time you were working on the lip reading, for example?
To step back one step further, there was an intellectual competition between two general approaches towards AI. One is expert systems where you would write down lots of rules, for instance, linguistic rules. The other was machine learning or statistical learning and statistical pattern recognition, where instead of trying to put down rules, you gave lots of examples and let the system learn the statistical relationships between them.
If you had to give a prize for who wins, it’s the statistics. Those are proven far more accurate, reliable, extendable, and so forth. That’s where deep neural networks come from. The basic idea is that you would have layers of very simple processors, call them neurons or nodes, where you would put input, be it an image or sound, and so forth.
You take weighted sums of the values of these inputs to get small groupings of these. You get more and more layers. In the end, you’ll get an output, maybe two neurons. One says, “A cat is visible. A cat is not visible.” The real question was how you train the weight and the connections between them. My host colleague, David Rumelhart McClelland, came up with the backpropagation learning algorithm, which is calculus but how do you take an error at the output and change the weights so the next time that same input is put in, you’re going to more likely get the correct answer?
One of the problems was you wanted to have man layers so that you could do things like translation in variance in image analysis. If I have a cat here, I might be able to recognize it but what if the cat’s over here? It is simplest to do this by having many layers. The backpropagation algorithm extended to these many layers made it seem rather difficult. I won’t talk about the real mathematics behind it. Given enough patterns, examples of cats in different positions, or the imposition that the scientist knows, we want translation and variance.
Put in the architecture itself. If you can recognize something here, you should be able to recognize it here. You put that in constraints. That plus the access to billions of pictures of rooms, people, horses, cats, and so forth, we can train these networks to be highly accurate and they’re rivaling, and in some cases, surpassing human performance on very difficult pattern recognition problems.
I can break it down in probably even oversimplified versions but tell me if this sounds about right. Let’s imagine I have a piece of graph paper. I’ve colored in the pixels of the graph paper, put it that way, the cells, to look like a cat to me. You’re saying, “I have a translation of whatever that image is into another image on another piece of graph paper,” to the point where at the end if it’s all black, it’s a cat, and if it’s all white, it’s not a cat.
It’s like that. You don’t need to have a whole large number of output neurons if your goal is just a cat or not a cat. It’s either 1 or 2 neurons.
Maybe it narrows it down to a smaller frame every time and it’s either a black or white pixel.
The key thing to modify in that excellent accurate description is that at each layer, one of those cells on your graph paper doesn’t necessarily represent a point in the field of view but a particular feature, like a vertical line in a place or a horizontal line, or as they get higher in the network, curves or groupings, more abstract, harder to interpret features and groupings of features. That’s it in a nutshell.
What’s useful for readers is they are trying to figure out, “What are they doing behind the scenes?” They somehow built an actual brain for a robot. The technology you’re describing, even though we described it for an image of a cat, you can use a similar technology to underlie something like ChatGPT, which is a text-based type of thing.
You put in the text as the input. The desired output is the next set of words, sometimes groups of words but for simplicity, the next word in a sentence. You give it a part of a sentence and the output is, “Tell me all the possible words that could go next. Which is most likely to come?” ChatGPT puts together groupings of words, phrases, and statistical correlations between words and then does an accurate job of producing what the next word would be and then you can do it again. This generates whole paragraphs.
There was a TED Talk by a scientist. This is probably a good many years ago. He set up his whole house full of cameras as his baby was learning to talk. He got cameras and audio. He captured all the development of, “Wa, wawa, wata, water,” and all of that. It also showed that there was a statistical connection between where the child was when he was using and learning these specific words. For example, it learned how to say water in the kitchen because there was proximity to the actual water. The statistical probability is, “In the kitchen, I’m going to say water.” Every time you return, you say it more. The interesting thing about that is that ChatGPT is learning to talk seems like we do.
Yes, but there’s a longer debate between cognitive neuroscientists and linguists. Noam Chomsky and Steven Pinker are strong proponents that there are innate structures ahead of time. We’ve evolved to have certain structures. This pure statistic will not explain certain types of errors and rates of learning that humans do when they’re learning a language.
The real question for your audience is, “Does it matter?” Suppose we had that structure and we knew what that was. We could train these things faster. However, if you just care about having a ChatGPT or other text generation system, another approach would be to flush the system with billions of examples and it will abstract out. Learn most of these kinds of regularities that surely, come innate with us.
I saw an interview with Noam Chomsky, the dismissive of ChatGPT. In a way, it’s statistics but it’s pretty impressive. We talked a little bit about it and referred to it but you worked for several years for a camera company.
Ricoh as chief scientist.
You mentioned the lipreading system. We alluded to it but how does that compare with what we’re doing and what you were doing back then?
We were doing a number of projects. One was the infinite memory multifunction machine. If you think of copiers and faxes, wouldn’t it be nice to store everything that ever came through your office? The question is, “How do you find it? How do you search for it? How do you organize it,” and so forth. There’s AI associated with that. I started a group there on computational sensing and imaging. I later took to those ideas to Rambus Corporation. That’s the idea that you can design.
Typically, for many years, people have designed optical systems to get the highest quality optical image. There’d still be some problems with it but then you could fix those with some image processing. Maybe if your goal was to recognize a face, you would do computer vision on the end. The computational imaging revolution or approach is saying, “Maybe we don’t need to get a high-quality traditional optical image if we get the information that’s useful for the task at hand.” Looking for the color of skin, for instance, that’s going to be helpful in finding faces.
That’s how they do it.
There’s skin, features, and so forth. I worked for many years on simplifying the optics in very special ways so that the optical image that gets captured contains the information that’s relevant for the task at hand like recognizing a face or imagining you’re reading vertical barcodes. You don’t care to get the image sharp up and down. You just need sharp left to right. You would design different optical systems. We designed a very small camera that didn’t even use lenses. It’s called diffraction grading, a structured piece of metal where the light would come through and detract in very special ways that had mathematical properties that we could undo in the processing.
One feature of what you were doing, if we go back to what folks are doing and it’s more complicated than this, but there’s a basic principle that gets to be applied over and over again and many layers with the AI, people are using these deep systems, versus some of the things that you were setting up. People still have to set up things like this. Where you had to meticulously go through and think, “How would I read lips? There’s an image and auditory component.” You were talking about this previously outside of the interview. In some ways, you were either not using the auditory component or considering it separately to do the lip reading.
You have auditory processing, taking the sound that you hear from the speaker, as well as the video. The question is, “How do you put those together?” The simplest way is to get the acoustic signal and make a classification. Let’s say we’re just dealing with recognizing numbers. Did he say 5, 6, or 7 and then you can get one from the vision and watch his lips, get an estimate there, put them together, and say, “Given those two, what’s the most likely?”
Those are the separate audio and video. That does not take advantage of the fact that when you’re doing certain lip motions, you’re going to get certain sounds and learning those earlier that they’re features that are audio-visual. Instead of having a separate audio and a separate video, you make a hybrid system where the features themselves have some audio aspects and video aspects and then do that. That turns out to be a little more accurate.
I can talk about lip reading for a long time but there are some very cool aspects to it. The utterances that are difficult to distinguish by ear are easiest by eye and vice versa. For instance, ba and pa are visually indistinguishable. They’re called visemes, like phonemes. They’re very easy to distinguish acoustically. There’s something called VOT, the Voice Onset Time, the delay between the burst sound and the vocal folds going, “Pa,” they’re together. “Pah,” there’s a delay. It’s very easy. There are examples and vice versa. They’re interesting.
A visual acoustic phenomenon like the McGurk effect is where you take a video of someone saying, “Ma,” but the sound that is spliced with it is the sound of someone saying, “Ga,” a so-called back-consonant. If you’re watching it, you hear, “Da.” It is mixing in. It averages the back, the front, and the middle. It’s cool. You can have this thing going again. You hear it saying, “Da, da, da.” You close your eyes so you don’t get the visual and you hear it say, “Ga, ga ga. Da, da, da,” by opening and closing your eyes. Our computer system had the same perception.
What’s interesting about the way that you were analyzing things is you learned how complicated the human perception system is. You don’t realize that but you’re integrating auditory and visual. You’re even processing some other levels of things.
Context and that hits you in speech. There’s something called the Van Santen Effect. It’s best to explain by an experiment. If I have an utterance like a wheel on the axle but imagine I put in noise at the beginning, you will hear the W. It’s not what could go there. It sounds like a W, even though the prime is much later but if I say, “Heel on the shoe,” you say, “I heard an H. It’s the heel on the shoe.” There’s context and knowledge at many layers. It’s very fascinating.
To demonstrate where things are different in some of the models that are being used, there’s a possibility we imagined. You could train like a speech recognition system using this multi-layer deep network system. Nobody would have to think through those details necessarily.
It was at Johns Hopkins that said, “Every time I fire a linguist and hire a statistician, my accuracy goes up.” In the era when large data sets are available, it doesn’t matter that much which algorithm you use. It’s who has the most data wins.When large data sets are available, it actually doesn't matter that much which algorithm you use; the one who has the most data wins. Click To Tweet
Here’s a question that’s probably asked all too much and you may or may not have an opinion on. We didn’t talk about this previously so you could wa wave it off. Any thoughts about what consciousness is and how AI has access to it?
This is the hard problem. What people are interested in is the hard problem. How do the intangible subjective qualia of the red of that book arise from the material processing of brains or computational systems? I don’t have an answer to that. The best philosophers don’t. I’m hard-pressed to imagine even a candidate’s answer. Imagine I could tell you everything about their brain like, “This neuron is firing at this rate. We can do that. I see red.” Imagine we do that in decades of research.
When we get that, does that answer it? I don’t think so. It’s still another qualitatively different thing from the processing itself. I urge people to work on it. I don’t see how they’re ever going to solve it. I will read every paper that comes out that solves it but I also don’t think it matters that much because it’s the processing, actions, and that people are going to be using, whether or not it’s conscious. I don’t know. I don’t spend much time on that. Be leery of people who think they have an answer to it.
In your experience, what has been the most significant development in AI, particularly in the areas of machine learning?
My friend Yann LeCun, Geoff Hinton, and the other pioneers in deep learning have revolutionized AI and machine learning. We could deal with data sets that involve one billion images of faces, for instance, just the whole scale, expanding beyond anything we dealt with when I started in three-layered neural networks back with David RumelHart. That is the most important.
These systems have more layers.
Many dozens. Sometimes over 100.
To give context, it’s not necessarily one billion layers.
ChatGPT has something like 1.7 billion free parameters. Think of them as the connection weights. It’s a very large network. This is too technical but from a traditional statistician’s viewpoint, when you have degrees of freedom, you traditionally need a certain number of training patterns per degree of freedom. Somehow that doesn’t seem to be the case with these large networks. There are redundancy and biases built into these networks. That means that as large as the training sets we have, it isn’t many patterns per weight that there are constraints. This is not fully understood how these systems work.
It is like you’re trying the system and it works but you don’t exactly know why.
Often worry about, “Can we get this AI to explain interpretable AI?” I am not that interested in it because, frankly, humans aren’t very good at explaining. I teach my class at Stanford on Pattern Recognition. I start with the case of recognizing a chair, “What is a chair?” Someone will say, “It has four legs and a back.” I have a wonderful Taschen book called 1000 Chairs. I don’t even have to look. I open it and it says, “I’ll bet that doesn’t conform to your definition. It only has a single book. It’s hanging from above.” The question is, “You can recognize it’s a chair but how do you do it?”
The whole point is if I pushed you against the wall, explain what a chair is that would apply to every chair that everyone would agree as a chair, you can’t do it. You get too vague. It’s called functionalism. You say, “A chair is something that supports sitting.” “Yes. How can I tell whether something supports sitting?” I understand for legal reasons it’s helpful to know when a machine makes a mistake, how you can debug it, and so forth. It has interesting scientific aspects to it but it’s not a problem I’m going to work on. When I build my systems, I don’t worry, “Will it be able to explain why it can recognize this painter versus that painter?” Not really.
In essence, in the end, AI seems like it’d be better than us in terms of speed, especially scope. At the same time, it’s the limitations of doing what we can do on our limitations. We can only define chairs the way that we can.
We can give billions of examples of chairs even though we can’t explain what it is about that image that makes me understand that it’s a chair. The computer is properly trained with enough data. I’m not so worried that it won’t be able to say, “It’s because of the conjunction of this curve with that color and this material.” I’ll leave that to someone else.
We haven’t talked about this yet but there are things that scare people about AI. You could think about this in a scary way or maybe in a good way. What do you think about AI replacing jobs? How much are we going to see that?
You should talk to an expert on that rather than me. I do think that there are going to be many repetitive tasks that get more replaced by AI. The real question, and I’m not the expert, is, “Will that free people to be much more productive using these tools or will it put them out of work?” I’m not expert enough in that realm to inform your readers.
Only time will tell.
I want to talk about art.
That leads me to my next question. I would love to talk about your book. As I’ve talked to you about something, you’re very passionate about it. You’ve written tax books. Not that you’re not passionate about those subjects but this feels like a personally rewarding project. Maybe you could walk me through it.
The overarching mission and vision is to use computation, computer vision, and machine learning AI as a tool to help art scholars understand and interpret fine art paintings and drawings. Whenever I talk to art scholars, they get very defensive very quickly. I allay their concerns by saying, “No. You will still make the decision. A microscope is a tool for biologists and a telescope for an astronomer. These computational tools will be able to help you address problems you’ve never done before and perceive more accurately than even the best human connoisseurs, scholars, or people off the street. For instance, lighting analysis.”
I’m going to be talking about lighting analysis in our open house here at Djerassi. Also, my work on Vermeer. It turns out humans are not very good at looking at an image and saying, “Where the light is coming from.” That’s why many so-called tampered photographs and fake photographs get by us. Someone will take a separate picture of Angelina and Brad. They segment Brad out and paste it in with Angelina. It will look perfectly good to most people but there’s no guarantee that the lighting directions on them in the individual photos are the same.
Hani Farida and others have developed techniques called the occluding contour algorithm that are far more accurate than any human at being able to distinguish these and find those. My colleagues and I were the first to ever apply these to paintings to look at, for instance, the accuracy in the works of Vermeer like the Pearl Earring.
No one’s saying an artwork is better if the lighting is consistent. We’re understanding the artwork as it is. Many artworks have very inconsistent lighting, like the surrealist and René Magritte, for instance, we’ve worked on. You say, “This cash shadow is here.” It’s not making normative judgments. It’s not a bad painting that causes that but these computer techniques can show it in ways that looking even a good connoisseur can’t by eye.
That’s one of the many examples of the kinds of tools we’re developing to help art scholars analyze paintings. I’m working on this on one of our papers. One of my students, Jean Paik Chow, worked on estimating the pose in portraits. You have the angles of the head, the roll, the tilt, and the yaw. Art scholars are interested in this because it is part of the aesthetic import of a painting. Is the person looking straight at you? Is it a profile view like this and so forth? They’ve been doing it rather informally, “It looks like a three-quarters view and stuff like this.” Not only can our system estimate all three angles very accurately but it can do it on thousands of paintings in a few minutes.
If you’re interested in how portrait pose has changed over the last hundreds of years, you could either take a few years and have a grad student go down one, or you could use our software and get these plots very quickly. You can see how it changes in Rembrandt’s time over his life. You can even tell whether an artist is right or left-handed. If I’m a portrait painter and I’m doing a self-portrait, and I have my mirror, if I’m right-handed, I’m going to put my canvas over here. If I’m left-handed, it’s like this. There’s going to be a bias.
Out of 11,000, we can find which artists are right and left-handed. That’s not Earth-shattering history. Everybody went, “Was this artist left-handed?” Leonardo was left-handed. We’re working on adding gender identification. Are the portrait poses of women different from men over time? All those kinds of questions. It’s the tool. The art scholar says, “I’m interested in this.” Here we have a tool that says, “You can do it on 11,000 paintings in 5 minutes. Get used to it and there are yet others.”
Any other particular chapters, topics, or things that you’re interested in calling out?
My deepest interest in AI and art centers on semantics. There’s a branch of traditional AI called semantic image analysis, where you take a photograph and the AI will produce a caption. There’s a man on a bicycle. He’s pushing his daughter and there’s a mountain in the background. That’s great. It’s quite remarkable that you can train a system to do this. That’s just describing the surface level or what is depicted. It’s not addressing the problem of why the artist made it or the message or the meaning of the painting. That’s where art is so much richer than natural photographs or cell phone photographs.
Our first work was on Dutch vanitas paintings. In the Dutch golden age, from 1650 to 1700, roughly, there were many still lives that would have a skull and a candle with a flame out. You could see the smoke. Also, a book, a musical instrument, and so forth. Traditional AI, and I’ve run these systems on it, will be able to say, “There’s a scaler,” but it misses why the artist made this, which imparts the lesson of vanitas. “Don’t concern yourselves with the pleasures of this life. Be prepared. Live a humble sober life so that you will have eternal salvation in heaven afterward.”
These were called painted sermons. Everyone understood them. Everyone was educated in the Netherlands. AI can’t solve those. We’re getting AI to figure this out. It’s a very simple and restricted class. For decades, we are not being able to take a painting, get a scan of it, and have a two-paragraph, “What does this Picasso mean?” It’s culturally dependent and there are multiple meanings but some are more reasonable than others. Some go to what the artists explicitly say. Those are the kinds of things we’re after.
It would also be interesting if you could get it one day. We’ll get there and see how far where you could say, “Here’s his background and what he thinks about.” What does this mean to him?
It’s in my book. There is a work on emotion but is this painting upsetting or does it make you interested and so forth? The AI can learn from a training set. In other words, if you see lots of paintings with bright colors, you’ll think of energy. An AI can learn that reasonably well. That’s a categorization like, “This is a painting that involves fear or something like that.”
I’m much more interested in a deeper meaning rather than the classification. That’s why art is so much more interesting to me than photographs. Yes, there are some art photographs that come to the fore but the billions of photos on the internet that I’ve been taking with my cell phone don’t address that level of depth of the greatest artworks like Las Meninas and things like that.
I want to go on to our next segment, AI Wants To Know, in a moment but I thought I’d share with you because we had a pre-interview conversation and we discussed this. I was curious if ChatGPT could have any insight into one of your paintings from its understanding of the context. Intentionally, I was like, “Harmen Steenwijck Vanity of Human Life Allegory Painting.” Look what ChatGPT said, “I’m sorry but I couldn’t find any specific information about a painting titled Harmen Steenwijck Vanity of Human Life Allegory. It’s possible that you may have misspelled the artist’s names.”
This is what it said, “Without more specific details, I can’t provide a detailed analysis. However, I can give you a general understanding of allegorical paintings and the concept of vanitas. It is often associated with symbolic representations and the transience of life utility of Earthly pursuits. Allegory paintings are a work of art that uses symbolic imagery.” It’s very interesting.
When we developed our AI system, we trained it on the texts that were specifically addressing these paintings. We had them translated from old Dutch and then modern ones in English. The words skull and mortality were often in the same sentence. We trained what’s called a knowledge network between certain objects and concepts. ChatGPT may have that. I’m going to put it in accurately and see how far it goes. It’s a good start. It’s a foundation. We want to train it with more art knowledge, context, and things like this. We’re moving.
Any final advice for a common person on how to best embrace this coming AI wave? He’s a scholar. He’s not a life coach.
Be fascinated by it. Other people are better placed to talk about the dangers of these and there are real dangers associated with this, unintended consequences, like ways things can go wrong. I do look forward to Chuck Schumer’s legal approaches towards regulating and so forth. The bigger danger is bad actors. Even if we are responsible in the academy, AI corporations, and so forth, there are bad actors out there. We need to use AI. The only way I see to fight deepfakes and disinformation is with an arms race.
The best scholars are working on how to detect and root out the kinds of things that we know are going to be coming. You probably saw the Deepfake Zelenskyy videos where he says, “Fellow Ukrainians, put down your arms. We’re done.” When that starts getting out, they’re real dangers. They’re much more versed scholars in that realm than me.
The next segment is going to be a little bit of fun. It’s called AI Wants to Know. Some of our fellow residents here help me tweak the questions so it’s extra fun. AI is curious so are we. These are ten quick questions designed to uncover the intriguing mysteries that AI loves to comprehend but can’t quite grasp. It’s a snack break in her journey. Keep the answers quick but the safety belt sign is also off. Let’s explore more about who you are and what makes you tick. One, what’s the first thing you ever remember being proud of?
I played music as a child. There were a few times when I was proud of being able to complete a very simple Bach piece as a kid on the piano. I wiggled my fingertips.
We didn’t get to cover that much at all but you’re a percussionist. You played it at the orchestra level. Question number two, what do you need help with that you wish you didn’t?
I love to write. I find it immensely fulfilling but it takes so much longer. Every sentence in my 781-page book has been edited at least ten times. If I could speed that up, I would love to have that.
Question number three, what do others often look to you for help with?
I teach at Stanford. Everyone in my class is like, “How do I get an A?” I’m teaching Fourier Transforms. They’re going to say they want to learn from me on those kinds of stuff or Computational Symbolic Mathematics. I teach a course on using Mathematica to solve. As a teacher, I could go on for all the courses I’ve ever taught.
What I can provide to a small group of people is how to integrate science and the arts because I’ve been working on this for decades. It’s not obvious. When it’s appropriate, how you talk, the different languages, how you answer questions that care to art scholars in ways that maybe the computer scientists can’t, and things like that. That’s how I got in here.
Question number four, what do you treasure most about your human abilities?
The ability to look at a painting for sometimes hours and see, understand, be fulfilled, be enriched and be challenged more. It all has to do something with doing with my eyes, vision, and seeing.
Question number five, throughout your whole life, what is the most consistent thing about you?
Fascination with the problem of seeing recognition, pattern recognition, and the mystery of how we do this. Many people have had this little epiphany at age seven like, “How do I recognize that’s a car? There’s an image of a car on my retina but that’s not a car. Who’s looking at that?” No, that doesn’t work. Once you confront that like, “If it’s not that, then what,” you have to work in this field. It grabs you. That appears in all of my technical work, starting from an undergraduate at some level.
Question number six, throughout your whole life, what has changed the most?
I am a great fan and follower of Steven Pinker, who’s shown that everything’s getting better. Violence has gone down. Freedom has gone up, freedom of speech, and all these kinds of things. I’m glad that those trends are continuing. I’m disappointed that not enough people are as optimistic as I am about the future, including the hard problems like global warming, authoritarianism in the US, and so forth. I’m incredibly optimistic. I’m not sure that answers your question. Things are getting better.Things are getting better. Click To Tweet
Question number seven, what do you find strangest about reality?
It was Hermann Weyl who said, “The unreasonable effectiveness of Mathematics and describing the real world.” We can make sense of it. We keep learning more. It makes sense that we’re getting a coherent picture. Yes, they’re hard problems like string theory, dark matter, cognition, and how I see red kinds of things. There are plenty of unsolved problems. Where did life come from? This inexorable application of scientific methods toward increasing our knowledge is amazing that it works.
I taught a little bit of statistics myself. I remember teaching about normal distribution. There is that bump and there’s an equation for that bump. It’s got pie. I’m teaching these undergrads in Psychology and I can tell they’re not that impressed but I’m like, “Come on. Wouldn’t that be crazy if you discovered that there’s a normal distribution of things statistically happening in these numbers?”
It happens with many things like IQ.
A few more questions on this list. Number eight, when, most recently, do you remember feeling alive?
It’s when I drank my orange juice. Looking out on this incredible vista, I feel alive all the time, except when I’m asleep.
Question number nine, what do you think is your most unique trait?
My thesis advisor said, “Your greatest asset is your greatest weakness. You have very broad intellectual interests and you alluded to it.” I’ve had faculty positions in ten different departments or programs. That’s rare. The fellow of the Optical Society and SBAE is a wide range. It’s the breadth and not shallow. It’s finding the problems that are deep that require understanding from disparate fields.
If you want to talk about the meaning of paintings, you got to know Art History. You have to have studied Art History. I have this many books on art history at home. I’ve spent tens of thousands of hours in museums, as well as all the computer stuff. It’s not, “A little bit of this.” It’s a lot of this that requires knowledge from around. I know other people like this but I would say that’s one of my good aspects.
Question number ten, if you weren’t human, what would you be?
I’m a SEAL plane pilot and glider pilot and I love soaring so some sort of hawk.
That concludes our AI Wants To Know segment. The next segment is AI Leaders and Influence. This allows you to highlight some leading individuals, projects, and organizations that might influence you. Could you tell us maybe leaders in the world of AI, whom you’d love to see come on the show?
It’s more the projects than the individuals. There are names associated with them but you have a whole group. OpenAI, Google, Facebook, and all these groups have very strong, powerful, interesting, and productive leaders and groups. I am most interested in images and image analysis. Fei-Fei Li from Stanford was one of the first who collected these massive data sets that made these things available. I won’t go through all of them with the who’s who.
It becomes political. Some people haven’t realized how much has been going on behind the season at places like Google, Facebook, and Amazon. I remember when I was studying, I was surprised but I thought it was interesting seeing some of the students I knew at Columbia who were studying Theoretical Neuroscience or something going on to work for DeepMind. DeepMind got acquired by Google.
One of my students worked there.
It’s been developing for a long time. Segment number four is AI Resource List. This is a chance for you to share resources you might utilize in AI. You’re very deep into technical stuff. It could be websites, applications, books, podcasts, or learning tools. Do you have any ideas that come to mind?
There are some good online courses, very good books, and open software that you can use. Anything that I use would be too specific to my interest in art analysis for the average reader.
Tell us then. Let’s say I’m interested in getting into what you’re doing. What would be a book you’d recommend?
It’s my book, Pixels And Paintings: Foundations Of Computer-assisted Connoisseurship. I have 50 papers on computer analysis of art. My conference is called CVAA, Computer Vision and Analysis of Art. Look at the papers there. There are a couple of groups around the world that are doing very interesting work, z like Ahmed Elgammal at Rutgers, Ingrid Daubechies at Duke, and a few others in Europe and Germany. That’s if you want to work on art. More people should do it.
Before meeting you, I wasn’t very aware of this whole field. There are a lot of interesting things going on. A lot of research that’s led by interest falls back into all these practical applications, I can imagine.
I should say how I got interested in using my deep interest in pattern recognition and then applied it to art. I come from a family steeped in the arts. I won’t go through them all but my great-grandfather was a court painter to crown Prince Rudolf in Austria down to my little sister who was chief calligrapher in the White House under Bill Clinton and lots of artists in between.
I was more interested in science and how you put those together. Around the year 2000, a very famous British American artist, David Hockney, came up with a bold and very controversial theory that some Renaissance painters secretly projected images onto their canvas, traced them, and then filled in paint. I was invited to do some technical analysis at a very large two-day symposium at the New York Institute for the Humanities.
I came with an open mind and thought, “This is interesting,” and then analysis after analysis simulation. All the arguments evaporated. Art scholars came up to me and said, “This is fascinating. I didn’t know you could do this. Have you thought about this? What about this? I have this problem.” That’s how it expanded. That theory is dead as a doornail that Jan van Eyck secretly built a projector to do the Arnolfini Portrait. It’s not true but it led to the development of all these techniques, especially in lighting, perspective and optical analysis, and so forth. That’s what my book is all about.
The last segment is AI Tips. Any more cool ways you use AI or you see people using, we might not have explored, things that people may not have realized are possible or available through AI, or anything like that?
I’ll start with the negative. I prevent my students from using ChatGPT on their exams, any papers, or anything like this. Some professors say, “You can use it. Just let me know.” I want people to express things in their ways. There’s more prohibition than how to use it. We’re using it all the time. Call centers, what gets mailed to you, and what advertisements get served are all AI. In movies, it’s everywhere. All those thousands of warriors who are battling on the field are all done with generative AI. It’s everywhere. I have no other special things to impart, I’m afraid.
You raised an interesting tip. In a previous conversation, there are multiple sides to this that you could highlight. There is a value even if you can have someone or something else do something for you, learning how to do it.
This is a very interesting thing because I teach a course at Stanford called Computational Symbolic Mathematics. It’s using the computer to do the calculations that allow us during class to focus on how you pose a problem versus how you do the calculations to get the final answer, like Calculus. There are all these integrations by parts and partial fractions, the substitutions trigger, and the whole host of techniques. More and more, we can let a computer do that like calculation. The question is, “How much do we benefit by spending all this time and learning to do a technique that the computer will do?”
I got two minds on this. Yes, if it were no cost but you should learn how to do this. At the expense of other things, given that you have finite time, it’s not so obvious anymore. You have kinds of problems that you can address by using symbolic mathematics. I use Mathematica, which is a superb software framework for computing. The kinds of mathematical problems you can solve like the symbolic ones, not the numeric ones, are astounding.
If I had this when I was an undergraduate, I’d probably be doing General Relativity. I took General Relativity Gravitation and Cosmology in grad school. I would do pages of calculation. On page three, I missed a minus sign where there was a factor of two somewhere. It would take hours trying to find where that error is. Once you know that the answer should be this, you have some but now you do it with symbolic mathematics. It allows you to focus on, “What do the equations mean? Let me try different things.”
For instance, when I’m teaching my course at Stanford on Fourier Transforms, many of the mathematical steps will be done by computer. In other words, Fourier Transforms bracket this function. “Let’s understand this and look at the limits. Let’s try how it depends on this parameter.” You don’t have to do it all by hand. At least those students will have gone through calculation by hand so it’s not quite a fair comparison but more and more Mathematics is going to be done symbolically on a computer like dividing large numbers. You don’t do that by hand anymore.
When we say symbolically, it’s graphs and images that help explain that concept.
It’s not necessarily graphs but what’s the integral of the sign of X squared? You can’t do this numerically but you can do it symbolically. There are algorithms that know how to do the transformations to get rid of these integral signs and get a final symbolic answer or differential equations. The second derivative of F with respect to X, plus X timed the first derivative. Find F of X.
You can’t do that numerically but you can do it symbolically. When I first saw that, I knew exactly where I was sitting in grad school when someone first showed me this and it was magic to me. You think of computers dealing with numbers but this is dealing with symbols. This is a long time ago but it’s an extraordinary development. I love this stuff.
I’ll summarize this. You can correct me if wrong but as AI evolves, be very intentional about what you want to leave to the computer and what you want to learn. Don’t ignore either side of it. Don’t ignore the fact that you could gain a much deeper insight into something by letting the computer assist you. Don’t necessarily write your paper with ChatGPT because you’re going to miss out on developing your space. The final question here is, where can the readers learn more about you and the projects you’re working on?
I have a Wikipedia page, which is not very up-to-date. Go to Google Scholar and download my papers, Academia.edu, or any of those places. Also, my conference, CVAA or Computer Vision Analysis of Art. Wait for my book to go out in September 2023.
It’s time for another safe landing here. On behalf of our guest and the entire crew, I’d like to thank you for choosing to voyage with us. We wish you a safe and enjoyable continuation of your journey. When you come back aboard, make sure to bring a friend. Our starship is always ready for more adventures. Head to Spotify or iTunes. Rate us and share your thoughts.
Your support and feedback mean the world to us. Don’t forget to visit EdgeOfAI.xyz to learn more. Connect with us on all major social platforms by typing @Edgeof_AI. Enjoy the exciting conversations happening online. Before we set off, mark your calendars for our next voyage, where we’ll continue to unravel the mysteries and advancements in AI. Until then. We’ll see you next time.