Hello and welcome to deep learning applications in computer vision. My name is Joanna Fleming and I'm a professor at University of Colorado Boulder. Together, we will explore deep learning applications in computer vision. But first, in Lecture 1, we will just do an introduction in computer vision. What is computer vision? What does the field concern itself with? Then we're going to dive a little bit into one of the tasks that computer vision is trying to solve. Recognition. Enjoy. This is our first lecture and we're going to jump right in. You've probably been asking yourself, what is computer vision? Of course, you asked yourself when you decided to take this course. What is computer vision? What am I going to learn about? Well, computer vision is a hot area of research. It has been around since 1960s seems like a long time ago. Computer vision is focused on analyzing images, finding meaning in images. We are going to actually write this down, and we're going to say, we want to analyze digital images. We're going to underline that for a second. What does it mean digital images? Digital images could be, for example, just one image. It's really important to look at all the things that computer vision looks at. It could be a sequence of images, which basically that could be a video. Video is a sequence of images. It could be multiple views of a scene. Imagine, for example, that you have a scene and you either have views from multiple cameras, from multiple viewpoints, or you have one camera that you move around in your scene and you take multiple pictures. If you can think that a sequence of images or a video, it's like a multidimensional data. We can also actually spell it out that it could actually be multidimensional data, but from a different source. We could have it from like a 3D scanner for example, or we could have it from a medical imaging system, like a CT or an MRI medical imaging system, a medical scanning device. You can think about it that way. Many ways to acquire digital images and to look at data from one image or from multiple images. Another thing that we're going to write here on this first page, because it's important to remember that computer vision is an interdisciplinary field. Sometimes you have to work with people from other disciplines, like in the example with the medical images. If I'm trying to write an algorithm that needs to detect a certain thing or certain condition, I'm going to need the expertise of doctors or biologists to figure out how to write my algorithm. Now, I wrote here as a summary that computer vision is focused on extracting information from images. But some people think about computer vision as the entire pipeline all the way from acquiring, processing, pre-processing, analyzing, interpreting digital images, it's all computer vision. Now, a question I get a lot, are we trying to write computer vision algorithms that mimic what our human eyes and our human visual cortex can do. Well, it's a yes and no here. On one hand I can tell you that we are trying to learn from what the human visual cortex does it's a pun here, but basically we are introducing learning to mimic how we learn as humans. As a human, you look at an object and you recognize it because you have seen it multiple times. If it's a new one, you quickly make a mental note that that is a X, new turtle that you haven't seen before, for example. This learning that humans do, the way they remember things they have seen, it comes not just from your experiences from birth but comes from genetics. It comes from your parents, it comes from your ancestors. We are able, if we encounter a familiar person or object to only see a small portion, a glimpse of that person, a glimpse of that object. Our brain fills in the rest and we know exactly that we are looking at that familiar object. Unfortunately, even though computer vision is one of the hottest areas in computer science and artificial intelligence research. It cannot yet compete with the power of the human eye. We'll go to the areas of interest, sub-disciplines of computer vision. But I want to touch on one more thing. When we hear the word machine vision, when we talk about machine vision, we basically talk about computer algorithms that extract information from images. But what makes the machine vision system? What it is? Is that we're going to use this information extracted from images to make decisions. We have the image, we have the next thing which is what is the output of the computer vision algorithm and that is used to basically make yes or no decision. If this do that, if not, do something else. That's what basically powers machine vision. Now, we said we're going to look at areas of interests. What problems is computer vision trying to solve? The one we're going to talk about today, probably the first task, the first sub-domain of computer vision is recognition. Recognition, there's a car in this image going to talk about this a little more in a second. Let's look at a few others. Let's see. Tracking an object. Tracking, segmenting, figuring out which portion, what pixels in an image belong to a certain object and then looking at the same thing in the next image, in the next image in a video. Basically segmenting the object and then tracking that object from image to image throughout the sequence. Also sometimes with it, we have motion estimation and motion analysis. If I am running and, self-driving car, then I have the system that basically needs to analyze the objects that I see in every frame that comes from my cameras, actually multiple cameras not just one. Then I need to figure out what speed to these objects have in a way. Am I coming close to crashing into something? Is someone coming? Someone, some object coming towards me. Estimation of motion and analysis of motion, that's really cool. Another area of interest here we're going to talk about maybe pose estimation. There is a person in this image, there are multiple people in this image, but are they running? Are they playing soccer? Are they dancing? That's part of pose estimation. Also sometimes you see that as recognition of a certain activity, so not necessarily an object, but also an activity. Let's see other things, other sub-domains of computer vision. Scene reconstruction. Can we take a drone and flying it around the scene and take multiple images from multiple viewpoints, and use all the data from these images to basically figure out the 3D reconstruction of a scene? Also with it here, we also have 3D scene modeling. If I fly my drone again around some old church in France and I take pictures all around it from all directions at all heightened levels. Or maybe there is some sculpture here the Venus de Milo and I take my cameras and run it all around it. Can I come up with a 3D model of that church, of that sculpture? Then can I do it again in 10 years and find out how much that object has maybe degraded, how it has changed from image to image. Another area, visual servoing, also involving drones. Can I take a drone with a camera or two and fly it around an area and basically do surveillance? But not surveillance necessarily, as in do I see a certain person in those images or do I detect a certain object? But rather, can I build a map of that area as I am surveying the zone? Can I build a map and can I also at the same time know where my drone, my camera is localized on that map? So that I keep building the map in multiple directions as I keep flying and surveying an area that's actually a really hard task. We're going to add one more here that maybe just maybe you haven't heard of, but you weren't thinking about it. Image restoration. If I have some old images and maybe have lost some information, lost some quality, can I use computer vision algorithms to restore that information? Lots of cool domains for computer vision and we're going to look at first one of them, recognition, and we're going to look at some applications. Recognition in a nutshell is asking yourself the question, does my image contain a certain something? That could be a certain object, it could be a certain feature in like we were mentioning, it would be a certain activity, and we'll give you some examples. Let's say we're going to look at an image and by look, I mean, analyze it with a computer vision algorithm. We want to find out first, is that a car in this image? If we look at the recognition and classification, our algorithm is going to spit a yes. Yes, there is an object of class car, that is recognition and classification. If you want to go a little further and you want to go to identification, then I'd like to be able to say, yes, there is a car in this image and it is a Tura Sienna van. Even more than that can I localize the Tura Sienna van in the image. Can I say that? Look right here and I'm going to put a bounding box around my object. There is, that's my van. Recognizing that there is a car, an object of the class car in the image. Going further and saying that car, easy to Tura Sienna van and this is the box that basically surrounds the boundaries of this object. Now, a different area of recognition if you can think about it is the detection, and if you think about detection, I'm going to give you an example. Let's say I want to see if I detect in an image a certain type of cell or a certain condition, does it exist? Do I detect it in this image? Usually detection is faster because it often relies on finding just a fragment of the entire object. So recognition, classification, going further, identify and localize, and can I detect a certain type of object, specific objects, specific condition. Here are some applications. Many of them I'm sure you're very familiar with, for example, face detection. You see a lot of applications of face detection. The picture I included is from a paper that I will post this reading. The Viola and Jones algorithm from 2001. A lot has been improved on in face detection since then, but that is still a good paper that we should probably study. Another type of recognition, optical character recognition, is by recognizing digits and characters from actually multiple alphabets. It doesn't have to be our Arabic numerals for example. A lot of applications of OCR. One of them is included here, a reading, recognizing license plates, writing in speed camera images, for example. Now let's look at something a little bit different. Another recognition task, Smile detection. Well, smile detection is actually very, very interesting field, very interesting application here. I included a link to the reading. Can artificial intelligence recognize facial, human emotions? One way people think about it is they think of smile detection as a sort of form of pose detection if I look at the position of the eyes, the eyebrows, frowns in the face, the shape of the mouth, those are all that we call pose. Now, why would anybody want to detect smiles? You would think, well, it's a special task in facial expression analysis that has a lot of good important potential applications. One of them imagine you took a whole batch of photos but you only want to keep the ones with everyone smiling or when everyone, in general, has a positive emotion. Would you think about an application in analyzing user experience? You visited my museum, you stopped in front of these two exhibits, but this one gives you a more positive emotional response, and I can do that with all the people passing in front of the two exhibits and I can draw some conclusions about how successful my exhibits were. Another one that you might not think about it much might be patient monitoring. I'm going to talk about photo selection, analyzing user experience. Patient monitoring. If my patient shows positive emotions, my patient is relaxed, is smiling, is laughing, I can draw conclusion either in telemedicine or in some observations in hospital that my patient is feeling better. Very cool applications. Cool applications of smile detection. Here's another application of object recognition in supermarkets. Not only in the Amazon Go stores but the Amazon Go stores are a poster child for this application. Can I have a store where I do not need staff? All I need is a lot of cameras equipped with advanced computer vision algorithms. As customers come in, I can track them. I can track the customers. I can track the objects from shelves that they pick up. I can register whether they put them in their cart or whether they put them back on the shelf. As they leave the store, I know exactly now what is in their shopping cart. I know automatically what to charge them to their account and they can go home with peace of mind that they have paid for exactly what they bought. It's still a work in progress, but it's making inroads and we're going to see it, I'm sure, more and more. Now this is a very interesting story that I absolutely had to share it with you. On the left side, there's a cover of a National Geographic Magazine from June 1985. This photo was taken in 1984. Award-winning photographer Steve McCurry took the photo in 1984, photographed a 12-year-old girl in a refugee camp in Pakistan. She instantly became one of the world's most famous, most recognizable faces, yet no one knew who she was. Her image appeared on magazines, on books, on posters, on merchandise. She had no idea about it. The photographer was so intrigued about how this image of this girl just seized the imagination of so many people around the world. He credited the success of the photo to the girl's eyes. They were so expressive. You could see the pain, but also resilience and strength, and of course beauty. Steve McCurry wanted to find the girl. He basically wanted to give credit for the success of the photo. It turns out it took him 18 years to find her. He found the Afghan girl, and he absolutely certain that this was the same person, but he needed some way to prove it. It turned out that he turned to computer vision. He basically turned to the specialists, the software that existed at that time. It's called iris scanning technology. It's one of the face recognition techniques used by the US FBI. Automatic iris recognition. I included a link there where you can go to John Daugman's page. He was the one who performed the iris scanning technology, and you will find both the scans and information about the algorithm behind them. It's really a fascinating story. Here's another form of recognition, automatic iris recognition. Looking at the patterns, lines in the iris of a person. Absolutely fascinating. One more thing I wanted to bring to your attention. I'm sure you're already familiar with many application of recognitions in mobile apps. I'll give you one example that does multiple forms of recognition. It definitely can identify text and characters. Google Lens, we're talking about the Google Lens app. Not only can it identify text, but it does something really cool. It has a translation of the text if needed. So if you find yourself in a foreign country, you can point your camera to the menu of a restaurant and the app will overlay to English translation on it, which I thought is really, really cool. But it doesn't recognize just texts, just characters. It can also identify a plant or an animal. You'll find yourself on a hike and you saw some new plant that you're wondering what it is, you can point your phone to it and Google Lens will do their best to find what plant are you looking at or what kind of dog you saw at the park the other day. There are many, many more applications of computer vision recognition, and we'll talk about more subdomains of computer vision next slide. This concludes our first lecture. We learned about computer vision as a field of study, and we took a closer look at the task of recognition, and some of its applications. Some of the applications of recognition are things that we encounter and are exposed to every day in our life. Join us next time as we take a look at motion analysis and scene recognition. Thank you.