Even if you have a great data request, the data requires some work. It's never as clean as you want, as organized as you want. That's not a surprise. Here you can see the framework that we talked about, that data by itself is useless. It's like going to the grocery store and just buying a bunch of groceries. That by itself is useless. Where we're getting to our insights and recommendations, doing something with that, at the very beginning of this process, we are collecting the data, cleansing it, and organizing it. I'd say first, make sure that you have enough of the right data to either check mark, prove your hypothesis or x, disprove your hypothesis. Make sure you have enough of it to do the job. The job is determining whether or not your original idea made sense or not. The worst case is, I don't know. The worst case is, huh, I better go back and ask for more data. Huh, I don't know what I have. That's not a very good job of shopping for ingredients. I would like to use the analogy of cooking. This is really the preparation step. For those of you who are foodies and like food and cooking and eating, this is called "mise en place". Getting everything ready, putting it in its place in French. "Mise en place", the food version means you have all the pots and pans and oils and spices, and you cut and clean and you measure everything and it's ready to go. You've maybe seen cooking shows where everything is in little bitty bowls, perfectly measured out. All you need to do, what's left is cook. That's what we're trying to do but the data version of that. On the bottom, what you'll see is making sure that you have the right quantity and quality of data, reformatting it so it's easy to work, and then testing it out a little. See what you got, exploring. This is where you being a little bit curious, you being a little bit rigorous, you being a little bit more professional will save you a lot of drama later on. Why is this important? You can see here, read it at the bottom. Would you start cooking a fancy dinner without checking the ingredients? Of course not. If your boss was coming over to dinner or somebody that you really respect or somebody that you're going on a date with, you are going to check and double-check those ingredients to make sure what you have, you have enough of them and they're fresh. That's what we're doing right now. I'm going to list out many things. I am not going to explain how you do each one. There are multiple Coursera courses on this, on Excel fluency, on data analysis, things like that. As a consultant, I will say if there are things that I mentioned over the next three minutes that seem foreign or you're not quite sure, do some research. One of my favorite websites is Exceljet, E-X-C-E-L-J-E-T. They have many free videos that explain this in three or four minutes each. Wonderful resource. I'm going to run through this somewhat quickly to give you an idea of what "mise en place" looks like with data. First, understand the boundaries. How big is this dataset? How many values? What's the smallest, biggest average? What's an outlier? Make sure that if you have numbers that you can calculate them, sometimes when you import data, it comes as text. When you try to add up the column, it doesn't work. Reformat it so it's saying exactly what you want. One example for me as a teacher at Emory, I download the roster, the list of my students, and it lists them last name, comma, first name. For me, that's not really useful because when I call all my students, I typically call on them by their first name. Having it in that format, it's not great, so I use some Excel to change that. Reformat it into the way that makes the most sense for you. Take out outliers. Let's say you have a list of students and let's say their age as an example. It wouldn't make sense if the age of one of the students was 250 years old. That's impossible. Nobody that I know of, lives to 250 years. That dataset is wrong. I need to take that dataset out or else the average or any other calculations that I do, the distribution is going to be wrong. Quick tip; whenever you start doing "mise en place" and slicing and dicing and cleaning the data, always keep the original somewhere separate and safe. Because you want to be able to audit what you did and what you changed. Also, if you take out that outlier, the 250 year-old student, you take the dataset out, take it out and put it somewhere in a separate tab, call it outliers so you can remind yourself what you took out and why. Then finally, if there's any calculation errors, you clearly need to fix those. A couple of more. Wow, these are my favorite. If you're not good at Excel. If I ask you, what is your level of Excel fluency from 1-10? Ten is like a Jedi expert. One is like a baby. Don't know anything. Even if you're at a five, just middle of Excel fluency, you should know pivot filter and sort. You should be able to use a pivot table all day, every day. That's a five out of 10 and consulting world. If me saying that makes you feel a little uncomfortable, that's okay. You just need to practice. Make sure that you have clear data definitions so that once you start connecting different datasets, they call it cross walking. You're connecting different datasets with the key. You're very clear of what's happening. Two last thoughts. Keep going. It's not easy to know when you should stop. One, think about how much time you have. Number 2, think about where you are in the project. Is this at the very beginning where you have more time later? Or is this a critical analysis that has to be done to perfection by tomorrow? A lot of that will help you to gauge at how much you should continue tweaking, iterating, and continuously improving this thing. Then as I said earlier, want to make sure that you save the original. One exercise I sometimes do when we do consulting, training and stuff like that. How would you describe the data collection process using these cooking analogies? This is what I would say. You need to know what data you're looking for, who to ask the data for. If you can't find exactly what you need be able to pivot and find similar things, and then as necessary go back to the original person and ask again. You need to validate what you have, that you have enough quantity and quality of data to prove or not prove your hypothesis. Then some cleaning, some reformatting, some cross walking, some slicing and dicing of the data so that you are ready to cook, to do analysis. Some key takeaways. What I would say is for new hires, everything that I have listed here, you need to know how to do it to the point where you could teach it to somebody else. Hands down, get really good at this. Excel jet is a great resource. But if you Google any one of these commands, you can watch a five-minute video at 1.5 speed that will explain everything to you. Always take a step back and think big picture, what are we trying to do? One very smart, crafty, clever way to do that is look at old deliverables. What are the analyses that were used on a previous project that was very similar? Then once you know that, it's like looking at the picture in a recipe book. Now you know what that cake should look like, and you can reverse engineer the different data you need to make it happen. Also, as a new person, don't be afraid to ask your manager. Now, don't ask your manager one question, two question like that. You don't want to bother them every 10 minutes for the whole day. Group your questions. Do the work, write it down, do the work, write it down. Then once you have four or five questions or you get stuck, then ask your manager to help you get out of that mess. The last one I would say is consulting is a lot about communications. Communication is not just with the client, it's also with your manager. If your manager likes you, if your manager thinks you do great work, your life will be better. I'm telling you. Share your findings when you do an analysis or you clean some data and you find out more about it, share it with your manager. If there are issues escalated, ring the fire bell as soon as possible. The reason is your manager wants you to be successful. But if you're struggling and you don't help your manager know that, then shame on you and shame on them. We want to work smarter and lazier, and we also want to reduce drama. The last thing you want to do is find out at the end of the project that something was wrong with the data that you had. That's almost like baking a cake and then later on finding out that the milk that you put into the cake was spoiled. A lot of effort and it's going to taste terrible. In conclusion, when you think about data cleansing, just think about it like cooking a recipe for food. Keep it simple, make sure you use common sense and continually make the process better and more efficient going forward.