In this tutorial, we're going to get into really interesting and managerially relevant issues, such as, where does money come from, and how is revenue distributed across segments. So the first thing I'm going to do, as always, I'm going to execute everything we've been up to, which is line 140. That's where we are. We are going to run a select segment of a new kind. Here it is. We're going to select, for each customer, how much purchase has [INAUDIBLE] made in 2015? We'll do that, we name that value, revenue 2015, it's just a quick name we're going to give it. We extract the data from Data. And of course, because we are only interested in purchases made in 2015. We're only going to apply that computations on those purchase where the year of purchase has been set to 2015. And because we want that figure to be grouped, and specific to each customer, we are going to group by one, meaning we are going to group by the first variable that appears on that statement. Let's execute that, and we have generated a new data frames, called revenue 2015. Now, here is the issue. Revenue 2015 only contains 5,398 observations, or lines. Why so? Well simply because, and that's a regular issue with select statements. Believe me, if you deal with SQL statements, you'll have that issue over and over and over. Only those customers who have made at least one purchase in 2015 will appear in the output, and in this case, only 5,400 customers have made a purchase in 2015. Of course we'd like to compute revenues for everyone, not only those active customers, but that's a different story. We'll get to that. If you look at the summary of these figures, again the minimum of yearly purchase, per customer, is $5. It should be zero if we included everyone, but we only include those who've made a purchase in 2015. The average is 88, and the maximum is 4,500. So, how can we merge those customers from 2015, to their revenue? Well, the wrong way to do that, would be to do this. Actual equal, you merge those customers with the revenues that they generated. Except that the variable you've just created, actual, has exactly 5,398 observations. You've just erased from your data, or at least you are not taking into account, all those customers who haven't made any purchase. How can we make sure that customers who haven't made any purchase are still up there in our statistics? Well what we are going to do is to use the exact same function, merge, but we'll add one option here, all.X = true. This is the data frame x, this is the data frame y. So, when you say all x, it means in the output, any line that appears on the left should also appear in the output, in the final output. And by the way the reason merge does work here, is because in customers 2015 and revenue 2015, both data frames have Customer ID as a key. And so they can merge both data frames, and match each and every line from one data frame to the next. If we use that option, all.X = true, then basically, the actual data set here contains 18,417 observations. We have one line for each and every individual, in the original data set. Except that, in revenue 2015, either we have a value, either 80, 45, or we don't have any value, and we have N/A for not applicable. But obviously, these not applicable revenue are actually equal to 0. And so we are going to select from revenue 2015, all those which are NA that are a functioning R, and replace them with the value 0. Once we have that, the actual data set we are going to use, makes much more sense, for each and every customer, such as customer number ID 10. We have recency, first purchase, frequency, amount, the segment to which they belong, and how much revenue they have generated in 2015. You could compute the average revenue generated by each customer, grouped by segments, and that's actually what you're going to do here. We're going to compute an aggregate of the revenue in 2015, grouped by segment to which they belong to. And we're going to compute the mean of that revenue per segment. And as you can see here, by definition, if you are inactive, cold, warm, you haven't generated any revenue in 2015, because by decision you haven't purchase anything in 2015, otherwise, you wouldn't be qualified as warm, cold or inactive. But if you are active, you have made some purchases in 2015, depending on which segment you belong to. That value might be pretty low or rather high, if you're an active high value customer, on average, you've generated $323 of revenue, if you're a new active customer, on average almost 80. So, as a quick summary, we've computed the revenue generated in 2015, for each and every customer, which we saved in a data file called revenue 2015. Then we've merged our complete list of customers with the revenue generated by some of them, making sure that with that option, we would keep all the customers in the original data sets, on the extreme left. And then, we replaced all the NA values by 0. And computed the average of revenue in 2015, grouped by segment, and that's what we got. Well, the statistics we've run are interesting, but they are a bit obvious. Obviously, if you are an inactive customer, by definition, you haven't generated any revenue in the year you were inactive. What we'd like to do now, is to estimate how much money you can expect. How much revenue you can expect from an active customer today, next year. So, of course we don't know that. We don't' know the future, we don't know exactly what's going to happen, but the one thing we can do, is to go back in the past. And look at how much revenue we got from inactive customers in 2014, going into 2015. And that's the next step of this analysis. So what we'll do, is to merge the revenue generated in 2015, as before. But we're going to merge them with the customer list of 2014. And so we're going to look into, how much revenue's been generated by each customer, based on the segment they were in, a year ago. And that's why I call it forward. Forward, as the segment in 2014 will enlight us, about how much revenue have been generated in 2015 from these customers. I create the file, again, I replace all the NA by 0, and I show and compute the aggregate revenue for each. What does it means? It means, for instance here, that an inactive customer in 2014 has generated, on average, about $3 of revenue in 2015. Many of them have generated nothing, and remained inactive. Some of them became active again and spent some money. And on average, these two figures averaged to $3. Cold goes to 6. And look at that. A warm, high value customer generated an average, close to $114 in 2015. So, a year later, which is interesting, since a warm, high value customer is actually worth a lot more money than, for instance, a new active customer, who has just made one purchase, and might not remain active a year from now. And if you reorder and look at the data in a different angle, so what I'm going to do here is to reorder these figures, in decreasing order. And show you, which customers are the most profitable a year later. Active high value customers come first with 254, warm high value customers come second with 114, and look at that, the new active customers only come fourth with only an expected revenue of $31, coming in next year. And if you plot that, as you can see, all these are segments in 2014, and these bars represent how much revenue you could have expected from a customer in any of those segments a year later, in 2015. Now interestingly, losing one active high value customer is like losing six new active customers here. You've lost six times more money by losing the potential sales coming from one customer in that segment, compared to one customer in that one. And so, from a managerial perspective, it's really interesting to understand that a customer in the active high value segment, going in the next 12 months, is worth six times more, than say, a new active customer. So, in terms of resource allocation, and how much money you want to invest, in keeping good relationships with certain customers, depending on which segment they belong to, that kind of information is priceless.