In this module, we'll talk about Sessions and this particular video we'll talk about Session State. Sessions play a huge role in authentication and authorization when you're talking about things on the web or mobile applications. Now one of the problems is that servers really want to be stateless. This may not make sense to those of you who haven't worked in them. But it is absolutely one of the great standards that we all strive for. A lot of it has to do with being able to follow through and test everything very effectively and ensure stability, especially if you have higher loads. The problem is of course, that human activities are stateful. You have a subscription to a magazine you expect each time you visit the magazine should be available and it should remember that you subscribed and if your subscription expires, well, it will remember if you renew. Similarly, a shopping cart collects the items you wish to buy, and that's stateful because it remembers all that. Once you perform the actual purchase, it empties the shopping cart so you can buy different stuff. The role of a session is to capture the state. One way of thinking about it is that the actual front end service doesn't maintain state, but their back-end supports such as a database that do maintain the state. What happens is the session state incorporates state elements for the shopping cart or subscription. In a sense, the session captures who you are and what you may do and what you been doing. There are session frameworks built into a lot of software development environments for the web and for mobile environments. They all have certain basic features to them, which is what we'll focus on here. First of all, there's this notion of the anonymous session. You come in, you start using a site. Typically, it will create a session ID for you, even if you don't log in or anything. Now sometimes these are called session IDs, sometimes session tokens. The point is whenever the user client contacts the server, it provides its session ID so that the server can tie that client with whatever activity they've been performing prior to that. For example, an anonymous user could have a language preference, font-size preference. Maybe they collect things in a shopping cart or a wish list, even if they don't log in. Authenticated Sessions. Now that ties of user ID reliably to a session ID. Then of course, as a user ID ties to a distinct individual who's supposed to be uniquely identifiable within the system, within the application system. In a sense, once you've tied this user ID to a session ID, the session ID becomes an authentication secret and of course, what that means is it has to be kept secret. Because session ID ties to your identity and it gets used as a parameter whether you're using an API or going through HTTP. Session IDs mustn't be sniffed or intercepted because then people can take over your session and they have to be impractical to guess. If you think about it, that means, well, better be a big number. We talk about it should be cryptographically random and at least 64 bits of entropy. Remember, we talked about entropy in one of the earlier courses in this specialization. Also, there's the hijacking risk, which we call sidejacking, which is if somebody has your session ID, they might be able to use it to masquerade as you or at least sniff on what it is you're doing. Something I had mentioned in the previous module is that up until around 2010 or so, Facebook did not use any sort of encryption to protect data transfer between browsers and the server. If you sat in, for example, a coffee shop where a lot of people were using the internet and Facebook and such. Well, back then, a typical coffee shop would not have encryption on the point-to-point connections between users. If you had the right software, you could essentially grab somebody's session ID from Facebook as they were sending their packets back and forth and then essentially eavesdrop on their Facebook session and see who they talked to. This completely bypasses any privacy settings they might have because literally, you are masquerading as the victim who's session ID you took. How do we create a session ID? Typically they're formatted in the name equals value structure so you'll have some name for session ID, maybe it's just the word ID. A lot of the frameworks that are already predefined, but if you can select a different identifier for the session ID, that's probably useful. It makes it a little bit harder for attackers to figure out what they're looking for, and of course, the value part is the actual ID. Now, contents of the ID, it's a field of bits, you can think of it as a number, you can think of it as just opaque bits but you don't want to embed sensitive data in it. So don't put permissions in there, user IDs, or personally identifiable information. Make sure it's unique and randomly chosen because you don't want to have any duplicate session IDs at a given time. The recommendation is it should be at least 128 bits in length, and remember, we talked about at least 64 bits of entropy in there and use cryptographically secure random number generator. Now, that I actually talked about cryptographically secure versus other random number generators, most random number generators or programming environments, for example, let's say you're using Excel and you tell Excel to give you a random number. It has what we might call a statistically random number generator in there, which means that they ran some experiments and found that statistically the numbers are random. On the other hand, that doesn't prove that there's not some lower level of pattern in there. Those are the patterns that crypto attackers go after. Even though it's statistically random, that doesn't protect it against guessing attacks by cryptographically knowledgeable attackers. You want in crypto secure a random number generator, that's choosing your identifiers. Now, sometimes these cryptographically secure generators are provided by the operating system. You have to check and make sure what it is you're using. Permissive versus strict. Now, when you create a session ID, first question who creates it? If it's the server, that's one thing. The server is essentially going to be managing the session anyway, ultimately, and then there's the client. Now, in a permissive session, the server really doesn't care where the session ID comes from, it will take whatever session ID the client provides. If it doesn't match to an active session, it creates a new one. There's also what we call strict sessions as opposed to permissive sessions. They're a bit more secure in that the server will only accept an existing session ID that it generated. The idea there is, you have a much stronger control over what becomes a session ID, and you can assure yourself that they're not going to be chosen in a way that they're easy to guess. It's also recommended, don't mix both types on a single side, if you're going to do permissive sessions, don't pretend you do, well, strict sessions as well. How does this server process of a session ID? Well, first of all, it's information provided by the client, so you have to assume it's hostile. Now, you might say, well, okay, I've written my client software, so it's a very secure. Well, that's nice but that doesn't take into account the fact that an attacker could sniff your data transactions, figure out what types of messages you're sending, and send their own, with their own contents in them, so you assume it's hostile until a good first step is to assume it's got a standard data formats, we're going to have 128 bit number. It is literally a number, so we can treat it as a number. One way of doing this is, do some kind of conversion on it to make it into a chain from one number format to another, and then start using it. For example, you might say, it's going to be provided to me in base64, and then I'm going to shrink it down into a row bit field. If you have non-base64 data in there, you can throw that out, or non hexadecimal. The idea there is you're getting rid of risks of cross-site scripting, or command injection. Now, there's another point which is, you don't want to use the same session ID constantly. You want to renew the session ID, replace it with a new value, essentially, whenever an authentication changes, or when ever privileges change. If Bob logs in and Bob logs out again, and then he logs in again or tries to login again with the same session ID, you generate a new one for that new login. Now this makes it harder to transfer state that belonged to a different user to the next user. That's essentially the point of doing a renewal. Now, this also assumes that you're going to have a way of managing authentications in this context because sometimes you'll want the renewal to discard the previous authentication and reauthenticate, sometimes you won't. We may be talking about that a bit in a moment. Session management, it should be easy to share the IDs between the client and the server. This is essentially the issue of when the client and the server send the ID information back and forth, so what do we need in order to make sure that the session management works well? Sniff proof ID sharing, like with SSL. Sessions probably have access control information associated with them all almost certainly because they usually have a user ID associated if the person's done authentication. Also, there should be an expiration date and time. Or some way of determining that the session will expire after some period of time. The typical practice now is actually to say, max age, say how long you want that session to last. Then that way when the session runs out, you reauthenticate or you renew whatever needs to be, whatever your policy says. Then point is that a lot of these controls are enforced by the server, not the client. How do we share session IDs and properties between client and server? One approach is cookies, on approach is URL parameters. In other words, you just essentially add it onto the URL, and other approach is URL arguments in a GET request, which cookies are basically arguments, although they tend to be in the lower part, so they don't actually show up on the URL. Then body arguments, say on a post operation. Also you can use hidden form fields, proprietary HTTP headers and so on. Now, I went off of that quickly, but as a practical matter, people tend to use cookies. Now let's talk about stealing a session. Session fixation is the notion of stealing a session by essentially giving the client a session ID to use. The attacker creates a session ID, gives it to the client, and then the client sends it to the server. This only worked with permissive IDs. That way the attacker already has the ID, provides it to the victim, and the victim sends it to the server, and the server makes it into its session ID. That's one way. The other way aside jacking, which is actually intercepting or guessing the session ID, like the Facebook example from earlier. SSL, TLS for sessions. Now, this gets down to the question of, well, if you're using SSL and you're encrypting your connections, must you encrypt everything? The answer is, it's the safest thing to do. If you need to encrypt anything, it's best to encrypt everything. A lot of this has to do with the fact that if you have some material on plaintext and some in ciphertext, there's a risk that you might get switched from encrypted to plaintext without realizing it, and share the session ID across one of the plaintext connections. That's one of the reasons we talk about HTTP strict transport security, which says, "Everything we do on their server is encrypted." Now, plaintext may be a little more efficient, but you essentially pay for it in a lot of security risks. As I mentioned earlier, cookies are the preferred way of sharing session IDs. Now, using URLs instead of cookies it's a danger because people want to save and share URLs. They'll stay in the middle bookmarks, send them through an e-mail. Also another advantage is the cookie design that's used in modern browsers provides additional attributes. So you can have the expiration time. You can have access control detail. You can put a user ID in there, or at least associated with it. Remember, this is cases where the server sends the client essentially information the server has determined and the server is not going to use those cookie details on the return path. It's not that the client is telling the server, "This is when I want this to expire." But the server tells the client.