Hi, there. My name is Jude Lancaster and I'm a Cybersecurity Specialist with IBM. Today, we're going to talk about a security information and event management system, and we'll talk about the concepts of it, some of the benefits of using one, a little bit about optimization, and the capabilities of a SIEM. We have three learning objectives today, and they really are to explore and understand the key terms of what a SIEM is, to explore the roles of SIEM on a network, and to understand the role of a SIEM in a moderate security operation center. So let's get started. System or security information event management system is really a data aggregator search and reporting system. It really takes a lot of information from your network environment and consolidates that and it makes that data available into a format that is easily accessible and easily readable by a human, and it categorizes that data, so it's all laid out at your fingertips to make things easier to understand. Here's some key terms around what we're going to talk about today as far as SIEM goes. So we'll talk about log collection, normalization, correlation, aggregation, and then reporting. These are all important when you consider them in the context of what a SIEM is. A SIEM basically collects logs and other security-related documentation for analysis. The logs, if you don't know, are really information that happens on a device like a firewall, or a web proxy, or any device that is providing network security or an application. So any application, typically, has a log file that will tell what exactly happened in the log. The core function of a SIEM is to manage your network security by monitoring network flows and events. Events are things that happen within an application or on a hardware device. Then the SIEM consolidates those log events and the network flow data, and it really pulls this from thousands of different devices, things like endpoints, applications, network hardware, anything that is touching the network, and then it uses advanced analytics to normalize and correlate that data and helps identify security offenses that might require investigation. A SIEM really takes two different approaches, they can be a rules-based approach or employer's statistical correlation to establish relationships between log entries. Then it will capture that log event and network flow data in near real time, and apply analytics to that to reveal security offenses in the network. There are several different ways that a SIEM can be deployed or consumed, if you will. It can be on-prem through either software appliances in your own datacenter, it can be in a Cloud environment, so I go to a web browser and log into a hosted environment, or it can be provided in what is called an MSSP, a managed security services provider, where they would host it for a specific company and allow you to log on similar to a Cloud environment. So let's talk about events and flows. Events are typically a log of a specific action, so things like a user login, or a firewall permit, or denial that occurs at a specific time and the event is logged at that time, and then that device or application would push that event to the SIEM, and the SIEM would process that in either this is normal behavior or this is abnormal behavior. Flows are record of network activity between two hosts, and those hosts, although that connectivity, I should say, can last for a few seconds or days depending on the activity within the session. So if you're transferring a large file, it will last for longer than if it's just a very short burst of communication, like an instant message or an email being sent. The network activity, let me say host, it could be a PC on your network that's talking to a hosted machine, like on an Internet page, or if you're transferring a file up to a Cloud service like a Dropbox or Box, anything like that is considered a network communication between hosts. Another example might be downloading multiple files, images, videos, and that might last for 5-10 seconds, or if you're watching a Netflix movie that might last for a few hours. Those are all network sessions and those flows are records that are captured of the amount of time and what was transferred between the two hosts. When we talk about data collection in the context of a SIEM, we're talking about the process of collecting those flows and logs from different sources and putting them into a common repository or database that the SIEM will analyze in order to make determinations if something is normal or if it's anomalous or abnormal. It can be performed by sending data directly into the SIEM on external device that collect that raw data from a source and aggregate it and then move it on demand, at a schedule or whatever the SIEM operator wants them to. Now to be frank, the SIEM data is going to be more valuable if it's real-time, so if the data is pulled directly from the device at the time it happens to the SIEM, you get better information and you get much faster information so that analysis can occur in real-time and provide the SOC analyst with data that is needed to determine if behavior or things going on are anomalous. Things to consider as far as how much data to pull in and how often to do that are really governed by a couple of different things, the amount of CPU that has been assigned to the SIEM application or if you're using an appliance, that SIEM appliance same with memory and storage capacity, of course the license associated with your SIEM is going to make a difference as well. Most SIEMs are licensed in the concept of what we call EPS, events per second and FPM, flows per minute, so the number of network flows per minute and the number of events per second is how most SIEM providers or systems license their SIEM. Then of course the number of sources that are being pulled into the SIEM are going to be important as well, so the more sources you bring in, the more EPS you're going to consume and then more flows you're going to consume as well. So those are all factors to consider when you're sizing how many resources to assign to your SIEM. Let's talk about normalization. Normalization is the process that takes raw data and puts it into a format that can be read by a SOC analyst, so things like the IP address, the queue identification, any data that is needed to provide information that is usable. It involves parsing that raw event data and preparing the data to display so it makes it more readable, and it results or allows for predictable and consistent storage of all the records. So regardless of what the system that the data is being pulled from, it gets normalized into a format that is readable. So I can see things like IP address, I can see things like machine name if that's available, username if that's available, it gives me more information about what's going on in the environment. The other thing we should talk about is licensing and license throttling. So most SIEMs as I mentioned before will be licensed by the number of EPS and the number of flows, and license throttling, monitor the number of incoming events and manage that input queue into the EPS or flow licensing. So if I go over that, those events that are coming in might be throttled or might be queued until I go under my license threshold, or in the event of too many at one time, they may actually be dropped and put directly into storage or just dropped all together, it all depends on the system and how it's monitored. So those are all considerations to think about when you look at the number of sources you bring in. Another concept that we talk about within SIEM is what we call coalescing. Then coalescing events are parsed and then coalesced based on common attributes across events. QRadar is IBM's same product. So the examples we're going to use in this presentation are all based on QRadar. Event coalescing, it occurs after three events have been found with matching properties within a 10-second period. So if I find three events coming from the same machine, I'm going to coalesce those into a single event with those three different properties into the same event essentially. It's processed in order to normalize it and really to keep the system from displaying too much information that makes it more difficult to sort and to analyze that information. When we talk about coalescing and how we combine those, there's five properties that are evaluated. So if the five properties match and we have three events within a 10- second period, they will all go to the same machine and the same event. That's the queue identification or queue ID, the source IP, the destination IP, the destination port, and the username. Those are all things that go into what we call event coalescing within QRadar. Then coalescing, to get into more detail, those does a bunch of parsed and when we find common attributes across those events, and we normalize that data into those field. We'll get more data than just those five things that we show on the screen, but those are the attributes that go into what makes up coalescing. So when we have three events in a 10-second period, all five of those things match. The queue ID, the source IP, the destination IP, the destination port, and the username, those are all going to be coalesced into a single event for the SOC analyst to look at and to be able to make sense of. Then let's talk about really what offenses are. So offenses are anomalous behavior that the system sees as, "Hey, this is something that doesn't make sense or something that is outside of the norm, so you should probably take a look at it." Those data points can come from lots of different sources. So things like security devices, servers, and mainframes, network activity, data activity if I'm accessing a database, application activity if I'm using a chat application or Microsoft Office 365, configuration info on a system, vulnerability and threat data, which is very important to assume, and then users and identity. So one of the things that might be anomalous is user logs in from multiple locations within a very short period of time. You might see someone's user ID trying to be logged in from the United States, then from India, then from Romania in a few seconds or a few minutes. That's obviously anomalous behavior because that couldn't happen logically. So those are things that might turn into what we would call an offense. All those things are put into event correlation. So things like logs, and flows, and IP, and your geolocation, as I mentioned in the example before. Then that activity is baseline to look for anomalies and then those will be put into an offense. Then offense is what the SOC analysts will look at to see if I need to do some further investigation on this. A good SIM will try to tune out the noise of false positives or anomalous behavior that really isn't an issue and provide what we call a true offense or something that can be and should be investigated. The challenge many organizations have when leveraging a SIM is you get all this data in and those filter into too many offenses that are not true offenses, and then they can't be investigated in a timely fashion, and so unfortunately, the SIM just becomes noise in the environment. So the whole goal of the SIM is to be tuned well enough that I can look at all the data that's coming in and tune it effectively so that I'm only looking at offenses that really need to be investigated that are true offenses.