Our lesson today will cover creating results tables and data visualizations. Let's start with a short overview of results tables and data visualizations and how they are tended to be used. Results tables provide estimates of indicators stratified by relevant covariates. They're intended to be used to report detailed findings on all survey coverage indicators. Detailed results tables serve the basis for identifying key trends and comparisons, synthesizing findings, and ultimately making recommendations for improving health programs. Once you have created results tables, you can begin to create data visualizations to summarize key findings from the results. Data visualization should be used in addition to tables, not in lieu of tables. Ideally, every results table should be self-explanatory. This six part table structure can help to ensure you create a self-explanatory results table. Each results table should have the following six parts: a table title at the top of the table, column headers going across the top of the table, row stubs going down the left side of the table, data filling in the middle of the table, and then footnotes at the bottom of the table, along with source information just under the footnotes. Let's talk about each of these table parts on a little bit more detail. The table title should give a clear and accurate description of the data. It should answer the three questions what, where, and when. The title should tell you the topic and the specific population group being described by the data in the table. Column headers at the top of the table should describe how the information is categorized in each column of the table and provide any relevant metadata which for example includes the unit of measurement, that time period, or the geographic area. Row stubs in the first column of the table show the different ways the data are divided into categories and for coverage surveys, this is generally based on population characteristics. Footnotes at the bottom of the table may provide any additional information needed to understand and use the data correctly so for example, additional definitions. The source line at the bottom of the table should provide the source of the data. For example, the organization that produced the data and the data collection method. Then finally, the data will take up the majority of the table and we'll talk next in more detail about some general principles for presenting your data. Thinking about the data, first let's talk about the data elements in a table. Each data table should include the following columns: a column for the point estimate, a column for the standard error or 95 percent confidence interval, and a column for the unweighted number of observations or the N. You can also include some additional optional table elements, such as the weighted number of observations or column with an asterisk to denote statistically significant differences or indicators. Would do you want to be careful not to overcrowd tables. We want to ensure the minimum required elements are present without overcrowding tables with too much information, and therefore making it difficult for readers to understand the content. On the right, we have an example results table, which shows the percentage of women who attended at least four antenatal care visits, and we can see here which columns have been included in this data table. We see a column for the point estimate or the percentage, columns for the standard error, and the 95 percent confidence interval, and columns for both the unweighted and weighted N. We can see in this example, we have both all of the required data elements as well as several additional optional data table elements included in this table. Now, let's talk about rounding and number formatting of data. In terms of rounding, percentages should be rounded to the nearest tenth of a percent. For example, 23.15 percent rounds to 23.2 percent. Numbers should be rounded to the nearest unit. For example, 1,215.2 rounds to 1,215. In terms of formatting, we generally present values with the same number of decimal places in all cells of a column. We can see in the example table on the right, percentages have been rounded to the nearest tenth, and numbers have been rounded to the nearest unit. There are some special cases that should be denoted in tables. For example, some indicators are calculated among a sub-group of respondents. In some cases, once the sub-group is further divided across background characteristic categories, there may be too few cases for the percentages to be reliable. The minimum number of cases for a coverage indicator to be considered reliable is based on the unweighted number of cases, and the general rule of thumb is a minimum of 50 cases. Estimates where the number of cases falls below the 50 case threshold, should be clearly indicated in results tables. As we just mentioned, it's important to clearly indicate in results tables estimates where the number of cases falls below the 50 case the threshold. There are multiple approaches for doing this. You can use a specific font color. You can place estimates and parentheses, or put an asterisk next to the estimate. Any of these approaches are acceptable as long as the notation is consistent across tables, and a note is added to the bottom of the table explaining the notation that is used in that table. Some reports use the same notation for all estimates where they're less than 50 cases. While others make a distinction between estimates with 25-49 cases and estimates with fewer than 25 cases. Both of these approaches are valid. For example, the DHS places estimates with 25-49 cases in parentheses, but replaces estimates with fewer than 25 cases with an asterisk. This is one approach that can be implemented. Another data issue that may need to be addressed in tables is missing data. Different household surveys have varying approaches to handling missing data in tables. There are many options for doing this. The RADAR survey doesn't present missing data less than five percent, while DHS has a different more detailed approach. If you're interested in learning more about the DHS methods for handling missing data, here are some resources for reading more about that approach. You can look this up in the Tabulation Plan for the DHS Final Report, as well as the Guide to DHS Statistics. There may be occasion to include other symbols in a table. Feel free to choose your own as needed and be sure to include any notation in the footnotes of a table. Here's a few examples of additional table symbols and notations used in DHS tables. Data's tables may include symbols such as NA, for not applicable, U, for no information, square brackets, for truncated or censored data, parentheses, for based on a small number of cases, asterisk, for based on too few cases to show, and 0.0 percent, for less than 0.05 percent. Let's look at an example table from the Sierra Leone 2019 DHS, to examine all the parts of the table. First, Step 1; read the title and subtitle. The title says, prevalence and treatment of symptoms of ARI, and the subtitle says, among children under age five, percentage who had symptoms of acute respiratory infection, ARI, in the two weeks preceding the survey, and among children with symptoms of ARI in the two weeks preceding the survey, percentage for whom advice or treatment was sought, according to backgrounds, characteristics, Sierra Leone, DHS 2019. From the title, we know the table is about two separate groups of children; all children under age five, and all children under age five with symptoms of ARI. Now for Step 2; look at the row stubs and the column headers. We can see the row stubs include background characteristics such as age in months, sex, district, mother's education, and wealth quintile. The column headers show that we have several different point estimates, as well as the number of children for which these point estimates are being calculated. The point estimates include: percentage of children under age five with symptoms of ARI, and percentage of children under age five with symptoms of ARI for whom advice or treatment was sought, and percentage for whom treatment was sought the same or the next day. Now let's look at the footnotes for Step 3. The footnotes include a note that the figures in parentheses are based on 25-49 unweighted cases, while an asterisk indicates that a figure is based on fewer than 25 unweighted cases and has been suppressed. In addition, we see the note that NC equals no cases. Finally Step 4, let's look at the data. What columns are included? Well, we see point estimates and an unweighted number of observations have been provided in our table. If we asked ourselves how have percentages and numbers been rounded? We can see percentages have been rounded to the nearest tenth of a percent and numbers have been rounded to the nearest unit. If we ask, how have estimates with too few cases to be reliable been handled? We can see based on the footnote and by examining the table estimates, that estimates based on 25-49 observations are in parentheses, while those based on less than 25 observations have been replaced with an asterisk. Just a note for future reference, when parentheses or asterisks are used in a table, the explanation will always be noted under the table. If you see there's no parentheses or asterisks in a table, you can proceed with confidence that enough cases were included in all categories that the data are reliable. Now let's try to answer a few questions using the data from this table from Sierra Leone. Our first question, what percentage of children age 36-47 months with ARI symptoms had advice or treatment sought? Looking at our table, we can see that the age category, 36-47 months is the second-to-last each category, and the third column in the table has the ARI symptoms had advice or treatment sought, and so we see it's 87.1 percent. We noticed this percentage is in parentheses because there's between 25 and 49 children unweighted that are in this category. This tells us that readers should use this number with caution as it may not be reliable. Looking at our second question, what percentage of children age 48-59 months with ARI symptoms had advice or treatment sought? This is the last age category and again, that third column in our table, and we can see there's no number in this cell. We only see an asterisk. This is because there's fewer than 25 children that's unweighted in this age category. The results for this group are therefore not reported. This tells us that this subgroup is too small and therefore, the data are not reliable. Finally, our third question, can we assess treatment-seeking by district? Well, if we look at the data by district, we can see only two districts have more than 25 cases. Most of the districts have too few cases to even show results for this indicator therefore, no comparison by district can be made as there are too few children with ARIs symptoms in each district to provide meaningful data. Now that we have covered the basics of creating results tables, here are a few tips to keep in mind when planning for results table creation for your survey. A tabulation plans should be prepared in relation to the study objectives and your audience. Try to tape your tables according to your audience. For example, if you have a less technical audience, there's really no need for high-level complex statistics. People likely won't meet it, and they won't be able to understand the takeaway messages. In this cases, simpler table would be more appropriate for this audience. It's helpful to construct dummy tables before you start your analysis so that you know what you want to come out of the analysis. Also, share your dummy tables with stakeholders ahead of time before generating the final tables. This will save you time as stakeholders can provide feedback on their plan for analysis before you carry out the analysis and create the final set of tables. Consistent formatting across tables is really helpful for readers to be able to understand all of the tables in report. Finally, table creation takes time, lots of time, so allocate sufficient time to create them. Now, we're going to move on to talking about data visualization. Before we get started, I'd like to thank and acknowledge the National Evaluation Platform or NPE project as much of the data visualization content was developed by the NEP project as part of the curriculum on data interpretation and reporting. What is this data visualization? Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Ultimately, data visualization is the process of displaying data to tell a story and there's many ways to visualize data. Many people ask, especially once we have so many results tables, why visualize data? What's the benefit? Well, here are a few reasons to visualize data. Data are more persuasive as graphs compared to tables. In addition, human brains more rapidly process visuals compared to text. Developing appropriate data visualizations is a useful advocacy in communications tool. Finally, data visualization through charts, maps, and illustrations allows us to see and explore data and ultimately use them to make decisions. When thinking about how to make a data visualization, it's important to remember there's no one right, data visualization. The method of data visualization depends on your key message. Dr. Bryan Zikmund Fischer and a team of other researchers at University of Michigan have conducted several studies exploring different approaches to visualizing health data. They made a list of 16 scenarios where an individual would need to interpret information to gauge a health risk, for example, comparison of different medications. They asked graphic designers to create several different visualizations for each scenario and asked regular individuals to assess the visualizations based on clarity. This quote is based on the team's findings, ''There is no one way to present data. It really depends on your communication goal. While the visualization needs to represent the data truthfully, it also needs to be purpose-driven.'' But one should also be aware that changing how the data is presented can drastically change its message. Depending on how the data is formatted, the same number can increase alarm or alley fears. It is a critical consideration and as communicators, we have a responsibility to take ownership of how we present data. We've developed a relatively simple three-step approach to making good data visualizations. Here we'll present the three broad steps and then walk through each of them in a bit more detail, providing some additional resources along the way for those interested in taking a deeper dive into the various aspects of data visualization. The three steps for making good data visualizations are, Step 1, identify your key message, Step 2, choose an appropriate type of visualization based on your key message, and Step 3, create your data visualization and apply the data visualization checklist. Let's start with Step 1, identify the key message. Let's first talk about the difference between a result and a key message. Results are what our study found. They are objective facts from our analyses such as survey results, numbers, and tables. Results are large in number and can be hard to quickly understand as they are complex and numerous. Imagine giving someone a state or log file. These are our results, but what do they actually mean? We synthesize and understand our results to come up with our key message. The key messages might not be obvious at first, and they require manipulation and close examination of the results to get them out. By taking those distilled results and contextualizing and interpreting them, we finally can come up with our key messages. We need to translate results into key findings to help convince our audience why they should care about our results. Key messages are important because they help you focus your communication and frame your argument. They also help avoid overloading information on your audience. A key message is a piece of information that you want your audience to remember. This is really the so what. A key message is generally short, so approximately 1-2 sentences and it should be able to be shared in less than 20 seconds. Your key message should be relevant, and it should be easy to understand, meaning the language should be concise and appropriate for the audience. Importantly, your key message should be memorable. The audience should be able to remember this statement and become curious to learn more or to take action. Once you have developed a key message, one way to check if it meets the guidelines for good key message is to ask yourself the following questions from this key message checklist. Is my key message less than two sentences? Is my key message as concise as possible? Does my key message motivate the audience to ask for more information? Can I repeat my key message without looking at notes? Then state your key message to a colleague. Wait a few minutes. Can your colleague repeat the key message? If you're able to meet all of these key message checklist requirements, chances are you've created a good key message. If you're looking for some tips for constructing a key message, the ABT method, maybe one place to start. You'll definitely want to use your tables or your preliminary figures to help build your key message. But in addition, the ABT method adopted from a TED Talk by Randy Olson is one approach to help construct a key message. The ABT method, which stands for and, but, therefore helps create a story with connections, conflict, and a resolution. The general formula for the construction of an ABT key message is, this happened or this data exists and this happened or this data exist, but this other data or condition complicates our understanding or cause a problem and therefore, this analysis was performed or this resolution was reached to resolve the problem or understand the circumstances. Let's look at an example of creating a key message, applying the ABT method. Here we have a graph of total fertility from 1997-2011 by urban and rural areas in Mozambique using DHS data. We're going to take a look at the trends in fertility over time, overall, and by urban and rural areas to create a key message using the ABT method. Here's our example key message. Total fertility rate, TFR, stayed relatively constant between 1997 and 2011 and TFR has decreased in urban areas during this time period, but the difference between TFR in urban and rural areas has increased over time. Therefore, there is a need to ensure that family planning programs target both individuals in urban and rural settings. Once you have a key message, you can move to Step 2, choosing an appropriate data visualization. The most appropriate choice for a data visualization is based on the story you want to tell and your data. Based on your key message, what do you want your visualization to show? The data visualization decision tree aims to help you decide the best data visualization type based on the data you have and the key message you are trying to convey. The decision tree is available as a reading associated with this lesson so you can examine the details further. However, I do want to point out briefly the types of questions you can ask yourself to get to a decision on the type of visual to use. The first question to ask is do I want to compare numbers, or do I have one number that I really want people to remember, or do I have words to share? Let's look, for example, down the far left side of the decision tree. If we want to compare numbers, for example, then we further have to ask ourselves, do we want to show how two or more numbers are the same or different? If so, then you further have to ask if you're comparing two numbers or more than two numbers. Then you should arrive at a list of potential types of data visualizations suitable for that purpose. The data visualization decision tree includes 13 different types of data visualizations which demonstrates the many different ways data can be visualized. This list of visualizations on the left is part of the data visualization decision tree reading. You have this as a resource if you want to read more about each one in detail. The 13 data visualizations include some that are very common and you may be familiar with, such as bar charts and pie charts, as well as some less common data visualizations. Here on the right, I have a few examples of some potentially less common visualization types. On the top row, left side, we have a back-to-back bar chart. Back-to-back bar charts are two bar charts that are aligned back to back. The purpose is not to highlight specific values, but distributions which are helpful when you want to assess symmetry, such as this graph which shows the proportion of marriage among males and females in Tanzania. On the top row, right side, we have a dot plot, which is also sometimes called a dumbbell plot. Dot plots are easy and quick to read and are a great option for emphasizing gaps between numbers. We often see dot plot used for equity graphs, such as this plot, which shows the difference in ansi for coverage between the poorest and richest population quantiles in Tanzania. On the bottom left, we have a line graph which is used to show changes over time. Generally, we use line graphs to show how coverage of interventions or disease prevalence change over time. This line graph shows how coverage of various malaria interventions has changed over time. Finally, on the bottom right, we have a map which is used to show how data trends across geographical boundaries. This map shows coverage of insecticide treated nets and indoor residual spraying by region in Tanzania at two time periods. One additional point to make about visualizing uncertainty. It's important to consider your audience when deciding if and how to include uncertainty in your data visualizations. If you display variability in your graph, such as husbands on in the figure shown here, you must be able to explain it. In addition, many audiences don't understand confidence intervals and there is a risk that your graph will confuse your audience. However, uncertainty is also an important concept and key to being able to correctly interpret your findings. You must carefully consider this when deciding how to present and explain uncertainty in data visualizations. Finally, once you have chosen and created an appropriate data visualization, you can move to Step 3, applying the data visualization checklist. That data visualization checklist provides some additional guidance and key areas to check to make sure your data visualization meets some minimum standards for a good data visualization. These include, overall, does the visualization highlight an important finding or conclusion? Is the type of visualization selected appropriate for the data and key message? In terms of text, is there descriptive title? Are the axes labeled? Is the font readable when printed and on a presentation screen? Are data labeled directly when possible? Do the axes start at 0? Is there any unnecessary clutter on the axes? For example, more labels than necessary. For color, do the colors match the government institution or organization? Do the colors highlight key patterns, and are the colors readable when printed in black and white and on a presentation screen? Finally, lines. If gridlines are present, are they faint? Are tick marks present? Except for line graphs, they're usually unnecessary. This data visualization checklist is also part of the data visualization decision tree reading. It can be accessed for further reading through that resource. Following these three key steps should provide a pathway to developing useful data visualizations to accompany your survey results.