Okay, you got your tree job back. You would request that a certain number of genes, and you didn't get all of those genes. Let's take a quick look. We're going to discuss in this video how to fix that problem. I'll go once again into jobs, which opens my jobs page, and then I'll click on the "Job" I wanted to see. This guy here, highlight the row, click on the "View" icon. This opens the landing page for the job. I want to look at this report and view that. You highlight it and you click "View". Here's my tree, and then here it says, "I requested 10 genes and I only got five." It gives me some warnings, and it says that several of my sequences are all identical. I may want to do a clean up on that. Here's another one that's identical. But if you scroll down to the bottom of the reports page, it will show you strategies to increase the single copy gene number. It says if I remove these two genes, I should be able to boost my output. I'm going to show you how to do this, but first let's see what they are. I copy it, I do a control find, I put the genome ID in there and you can see that this is HHAN genome is causing a lot of the issues in the tree. Then there's this guy, this genome ID. I copy it. I paste it. Oh, it's another one. This is a different copy of that. How do I fix these things? How can I improve the tree? How can I get rid of those things? Do I have to add all of these things at once? That would be a nightmare. I'm going to show you how we can do this and I'm going to open a new tab and go back to PATRIC. This time, I'm going to go into my workspaces and click on my genome groups. I have a bunch of genome groups. I use them all the time. They're the easiest way to assemble data and launch jobs, and so I want to go to ones I've recently created. I'll double-click on this and this will show me the most recent ones. The problem, I'm going to go back to this first tab, were with these Riesia genomes. I'm going go back to that group I created here, click on this and when I select this group, there's a number of things I can do with it. But I want to do the group view, the genome group view, so I click on this. This is pretty clever way of looking at all your data. It's all assembled. It's the same thing as any [inaudible] landing page. It's showing you for your group, all the data that's there. But we want to go into the genomes. Now, I can [inaudible] things out right here if I want to, but I'm always a little bit nervous about that so I want to create a new group and adjust the new group and not remove my favorite ones. I click here next to genome name, the "Checkbox" immediately to the left of that, select all of them. I'm going to create a new group. You click on the "Group" icon, it pops up this window. You click on this "Down arrow" next to existing group and say, new group. I'm going to call it Riesia May2020 adjusted, add. Once I click the "Add" button for that, it tells me I've made it. I know this is a bit tedious, but we have to go back into workspaces, click on "Genome Groups", click on "Created" to see the most recent one. Now this is the one that I'm going do all my cleanup on. I highlight it and I look at the group view. Here is all the metadata. Let's go to the Genomes, the second tab there. Now, I have two tabs open because I want to look back and see what the problems were. One of them is with this guy, this HPNA with that number. That genome ID number, I could put that directly there in the filter box and click "Filter" and then that's the genome that we were having problems with. Let's just verify that HPNA, so now I click on this "Checkbox" immediately to the left of the name. The downstream functions include Remove. Let's get rid of it. I click on that and it asks me, it wants to make sure I'm sure. Yes, I want to get rid of it and it's gone. But let's check to make sure it's gone. We can remove these keywords, and now that beautiful little genome is gone. Who else was the problem? We've got to scroll down and all the way to the bottom. Another one that was a problem is this guy. I copy that, I go into my second tab and I paste that and hit "Return". Now look at this, how annoying this is. I've given it an exact term and it's return basically pretty much everything. Which is annoying to me. One way you can thwart your computer is to put quotation marks around that and then hit return and there's the one. The HHA and genome, we want to get rid of that. There it is. I click the Remove icon after highlighting the row. Remove it. "Are you sure you want to remove this one genome for the group?" I do indeed. Then it'll pop up a message and say, it's gone, you did it. To verify, you click the X next to keywords and now we're down to fewer. So right away we can see that with these particular guys, I've gotten rid of two of the genomes that are problematic. Another thing you may want to do, is get rid of some of those genomes that were Identical, especially when you're looking at a really big tree. It's one thing if it's your own private genome that's identical and you want to include that in the tree. You don't want to get rid of that one. But in this public one, we had a lot of genomes there. If we have some extra baggage that we can get rid of, let's do it. So let's go back here and let's scroll up a bit. Let's look at the identical genomes. So it says that this 1009856.3, it's included here and here and here. There are three genomes that are identical to it. So let's see who those guys are. Control C, I copy it. Control find, I paste it. I want to see who those are. Those your Buchnera, so I have to go back to my Buchnera genomes. So I'm going to go back to this other tab, I'm going to go into my work spaces, click on genome groups. I want to see my most recently created ones. Let's see Buchnera. Here it is. So I click on that and I want to go to the genome group view. So once again this is showing the all the metadata and step that are interesting for this entire group that can include 64 genomes. So let me go into the genomes tab and we already know how to do this. We cut and paste the number there and hit Return. This time it gives me an exact match and I'm going to get rid of that. So I hit the Remove. It says, "are you sure you want to remove this?" Yes, I am sure. It's gone. To verify, I click this. There are the other guys. So I go back to the first tab again. It's this guy. Here is another identical one. Once again, in the next tab, I paste that value into the keyword search. It tells me who that is, and I'm removing it from that group. This time, remember last time how I saved the group? I've been adjusting this current group. Well, that's why I like to save it and rename it so I don't do my original one. But I was so excited about creating the perfect group that I forgot about that. But in for a penny, in for a pound, let's get rid of even more. So this guy, seeing as we're getting rid of them, we're just go on full force. Hit Return. This guy is also gone. Although I guess it's sexist, or is it sexist bacteria don't have any sex. But to associate a genome I want to get rid of is male is probably not the best thing to do. I'll try to be not even be anthropomorphic about my genomes. Then the last one, I'm assuming it's Buchnera because those were the most. I could search for it in the tree but let's go to this genomes group thing again, seeing as we already have it open and let's put it in. Did I get it all? Yes. These genome okay click on that and then remove it. So now I should be able to launch a tree job, with these new groups and it should look a lot better. Then as a matter of fact, I have done that. So I'm going to show you when I removed all of these, I'll just step you through it like we're doing it again. Tree alignment, and then this time, here's my new Buchnera, add that. Here's my adjusted Riesia, add that. There's our old friend, Blochmannia, add that, and Wigglesworthia, who could forget Wigglesworthia? Not only the insect physiologists, but also the endosymbionts of the tsetse fly. Let's add that. Remember, we had my private genome, which was a Wigglesworthia because I am a fan. The lock sign indicates a private genome and add that. Then we just have to find the folder which was endosymbiont trees. Name, I'll call it adjusted endosymbiont tree and then I could launch it. But I'm going to show you one of these that I've already launched, I just wanted to step you through it so you could see how much that improved. The first one, I'd done it with 10 and I couldn't get 10, I could only get five. I'm going to click on the jobs and see the tree that I submitted to look at that, here's the adjusted endosymbiont tree. I highlight the row, I view it and once again we open up my beloved report HTML, one of my favorite documents in PATRIC, maybe even in the whole world. Highlight that, click on the view icon. So here's the tree. Look at all those values of a 100 and look here, single genes requested. Remember last time we did get 10 and we only got five. Here we've gotten 52, so that's increased it even more. But you know what? I'm going to show you how to get to a 100 in the next webinar I'm going to show you how to use the protein family sorter and be able to start using the duplications and deletions within the gene families, and this is just going to rock your phylogenetic tree and your phylogenetic world. Join me for that one, it's going to be a lot of fun. Thanks. Assignment five is going to be a little bit tough for you, but I know you can do it. It'll take a lot of time for some of these jobs to complete. But I want you to become more aware of the time issue and it'll also help me so that you're not reporting jobs as not working. You'll know when you've passed a lot of the system. The first thing I want you to do is go into that tree that included all the groups including mesorhizobium. If you go into the report for that tree, and if you scroll down to the bottom, there were some suggestions you could improve the number of genes asked for if you removed one of those genomes. I want you to remove a problem genome either create a new group, remove it from the group that you have. That's your first assignment. The second is I want you to submit 32 tree jobs and I want you to include the practice genome from that comprehensive genome analysis contig assignment, that bacteria contig test. Now this time use the new mesorhizobium group that doesn't contain the problem genome, the one that you got rid of. Part of this and these 32 jobs is we're going to adjust the deletions and duplications. The goal of this assignment is to get as close to the 500 genes that we're going to ask for as possible. Here is what I'm asking you to do. This is that tests bacteria, the brucella representative genomes, the brucella frog genomes, the brucellaceae representative genomes, the bartonella reference and representative genomes, and the mesorhizobium group where you remove that difficult genome that was causing us problems. First, submit a tree with a 100 genes and you have deletion set to 0 and duplication set to 0, and enter the job name and you will go all the way through this, up each of these then I want you to bump it up to 500. This is deletions and duplications 0,0. But then you can go up to 10, so each one going up to 10. Then with the duplication set to 0, each one going up to 10 and then setting them equal and we will see where we get from there. I created a spreadsheet that you can use to fill everything in. You go into the workspaces tab and you click on "Public Workspaces" and then you scroll down till you see the MOOC PATRIC Course and click on that globe. Then when you see the Codon tree icon, click on that and here it is, the MOOC Codon tree assignment. I outlined in all of those who I wanted each of those. Your job right now is to name the thing and submit the jobs. One thing that you can do to make your life a whole lot easier when you're submitting a bunch of trees is after you submit the job, the submit button will go back to being white. You can change the name of the job and then go in and change the duplications and deletions. You don't have to load all the groups every time. That way you can really rapidly submit a number of jobs. I know you can do it, but it's not going to be easy. Especially waiting because some of these jobs are going to take more than a day, more than two days to run and maybe some of them will fail and we'll see how it goes. This is the most important assignments so far. Good luck.