You've annotated your genome in PATRIC or you have some sequences or fast day files. What if you're not sure of what taxonomy your organism is? Well, then we have a tool in PATRIC that can help you with that. You can go into "Services" and click here on "Similar Genome Finder." This is the landing page for that service. Similar Genome Finder uses Mash/MinHash. Mash reduces large sequences and sequence datasets to small representative sketches from which global mutation distances can be rapidly estimated. It's based on this paper that describes it. If you have any questions about it, you should look here and read up on it. But we found it's a really good way to go very quickly and find close relatives to your data in PATRIC. In this first instructional video, we're going to talk about searching by the genome name or genome ID. We have this filter here that you can click on to filter by your private data or other types of data. If you clicked here, it would show you the most recent data. That little lock here, it indicates that it's a private genome, and this is some of my private data in PATRIC. But also, if you know the name, you could start typing it. This is some data that was sent to me by a collaborator and there were some indications to me that this was a mixed sample, but it was contaminated. Although I assembled it and I annotated it. I have it here, but I want to see. They thought it was from this particular genus, but I'm not sure. So how do I go about doing this search? Right now, I could hit the "Search" button, but I wanted to talk a bit about the parameters. You can say you want to see 1-500 hits returned. You could filter on the p-value. You could filter on the distance, and you could choose to search only within the reference and representative genomes or within the public genomes. Now, the reference and representative genomes are something that's designated by NCBI. But we use them because they're generally higher-quality genomes, but they don't span the breadth of all bacterial taxonomy. So that's why I'm going to set it for all public genomes. I'm looking to see about this Elizabethkingia genome. I hit "Search." Now, generally, in PATRIC for most of the services that you've seen if you've been using these tutorials, [inaudible] , you monitor it on this jobs monitor. But this one runs on the fly, so you have to wait here for the data to return. Look at this. My collaborator told me it was this genus, but when I'm looking here, it looks like it's all Listeria in all these results. Let's talk about, before we discuss more about that disturbing finding. This is the top hit and it gives me some information about this genome like, the status, how many contexts, where it was isolated from, the year it was collected, when it became complete. This is the Mash distance, the p-value, and the number of k-mers out of 1000 that hit that. But you notice that it says it's all Listeria, we don't see Elizabethkingia. So I'm going to go back, edit, resubmit, and extend this to 500, and see if I can start seeing it then. Because my first indication was, this isn't Elizabethkingia, it must be Listeria. But you can just easily, by hitting on this button, refine your search. Once again, Listeria, Listeria. Here we have one. Here's an Elizabethkingia. We start seeing that there are 399 k-mers out of 1000 match to that. Here's some more of those. So truly this was a mixed sample. Then I was able to determine that by using this tool. Another way I could have determined it was using the taxonomic classification service. But that's a subject for a different video. In our next one on Similar Genome Finder, we'll talk about uploading fast day and fast cue files so you can get an estimate. You don't have to go all the way to annotation to figure out what something is. Thanks. Bye. All right. Think of all you've done in PATRIC. You've dissembled, you've annotated, you've run the Comprehensive Genome Analysis Service, some of you probably run trees too. Now, when we had to do assembly and annotation, and even the comprehensive genome analysis service, I purposely didn't tell you what those genomes were. The Comprehensive Genome Analysis Service runs a little tree for you, so it'll get you at least the genus for that. But we have the other service in PATRIC, Similar Genome Finder, that tells you the closest genome to the one you just annotated. So we're going to do that right now with this assignment. In the box that says "Search by Genome Name or Genome ID." Remember that spreadsheet I made you fill out with those 27 different jobs for the Comprehensive Genome Analysis Service? Remember how you hated doing that? I'm sure you cursed me. You just found it a total drag. But if you did that right, this is going to make your life a whole lot easier. In that box, put in the genome ID from the job that you used Unicycler and had zero racon on and zero pilon iterations, and then launch it. What's the closest genome to that annotated genome of yours? Note the distance, the p-values, and the k-mer counts. You could even open a new tab to do this and have Similar Genome Finder job in there. Repeat it, but this time use the one where you used Canu as the assembly strategy and you also had zero racon and pilon iterations. Do the same thing. What's the closest genome? Note the distance, p-value, and k-mer values. First of all, did they both hit the same genome? Secondly, which assembly strategy resulted into higher hits? Oh, I know you're going to love this and be amazed by this. That is another thing to think about when you're choosing an assembly strategy. Good luck. I'll see you at the next assignment.