[MUSIC] Hello, my name is Alejandro Rodriguez, and I'm one of the professors that has been teaching this MOOC, as you will remember. I'm going to give you this block about extraction of information from text, the NLP part. And in this first video, I would like to introduce you to the NLP pipeline. NLP is the natural language processing process that it's a process that allows us to extract information from text. We're going to focus this NLP process to the extraction of information from medical text. And I will show you all these steps that you can see in the slide. I will not focus very much on the details about what means each of the phases. I will just describe it very briefly, because this MOOC is about formation and it's not about the technical part. It's not very focused on the technical part. The idea is to you an overview about the main elements of our using this context in the extraction of information using NLP. So the pipeline that you'll see here is more or less a more basic pipeline that we can use, or the most useful pipeline, let me say it better. We can use in NLP, and it's comprised of several steps. We're going to use, as an example, this paragraph that we had. It's a man, 42 years, NAMC patient without a previous loss of conscience. Observation shows deficit in the left side of his body. No fever, fibrinolysis discarded. Hospitalized for a complete study. So there is a chance that we are going to apply the different part. So the first one is the sentence detector. Okay, the sentence detector is just to bite or split the paragraph, or part of the text, in sentences. So, in this case, we have divided this full paragraph in these sentences. We normally use as divider for the sentence detector to try to split we're using the dot sign of punctuation or, for example, a new line over character as the main element. So we have divided the paragraph in several sentences and we can see all the information over there, all the sentences that we have divided. The second one is the tokenizer. The tokenizer for each given sentence, try to divide it into tokens. So the tokens are the meaningful representative pieces of a sentence, typically words and symbols. So as you can see here, we have Man, we have 42, because the two digits are together, they are not separate. So it's considered as a one token, again, just and the sign, the dot, the punctuation to end the sentence. NAMC, it's completely considered as a single word, again, it's a new token. So you see that words and symbols are the main parts of the token. So in the third part, where the part of the speech. And the part of the speech, the idea is for each of the tokens, the a part of the speech is stamped. So that means that we assign to each token, the functionality that they have in the sentence. So things as, for example, in the sentence we can see that a Man is a noun, or this cat is a verb. So this is the functionality that each of the tokens has in the text. So the next part is the chunker. So after assigning the part of a speech, they are target as part of a phrase, which can be a noun, or can be a phrase. For example, patient or left in the body or upper or prepositional phrase. For example, from, in and so on. So the chunker allows us to make these structure and create basically a greater structure of the sentence that we are analyzing or in this case of the different sentences. Almost finally, we have the parsing that the idea is to create a structure that are performed through this parsing. Where we can identify which is the subject, direct and in direct objects, the verb and so on. So, we can see this information in such a structure that it's near to [COUGH] being some kind of tree or similar and we have all this information. And then finally, we have the name entity recognition. The name entity recognition basically tries to identify the concepts that are relevant. So in this case, we can say that, for example, an NAMC is detected as an organization like that. This is a method that can fail depending on the models that we are using, in this case it's not, it's our acronym, okay? But basically the idea is to identify the type of entity that each of the different tokens or different parts of the sentence. So that's the final part of the NLP pipeline. So there are several other NLP tasks or procedures that can be applied. You can see at least over here. But, well, and depending on the context on the main we are going to use all of this or just a part of this task. You can see the full list. You can find more information on Internet about the different tasks. So again, you, sorry, [LAUGH] you have here the reference on the materials. And this is everything for this lesson, so thank you very much for your attention. [MUSIC]