In this lecture I would like talk to you about Crowdsourcing in Bioinformatics and Systems Biology. As you saw throughout the course, the problems that we are dealing with in Systems Biology and Systems Pharmacology are complex. Through websites such as Coursera, we have the opportunity to have many people working together on the same project. So the first thing that I would like us to do is to watch a video created by Vincent Liota for NOVA that maybe motivate you to think about the ability of the crowd to solve complex problems. The name of the video is called the Wisdom of the Crowd and I hope you enjoy it. >> Sir Francis Galton was a nobleman. And scorned the common masses. He thought that votes of governance should be left to higher classes. He'd prove with all the data from a contest inescapable of guessing even simple things, the commoners were incapable. >> Ladies and gentlemen, step right up! >> What kind of contest might it be? >> Guess the ox's weight and see! Guess the weight correctly and win a prize. >> It's 100! >> Said the little one. >> That's much too light. At least a ton. >> An eager crowd queued up to play, 800 made a guess that day. >> So it had 800 data points. >> And now, the ox's weight is exactly, [CROSSTALK] 1198 pounds. >> [SOUND] >> There are no winners. >> Sir Francis knew the rabble would never guess the weight. How might they judge important things if left to meet that fate? [MUSIC] With mathematics he would show, how far they went astray. But in the end his theory was in total disarray. Because a curve of all the guesses >> Oh that curve? It's a cumulative distribution function of the normal distribution. Sorry, that's what it's called. >> [LAUGH] >> Because graphing all the guesses and determining their mean. >> I think he was talking about the median. >> And determining their median he showed that if the crowd were one its estimate is keen. >> He showed that if the crowd were one, its estimate is keen. >> Keenius. >> That's because while no individual get the actual rates, the average of all the guesses is exactly right. >> The average will generally be better than a randomly selected individual guess. >> The average of the masses assures us of. Success. >> I think he was talking about the media. >> And the larger the number of guesses we toss in. >> The more likely we are to get the right answer about the oxen. >> His premature prognostication. They cannot help but scoff. >> Gelman should've gathered more data before he went shooting his mouth off. >> Sir Francis' hypothesis was rocked by ignoramuses. He lost the proof he had avowed. He found the wisdom of the crowds. >> If you have a group of people and they each have tiny bits of information, then you can learn a lot if you can just gather all of these bits together. >> It's just like Wikipedia. >> Well, this isn't exactly like Wikipedia, [LAUGH] it's a little bit different. >> It could maybe be like. [SOUND] And you don't even need to be an expert, but if you know something then you're able to contribute, and that entry is able to be more informed. >> Another sample of this fair? >> Who wants to be a millionaire? >> Yeah the audience life line. >> If a person feels like they can't answer the question by themselves, as the audience. The audience is right over 90% of the time. >> There you go. >> How about that, Gelman? >> The wrong Gelman. >> Sorry. >> One by one we're not too smart, but every guess it plays its part and when you add them up you'll find the wisdom of the crowds. [NOISE] So what you saw from the video is that when you put the minds of many people together you can potentially achieve more than what one person can achieve. So the rest of this lecture would be discussing information that was published in a review article By Benjamin Good and Andrew Su. And this is a really great review that covers many of the types of crowdsourcing projects in the field of systems biology and bioex schematics. So the first one is Crowdsourcing. So the term Crowdsourcing was coined in an article in Wired magazine in 2006. And a definition stated that Crowdsourcing is the act of taking a traditional job Performed by an employee and outsourcing it to an undefined, generally large group in an open call. So the review article in Bioinformatics by divides crowdsourcing projects into two types, microtasks And mega-tasks. Micro-tasks are projects where you don't need to know too much to participate in. You are presented with a relatively easy task. For example, a task that we saw in the video of guessing the weight of the ox. And the combination of the input of many people results in a final great product that would be very difficult to achieve using, for example, complicated computer programs. The other types of tasks, the megatasks are typically very hard. Problems and can potentially be solved by individuals. These megatasks are typically set as challenges or competitions, and only the top few solvers of the task provide a solution to the problem. Now we are going to over several examples of microtasks with megatasks that are out there. But before we go into some examples, let's think about. What could be motivating people to participate in any Crowdsourcing project? So the first would be people just like to volunteer and be a part of something bigger and great. Crowdsourcing projects are also delivered sometimes as games. And the reason people want to play games is mostly for fun. There are also now on the web microtask markets. One example is the famous Amazon Mechanical term that we will discuss a little bit later on in the next few slides. Sometimes you guys are participating in microtasks without even knowing it. For example, when you search Google, you're helping Google to improve their search engine. Because they process your search terms to improve their algorithms. And the last one is through education. So by performing those micro or mega tasks, you are actually learning, and especially now when you have those MOOCs, an instructor can give the students projects that they can all work on together. Let's look at some Examples of microtasks. So the first example is called Cell Slider. So this microtasks ask you to label various kinds and quantities of different cells from cancer patient biopsies. So let's go to the website and look at this example. And typically when you're going to those micro tasks, you have a training period that teaches you how to perform those micro tasks. And in this training period, they show us Which cells are the cancer cells? Which cells are the core tissues? And which cells are regular white blood cells? And this gives you an ability to make a selection so you can go on and try this site on your own. So this is some examples of games that you can play to help solve bigger tasks. Phylo is a game that can be used to help with sequence alignment. MOLT and MalariaSpot are games that can be used to annotate images from patients that have malaria. So I mentioned before that Amazon Mechanical Turk. On the Amazon Mechanical Turk the motivation to complete those microtasks are very small cash awards. So there's a few cents per task. So there are some examples from biomedical sciences where large microtasks were posted on Amazon Mechanical Turk. One of them is for processing Texts from biomedical literature. So in the Amazon Mechanical Turk website, you can either try to make money or you can post a project and then have many people contribute to complete the project. So if they click on Find Hits Now, you can see an example. So one is, for example, Quality Estimation From Arabic to English translation the reward for microtask is 10 cents. And it maximally takes 60 minutes to complete this task. If I click on it, you can see some more details. And you have to be signed in in order to participate. So on this site there are 185,000 Hits and a Hit is a human intelligence task. Which is typically a Crowdsourcing project. So this is an example of a Crowdsourcing task in an app or application and one of the famous ones is recaptcha which is anti-bot free service, so you can add it to your website to make sure that robots do not sign in. To your website, and at the same time when users are logging into your website, recaptcha is using your entry for helping them in the process of digitizing all textbooks. And the last type of microtasks are those Crowdsourcing educational, and this is our case here in this course, you're gonna try to do some of those crowdsourcing tasks. Those are not graded assignments. Just and we're gonna try to experiment with this idea that we can all achieve together more than what we can achieve individually. One thing that we're gonna try in this course is to extract interactions from the literature to build networks. Literature based protein, protein interaction networks. And kinase substrate interaction networks. A very similar Crowdsourcing project is called SBVimprover and this is a network verification challenge. They're asking you to read papers and suggest or approve interactions in cell-signaling and metabolic pathways. So if you are interested I encourage you to visit that website. So now we should switch our attention to megatasks. And one of the famous megatasks in the field of computational biology is the problem called Foldit. It's a game. That involves a series of training steps. Foldit is for learning how to fold a protein from its sequence. So this is one of the most famous computational problems in the fields of bioinformatics, and it's a very difficult problems because the search space of all possible folds for proteins is enormous. The human intuition can really be used to solve this problem. However only the top players can really achieve a quality prediction that is comparable to computational predictions, or even beat state of the art Computational prediction methods. A similar problem is also our ability to understand the folds of MRNA. Once the MRNA gets transcribed from the DNA, it's Assumes a fold that protects the MRNA from degradation. It forms hair pins that allows it to become more stable, and this way it can begin the trip from its transcription site outside of the nucleus to reach the rival zones. So trying to identify the fold of those MRNAs is a difficult task, and is a really nice game called ETERNA, enables people to try to fold MRNAs in many ways, and this way they can also search the space of MRNA folds. So let's look at that game. So this is the EteRNA website and you can begin by pressing the Play Now button. So it's first, you're going through a tutorial that tells you how to play. So now I have to change all of the. Nucleotides to g and that earned me some points and this way, I can move on to the next puzzle. And you can explore this yourself. So many of the Megatasks are set up as competitions. And typically the incentives are either money or the winning algorithms are published in scientific journals. And the idea is to recruit many talented people to achieve a solution for a task. In some cases, the combinations of the algorithms and solutions that people come up with, to even come up with a combined method that can beat any individual entry. So, in the next lecture, I'm going to present to you the three microtasks and the one megatask that we designed. For this course that all of us can participate in. [MUSIC]