This document summarizes a presentation about using the Taverna workflow system and myExperiment repository for collaborative bioinformatics research. It discusses how Taverna allows researchers to combine multiple computational methods and online data sources into reproducible workflows. The presenter describes their own experiences with early "spaghetti code" approaches to bioinformatics and how e-Science tools now enable more insightful experiments through collaboration and sharing of workflows.
1. Feasting on Brains with Taverna Tutorial and demonstration by Marco Roos acknowledging Carole Goble, Dave de Roure, Alan Williams, Jiten Bhagat, Katy Wolstencroft, Martijn Schuemie, Edgar Meij, Sophia Katrenko, Willem van Hage, Scott Marshall, Pieter Adriaans, NBIC, OMII-UK, the myGrid team
10. A typical biologist… A needy biologist Tiny brain Lots of data to deal with Lots of methods and algorithms to try and combine No computational superpowers Lots of knowledge to deal with
11. Start at the beginning I have a computational question…
12. ‘ Old school’ Bioinformatics A typical bioinformatician
13. ‘ Old school’ Bioinformatics A biologist behind a computer who (just) learned perl
14. /* * determines ridges in htm expression table */ #include "ridge.h" int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim"); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, "connection to database failed."); fprintf(stderr, "%s", PQerrorMessage(conn)); exit(1); } else printf("Connection ok"); sprintf(querystring, "SELECT * FROM chromosomes"); printf("%s", querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf("%d, ", i); printf("%s", PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(" in validquery"); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf("Query %s failed.", querystring); fprintf(stderr, "Query %s failed.", querystring); return FALSE; } return TRUE; }
48. Many? Some statistics >4000 Taverna users >3000 Web Services in Taverna >400 workflows on myExperiment >1000 registered myExperiment users in one year