Using Flow-based programming to write tools and workflows for Scientific Computing in Go
1. Using Flow-based programming
... to write Tools and Workflows
for Scientific Computing
Go Stockholm Conference Oct 6, 2018
Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter)
Ex - Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io
Savantic AB savantic.se | RIL Partner AB rilpartner.com
2. About the speaker
● Name: Samuel Lampa
● PhD in Pharm. Bioinformatics from UU / pharmb.io (since 1 week)
● Researched: Flow-based programming-based workflow tools to build
predictive models for drug discovery
● Previously: HPC sysadmin & developer,Web developer,etc,
M.Sc. in molecular biotechnology engineering
● Next week: R&D Engineer at Savantic AB (savanticab.com)
● (Also:AfricArxiv (africarxiv.org) and RIL Partner AB (rilpartner.com))
3. Read more about my research
bit.ly/samlthesis →
(bionics.it/posts/phdthesis)
10. ● Black box, asynchronously running processes
● Data exchange across predefined connections
between named ports (with bounded buffers) by
message passing only
● Connections specified separately from processes
● Processes can be reconnected endlessly to form
different applications without having being changed
internally
FBP in brief
14. The Central Dogma of Biology …
… from DNA to RNA to Proteins
DNA
mRNA
Protein
Image credits: Nicolle Rager, National Science Foundation. License: Public domain
Amino acids
Ribosome
RNA
polymerase
Cell nucleus
Cell
15. “FBP is a particular form of dataflow
programming based on bounded buffers,
information packets with defined lifetimes,
named ports, and separate definition
of connections”
FBP vs Dataflow
16. ● Change of connection wiring
without rewriting components
● Inherently concurrent - suited
for the multi-core CPU world
● Testing, monitoring and logging very
easy: Just plug in a mock-, logging-
or debugging component.
● Etc etc ...
Benefits abound
20. Generator functions
Adapted from Rob Pike’s slides: talks.golang.org/2012/concurrency.slide#25
func main() {
c := generateInts(10) // Call function to get a channel
for v := range c { // … and loop over it
fmt.Println(v)
}
}
func generateInts(max int) <-chan int { // Return a channel of ints
c := make(chan int)
go func() { // Init go-routine inside function
defer close(c)
for i := 0; i <= max; i++ {
c <- i
}
}()
return c // Return the channel
}
21. Chaining generator functions 1/2
func reverse(cin chan string) chan string {
cout := make(chan string)
go func() {
defer close(cout)
for s := range cin { // Loop over in-chan
cout <- reverse(s) // Send on out-chan
}
}()
return cout
}
22. Chaining generator functions 2/2
// Chain the generator functions
dna := generateDNA() // Generator func of strings
rev := reverse(dna)
compl := complement(rev)
// Drive the chain by reading from last channel
for dnaString := range compl {
fmt.Println(dnaString)
}
23. Chaining generator functions 2/2
// Chain the generator functions
dna := generateDNA() // Generator func of strings
rev := reverse(dna)
compl := complement(rev)
// Drive the chain by reading from last channel
for dnaString := range compl {
fmt.Println(dnaString)
}
24. Problems with the generator approach
● Inputs not named in connection code (no keyword arguments)
● Multiple return values depend on positional arguments:
leftPart, rightPart := splitInHalves(chanOfStrings)
25. Could we emulate named ports?
type P struct {
in chan string // Channels as struct fields, to act as “named ports”
out chan string
}
func NewP() *P { // Initialize a new component
return &P{
in: make(chan string, 16),
out: make(chan string, 16),
}
}
func (p *P) Run() {
defer close(p.out)
for s := range p.in { // Refer to struct fields when reading ...
p.out <- s // ... and writing
}
}
26. Could we emulate named ports?
func main() {
p1 := NewP()
p2 := NewP()
p2.in = p1.out // Connect dependencies here, by assigning to same chan
go p1.Run()
go p2.Run()
go func() { // Feed the input of the network
defer close(p1.in)
for i := 0; i <= 10; i++ {
p1.in <- "Hej"
}
}()
for s := range p2.out { // Drive the chain from the main go-routine
fmt.Println(s)
}
}
27. Add almost no additional code, and get:
flowbase.org
28. Real-world use of FlowBase
● RDF (Semantic) MediaWiki XML→
● Import via MediaWiki XML import
● Code: github.com/rdfio/rdf2smw
● Paper: bit.ly/rdfiopub
30. Taking it further: Port structs
ttlFileRead.OutTriple().To(aggregator.In())
aggregator.Out().To(indexCreator.In())
indexCreator.Out().To(indexToAggr.In())
indexCreator.Out().To(triplesToWikiConverter.InIndex())
indexToAggr.Out().To(triplesToWikiConverter.InAggregate())
triplesToWikiConverter.OutPage().To(xmlCreator.InWikiPage())
xmlCreator.OutTemplates().To(templateWriter.In())
xmlCreator.OutProperties().To(propertyWriter.In())
xmlCreator.OutPages().To(pageWriter.In())
(So far only used in SciPipe, not yet FlowBase)
32. SciPipe
● Workflow
● Keeps track of dependency graph
● Process
● Added to workflows
● Long-running
● Typically one per operation
● Task
● Spawned by processes
● Executes just one shell command or custom Go function
● Typically one task spawned per operation on a set of input files
● Information Packet (IP)
● Most common data type passed between processes
Workflow
Process
File IP
Task
Task
Task
33. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
34. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
35. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
36. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
37. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
38. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
39. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs & inputs
(dependencies / data flow)
● Run the workflow
40. “Hello World” in SciPipe
package main
import (
// Import the SciPipe package, aliased to 'sp'
sp "github.com/scipipe/scipipe"
)
func main() {
// Init workflow with a name, and max concurrent tasks
wf := sp.NewWorkflow("hello_world", 4)
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetOut("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetOut("out", "{i:in|%.txt}_world.txt")
// Connect network
world.In("in").From(hello.Out("out"))
// Run workflow
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
41. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
42. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
43. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
44. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
45. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
46. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
47. Writing SciPipe workflows
package main
import (
"github.com/scipipe/scipipe"
)
const dna = "AAAGCCCGTGGGGGACCTGTTC"
func main() {
wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4)
makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}")
makeDNA.SetOut("dna", "dna.txt")
complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}")
complmt.SetOut("compl", "{i:in|%.txt}.compl.txt")
reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}")
reverse.SetOut("rev", "{i:in|%.txt}.rev.txt")
complmt.In("in").From(makeDNA.Out("dna"))
reverse.In("in").From(complmt.Out("compl"))
wf.Run()
}
● Import SciPipe
● Set up any default variables
or data, handle flags etc
● Initiate workflow
● Create processes
● Define outputs and paths
● Connect outputs to inputs
(dependencies / data flow)
● Run the workflow
51. Turn Audit log into TeX/PDF report
TeX template by Jonathan Alvarsson @jonalv
52. ● Intuitive behaviour: Like conveyor belts & stations in a factory.
● Flexible: Combine command-line programs with Go components
● Custom file naming: Easy to manually browse output files
● Portable: Distribute as Go code or as compiled executable files
● Easy to debug: Use any Go debugging tools or even just println()
● Powerful audit logging: Stream outputs via UNIX FIFO files
● Efficient & Parallel: Fast code + Efficient use of multi-core CPU
Benefits of SciPipe - Thanks to Go + FBP
54. Thank you for your time!
Using Flow-based programming
... to write Tools and Workflows for Scientific Computing
Talk at Go Stockholm Conference Oct 6, 2018
Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter)
Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io