Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Scripting Embulk Plugins

1.422 visualizaciones

Publicado el

Scripting Embulk plugins makes plugin development easier drastically. You can develop, test, and productionize data integrations using any scripting languages. It's most suitable way to integrate data with SaaS using vendor-provided SDKs.
https://techplay.jp/event/781988

Publicado en: Software
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Scripting Embulk Plugins

  1. 1. Sadayuki Furuhashi Scripting Embulk Plugins Embulk & Digdag Online Meetup 2020
  2. 2. A founder of Treasure Data, Inc. located in Silicon Valley. OSS I designed: An open-source hacker. Github: @frsyuki Sadayuki Furuhashi
  3. 3. Data integrationis a key for data-driven business. AI/ML Analytics Databases
  4. 4. Data is moving to the cloud & SaaS SaaS
  5. 5. SaaS data integration is becoming more common SaaS
  6. 6. Who develops new SaaS integrations? Java developers Low code Scripting with SDKs Scripting Embulk plugin API SaaS users Dev vs user gap!
  7. 7. Who develops new SaaS integrations? Java developers Low code Scripting with SDKs Scripting Embulk plugin API Embulk scripting SaaS users Dev = user
  8. 8. Scripting on the powerful framework Embulk scripting plugin Embulk core framework Your script SDK / library ✓High-performance ✓Choices of output plugins Embulk plugins
  9. 9. How it works? 1. Run a script 3. Write rows as a CSV file 4. Read the CSV file 2. Load rows named pipe Embulk scripting plugin Your script SDK / library Named pipe is like a file but not a file. • It doesn’t consume disk space. • It doesn’t cause disk IO (=fast). • It transfers data as your script writes rows (=fast).
  10. 10. How it works? 1. Run a script 3. Write rows as a CSV file 4. Read the CSV file named pipe Embulk scripting plugin Your script SDK / library output plugin5. Pass rows to an
 output plugin 2. Load rows
  11. 11. How to use embulk-input-script 1. Install 2. Create a config 3. Run $ embulk gem install embulk-input-script in: type: script run: ruby your_script.rb #-- any executable out: type: … $ embulk run config.yaml
  12. 12. How to develop a script- your script runs 3 times if ARGV[0] == “setup” File.write(ARGV[2], “…”) elsif ARGV[0] == “run” CSV.open(ARGV[2], “w”) do |file| file << row … end elsif ARGV[0] == “finish” puts “Done!” end $ script.rb setup <config.yaml> <setup.yaml> $ script.rb run <setup.yaml> <N> <output.csv> $ script.rb finish <setup.yaml> First, write a setup file. It should include column names, column types and parallelism. Second, load rows and write them to a CSV file. If the setup file says parallelism is bigger than 1, this runs for multiple times with N=0, 1, 2, 3, … Finally, do cleanup if necessary. $ script.rb setup <config.yaml> <setup.yaml> $ script.rb run <setup.yaml> <output.csv> <N> $ script.rb finish <config.yaml> <setup.yaml>
  13. 13. Examples • Importing server status from DataDog
 https://github.com/embulk/embulk-input-script/tree/master/examples/datadog_hosts • Importing AWS EC2 server list
 https://github.com/embulk/embulk-input-script/tree/master/examples/aws_ec2_instances
  14. 14. Wanted • Output support
 embulk-output-script is not available. • Converter from a script to an Embulk plugin gem
 When you create a script, you want to release it so that other people can reuse it.
 To do it, we need a tool that packages the script with embulk-input-script as a gem.

×