5. Solution
Reliable framework with parallel execution, data validation, error recovery,
auto guessing, resuming and extensive plugins
github.com/embulk/embulk
8. Getting Started
1. Embulk requires Java
2. Download embulk:
http://dl.embulk.org/embulk-latest.jar
3. Make it executable
$ embulk --version
4. Run an example:
$ embulk example
9. Installing Embulk Plugin
$ embulk gem install embulk-input-mysql
$ embulk gem install embulk-output-postgresql
List of plugins:
https://embulk.org/plugins
11. Embulk Configuration File
(YAML)
in: Input plugin options.
◦ parser: If the input is file-based, parser plugin parses a file format (built-in csv,
json, etc).
◦ decoder: If the input is file-based, decoder plugin decodes compression or
encryption (built-in gzip, bzip2, zip, tar.gz, etc).
out: Output plugin options.
◦ formatter: If the output is file-based, formatter plugin formats a file format (such
as built-in csv, JSON)
◦ encoder: If the output is file-based, encoder plugin encodes compression or
encryption (such as built-in gzip or bzip2)
filters: Filter plugins options (optional).
exec: Executor plugin options. An executor plugin control parallel processing
(such as built-in thread executor, Hadoop MapReduce executor)
12. Using Guess Command
Guess command guesses parser and decoder options
$ embulk guess seed.yml –o config.yml