This document summarizes a presentation about server-side data integration using SnapLogic. It discusses problems with client-side mashups, how SnapLogic provides a solution with server-side mashups using reusable resources and pipelines. It provides examples of defining resources from various data sources, linking resources in pipelines, and accessing SnapLogic programmatically using SnapScript to define resources and pipelines in code.
Injustice - Developers Among Us (SciFiDevCon 2024)
Data Integration Server Side Mashups SnapLogic Project
1. Data Integration with Server Side Mashups
Juergen Brendel
Principal Software Engineer
OSDC 2007, Brisbane
2. Agenda
The SnapLogic project
•
• Client-side mashups
• Problems and solutions
• Data integration with SnapLogic
Data Integration with Server Side Mashups Slide 2
OSDC 2007, Brisbane
3. The SnapLogic project
• Founded 2005, data integration background
• Vision:
– Reusable data integration resources
– REST
– Web-based GUI
– Programmatic interface
– Open Source
• Python... Why not?
• www.snaplogic.com
Data Integration with Server Side Mashups Slide 3
OSDC 2007, Brisbane
4. What's a mashup?
• A 'Web 2.0 kind of thing'
• Combine, aggregate, visualise
– Multiple sources
– Multiple dimensions
• Typically on the client side
– Browser
– Ajax
Data Integration with Server Side Mashups Slide 4
OSDC 2007, Brisbane
5. Self-made mashups
• Hand coded
• Mashup editors
– GUI mashup-logic editor
– Wiki-style
– Hosted
Data Integration with Server Side Mashups Slide 5
OSDC 2007, Brisbane
6. Benefits for the enterprise?
nal
Enable knowledge
io
uat ns !
Sit
workers !!!
atio
c
ppli
a
Avoi
d th
IT b
ottle e
neck
!!
Yeah, right...
Data Integration with Server Side Mashups Slide 6
OSDC 2007, Brisbane
7. Problems with client-side mashups
Skill
•
• Internal data often not web-friendly
• Maintenance
• Security
Performance
•
Data Integration with Server Side Mashups Slide 7
OSDC 2007, Brisbane
8. Solution: Server-side mashups
• Flexible access
• Security
• Performance
Data Integration with Server Side Mashups Slide 8
OSDC 2007, Brisbane
9. SnapLogic data integration philosophy
Clearly defined, REST resources
•
• Data reuse and integration
• Pipelines
• Framework for resource specific scripting
Open source and community
•
Data Integration with Server Side Mashups Slide 9
OSDC 2007, Brisbane
10. Example: Resources
HTTP://server1.example.com/customer_list
Databases
SnapLogic Server
Files
Client HTTP
HTTP
Request and
Component
Applications
Response
Atom / RSS
Resource
JSON Definition
• Resource Name
• HTTP://server1.example.com/customer_list
• SQL Query or filename
• Credentials
• Parameters
Data Integration with Server Side Mashups Slide 10
OSDC 2007, Brisbane
11. Example: Pipelines
HTTP://server1.example.com/processed_customer_list
Databases
SnapLogic Server
Files
Client HTTP
HTTP
Request and
Component Component Component
Applications
Response
Atom / RSS
Resource Resource Resource
JSON Definition Definition Definition
Read Geocode Sort
Data Integration with Server Side Mashups Slide 11
OSDC 2007, Brisbane
12. A simple pipeline: Filtering leads
Data Integration with Server Side Mashups Slide 12
OSDC 2007, Brisbane
13. Linking fields in a pipeline
Data Integration with Server Side Mashups Slide 13
OSDC 2007, Brisbane
14. Reusing a pipeline as a resource
Data Integration with Server Side Mashups Slide 14
OSDC 2007, Brisbane
15. Reusing a pipeline as a resource
Data Integration with Server Side Mashups Slide 15
OSDC 2007, Brisbane
16. Reusing a pipeline as a resource
Data Integration with Server Side Mashups Slide 16
OSDC 2007, Brisbane
17. Adding new components
For access logic
•
• For data transformations
• Independent of data format
• Currently written in Python
Data Integration with Server Side Mashups Slide 17
OSDC 2007, Brisbane
18. A simple processing component
1: class IncreaseSalary(DataComponent):
2:
3: def init(self):
4: '''Called when the component is started.'''
5: self.increase = float(self.moduleProperties['percent_increase'])
6:
7: def processRecord(self, record):
8: '''Called for every record.'''
9: record.fields['salary'] *= (1 + self.increase/100)
10: self.writeRecord(record)
Data Integration with Server Side Mashups Slide 18
OSDC 2007, Brisbane
19. An Apache log file reader
1: class LogReader(DataComponent):
2:
3: def startReading(self):
4: '''Called when component does not have input stream.'''
5: logfile = open(self._filename, 'rbU')
6: format = self.moduleProperties['log_format']
7:
8: if format == 'COMMON':
9: p = apachelog.parser(apachelog.formats['common'])
10: elif ...
11:
12: # Read all lines in the logfile
13: for line in logile:
14: out_rec = Record(self.getSingleOutputView())
15: raw_rec = p.parse(line)
16: out_rec.fields['remote_host'] = raw_rec['%h']
17: out_rec.fields['client_id'] = raw_rec['%l']
18: out_rec.fields['user'] = raw_rec['%u']
19: out_rec.fields['server_status'] = int(raw_rec['%>s'])
20: out_rec.fields['bytes'] = int(raw_rec['%b'])
21: ...
22:
23: self.writeRecord(out_rec)
Data Integration with Server Side Mashups Slide 19
OSDC 2007, Brisbane
20. Programmatic access
• GUI is nice, but still limiting
• SnapScript: An API library
• Python, PHP, more to come
Data Integration with Server Side Mashups Slide 20
OSDC 2007, Brisbane
21. Creating a resource
1: # Create a new resource
2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')
3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'
4: staff_res_def.props.description = 'Read the from the employee file'
5: staff_res_def.props.title = 'Staff'
6: staff_res_def.props.delimiter = '$?{DELIMITER}'
7: staff_res_def.props.filename = '$?{INPUTFILE}'
8: staff_res_def.props.parameters = (
9: ('INPUTFILE', Param.Required, ''),
10: ('DELIMITER', Param.Optional, ',')
11: )
12:
13: # Define the output view of the resource
14: staff_res_def.props.outputview.output1 = (
15: ('Last_Name', 'string', 'Employee last name'),
16: ('First_Name', 'string', 'Employee first Name'),
17: ('Salary', 'number', 'Annual income')
18: )
Data Integration with Server Side Mashups Slide 21
OSDC 2007, Brisbane
22. Creating a pipeline
1: # Create a new pipeline
2: p = Pipeline()
3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'
4: p.props.title = 'Employee_Salary_Increase'
5:
6: # Select the resources in the pipeline
7: p.resources.Staff = staff_res_def.instance()
8: p.resources.PayRaise = increase_salary_res_def.instance()
9:
10: # Link the resources in the pipeline
11: link = (
12: ('Last_Name', 'last'),
13: ('First_Name', 'first'),
14: ('Salary', 'salary')
15: )
16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)
Data Integration with Server Side Mashups Slide 22
OSDC 2007, Brisbane
23. Pipeline parameters
1: # Define the user-visible parameters of the pipeline
2: p.props.parameters = (
3: ('INCREASE', Param.Required, ''),
4: )
5:
6: # Map values to the parameters of the pipeline's resources
7: p.props.parammap = (
8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),
9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')
10: )
11:
12: # Confirm correctness and publish as a new resource
13: p.check()
14: p.saveToServer(connection)
Data Integration with Server Side Mashups Slide 23
OSDC 2007, Brisbane
24. The end
Any questions?
jbrendel@snaplogic.org
Data Integration with Server Side Mashups Slide 24
OSDC 2007, Brisbane