This presentation depicts the 10 New Requirements for Modern Data Integration based from the SnapLogic VP of Engineering, Viakom Krishna. You can read the full article from Database Trends and Applications at http://www.dbta.com/Editorial/Trends-and-Applications/10-New-Requirements-for-Modern-Data-Integration-109146.aspx.
You can also read our blog coverage at www.snaplogic.com/blog/10-requirements.
12. 12
10 New Requirements for Modern Data Integration
1. Application integration is done primarily through REST and SOAP services
2. Large-volume data integration is available to a Hadoop-based data lake or
to cloud-based data warehouses
3. Integration has to support the continuum of data velocities starting from
batch all the way to continuous streams
4. Integration is event-based rather than clock-driven
5. Integration is primarily document-centric
6. Integration is hybrid and spans cloud-cloud and cloud-ground scenarios
7. Integration itself has to be accessible through SOAP/REST APIs
8. Integration is all about connectivity, connectivity, connectivity
9. Integration has to be elastic
10. Integration has to be delivered as a service
13. Anything
apps | data | APIs | things
SnapLogic: Unified Platform for Data and Application Integration
Anytime
batch | streaming | real-time
Anywhere
on prem | cloud | hybrid
14. Integrate at the speed of
modern business
+1 888-494-1570
sales@snaplogic.com
@SnapLogic
www.snaplogic.com
Notas del editor
Software applications are increasingly delivered as cloud-based services that expose SOAP/REST APIs for data and metadata management based on business services and business objects. Unlike the previous generation of on-premises applications, today’s SaaS applications do not allow direct access to the database behind their services. As a result, these applications lack the relational client server interface leveraged by the previous generation of application integration tooling.
To be effective, modern data integration platforms must provide easy and robust ways to consume REST and SOAP. They need to provide an easy way to abstract the complexities of these APIs into business actions and objects to enable an application administrator to integrate these services with the rest of the enterprise.
Increasingly, enterprise IT organizations (and lines of business/departments) are moving away from bespoke data warehouses to data lakes that are repositories of all data based on a Hadoop cluster. MapReduce and, more recently, Spark are used as the compute frameworks for data transformation of large amounts of data in this environment. Cloud data warehouse technologies such as Amazon Redshift, Microsoft Azure SQL Data Warehouse and Snowflake Computing are providing low-cost and low-administration alternatives to expensive specialized data warehouse appliances. Data integration tooling has to have a native understanding of newer storage and compute frameworks based on large-scale distributed frameworks such as HDFS and Spark. This is difficult for client server-based tooling that has relied on row sets as the primary commodity to be efficiently managed.
Change in data velocity or data size should not require you to change engines as in the previous version of tooling. Last-generation data integration engines were either optimized for batch processing of large volume data or for low latency handling of small messages.
Modern integration platforms should be able to provide the necessary velocity regardless of size of data. This means that the engine has to be able to stream large data such as sensor data from the Internet of Things just as easily as it can consume and deliver responses to discrete business events such as the addition of a new product or a new customer.
Responding to a business event as it happens is expected. For example, increasing the stock inventory on an item based on sentiments expressed in social media or entering a support case automatically when a failure is detected at a device. In either case, polling after the fact for these conditions means a frustrated or lost customer and an inefficient process in today’s real-time enterprise.
This is a corollary to the fact that integration is based on SOAP/REST APIs that send and receive hierarchical documents rather than row sets or compressed message payloads of the previous generation client server-based technologies.
Transforming hierarchical documents into row sets or into compressed payloads at the edges to make the internal engines run efficiently is the biggest impediment to streamlined repurposing of the previous generation of data integration tooling.
We are in a transitional period. While the newer software purchases are almost exclusively cloud-based, there is still a lot of investment in legacy on-premises enterprise applications that will take time to migrate. Some applications may never migrate to the cloud.
In today’s hybrid, multi-cloud environment, modern data integration technology has to be able to handle both on-premises and cloud-based applications with the same efficiency and ease.
Integration has to interoperate with other services in the enterprise such as monitoring, provisioning, and security. For example, enterprises might want to monitor the success or failure of integration flows through their own monitoring tools, and they might want to add new users automatically as they get added to the enterprise integration group. And most enterprises require single sign-on with their identity provider.
Just like real estate is all about location, location, location, integration is all about connectivity, connectivity, connectivity.
By definition, integration is about connecting disparate systems each with its own API set, and an integration toolset needs an effective framework to adapt these APIs to efficiently process the data. In addition, a large set of pre-built connectors speeds up the implementation and increases agility in responding to new integration scenarios.
Integration demands of a modern real-time enterprise can vary widely from one day to the next based on the business events that are taking place. One day could see hundreds of integrations triggered by a scenario that a data scientist is exploring, and the next day could be back to the normal load of a few integrations. Reserving capacity to handle the worst case computation/storage needs is costly, and not having sufficient capacity when necessary is even more so. This means that the integration framework has to be able to scale up and scale down resources on demand.
In a world that is increasingly cloud-based and data-driven, data access and integration technology have to be delivered as a service that’s accessible to anyone who needs them, rather than the few practitioners who toil away in the back room. The service has to be always on and web-scale to handle the elastic integration demands of the modern enterprise. On-premises data integration technology, with its long release cycles, complex and costly upgrades, and general administration, cannot handle the agility and need for speed in the modern enterprise. A new class of users—sometimes referred to as “citizen integrators”—has made self-service essential, and only a SaaS-based approach with simplified design, management, and monitoring interfaces can meet the broad spectrum of users and requirements.
These new requirements have given rise to a new category of integration called integration platform as a service (iPaaS), which should be built from the ground up to address the new and legacy enterprise application and data integration needs.
Leading enterprises choose SnapLogic because we help them connect data and applications faster.
We connect anything: sources including applications, APIs, things, or data
We connect anytime: in batches, streaming, or in real time
And we connect anywhere: on premises, in the cloud or a combination of both