"Take a look into how NordstromRack.com | HauteLook and Nasdaq OMX are using Amazon Redshift for data warehouse and supporting business intelligence workloads one year after they made the move to using Amazon Redshift. We will cover why HauteLook chose Redshift, how they built the architecture, discuss what data is being stored and accessed, and overall, how that data is powering the HauteLook business. We will also discuss how Nasdaq migrated from an on-premised data warehouse to Amazon Redshift, and how they've been able to take advantage of Redshift's array of security features such as hardware security modules (HSM), encryption, and audit-logging.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
(BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
1. November 12, 2014 | Las Vegas, NV
BDT206See How Amazon Redshift is Powering Business Intelligence in the Enterprise
Rahul Pathak, Amazon Redshift
Jason Timmes, Nasdaq
Kevin Diamond, Hautelook
2.
3. Amazon
Redshift
Amazon Elastic
MapReduce
Amazon EC2
Analyze
AWS Data
Pipeline
Amazon
Glacier
Amazon
DynamoDB
Store
AWS Direct
Connect
Collect
Amazon Kinesis
Amazon
S3
8. Data Source ET
Direct
Connect
Client
Forwarder
State Management Loader
Amazon Redshift Sandbox
S3
9.
10.
11. 11
LEADING INDEX PROVIDER WITH
41,000+ INDEXES
ACROSS ASSET CLASSES AND GEOGRAPHIES
Over 10,000 Corporate Clients in
60 countries
Our technology powers over
70 MARKETPLACES, regulators, CSDs and clearing- houses
in over
50 COUNTRIES
100+ DATA
PRODUCT OFFERINGS
supporting 2.5+ million
investment professionals and users
IN 98 COUNTRIES
26Markets
3 Clearing Houses
5Central Securities Depositories
Lists more than 3,500
companies in 35 countries, representing more than $8.8 trillionin total market value
12.
13. Our warehouse can be used to analyze market share, client activity, surveillance, power our billing, and more…
14.
15.
16.
17.
18. •Pay close attention to manifest mandatory flag!
–Amazon Redshift UNLOAD always sets this to false!!!
19. •TableIngestStatus
–We originally put this table in Amazon Redshift itself
–Turns out Amazon Redshift is not efficient on really small data sets
–Significantly impacted performance, and increased concurrency contention
•Solution: Moved TableIngestStatus to a separate transactional RDBMS (MySQL)
–We were already using a MySQL instance to persist workflow states
20. •Direct Connect (private lines)
•VPC
•Encryption in flight(HTTPS/SSL/TLS on API, JDBC)
–Parameter Group: require_ssl = true
–Use Amazon Redshift cluster SSL certificate to verify cluster identity
•Encryption at rest
–AES-256 encrypt files prior to loading to S3 (not using S3 SSE)
–Amazon Redshift encryption
•Specified at cluster creation, applies to backups/snapshots too
21. •Amazon Redshift will store the cluster key in a singlecustomer premise HSM (or CloudHSM)
–SafeNet Luna SA HSM, firmware version should match CloudHSM
–Requires certificate exchange between cluster and HSM
–Requires cluster have an EIP
•On our side, required static 1-to-1 NAT of HSM private IP
•VPC Security Groups still apply; can still isolate cluster from others
–Encrypted database key decrypted in HSM, passed over encrypted channel to cluster on startup, stored in memory to decrypt data encryption (block) keys
–If running an HSM HA group, must synchronize keys after creation
22. •HSM integration was critical to Nasdaq adoption
•Monitor cluster access, react to any unauthorized connections
–STL_CONNECTION_LOG
•Query system table on a timed basis, alert to any unexpected access
–CloudTrail to Splunk Amazon Redshift connection & user logs
•Captures all API calls, not activity inside Amazon Redshift
–STL_DDLTEXT
•Audits all schema changes in the cluster
•In response to an alert, Amazon Redshift/HSM connectivity is severed, and cluster is immediately shut down
23. •With validation, data integrity, and security requirements met, the challenge remains to optimize ingest
•Why?
–Concurrency is a huge performance factor; can’t afford to be loading yesterday’s data when clients are running queries
27. On premises
AWS Regional (Multi-AZ) Scope
AWS (US-East, primary AZ/VPC)
S3
Amazon SNS
Redshift Database Cluster
HSM Key Appliance Cluster
MySQL
Redshift Load files/ Manifests
Redshift Snapshots/ Backups
Data Loaded Topic
RMS Input Sources (multiple systems)
Data Ingest Process
28.
29.
30.
31.
32.
33.
34.
35. November 12, 2014 | Las Vegas, NV
BDT206See How Amazon Redshift is Powering Business Intelligence in the Enterprise
Kevin Diamond, Nordstromrack.com | HauteLook