SecurityScorecard is a global leader in cybersecurity ratings and the only service with over 12 million companies continuously rated. ScyllaDB is now an integral part of our data processing. Our requirements are for a database with low query latency, real-time data ingestion, fault tolerance, and highly scalable.
In this presentation, we will share how ScyllaDB is powering our platform and why it is a great fit. We will highlight our business and technical use-cases, and the challenges we faced before migrating to ScyllaDB. Next, we will describe how we migrated three data sources and decoupled the frontend and backend services by introducing a middle layer for improved scalability and maintainability.Finally, we will conclude by sharing some of our learnings, performance benchmarks, and future plans.
2. Nguyen Cao
■ Staff Software Engineer at SecurityScorecard
■ Key member of data migration to ScyllaDB project
■ 8 years of experience building large scale distributed
systems
■ MSc in Computing Science, specialized in Big Data
5. SecurityScorecard Mission
To make the world a safer place by transforming
the way organizations understand,
mitigate, and communicate cybersecurity to their
Boards, employees, and vendors.
6. SecurityScorecard Security Rating
Security Rating is an objective, data-driven and quantifiable
measure of an organization’s overall cybersecurity and cyber
risk exposure. Ratings grade vendors and organizations on a
scale of A through F.
SecurityScorecard provides quality insights, giving you the
confidence to make fast and informed decisions about
cybersecurity investments.
Companies with an F rating are 7.7x more likely to suffer a
data breach versus those with an A rating.
Entities with a Better Security Rating are More Resilient
SecurityScorecard Provides:
Continuous
Visibility into
Statewide Risk
Greater Visibility
into Cyber
Investments
Decreased Risk of
Breaches
Hurting the State
and Taxpayers
7. SecurityScorecard Data Pipeline
- IPv4 scan
- Malware Sinkholes
- DNS data
- External data feeds
Signal Collection
- RIR, DNS, SSL data
- Domain discovery
- Subdomains
- IP-domain pairing
Attribution Engine
- Investigate
emerging threats
- CVEs
- Machine Learning
Cyber Analytics
- Digital Footprint
- Size normalization
- Factor scores
- Total score
Scoring Engine
Global network of sensors
deployed across 50 countries
to spot zero-day threats
4.1B IP addresses scanned
every week
100B+ vulnerabilities
published weekly at
trust.securityscorecard.com
12M+ organizations
continually scored every day
Risk Factors
Application
Security
Hacker
Chatter
Cubic
Score
Social
Engineering
Patching
Cadence
DNS
Health
Network
Security
Endpoint
Security
IP
Reputation
Information
Leak
The detected security issues are measured by the assigned factor with severity-based weights, update cadence
and age out window to determine the calculation of a score
15. Scoring Architecture Current
OVERVIEW
ssc-platform-api ssc-svc-measurements
Scoring
Workflow
CQL query
REST API
S3
ssc-scoring-api
Presto
Cluster
AWS EMR
SQL query
12M
scorecards
4B measurement stats
for domains/IPs
all historical
measurement details
historical measurement
details for 2 weeks
16. Scoring Architecture Current
LOW LATENCY
ssc-platform-api ssc-svc-measurements
Scoring
Workflow
CQL query
REST API
ssc-scoring-api
S3
Presto
Cluster
AWS EMR
SQL query
scorecard_detail (
uuid_company_id_key UUID,
total_score DOUBLE,
breach_impact DOUBLE,
…,
effective_date DATE,
PRIMARY KEY ((uuid_company_id_key),effective_date)
) WITH default_time_to_live = 32400000;
schemas are designed
based on access pattern
highly parallel
processing tasks
SELECT *
FROM scorecard_detail
WHERE uuid_company_id_key IN (...)
AND date >= … and date <= …
read throughput is
stable even under
high write workload
18. Scoring Architecture Current
DATA ACCESS ABSTRACTION
ssc-platform-api ssc-svc-measurements
Scoring
Workflow
CQL query
REST API
S3
ssc-scoring-api
Presto
Cluster
AWS EMR
SQL query
ssc-svc-users
ssc-svc-reports
….
access data in ScyllaDB for
low latency requests with
high volume
redirect all historical or
high latency requests
such as reporting to
Presto S3
REST interface
access for all
FE services
19. Results
Migration to ScyllaDB helps us gain lot of
benefits from different perspectives:
■ 90% latency reduction for most service
endpoints
■ 80% less production incidents related to
Presto/Aurora performance
■ $1M infrastructure cost saving per year
■ 30% faster data pipeline processing
■ Much better customer experience
20. Lessons
Route infrequent, complex and high
latency-tolerant data access to OLAP engines
like Presto, Athena (generating reports, custom
analysis, etc.)
Build a scalable, highly parallel processing
aggregation component to overcome current
limits of CQL (in-memory JOIN-capable,
SELECT-IN queries, etc.)
Design ScyllaDB schemas based on
data access patterns to address latency
issues.
21. Thank You
Stay in Touch
Nguyen Cao
ncao@securityscorecard.io
@ducnguyen_cao
https://github.com/nguyencaoduc
https://www.linkedin.com/in/nguyenduccao/