Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Big data at CallFire
1. Big Data at CallFire
Vijesh Mehta (Co-Founder and CTO)
2. Agenda
• A little about CallFire
• CallFire’s technical challenges
• How CallFire deals with data
• Summary
3. Some background about myself
• I am one of the founders of CallFire.
– Started in 2005 in a small apartment
– Now 28 people
– Bootstrapped and profitable
• I’ve been writing software primarily in the
Java space for 12 years. CallFire is all
Java.
– We use : Wicket, Guice, Hibernate, MySQL,
Cassandra, ActiveMQ, XEN, Puppet
4. About CallFire
• We are a cloud telephony provider.
– Outbound Phone calls
– Phone Numbers
– SMS through long and short codes
– IVR – Interactive Voice Response
– Power Dialing
• CallFire’s call volume can get large very quickly.
– Hurricane Sandy : 1.9 million emergency calls
• 4 Engineers and 1 System admin managing
operations and new features.
• We just hired 7 more engineers this year, and still hiring!
5. Technical Challenges by Numbers
• 1.4 billion calls and texts
– Growing exponentially
• Over 50,000 accounts
• Over 6 million campaigns
• 80 million sound files
• 14 TB in storage (NFS)
• MySQL : Over 10,000 qps at peak
Big data isn’t always big company problem!
6. Growing faster each day
Campaigns
over
Time
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
7. The first challenge
• Problem : We outgrew our datacenter. New
systems need access to central storage.
Replication across a 1gb/s interconnect.
• Needed Solution:
– Must work across datacenter
– Must scale as demand increases
– Must be fault tolerant
– Must deal with over 80 million sound files
– Cheaper the better
8. Solutions Considered (2010)
NFS
GLUSTER
HDFS
CASSANDRA
Fault
Tolerant
Yes,
if
configured
Yes
Yes
Yes
Datacenter
Maybe.
Rsync
isn’t
Not
at
the
Dme
Yes
Yes
Replica>on
fun
with
lots
of
files.
Easy
to
add
storage
No
Not
at
the
Dme
Yes
Yes
No
Single
point
of
No
Yes
Not
exactly,
Yes
failure
NameNode.
Data
always
No,
hard
to
sort
No,
same
as
a
file
Yes
Yes
accessible
easily
through
file
system
systems.
Notes
Not
working
for
us.
Looks
good,
tried
it
Didn’t
like
the
name
Everything
we
Too
much
for
a
while.
Easy
at
node
issue.
May
need,
quick
to
management
and
first
because
it
was
have
been
a
good
learn.
We
went
all
downDme.
a
file
system.
way
to
go.
in!
*
Only
LAN
soluDons
considered.
Calls
had
too
much
latency
in
the
cloud,
or
even
across
datacenter.
9. Cassandra
• Storage isn’t the best use of Cassandra.
• Do not exceed 50% of drive space.
– Compaction needs the space. Hard lesson learned.
• Fault Tolerance: Replication factor of 3.
• Result
• 1 TB of data = 6 TB of storage needed!
• CallFire has a 74TB Cassandra Cluster
10. Extending the scope
• We like SQL and Hibernate.
– Pros: Easy, Flexible, Ad-Hoc Queries, Locks
– Cons: Scaling
• Solution: Sharding with Cassandra for universal data
Shard
1
Shard
2
Shard
3
Cassandra
Cluster
11. Sharding + Big Data
• Cassandra makes sharding easier
– Easy to store universal data. (Authentication)
– Performs very well
• Tungsten Replicator (Big Data with SQL)
– Sharding makes joins impossible, so fan your
data into central places.
– NoSQL can’t handle ad-hoc queries. No
worries, you can still have SQL.
12. Big Data Summary
• Not Just for big companies, data grows rapidly in
todays environment.
– Nice article about Obama’s Data Crunchers:
– http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/
• NoSQL systems have easier scaling and fault
tolerance mechanisms.
– Not uncommon to see small teams with 10-20 node
clusters.
• SQL is still a big part of the equation. (Tungsten)
– Fan in information across partitions
– Replicate across datacenters
– Keep your ad-hoc dreams alive!