SlideShare a Scribd company logo
1 of 51
Download to read offline
PK Chunking
Divide and conquer massive objects in Salesforce
Daniel Peter
Lead Applications Engineer,Kenandy
@danieljpeter
Bay Area Salesforce Developer User Group
Takeaways: How to avoid these errors
Query not “selective” enough:
•Non-selective query against large object type (more than 100000 rows).
Query takes too long:
•No response from the server
•Time limit exceeded
•Your request exceeded the time limit for processing
Too much data returned in query:
•Too many query rows: 50001
•Remoting response size exceeded maximum of 15 MB.
GET THE DATA
Sounds great. How?
Not so fast…
…first we need some pre-requisite knowledge!
•Database Indexes
•Salesforce Ids
Database indexes (prereq)
“Allow us to quickly locate rows without
having to scan every row in the
database”(paraphrased from wikipedia)
Database indexes (prereq)
Database indexes (prereq)
Location
Location
Location
Salesforce Ids (prereq)
•Composite key containing multiple pieces of
data.
•Uses base 62 numbering instead of the more
common base 10.
•Fastest way to find a database row.
Salesforce Ids (prereq)
Digits Values
1 62
2 3,844
3 238,328
4 14,776,336 million
5 916,132,832 million
6 56,800,235,584 billion
7 3,521,614,606,208 trillion
8 218,340,105,584,896 trillion
9 13,537,086,546,263,600 quadrillion
Digits Values
1 10
2 100
3 1,000
4 10,000
5 100,000
6 1,000,000 million
7 10,000,000 million
8 100,000,000 million
9 1,000,000,000 billion
Base 10 Base 62vs
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Salesforce Ids (prereq)
MO’ NUMBERS
Base 62
Prerequisites complete!
How does PK Chunking work?
Analogy: fetching people in a city.
Fetching people in a city: problems
Non-selective
Request:
“get me all the people who are female”
Response:
“yer trippin’!”
Fetching people in a city: problems
Timeout
Request:
“find me a 7 foot tall person in a pink tuxedo in Beijing”
Response:
(after searching all day) “I can’t find any! I give up!”
Finding people in a city: problems
Too many people found
Request:
“find me all the men in San Francisco with beards”
Response:
(after searching for 10 mins) “The bus is full!”
PK Chunking addresses all those problems
Divide and conquer!
Parallelism!
Fetching people in a city: solutions
Non-selective
Request:
“get me all the people who are female, in your small search area”
Response:
“¡Con mucho gusto!”
Fetching people in a city: solutions
Timeout
Request:
“find me a 7 foot tall person in a pink tuxedo in Beijing, in your small
search area”
Response:
SP1: “Didn’t find any, sorry!”
SP2: “Didn’t find any, sorry!”
SP3: “Found one!”
SP4: “Didn’t find any, sorry!”
Finding people in a city: solutions
Too many people found
Request:
“find me all the men in San Francisco with beards, in your small search
area”
Response:
SP1: 30 people in our bus
SP2: Didn’t find any
SP3: 50 people in our bus
Technical details
2 different implementations
QLPK
Query Locator PK Chunking
Base62PK
Base62 PK Chunking
QLPK
Salesforce SOAP or REST API – AJAX toolkit works great.
Create and leverage a server-sidecursor. Similar to an Apex query
locator (Batch Apex).
Analogy: Print me a phone book of everyonein the city so I can flip
through it.
QLPK – AJAX Toolkit Request
QLPK – AJAX Toolkit Response
Chunk the database, in size of your choice, by offsetting the
queryLocator:
01gJ000000KnRpDIAV-500000
1gJ000000KnRpDIAV-100000
…
01gJ000000KnRpDIAV-39950000
01gJ000000KnRpDIAV-40000000
QLPK – The Chunks
800 chunks
x 50,000 records
40,000,000 total records
Analogy: we have exact addresses for clusters of 50k
people to give to 800 different search parties.
QLPK – How to use in a query?
Perform 800 queries with the Id ranges in the whereclause:
SELECT Id, Autonumber__c, Some_Number__c
FROM Large_Object__c
WHERE Some_Number__c> 10 AND Some_Number__c< 20
AND Id >= ’a00J000000BWNYk’
AND Id <= ’a00J000000BWO4z’
THAT SPLIT CRAY
database so hard, take 800 queries to find me
QLPK – Parallelism
Yeah it’s 800 queries, but…
They all went out at once, and they might all come
back at once.
Analogy: We hired 800 search parties and unleased
them on the city at the same time.
QLPK Base62PK
Shift Gears
Base62PK
Get the first and last Id of the database and
extrapolate the ranges in between.
Analogy: Give me the highest and lowest address of
everyone in the city and I will make a phonebook with
every possible address in it. Then we will break that
into chunks.
Base62PK – first and last Id
Get the first Id
SELECT Id FROM Large_Object__c ORDER BY Id ASC LIMIT 1
Get the last Id
SELECT Id FROM Large_Object__c ORDER BY Id DESC LIMIT 1
Even on H-U-G-E databases these return F-A-S-T. No problem.
Base62PK – extrapolate
1. Chop off the last 9 digits of the 15 digit first/last Ids.
Decompose.
2. Convert the 9 digit base 62 numbers into a Long Integer.
3. Add the chunk size to the first number until you hit or
exceed the last number.
4. Last chunk may be smaller.
5. Convert those Long Integers back to base 62 and re-
compose the 15 digit Ids
Base62PK – benefits
•High performance! Calculates the Ids instead of
querying for them.
Base62PK – issues
•Digits 4 and 5 of the Salesforce Id are the pod
Identifier. If the Ids in your org have different
pod Id’s this technique will break, unless
enhanced.
•Fragmented Ids lead to sparsely populated
ranges. You will search entire ranges of Ids
which have no records.
So which do I pick?
QLPK
or
Base62PK
So which do I pick?
Hetergeneous Pod Ids Homogeneous Pod Ids
Low Id Fragmentation
(<1.5x)
Medium Id
Fragmentation
(1.5x - 3x)
High Id
Fragmentation
(>3x)
QLPK X X X
Base62PK X X
How do I implement?
•Needs to be orchestrated via language like JS in
your page, or another platform (Heroku)
•Doesn’t work on Lightning Component
Framework (yet). No support for real parallel
controller actions. (boxcarred)
•Has to be Visualforce or Lightning / Visualforce
hybrid.
How do I implement?
•Use RemoteActions to get the chunk queries
back into your page.
•Can be granular or aggregate queries!
•Process each chunk query appropriately when it
comes back. EX: update totals on a master
object or push into a master array.
function queryChunks() {
for (var i=0; i<chunkList.length; i++) {
queryChunk(i);
}
}
function queryChunk(chunkIndex) {
var chunk = chunkList[chunkIndex];
Visualforce.remoting.Manager.invokeAction(
'{!$RemoteAction.Base62PKext.queryChunk}',
chunk.first, chunk.last,
function (result, event) {
for (var i=0; i<result.length; i++) {
objectAnums.push(result[i].Autonumber__c);
}
queryChunkCount++;
if (queryChunkCount == chunkList.length) {
allQueryChunksComplete();
}
},
{escape: false, buffer: false}
);
}
@RemoteAction
public static List<Large_Object__c> queryChunk(String firstId, String lastId) {
String SOQL = 'SELECT Id, Autonumber__c, Some_Number__c ' +
'FROM Large_Object__c ' +
'WHERE Some_Number__c > 10 AND Some_Number__c < 20 ' +
'AND Id >= '' + firstId + '' ' +
'AND Id <= ''+ lastId +'' ';
return database.query(SOQL);
}
Landmines
Timeouts – retries
•Cache warming means if you first fail, try and try again!
Concurrency
•Beware: ConcurrentPerOrgApexLimit exceeded
•Keep your individual chunk queries lean. < 5 secs.
Demos
Backup video:
https://www.youtube.com/watch?v=KqHOStka0eg
How did you figure this out?
Had to meet requirements for Kenandy’slargest customer. $2.5B / yr
manufacturer.
High visibility project.
Necessity mother of invention!
How did you figure this out?
Query Plan Tool
How did you figure this out?
Debug logs from real execution
Why doesn’t Salesforce do this?
They do!
(kinda)
The Bulk API uses a similar technique, but it is more
asynchronous and wrapped in a message container to
track progress.
More Info
Article on Salesforce Developers Blog
https://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive-
orgs.html
Githubrepo
https://github.com/danieljpeter/pkChunking
Bulk API documentation
https://developer.salesforce.com/docs/atlas.en-
us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm
Q&A
Thank you!
Forcelandia 2016 PK Chunking

More Related Content

What's hot

LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Enginedenis Udod
 
Introduction à kafka
Introduction à kafkaIntroduction à kafka
Introduction à kafkaunivalence
 
Streaming in Scala with Avro
Streaming in Scala with AvroStreaming in Scala with Avro
Streaming in Scala with Avrounivalence
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...Hakka Labs
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Brian Brazil
 
Web Crawlers in Perl
Web Crawlers in PerlWeb Crawlers in Perl
Web Crawlers in PerlLambert Lum
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthPhilip Norton
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantageAlexandra Johnson
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsAzmath Mohamad
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...javier ramirez
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageGreg Brown
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
Mongodb beijingconf yottaa_3.3
Mongodb beijingconf yottaa_3.3Mongodb beijingconf yottaa_3.3
Mongodb beijingconf yottaa_3.3Yottaa
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel FailsOliver Mason
 

What's hot (20)

LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Engine
 
Introduction à kafka
Introduction à kafkaIntroduction à kafka
Introduction à kafka
 
Streaming in Scala with Avro
Streaming in Scala with AvroStreaming in Scala with Avro
Streaming in Scala with Avro
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Web Crawlers in Perl
Web Crawlers in PerlWeb Crawlers in Perl
Web Crawlers in Perl
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp North
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantage
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 steps
 
Cqrs api v2
Cqrs api v2Cqrs api v2
Cqrs api v2
 
React & GraphQL
React & GraphQLReact & GraphQL
React & GraphQL
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
Mongodb beijingconf yottaa_3.3
Mongodb beijingconf yottaa_3.3Mongodb beijingconf yottaa_3.3
Mongodb beijingconf yottaa_3.3
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel Fails
 

Viewers also liked

Forcelandia Salesforce CI
Forcelandia Salesforce CIForcelandia Salesforce CI
Forcelandia Salesforce CIDaniel Hoechst
 
Forcelandia 2016 Wave App Development
Forcelandia 2016   Wave App DevelopmentForcelandia 2016   Wave App Development
Forcelandia 2016 Wave App DevelopmentSkip Sauls
 
Dev Tools for Admins - Forcelandia 2016
Dev Tools for Admins - Forcelandia 2016Dev Tools for Admins - Forcelandia 2016
Dev Tools for Admins - Forcelandia 2016Laura Meerkatz
 
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptx
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptxTanner Ellen - Forcelandia 2016 - Dev Stack.pptx
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptxSeedCode
 
Tree Traversal #SalesforceSaturday
Tree Traversal #SalesforceSaturdayTree Traversal #SalesforceSaturday
Tree Traversal #SalesforceSaturdayDaniel Peter
 

Viewers also liked (7)

Forcelandia Salesforce CI
Forcelandia Salesforce CIForcelandia Salesforce CI
Forcelandia Salesforce CI
 
Heroku Demo
Heroku DemoHeroku Demo
Heroku Demo
 
Forcelandia 2016 Wave App Development
Forcelandia 2016   Wave App DevelopmentForcelandia 2016   Wave App Development
Forcelandia 2016 Wave App Development
 
Dev Tools for Admins - Forcelandia 2016
Dev Tools for Admins - Forcelandia 2016Dev Tools for Admins - Forcelandia 2016
Dev Tools for Admins - Forcelandia 2016
 
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptx
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptxTanner Ellen - Forcelandia 2016 - Dev Stack.pptx
Tanner Ellen - Forcelandia 2016 - Dev Stack.pptx
 
Tree Traversal #SalesforceSaturday
Tree Traversal #SalesforceSaturdayTree Traversal #SalesforceSaturday
Tree Traversal #SalesforceSaturday
 
2016 Salesforce Denver User Group Salary Survey
2016 Salesforce Denver User Group Salary Survey2016 Salesforce Denver User Group Salary Survey
2016 Salesforce Denver User Group Salary Survey
 

Similar to Forcelandia 2016 PK Chunking

Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018Mike Harris
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme SwiftMovel
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
CODAIT/Spark-Bench
CODAIT/Spark-BenchCODAIT/Spark-Bench
CODAIT/Spark-BenchEmily Curtin
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Railsfreelancing_god
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REAkenbot
 
GraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageGraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageBartosz Sypytkowski
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverMongoDB
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Persistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfPersistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfIvan Vergiliev
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeDenny Lee
 

Similar to Forcelandia 2016 PK Chunking (20)

Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
CODAIT/Spark-Bench
CODAIT/Spark-BenchCODAIT/Spark-Bench
CODAIT/Spark-Bench
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
GraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageGraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized age
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET Driver
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Persistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfPersistent Data Structures - partial::Conf
Persistent Data Structures - partial::Conf
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 

More from Daniel Peter

Salesforce Slack Demo Cactusforce 2022
Salesforce Slack Demo Cactusforce 2022Salesforce Slack Demo Cactusforce 2022
Salesforce Slack Demo Cactusforce 2022Daniel Peter
 
Rules-based Record Generation with Custom Metadata Types
Rules-based Record Generation with Custom Metadata Types Rules-based Record Generation with Custom Metadata Types
Rules-based Record Generation with Custom Metadata Types Daniel Peter
 
Save Millions of Clicks! Easily migrate complex schemas from SQL to Salesforce.
Save Millions of Clicks!  Easily migrate complex schemas from SQL to Salesforce.Save Millions of Clicks!  Easily migrate complex schemas from SQL to Salesforce.
Save Millions of Clicks! Easily migrate complex schemas from SQL to Salesforce.Daniel Peter
 
Using Custom Permissions to Simplify Security
Using Custom Permissions to Simplify SecurityUsing Custom Permissions to Simplify Security
Using Custom Permissions to Simplify SecurityDaniel Peter
 
DF Global Gathering PuneWIT
DF Global Gathering PuneWITDF Global Gathering PuneWIT
DF Global Gathering PuneWITDaniel Peter
 
Dreamforce Global Gathering Bangaluru 2017
Dreamforce Global Gathering Bangaluru 2017Dreamforce Global Gathering Bangaluru 2017
Dreamforce Global Gathering Bangaluru 2017Daniel Peter
 
Blaze a Trail to Predictive Selling With Einstein Intent
Blaze a Trail to Predictive Selling With Einstein IntentBlaze a Trail to Predictive Selling With Einstein Intent
Blaze a Trail to Predictive Selling With Einstein IntentDaniel Peter
 
LDS salesforce saturday
LDS  salesforce saturdayLDS  salesforce saturday
LDS salesforce saturdayDaniel Peter
 
Lightning Reports - Dreamforce 2015
Lightning Reports - Dreamforce 2015Lightning Reports - Dreamforce 2015
Lightning Reports - Dreamforce 2015Daniel Peter
 
Callout architecture
Callout architectureCallout architecture
Callout architectureDaniel Peter
 

More from Daniel Peter (11)

Salesforce Slack Demo Cactusforce 2022
Salesforce Slack Demo Cactusforce 2022Salesforce Slack Demo Cactusforce 2022
Salesforce Slack Demo Cactusforce 2022
 
Rules-based Record Generation with Custom Metadata Types
Rules-based Record Generation with Custom Metadata Types Rules-based Record Generation with Custom Metadata Types
Rules-based Record Generation with Custom Metadata Types
 
Save Millions of Clicks! Easily migrate complex schemas from SQL to Salesforce.
Save Millions of Clicks!  Easily migrate complex schemas from SQL to Salesforce.Save Millions of Clicks!  Easily migrate complex schemas from SQL to Salesforce.
Save Millions of Clicks! Easily migrate complex schemas from SQL to Salesforce.
 
No Refresh Needed
No Refresh NeededNo Refresh Needed
No Refresh Needed
 
Using Custom Permissions to Simplify Security
Using Custom Permissions to Simplify SecurityUsing Custom Permissions to Simplify Security
Using Custom Permissions to Simplify Security
 
DF Global Gathering PuneWIT
DF Global Gathering PuneWITDF Global Gathering PuneWIT
DF Global Gathering PuneWIT
 
Dreamforce Global Gathering Bangaluru 2017
Dreamforce Global Gathering Bangaluru 2017Dreamforce Global Gathering Bangaluru 2017
Dreamforce Global Gathering Bangaluru 2017
 
Blaze a Trail to Predictive Selling With Einstein Intent
Blaze a Trail to Predictive Selling With Einstein IntentBlaze a Trail to Predictive Selling With Einstein Intent
Blaze a Trail to Predictive Selling With Einstein Intent
 
LDS salesforce saturday
LDS  salesforce saturdayLDS  salesforce saturday
LDS salesforce saturday
 
Lightning Reports - Dreamforce 2015
Lightning Reports - Dreamforce 2015Lightning Reports - Dreamforce 2015
Lightning Reports - Dreamforce 2015
 
Callout architecture
Callout architectureCallout architecture
Callout architecture
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Forcelandia 2016 PK Chunking

  • 1. PK Chunking Divide and conquer massive objects in Salesforce Daniel Peter Lead Applications Engineer,Kenandy @danieljpeter Bay Area Salesforce Developer User Group
  • 2. Takeaways: How to avoid these errors Query not “selective” enough: •Non-selective query against large object type (more than 100000 rows). Query takes too long: •No response from the server •Time limit exceeded •Your request exceeded the time limit for processing Too much data returned in query: •Too many query rows: 50001 •Remoting response size exceeded maximum of 15 MB.
  • 4. Sounds great. How? Not so fast… …first we need some pre-requisite knowledge! •Database Indexes •Salesforce Ids
  • 5. Database indexes (prereq) “Allow us to quickly locate rows without having to scan every row in the database”(paraphrased from wikipedia)
  • 8. Salesforce Ids (prereq) •Composite key containing multiple pieces of data. •Uses base 62 numbering instead of the more common base 10. •Fastest way to find a database row.
  • 10. Digits Values 1 62 2 3,844 3 238,328 4 14,776,336 million 5 916,132,832 million 6 56,800,235,584 billion 7 3,521,614,606,208 trillion 8 218,340,105,584,896 trillion 9 13,537,086,546,263,600 quadrillion Digits Values 1 10 2 100 3 1,000 4 10,000 5 100,000 6 1,000,000 million 7 10,000,000 million 8 100,000,000 million 9 1,000,000,000 billion Base 10 Base 62vs 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
  • 11. Salesforce Ids (prereq) MO’ NUMBERS Base 62
  • 13. How does PK Chunking work? Analogy: fetching people in a city.
  • 14. Fetching people in a city: problems Non-selective Request: “get me all the people who are female” Response: “yer trippin’!”
  • 15. Fetching people in a city: problems Timeout Request: “find me a 7 foot tall person in a pink tuxedo in Beijing” Response: (after searching all day) “I can’t find any! I give up!”
  • 16. Finding people in a city: problems Too many people found Request: “find me all the men in San Francisco with beards” Response: (after searching for 10 mins) “The bus is full!”
  • 17. PK Chunking addresses all those problems Divide and conquer! Parallelism!
  • 18. Fetching people in a city: solutions Non-selective Request: “get me all the people who are female, in your small search area” Response: “¡Con mucho gusto!”
  • 19. Fetching people in a city: solutions Timeout Request: “find me a 7 foot tall person in a pink tuxedo in Beijing, in your small search area” Response: SP1: “Didn’t find any, sorry!” SP2: “Didn’t find any, sorry!” SP3: “Found one!” SP4: “Didn’t find any, sorry!”
  • 20. Finding people in a city: solutions Too many people found Request: “find me all the men in San Francisco with beards, in your small search area” Response: SP1: 30 people in our bus SP2: Didn’t find any SP3: 50 people in our bus
  • 22. 2 different implementations QLPK Query Locator PK Chunking Base62PK Base62 PK Chunking
  • 23. QLPK Salesforce SOAP or REST API – AJAX toolkit works great. Create and leverage a server-sidecursor. Similar to an Apex query locator (Batch Apex). Analogy: Print me a phone book of everyonein the city so I can flip through it.
  • 24. QLPK – AJAX Toolkit Request
  • 25. QLPK – AJAX Toolkit Response Chunk the database, in size of your choice, by offsetting the queryLocator: 01gJ000000KnRpDIAV-500000 1gJ000000KnRpDIAV-100000 … 01gJ000000KnRpDIAV-39950000 01gJ000000KnRpDIAV-40000000
  • 26. QLPK – The Chunks 800 chunks x 50,000 records 40,000,000 total records Analogy: we have exact addresses for clusters of 50k people to give to 800 different search parties.
  • 27. QLPK – How to use in a query? Perform 800 queries with the Id ranges in the whereclause: SELECT Id, Autonumber__c, Some_Number__c FROM Large_Object__c WHERE Some_Number__c> 10 AND Some_Number__c< 20 AND Id >= ’a00J000000BWNYk’ AND Id <= ’a00J000000BWO4z’
  • 28. THAT SPLIT CRAY database so hard, take 800 queries to find me
  • 29. QLPK – Parallelism Yeah it’s 800 queries, but… They all went out at once, and they might all come back at once. Analogy: We hired 800 search parties and unleased them on the city at the same time.
  • 31. Base62PK Get the first and last Id of the database and extrapolate the ranges in between. Analogy: Give me the highest and lowest address of everyone in the city and I will make a phonebook with every possible address in it. Then we will break that into chunks.
  • 32. Base62PK – first and last Id Get the first Id SELECT Id FROM Large_Object__c ORDER BY Id ASC LIMIT 1 Get the last Id SELECT Id FROM Large_Object__c ORDER BY Id DESC LIMIT 1 Even on H-U-G-E databases these return F-A-S-T. No problem.
  • 33. Base62PK – extrapolate 1. Chop off the last 9 digits of the 15 digit first/last Ids. Decompose. 2. Convert the 9 digit base 62 numbers into a Long Integer. 3. Add the chunk size to the first number until you hit or exceed the last number. 4. Last chunk may be smaller. 5. Convert those Long Integers back to base 62 and re- compose the 15 digit Ids
  • 34. Base62PK – benefits •High performance! Calculates the Ids instead of querying for them.
  • 35. Base62PK – issues •Digits 4 and 5 of the Salesforce Id are the pod Identifier. If the Ids in your org have different pod Id’s this technique will break, unless enhanced. •Fragmented Ids lead to sparsely populated ranges. You will search entire ranges of Ids which have no records.
  • 36. So which do I pick? QLPK or Base62PK
  • 37. So which do I pick? Hetergeneous Pod Ids Homogeneous Pod Ids Low Id Fragmentation (<1.5x) Medium Id Fragmentation (1.5x - 3x) High Id Fragmentation (>3x) QLPK X X X Base62PK X X
  • 38. How do I implement? •Needs to be orchestrated via language like JS in your page, or another platform (Heroku) •Doesn’t work on Lightning Component Framework (yet). No support for real parallel controller actions. (boxcarred) •Has to be Visualforce or Lightning / Visualforce hybrid.
  • 39. How do I implement? •Use RemoteActions to get the chunk queries back into your page. •Can be granular or aggregate queries! •Process each chunk query appropriately when it comes back. EX: update totals on a master object or push into a master array.
  • 40. function queryChunks() { for (var i=0; i<chunkList.length; i++) { queryChunk(i); } } function queryChunk(chunkIndex) { var chunk = chunkList[chunkIndex]; Visualforce.remoting.Manager.invokeAction( '{!$RemoteAction.Base62PKext.queryChunk}', chunk.first, chunk.last, function (result, event) { for (var i=0; i<result.length; i++) { objectAnums.push(result[i].Autonumber__c); } queryChunkCount++; if (queryChunkCount == chunkList.length) { allQueryChunksComplete(); } }, {escape: false, buffer: false} ); }
  • 41. @RemoteAction public static List<Large_Object__c> queryChunk(String firstId, String lastId) { String SOQL = 'SELECT Id, Autonumber__c, Some_Number__c ' + 'FROM Large_Object__c ' + 'WHERE Some_Number__c > 10 AND Some_Number__c < 20 ' + 'AND Id >= '' + firstId + '' ' + 'AND Id <= ''+ lastId +'' '; return database.query(SOQL); }
  • 42. Landmines Timeouts – retries •Cache warming means if you first fail, try and try again! Concurrency •Beware: ConcurrentPerOrgApexLimit exceeded •Keep your individual chunk queries lean. < 5 secs.
  • 44. How did you figure this out? Had to meet requirements for Kenandy’slargest customer. $2.5B / yr manufacturer. High visibility project. Necessity mother of invention!
  • 45. How did you figure this out? Query Plan Tool
  • 46. How did you figure this out? Debug logs from real execution
  • 47. Why doesn’t Salesforce do this? They do! (kinda) The Bulk API uses a similar technique, but it is more asynchronous and wrapped in a message container to track progress.
  • 48. More Info Article on Salesforce Developers Blog https://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive- orgs.html Githubrepo https://github.com/danieljpeter/pkChunking Bulk API documentation https://developer.salesforce.com/docs/atlas.en- us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm
  • 49. Q&A