A data model is the architectural structured way of storing data in an application. The Salesforce platform provides a standard model for you to customize data models for custom functionality. We’ve had some considerable experience when it comes to creating Salesforce data models, especially when it comes to creating scalable models.
Here are our tips and tricks of the trade for working with scalable Salesforce data modeling and how to avoid issues.
Account Data Skew
- Do not put 10,000 or more objects to an Account.
- The most common issue that occurs is with unassigned contacts being assigned to a default account.
Record Locking: When updating large number of contacts under the same account (using multiple threads ONLY) the system locks the child and parent record being updated (in order to preserve DB integrity). Since all updates are doing this, there is high risk the update will fail since the previous operation could still have the account locked up.
Sharing Issues: Depends on sharing configuration of parent/child relationship, but say we change the owner of an account. This will affect child records and their sharing. Two major issues that can stem from this are recalculating role hierarchy and sharing rules.
Ownership Data Skew
This occurs when a large number of records exist under the same owner or default profile and that owner exists in the hierarchy expect issues. If the skewed owner exists and is updated or deleted, then a trickle down effect occurs, thus removing sharing from the old owner/parent users in the hierarchy and from all users who have access through sharing rules.
Lookup Data Skew
This skew happens when a very large number of records are related to a single record through a lookup relationship. The problem here is that it is very complex and has no tools for detection. This situation truly is the proverbial “needle in a haystack.” Best practice is to avoid massive lookup relationships that include custom code along with Large Data Volume (LDV) that executes simultaneously and automatically.
External Objects and Relationships for LDV
Since external objects aren’t stored on the org the performance issues associated with big data in SFDC can be avoided. External objects is best when you have a ton of records but only need access to small subsets at a time. You can also have external or indirect lookup relationships on or with external objects.
Query Big Data
- SOQL, the database query language similar to SQL and SOSL, is Salesforce’s text search language, which can return 50,000 and 2,000 records respectively.
- Query Optimizer Tool offered by Salesforce.com will provide column based statistics as well as performing “pre-queries” to determine their speed.
Batch Apex: This is the best way to query big data in Salesforce. It works by breaking down the data and querying it asynchronously in batches up to 50 million records at a time. Won’t work in all cases, such as if you have a synchronous visualforce page that needs more than 50,000 records.
Bulk Queries: Query in bulk to manage big data that can handle up to 15GB of data in separate 1GB files. SFDC will execute a bulk query for two minutes before throwing an error. If results exceed 1GB per file or take longer than 10 minutes to retrieve, it will make another attempt. Up to 15 attempts will be done before an error is thrown and the query fails.
Skinny Table: A skinny table consists of custom tables in Apex that hold a subset of fields from standard or custom objects. Basically, it is a way to cut down on fields so you can return more rows each time you query the database. “Soft” or hidden fields such as “isDeleted” do not exist in skinny tables. Force.com also automatically syncs the base object with the skinny table data. To add a field to a skinny table, sync skinny tables with a sandbox, or re-validate a skinny table after changing the base object you will have to contact Salesforce Customer Support.
Upload Big Data
Load Lean: Make sure to load in only required and business critical fields. Also this means uploading data BEFORE you set organization-wide sharing defaults, complex object relationships, sharing rules, workflow rules, validation rules, and/or triggers. Having any of those processes active when uploading big data will cause a massive slow down, but do not forget to load in required fields for any of those processes since you will need them after the upload.
SOAP API: Optimized for real-time client apps that are only meant to update a few records at a time. It requires complex processes to upload small pieces of data, monitor results, and retry failed attempts. SOAP APIs are not good for big data.
Bulk API: Processes data from thousands to millions of records based on REST principles and developed to simplify data loads. Submits records to Salesforce, thus creating a new job, and then batches them into 10,000 records/batch. As your data is flowing into the server, the batches will already be executing (in parallel or linear based on needs). The API will try to reprocess failed records automatically, and jobs that timeout are placed into a queue for another attempt.
Suspend Events: Same as the last part of load lean except in regards to existing data. You can do this without causing inaccurate data via three phases: analyzing and preparing data, disabling events for loading, and post-processing.
- This could mean doing validation rules, workflow, and assignment rules by hand with queries on your existing data set. Also, just be sure to include fields for lookup relationships, roll-up summary fields to parent records, and other data relationship fields.
- Edit your planned actions and set them to inactive. Disabling triggers can be a bit more complex, but if you just add a checkbox to the records then any trigger can be disabled by something like: ‘if(checkbox is checked) return;’. Then just uncheck it to re-enable the trigger.
- Turn back on all your planned actions, define your relationships, and reset fields you used to disable a trigger.
Delete and Extract Big Data
Soft vs Hard Deletion: The way SFDC deletes data can impact performance severely. By default, when data is removed, it’s flagged for deletion and sent to the recycling bin for 15 days, which means it still exists in the org. To delete Bulk records, you could use the Bulk API with the “hard delete” option meaning the data will be immediately deleted, skipping the recycling bin.
Chunking Data: When extracting data with Bulk API, 100,000 records will chunk by default. You can change the chunk size header to smaller chunks or larger up to 250,000 – experiment to determine optimal chunk size. This could fail when working with hundreds of millions of records.
PK Chunking: Primary Key Chunking is used to handle extra large data set extracts and is supported by the Bulk API. Should be used when querying a table with more than 10 million records or when a normal Bulk API call fails regularly. Each chunk also counts toward the daily batch limit.
Truncation: This will permanently remove all records from a custom object while keeping the object and its metadata intact. Truncation is hard deletion, meaning the records will not go to the recycle bin first. This is a very fast way to remove records from a custom object.
Have any questions about Salesforce data modeling? Please reach out to one of our experts at Sikich at any time.