Table of Contents
Have you ever been stuck in a growing Salesforce org where everything worked perfectly in the beginning—fast queries, smooth automations, and zero concerns about hitting limits—only to realise that as your company scales, the data volume quietly turns into a performance bottleneck? Handling large data volumes in Salesforce usually doesn’t feel like a big problem initially, until it gradually becomes one. In many teams, operations run smoothly early on, but as the business expands, the data grows. That’s when issues begin to surface.

You’ll often find that what started as a few hundred thousand records quickly scales into millions across objects like Account, Contact, Opportunity, or other custom objects. That’s when things start to shift. Queries slow down, batch jobs take longer than expected, and sometimes jobs fail without any obvious reason.
Honestly, this is usually the point where Batch Apex becomes essential.
Batch Apex is designed to process large datasets asynchronously, in smaller chunks. Based on what we’ve seen, it’s one of the most reliable tools that Salesforce provides for working at scale. But just using Batch Apex isn’t enough. If the design isn’t right, it can still run into limits, perform poorly, or create unnecessary load on the system.
Let’s walk through what tends to work well when dealing with LDV—based on how these issues usually show up in real projects.
What is Batch Apex and Why It Matters for LDV
At a basic level, Batch Apex breaks large jobs into smaller transactions. Each batch runs independently and gets its own set of governor limits.
The structure is straightforward:
• start() – defines the dataset
• execute() – processes records in chunks
• finish() – runs any follow-up logic
In practice, you’ll often find the biggest benefit is isolation. If one batch fails—maybe due to bad data or an unexpected edge case—the rest can continue. In many real-world scenarios, that’s what keeps long-running jobs from failing.
At the end of the day, without Batch Apex, trying to process large volumes in a single transaction usually doesn’t hold up. You’ll hit CPU limits, heap limits, or DML limits fairly quickly.
Understanding Large Data Volumes (LDV) in Salesforce
No fixed number defines LDV, but in most environments, you’ll start seeing issues once objects cross 100k+ records or queries return large result sets.
One thing teams often notice is that problems don’t show up immediately. Something that works perfectly in a sandbox with limited data can behave very differently in production.
Full table scans are a common issue here. In many cases, they don’t just slow down a single query—they can impact other processes running at the same time.
Based on what we’ve seen, LDV issues are rarely caused by one big mistake. It’s usually a combination of smaller errors—non-selective queries, inefficient loops, large transactions—that build up over time.
Use Database.QueryLocator for Massive Datasets
If you’re working with large datasets, Database.QueryLocator is generally the safest choice in Batch Apex.
It can handle up to 50 million records and doesn’t try to load everything into memory at once. Instead, Salesforce streams the data in batches behind the scenes.
Example:
public Database.QueryLocator start(Database.BatchableContext BC){
return Database.getQueryLocator(
'SELECT Id, Name FROM Account WHERE CreatedDate = LAST_N_DAYS:30'
);
}
In most implementations, this approach tends to be more stable than managing large collections manually.
That being said, you’ll still need to pay attention to the query itself. If the query is slow or not selective enough, the batch job will reflect that.
Write Highly Selective SOQL Queries
A lot of Batch Apex performance issues come back to the query.
In many teams, queries are written early on without much concern for scaling. They work well initially, but as data grows, they start causing problems.
Selective queries help Salesforce use indexes instead of scanning entire tables, which makes a conspicuous and noticeable difference.
Some practical habits that usually help:
• Use indexed fields like Id, CreatedDate, or external IDs
• Add filters that significantly reduce the dataset
• Avoid selecting fields you don’t actually need
• Test queries with realistic data volumes whenever possible
For example:
SELECT Id FROM Account
This might work early on, but in larger orgs, it can become inefficient.
Adding even a simple filter:
SELECT Id FROM Account WHERE CreatedDate = LAST_N_DAYS:30
often improves performance more than expected.
Choose the Right Batch Size for Performance
The default batch size is 200, and in most cases, that works well.
But as logic becomes more complex—multiple queries, loops, integrations—you’ll often find that smaller batch sizes are more reliable.
Batch sizes like 50 or 100 tend to reduce the chances of hitting CPU or heap limits, especially when dealing with inconsistent data.
The trade-off is that smaller batches increase the total execution time.
Most teams end up adjusting the batch size after observing a few runs in production. It’s less about picking a perfect number upfront and more about tuning it over time.
Avoid SOQL and DML Inside Loops
This is a well-known guideline, but it still shows up in many codebases.
With small datasets, it might not cause immediate issues. But with larger volumes, it tends to break quickly.
For example:
for(Account acc : scope){
update acc;
}
This approach doesn’t scale.
Instead:
update scope;
Processing records in bulk is essential. Basically, it’s the difference between something that works during the testing phase and something that holds up while in production.
Use Database.Stateful Only When Necessary
Database.Stateful can be useful, but you’ll often find it’s not needed as frequently as it’s used.
It allows you to maintain data across batch executions, which helps with counters or aggregations.
At the same time, it adds overhead—more memory usage and slightly slower execution.
In many scenarios, it’s better to keep things stateless unless there’s a clear requirement. Simpler designs tend to perform better at scale.
Handle Errors Without Failing the Entire Batch
With large datasets, errors are expected. What matters is how you handle them.
Using partial DML is a practical approach:
Database.update(records, false);
This allows valid records to be processed even if some fail.
In many teams, this approach significantly reduces rework. Instead of rerunning entire jobs, you can focus only on failed records.
It’s also worth setting up proper error logging. Having clear visibility into failures saves a lot of debugging time later.
Leverage Indexing and Data Architecture
At a certain scale, performance improvements don’t just come from code—they come from data design.
Custom indexes can significantly improve query performance, especially for frequently used filters.
Skinny tables can help in specific scenarios, though they’re usually used in more advanced implementations.
Data archiving is another area where you’ll often see impact. Keeping older, unused records in active objects tends to slow things down over time.
At the end of the day, these optimizations become necessary as the data continues to grow.
Schedule Batch Jobs Strategically
Timing is something teams sometimes overlook.
Running batch jobs during peak hours can impact user experience, especially in orgs with high activity.
In most cases, scheduling jobs during off-peak hours leads to more stable performance.
It’s also a good idea to avoid running multiple resource-heavy jobs simultaneously. Even well-optimized jobs can compete for resources.
Based on what we’ve seen, a bit of planning here can prevent a lot of avoidable issues.
Chain Batch Jobs Carefully
Batch chaining lets you trigger another batch job after the current one finishes. In many teams, this comes up when processes need to run in steps—one job prepares the data, another picks it up from there.
You can do this in the finish() method:
public void finish(Database.BatchableContext BC){
Database.executeBatch(new NextBatchClass());
}
Pretty straightforward. And honestly, it works fine—until the chain starts growing.
What usually happens is that one extra batch gets added, then another, and before long, you’ve got multiple jobs calling each other. It doesn’t always look complex at first, but over time, it becomes harder to follow.
You’ll often see things like:
• Jobs looping back unintentionally
• Higher-than-expected resource usage
• Debugging taking longer than usual
Not every chain turns into a problem, but when it does, it’s usually because it wasn’t planned as a flow—it just grew over time.
In practice, keeping chaining minimal helps. Also, documenting the flow sounds basic, but it saves time later.
Monitor Batch Apex Performance Continuously
Even a well-written batch job won’t behave the same forever. Data grows and usage patterns change.
Salesforce gives you a few tools:
• Apex Jobs page
• Debug logs
• Event Monitoring
What people usually end up checking:
• Execution time
• Number of records processed
• Error rates
You’ll often notice performance changes slowly. A job that used to run in, say, 5 minutes might start taking 8, then 12. No code change—just more data.
It’s easy to ignore in the beginning. But over time, those small increases add up.
Honestly, this is where regular monitoring helps. Not constant, but just enough to notice patterns. Otherwise, you only look at it when something breaks—and by then it’s already a bigger issue.
Real-World Use Cases of Batch Apex for LDV
Once data grows, Batch Apex tends to show up everywhere. Not always by design—it just becomes necessary.
Common use cases you’ll see:
• Data cleanup or migration
• Recalculating fields across large datasets
• Updating older data structures
• Syncing with external systems
• Archiving old records
A typical example: adding a new field and needing to populate it using old data.
Sounds simple. But with millions of records, it’s not.
Trying to run that in one go usually fails. Even if it works in testing, production behaves differently. That’s something teams run into quite often.
So, Batch Apex becomes the safer option to resort to. Not the fastest always, but predictable—and predictability matters a lot in the long run.
Common Mistakes to Avoid in LDV Handling
Most LDV problems don’t come from complex mistakes. It’s usually the smaller things that don’t scale well.
You’ll often see:
• Non-selective queries
• Large batch sizes with complex logic
• Overusing the Database.Stateful
• Weak error handling (or none at all)
• Running jobs during peak hours
Individually, these don’t always break things. But together, over time, they start affecting performance. And then debugging becomes harder because it’s not just one issue—it’s a mix.
In quite a few projects, the code itself wasn’t “wrong.” It just wasn’t built for the data it eventually had to handle.
Advanced Optimization Techniques
At some point, following basic best practices isn’t enough. Usually, when the data’s size increases.
That’s where things like partitioning come in—splitting data by region, date, or some logical grouping. Instead of processing everything together, you break it down.
It sounds obvious, but it helps more than expected.
You might also combine Batch Apex with Queueable Apex. Not always, but in cases where the flow isn’t strictly linear, it gives more control.
Caching is another option. If the same data is being queried again and again, storing it temporarily reduces the load.
These aren’t things every project needs. But in larger orgs, they tend to come up. Maybe not immediately, but eventually.
Interview Questions and Answers based on Batch Apex & LDV
1. What is Batch Apex, and when would you use it?
Batch Apex is used to process large datasets in smaller chunks, asynchronously.
You should use it when:
• You’re working with more than 50,000 records
• You need background processing
• You’re doing data cleanup or migration
• Synchronous logic isn’t enough
In most real setups, once the data grows, this becomes the default approach.
2. What are the three methods in a Batch Apex class?
• start() → defines the dataset
• execute() → processes records
• finish() → post-processing
Simple structure. But how you use them matters quite a bit.
3. What is the difference between QueryLocator and Iterable?
QueryLocator:
• Supports up to 50 million records
• Handles batching automatically
• Preferred for LDV
Iterable:
• Used for custom data
• Limited by heap size
• Not ideal for large datasets
In interviews, it’s usually expected that you mention QueryLocator for LDV(Almost always).
4. What is the default batch size, and how do you decide its optimal size?
The default batch size is 200.
In practice:
• Simple logic → 200 works
• Complex logic → 50–100 is safer
Most teams don’t achieve this perfectly on day one. It’s adjusted after observing behavior.
5. What are governor limits, and how does Batch Apex help?
Governor limits control the usage of the available resources.
Batch Apex helps because:
• Each batch runs separately
• Limits reset every time
• Large datasets become manageable
That separation is what makes it scalable.
6. What is Database.Stateful, and when should you use it?
It lets you retain data across batches.
Useful for totals or aggregation.
But it’s often overused. If you don’t need it, it is always better to refrain from using it.
7. How do you handle errors in Batch Apex?
Use partial DML:
Database.update(records, false);
Also:
This allows valid records to go through.
• Use try-catch
• Log errors
Logging helps more than people think.
8. Why is query selectivity important in LDV?
Selective queries use indexes.
Non-selective ones:
• Slow things down
• Can fail in start()
This is one of the most common issues. It shows up very often.
9. Can we call Batch Apex from another Batch Apex?
Yes, using chaining.
But again, keep it controlled. Chains tend to grow if not checked regularly.
10. How do you schedule a Batch Apex job?
Using Schedulable or UI.
System.schedule('My Batch Job', '0 0 2 * * ?', new MyBatchClass());
Usually done during off-peak hours.
11. What happens if a batch fails?
• Only that batch rolls back
• Others continue
• Failed records can be retried
This isolation is very useful. And practical.
12. What is heap size, and how does it affect Batch Apex?
Heap size is the memory available.
Each batch has its own limit.
If you exceed it, the batch fails. Usually, because too much data is stored unnecessarily.
13. What are common mistakes in LDV handling?
• Non-selective queries
• DML inside loops
• Large batch sizes
• Overusing Stateful
• Poor error handling
You’ll see these often. Sometimes all of them together.
14. How do you optimize Batch Apex performance?
• Use QueryLocator
• Write better queries
• Adjust batch size
• Avoid loops with DML
• Use indexing
15. What is the role of indexes in Salesforce?
Indexes reduce the number of scanned records.
This includes:
• Standard indexes
• Custom indexes
They matter more as data grows.
16. What is LDV in Salesforce?
Handling very large datasets where performance matters.
17. How does Batch Apex ensure scalability?
• Processes in chunks
• Resets limits
• Handles large data
18. Can Batch Apex call external APIs?
Yes, using Database.AllowsCallouts.
Just handle failures properly. That part is often missed.
19. What is a real-world LDV scenario?
Updating millions of records based on existing data.
Approach:
• Use Batch Apex
• Filter properly
• Process in chunks
• Log failures
Pro Interview Strategy
In interviews, simple answers work better.
Focus on:
• Governor limits
• Scalability
• QueryLocator
• Error handling
• Real examples

There’s no need for overly complex answers. Handling large data volumes in Salesforce is something most people run into sooner or later.
If Batch Apex or governor limits have felt confusing, that’s actually quite normal. A lot of developers go through that phase.
The challenge usually isn’t effort—it’s not having a structured way to learn.
That’s where AlmaMate helps.
Instead of scattered resources, you get a clear path focused on real use cases. You don’t just learn concepts—you learn how they work in actual systems.
With hands-on practice and real scenarios, things start making more sense gradually.
If you’re serious about growing in Salesforce, having that kind of structure helps. More than it seems at first.
Conclusion: Mastering Batch Apex for LDV Success
Handling large data volumes in Salesforce isn’t just about getting the job to run—it’s about making sure it keeps running as data grows.
Batch Apex gives you the structure. But how well it works depends on the details.
Using QueryLocator, writing selective queries, choosing a reasonable batch size—these are basic things. Nothing fancy. But they matter more than people expect.
In real projects, consistency matters more than clever solutions.
You’ll often find that stable systems are built on simple practices done properly, not complex optimizations.
If you’re working on production systems or aiming for more responsibility, this is one area that comes up again and again.
Being able to handle plenty of data reliably is a practical skill. And teams notice it.
If you’re looking to master Batch Apex, LDV handling, and real-world Salesforce development with clarity—not confusion—AlmaMate is the place to start.
Most learners struggle not because they lack potential, but because they lack structure. AlmaMate solves that with guided learning, hands-on assignments, interview-focused preparation, and mentor-driven support that actually makes complex topics feel manageable.
Whether you’re preparing for roles like Salesforce Developer, Agentforce Specialist, or aiming for high-impact enterprise projects, AlmaMate helps you build the skills and confidence to perform at a professional level.
Join AlmaMate’s Salesforce training programs today and start building the expertise that top companies look for. Your journey towards your next career milestone must start with the right guidance—and that’s exactly what AlmaMate delivers.













