Spring Batch Intro
Some work cannot be done inside a single web request: importing a million-row CSV, reconciling yesterday’s transactions, migrating a legacy database, or generating nightly statements. Spring Batch is a framework for these long-running, high-volume jobs. It gives you restartability, transaction boundaries, chunked reads and writes, skip/retry policies, and a metadata store that records exactly what ran and what happened — so a job that dies at row 800,000 can resume instead of starting over.
What Spring Batch is for
Spring Batch targets bulk, offline processing rather than request/response work:
- ETL — extract from files or databases, transform/validate, load into a target store.
- Scheduled jobs — nightly reports, end-of-day reconciliation, billing runs.
- Data migrations — move and reshape data between systems with full auditing.
- Cleanup and aggregation — purge stale records, roll up metrics, archive history.
The framework handles the plumbing every batch job re-invents: reading in pages, committing in chunks, tracking progress, retrying transient failures, and skipping bad records — all while keeping a durable record of each run.
Getting started
Add the starter. On Spring Boot 3.x this pulls in Spring Batch 5, which runs on Java 17+ and the jakarta.* namespace.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
Spring Batch needs a DataSource for its metadata tables, so add a database driver too (H2 for demos, MySQL/PostgreSQL in production).
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
Note: On Spring Boot 3 you do not need
@EnableBatchProcessing. Boot’s auto-configuration provides aJobRepository, aJobLauncher, and aPlatformTransactionManagerout of the box. Adding@EnableBatchProcessingactually turns that auto-configuration off, so leave it out unless you have a reason to customize the infrastructure manually.
Core concepts
A batch application is built from a small set of collaborating pieces.
| Concept | Role |
|---|---|
| Job | The whole batch process — an ordered flow of one or more Steps. |
| Step | A single phase of a Job. Either chunk-oriented (read/process/write) or a Tasklet. |
| JobRepository | Persists metadata: which job instances and executions ran, their status, and progress. |
| JobLauncher | Starts a Job with a set of JobParameters. |
| ItemReader / ItemProcessor / ItemWriter | The read → transform → write trio inside a chunk step. |
| ExecutionContext | A key/value scratchpad scoped to a step or job execution, persisted so state survives restarts. |
A JobInstance is a logical run identified by the job name plus its identifying parameters; each attempt to run it produces a JobExecution. The same applies to steps (StepExecution). This is what makes restart possible: the repository knows a given instance already completed three of five steps.
Chunk vs tasklet processing
A Step comes in two flavours, and choosing correctly is the most important early decision.
Chunk-oriented steps process items in configurable groups. The reader supplies items one at a time, the processor transforms each, and once chunk-size items accumulate they are handed to the writer and committed in a single transaction. This is the model for ETL: efficient, transactional, and restartable at chunk boundaries.
Tasklet steps run a single block of logic to completion — drop a staging table, call a stored procedure, move a file, send a notification. Use a Tasklet when the work isn’t naturally a stream of items.
| Chunk step | Tasklet step | |
|---|---|---|
| Shape of work | stream of items | one unit of work |
| Components | reader + processor + writer | a single Tasklet |
| Commits | one per chunk | one per tasklet |
| Best for | ETL, migrations, exports | setup/cleanup, single commands |
Tip: Default to chunk steps for anything that reads many records. Reserve Tasklets for the setup and teardown around them — a typical job is tasklet (prepare) → chunk (process) → tasklet (cleanup).
The metadata tables
The JobRepository writes to a fixed set of tables (the BATCH_* schema): BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS, BATCH_STEP_EXECUTION, and the *_CONTEXT tables that store serialized ExecutionContext data. These tables power restartability, prevent duplicate runs, and let you query the history of every job.
In development, let Boot create them automatically:
spring:
batch:
jdbc:
initialize-schema: always # always | embedded | never
job:
enabled: false # don't auto-run jobs on startup (see Running Jobs)
Warning: Use
initialize-schema: alwaysonly in dev. In production, set it toneverand create theBATCH_*tables with a versioned migration (Flyway or Liquibase) — the DDL ships inside thespring-batch-corejar underorg/springframework/batch/core/schema-*.sql.
A minimal job
Here is the smallest complete job — a single Tasklet step — to show how the pieces connect. Reader/processor/writer chunk jobs come in the later pages.
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;
@Configuration
public class HelloBatchConfig {
@Bean
Step helloStep(JobRepository jobRepository, PlatformTransactionManager txManager) {
return new StepBuilder("helloStep", jobRepository)
.tasklet((contribution, chunkContext) -> {
System.out.println("Hello from Spring Batch");
return RepeatStatus.FINISHED;
}, txManager)
.build();
}
@Bean
Job helloJob(JobRepository jobRepository, Step helloStep) {
return new JobBuilder("helloJob", jobRepository)
.start(helloStep)
.build();
}
}
Note: In Spring Batch 5 the builders take the
JobRepository(and theSteptakes thePlatformTransactionManager) directly in their constructors — the oldJobBuilderFactory/StepBuilderFactoryhelpers were removed.
In This Section
- Jobs & Steps — define Jobs and Steps with the builders, chunk vs tasklet, and multi-step conditional flows.
- Reader, Processor, Writer — the chunk trio with a full CSV→DB example.
- Running & Scheduling Jobs — launch on startup, on demand, or on a schedule, with
JobParameters, restart, skip, and retry.