Spring Batch Intro

Some work cannot be done inside a single web request: importing a million-row CSV, reconciling yesterday’s transactions, migrating a legacy database, or generating nightly statements. Spring Batch is a framework for these long-running, high-volume jobs. It gives you restartability, transaction boundaries, chunked reads and writes, skip/retry policies, and a metadata store that records exactly what ran and what happened — so a job that dies at row 800,000 can resume instead of starting over.

What Spring Batch is for

Spring Batch targets bulk, offline processing rather than request/response work:

ETL — extract from files or databases, transform/validate, load into a target store.
Scheduled jobs — nightly reports, end-of-day reconciliation, billing runs.
Data migrations — move and reshape data between systems with full auditing.
Cleanup and aggregation — purge stale records, roll up metrics, archive history.

The framework handles the plumbing every batch job re-invents: reading in pages, committing in chunks, tracking progress, retrying transient failures, and skipping bad records — all while keeping a durable record of each run.

Getting started

Add the starter. On Spring Boot 3.x this pulls in Spring Batch 5, which runs on Java 17+ and the jakarta.* namespace.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

Spring Batch needs a DataSource for its metadata tables, so add a database driver too (H2 for demos, MySQL/PostgreSQL in production).

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>

Note: On Spring Boot 3 you do not need @EnableBatchProcessing. Boot’s auto-configuration provides a JobRepository, a JobLauncher, and a PlatformTransactionManager out of the box. Adding @EnableBatchProcessing actually turns that auto-configuration off, so leave it out unless you have a reason to customize the infrastructure manually.

Core concepts

A batch application is built from a small set of collaborating pieces.

Concept	Role
Job	The whole batch process — an ordered flow of one or more Steps.
Step	A single phase of a Job. Either chunk-oriented (read/process/write) or a Tasklet.
JobRepository	Persists metadata: which job instances and executions ran, their status, and progress.
JobLauncher	Starts a Job with a set of `JobParameters`.
ItemReader / ItemProcessor / ItemWriter	The read → transform → write trio inside a chunk step.
ExecutionContext	A key/value scratchpad scoped to a step or job execution, persisted so state survives restarts.

A JobInstance is a logical run identified by the job name plus its identifying parameters; each attempt to run it produces a JobExecution. The same applies to steps (StepExecution). This is what makes restart possible: the repository knows a given instance already completed three of five steps.

Chunk vs tasklet processing

A Step comes in two flavours, and choosing correctly is the most important early decision.

Chunk-oriented steps process items in configurable groups. The reader supplies items one at a time, the processor transforms each, and once chunk-size items accumulate they are handed to the writer and committed in a single transaction. This is the model for ETL: efficient, transactional, and restartable at chunk boundaries.

Tasklet steps run a single block of logic to completion — drop a staging table, call a stored procedure, move a file, send a notification. Use a Tasklet when the work isn’t naturally a stream of items.

	Chunk step	Tasklet step
Shape of work	stream of items	one unit of work
Components	reader + processor + writer	a single `Tasklet`
Commits	one per chunk	one per tasklet
Best for	ETL, migrations, exports	setup/cleanup, single commands

Tip: Default to chunk steps for anything that reads many records. Reserve Tasklets for the setup and teardown around them — a typical job is tasklet (prepare) → chunk (process) → tasklet (cleanup).

The metadata tables

The JobRepository writes to a fixed set of tables (the BATCH_* schema): BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS, BATCH_STEP_EXECUTION, and the *_CONTEXT tables that store serialized ExecutionContext data. These tables power restartability, prevent duplicate runs, and let you query the history of every job.

In development, let Boot create them automatically:

spring:
  batch:
    jdbc:
      initialize-schema: always   # always | embedded | never
    job:
      enabled: false              # don't auto-run jobs on startup (see Running Jobs)

Warning: Use initialize-schema: always only in dev. In production, set it to never and create the BATCH_* tables with a versioned migration (Flyway or Liquibase) — the DDL ships inside the spring-batch-core jar under org/springframework/batch/core/schema-*.sql.

A minimal job

Here is the smallest complete job — a single Tasklet step — to show how the pieces connect. Reader/processor/writer chunk jobs come in the later pages.

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;

@Configuration
public class HelloBatchConfig {

    @Bean
    Step helloStep(JobRepository jobRepository, PlatformTransactionManager txManager) {
        return new StepBuilder("helloStep", jobRepository)
                .tasklet((contribution, chunkContext) -> {
                    System.out.println("Hello from Spring Batch");
                    return RepeatStatus.FINISHED;
                }, txManager)
                .build();
    }

    @Bean
    Job helloJob(JobRepository jobRepository, Step helloStep) {
        return new JobBuilder("helloJob", jobRepository)
                .start(helloStep)
                .build();
    }
}

Note: In Spring Batch 5 the builders take the JobRepository (and the Step takes the PlatformTransactionManager) directly in their constructors — the old JobBuilderFactory / StepBuilderFactory helpers were removed.

In This Section

Jobs & Steps — define Jobs and Steps with the builders, chunk vs tasklet, and multi-step conditional flows.
Reader, Processor, Writer — the chunk trio with a full CSV→DB example.
Running & Scheduling Jobs — launch on startup, on demand, or on a schedule, with JobParameters, restart, skip, and retry.