Skip to content
AWS aws security 5 min read

Amazon Macie

Amazon Macie is a fully managed data security service that uses machine learning (ML, computers learning patterns from examples) to find and protect sensitive data stored in Amazon S3 (Simple Storage Service, AWS’s object storage). You point Macie at your S3 buckets, and it tells you two important things: which buckets are exposed or unencrypted, and which objects actually contain sensitive information like personally identifiable information (PII, data that can identify a person such as names, emails, or Social Security numbers), credentials, or financial records. This matters because most data breaches happen when sensitive data quietly lives in a bucket nobody knew was public.

What Macie does

Macie has two distinct jobs, and understanding the split is the key to using it without overspending.

  1. Bucket-level inventory (automated, cheap). Macie continuously evaluates every S3 bucket in your account and reports its security posture: is it public, is it encrypted, is it shared with other AWS accounts, does it allow unencrypted uploads? This runs in the background and costs almost nothing.
  2. Sensitive-data discovery jobs (on-demand, priced per GB). You explicitly run jobs that download and inspect object contents to classify sensitive data. This is where the ML happens, and this is where the bill comes from.

Warning: Sensitive-data discovery is billed per gigabyte (GB) analyzed. Scanning a multi-petabyte data lake “just to be safe” can cost thousands of dollars in a single job. Always scope jobs to high-risk buckets and use sampling first.

When to use Macie (and when not to)

Use Macie when you store customer data, logs, exports, or backups in S3 and need to prove (for audits like PCI DSS, HIPAA, or GDPR) that you know where sensitive data lives and that it is protected. The bucket-level inventory is worth enabling broadly, for every account, because it is cheap and catches the most common mistakes (a public bucket, a disabled encryption setting).

Do not blindly run full-content discovery on every bucket. For huge data lakes, scope discovery to buckets you suspect hold PII, and use sampling so Macie analyzes a representative fraction rather than every byte.

Sensitive-data types Macie detects

Macie ships with built-in managed data identifiers and lets you add custom data identifiers (regex patterns you define).

CategoryExamples Macie detects
Personal (PII)Names, addresses, email, phone, passport, driver’s license
FinancialCredit card numbers, bank account numbers
CredentialsAWS secret keys, private keys, OAuth tokens
HealthHealth insurance / medical beneficiary numbers
CustomAnything you describe with a regular expression

Enabling Macie

Macie must be enabled per AWS Region before it can inventory buckets or run jobs.

Console steps:

  1. Open the Amazon Macie console.
  2. Confirm the Region (top-right) is the one holding your buckets.
  3. Choose Get started.
  4. Review the service-linked role (Macie creates an IAM role to read your buckets), then choose Enable Macie.

AWS CLI (v2):

aws macie2 enable-macie --region us-east-1

Output:

(no output on success; HTTP 200 returned)

To confirm it is on and view the cheap bucket inventory:

aws macie2 get-bucket-statistics --region us-east-1

Output:

{
    "bucketCount": 42,
    "bucketCountByEffectivePermission": {
        "publiclyAccessible": 1,
        "publiclyReadable": 1,
        "publiclyWritable": 0
    },
    "bucketCountByEncryptionType": {
        "unencrypted": 3,
        "kmsManaged": 30,
        "s3Managed": 9
    },
    "classifiableSizeInBytes": 184329472102
}

That one public, unencrypted bucket is exactly what you want to catch before scanning a single GB.

Running a scoped sensitive-data discovery job

This is the part you must control. Below, we create a one-time job that targets a single high-risk bucket and samples 20 percent of its objects instead of all of them.

Console steps:

  1. In the Macie console, choose Jobs then Create job.
  2. Select the specific bucket(s) to scan (e.g. customer-uploads-prod). Avoid “all buckets.”
  3. Choose One-time job (or scheduled if you need recurring scans).
  4. Under Sampling depth, set a percentage (e.g. 20) so Macie scans a sample, not everything.
  5. Optionally attach managed and custom data identifiers, then Submit.

AWS CLI (v2):

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --name "scan-customer-uploads-sampled" \
  --sampling-percentage 20 \
  --s3-job-definition '{
    "bucketDefinitions": [
      { "accountId": "111122223333", "buckets": ["customer-uploads-prod"] }
    ]
  }' \
  --region us-east-1

Output:

{
    "jobId": "a1b2c3d4e5f60718293a4b5c6d7e8f90",
    "jobArn": "arn:aws:macie2:us-east-1:111122223333:classification-job/a1b2c3d4e5f60718293a4b5c6d7e8f90"
}

After the job finishes, findings appear in the console under Findings and can be sent to AWS Security Hub and Amazon EventBridge for automated alerting.

Cost: why scoping matters

Sensitive-data discovery is priced per GB of data inspected (roughly $1.00–$1.25 per GB in most Regions, with the first ~150 GB free in the trial period). The automated bucket inventory and object monitoring are priced per S3 object evaluated and cost only cents per thousand objects.

  • Scanning 100 GB of suspect buckets at 20% sampling inspects ~20 GB and costs roughly $20–$25.
  • Scanning a 2 PB data lake fully would inspect ~2,000,000 GB and cost millions of dollars — never do this.

Tip: Run a small sampled job first to confirm a bucket even contains PII. If the sample comes back clean, you have spent a few dollars instead of scanning the whole bucket.

ServiceWhat it inspectsUse it for
Amazon MacieS3 object contents + bucket postureFinding PII / sensitive data in S3
Amazon GuardDutyAccount, network, and API activityDetecting threats and malicious behavior
Amazon InspectorEC2, containers, Lambda softwareFinding software vulnerabilities (CVEs)

Best Practices

  • Enable Macie in every Region and account, and rely on the cheap bucket-level inventory broadly.
  • Scope sensitive-data discovery jobs to specific high-risk buckets — never “all buckets” on a large data lake.
  • Use sampling (a low percentage) for first passes to confirm whether PII exists before a full scan.
  • Send Macie findings to AWS Security Hub and EventBridge so a public-bucket or PII finding triggers an automatic alert.
  • Combine Macie with default S3 encryption (via AWS KMS) so newly flagged buckets are remediated, not just detected.
  • Set up scheduled jobs only on buckets where data changes and sensitivity must be tracked over time, to avoid re-scanning static data and re-paying per GB.
Last updated June 15, 2026
Was this helpful?