S3 Benchmarking: Multipart Upload & Download Testing with s3bench

When working with object storage at scale, understanding your throughput characteristics is critical. Whether you’re using AWS S3, MinIO, Ceph, or any S3-compatible storage, multipart operations are the key to achieving high performance. In this post, I’ll introduce s3bench, a command-line tool I’ve developed for benchmarking download throughput with configurable concurrency and chunk sizes.

Why Multipart Operations Matter

S3 multipart downloads (and uploads) allow you to parallelize data transfer across multiple connections. Instead of downloading a 10GB file as a single stream, you can:

Split the object into chunks using HTTP Range headers
Download multiple chunks concurrently
Reassemble the complete file

This approach can dramatically improve throughput, especially for large objects and high-latency connections.

How s3bench Works

The tool follows a straightforward but effective approach:

HeadObject - Determines the object size
Chunk Division - Splits the object into configurable byte-range chunks
Concurrent Download - Dispatches goroutines to fetch chunks in parallel using Range headers
Live Progress - Shows real-time transfer rates (updated every 200ms)
Detailed Reporting - Reports throughput, time-to-first-byte, and latency percentiles

Get s3bench

GitHub Repository: https://github.com/soothill/s3bench

You can:

Clone the repository:

git clone https://github.com/soothill/s3bench.git

Download releases: Visit https://github.com/soothill/s3bench/releases for pre-built binaries
Browse the source code: https://github.com/soothill/s3bench

Installing Go

s3bench requires Go 1.22 or later. Here’s how to install Go on different platforms:

🐧 Linux Installation

Ubuntu/Debian

# Download and install Go 1.22
wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz

# Add to PATH
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
source ~/.bashrc

# Verify installation
go version

Fedora/RHEL/CentOS

# Using dnf
sudo dnf install golang

# Or manually:
wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
source ~/.bashrc

Arch Linux

sudo pacman -S go

🍎 macOS Installation

Using Homebrew (Recommended)

# Install via Homebrew
brew install go

# Verify installation
go version

Manual Installation

# Download from https://go.dev/dl/
# Or use curl:
curl -OL https://go.dev/dl/go1.22.0.darwin-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.0.darwin-amd64.tar.gz

# Add to PATH (zsh)
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.zshrc
source ~/.zshrc

🪟 Windows Installation

Using Chocolatey

# Install via Chocolatey
choco install golang

# Verify installation
go version

Using Scoop

# Install via Scoop
scoop install go

# Verify installation
go version

Manual Installation

# Download the MSI installer from:
# https://go.dev/dl/go1.22.0.windows-amd64.msi

# Run the installer and follow prompts
# Go will be added to PATH automatically

# Open PowerShell and verify:
go version

Building from Source

Once Go is installed:

cd s3bench
go mod tidy
go build -o s3bench .

Basic Usage

./s3bench \
  --bucket my-bucket \
  --key path/to/large-file.bin \
  --chunk-size 64MB \
  --concurrency 16 \
  --discard

The --discard flag is crucial for pure throughput benchmarking—it prevents local disk I/O from becoming the bottleneck.

Key Features

Chunk Size Presets

Instead of remembering byte values, use intuitive presets:

Preset	Size
XS	1 MB
S	4 MB
M	8 MB
L	64 MB
XL	256 MB
XXL	1 GB

./s3bench --chunk-size L --bucket mybucket --key bigfile.bin --discard

Concurrency Sweep

The most powerful feature for finding optimal settings. Pass multiple concurrency values to automatically benchmark each:

./s3bench \
  --bucket my-bucket \
  --key path/to/large-file.bin \
  --chunk-size 64MB \
  --concurrency 4,8,16,32,64 \
  --runs 3 \
  --discard

This produces a comparison report:

╔══════════════════════════════════════════════════════════╗
║              Concurrency Sweep Comparison               ║
╚══════════════════════════════════════════════════════════╝

  Workers     Runs    Min MB/s  Mean MB/s   Max MB/s
  -------     ----    --------  ---------   --------
           4     3       412.1      438.7      461.3
           8     3       781.4      823.9      856.2
          16     3      1102.5     1163.8     1201.4
          32     3      1367.9     1401.5     1423.0 <-- best
          64     3      1389.2     1398.1     1412.7

S3-Compatible Storage

Works with MinIO, Ceph, and any S3-compatible endpoint:

./s3bench \
  --endpoint http://minio.local:9000 \
  --bucket testbucket \
  --key bigfile.bin \
  --access-key-id minioadmin \
  --secret-access-key minioadmin \
  --region us-east-1 \
  --chunk-size 64MB \
  --concurrency 8,16,32 \
  --runs 3 \
  --discard

Detailed Output

Per-Run Summary

=== Run 1 ===
  Object:       s3://my-bucket/path/to/large-file.bin
  Object size:  10.00 GB
  Chunk size:   64.00 MB  (160 chunks)
  Concurrency:  16 workers

  Results:
    Total time:        8.432 s
    Total bytes:       10.00 GB
    Throughput:        1184.3 MB/s  (1.157 GB/s)
    Time to 1st byte:  42.3 ms

  Chunk latency (per-chunk download time):
    Min:   341.2 ms
    Max:   892.7 ms
    Mean:  526.4 ms
    P50:   512.1 ms
    P95:   781.3 ms
    P99:   856.4 ms

JSON Output

For scripting and automation:

./s3bench --bucket b --key k --concurrency 8,16,32 --discard --json \
  | jq '.[] | {workers: .Concurrency, mean_mb_s: .Aggregate.mean_throughput_mb_s}'

Tuning Recommendations

Based on extensive testing, here are my recommendations:

1. Use Concurrency Sweep

Run --concurrency 4,8,16,32,64 to find your saturation point. Throughput will plateau when you’ve hit your network or storage ceiling.

2. Balance Chunk Size and Concurrency

Smaller chunks + more workers = more connection overhead
Larger chunks + fewer workers = potential worker idle time
Sweet spot: Start with --chunk-size L --concurrency 16 and sweep from there

3. Multiple Runs for Accuracy

Use --runs 3 or more. The first run often shows slower results due to cold caches on the storage side.

4. Always Use Discard Mode for Throughput Testing

When measuring network/storage throughput, use --discard to eliminate local disk I/O as a variable.

5. Test from the Right Location

Results are only meaningful when measured from where your workload actually runs. For example:

EC2 instance in the same region as the bucket (for AWS)
Same network segment as your MinIO deployment
Not from your laptop over VPN!

All Command-Line Options

Flag	Default	Description
`--bucket`	(required)	S3 bucket name
`--key`	(required)	S3 object key to download
`--chunk-size`	`64MB`	Size of each byte-range read
`--concurrency`	`8`	Parallel workers (single or comma-separated)
`--runs`	`1`	Repetitions per concurrency level
`--profile`	`impossible`	AWS named profile
`--access-key-id`	`""`	Override access key
`--secret-access-key`	`""`	Override secret key
`--region`	`us-east-1`	AWS region
`--endpoint`	`""`	Custom S3-compatible endpoint
`--discard`	`false`	Discard downloaded bytes
`--output`	`""`	Write to file instead
`--json`	`false`	JSON output

Conclusion

Multipart operations are essential for high-performance object storage access. With s3bench, you can:

Find optimal concurrency settings
Compare different storage backends
Validate network infrastructure
Benchmark before and after optimizations

The tool is open source under the MIT license. Check out the GitHub repository for the full source code and documentation.

Happy benchmarking!