When working with object storage at scale, understanding your throughput characteristics is critical. Whether you’re using AWS S3, MinIO, Ceph, or any S3-compatible storage, multipart operations are the key to achieving high performance. In this post, I’ll introduce s3bench, a command-line tool I’ve developed for benchmarking download throughput with configurable concurrency and chunk sizes.
Why Multipart Operations Matter
S3 multipart downloads (and uploads) allow you to parallelize data transfer across multiple connections. Instead of downloading a 10GB file as a single stream, you can:
- Split the object into chunks using HTTP
Rangeheaders - Download multiple chunks concurrently
- Reassemble the complete file
This approach can dramatically improve throughput, especially for large objects and high-latency connections.
How s3bench Works
The tool follows a straightforward but effective approach:
- HeadObject - Determines the object size
- Chunk Division - Splits the object into configurable byte-range chunks
- Concurrent Download - Dispatches goroutines to fetch chunks in parallel using
Rangeheaders - Live Progress - Shows real-time transfer rates (updated every 200ms)
- Detailed Reporting - Reports throughput, time-to-first-byte, and latency percentiles
Get s3bench
GitHub Repository: https://github.com/soothill/s3bench
You can:
- Clone the repository:
git clone https://github.com/soothill/s3bench.git -
Download releases: Visit https://github.com/soothill/s3bench/releases for pre-built binaries
- Browse the source code: https://github.com/soothill/s3bench
Installing Go
s3bench requires Go 1.22 or later. Here’s how to install Go on different platforms:
🐧 Linux Installation
Ubuntu/Debian
# Download and install Go 1.22
wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
# Add to PATH
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
source ~/.bashrc
# Verify installation
go version
Fedora/RHEL/CentOS
# Using dnf
sudo dnf install golang
# Or manually:
wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
source ~/.bashrc
Arch Linux
sudo pacman -S go
Building from Source
Once Go is installed:
cd s3bench
go mod tidy
go build -o s3bench .
Basic Usage
./s3bench \
--bucket my-bucket \
--key path/to/large-file.bin \
--chunk-size 64MB \
--concurrency 16 \
--discard
The --discard flag is crucial for pure throughput benchmarking—it prevents local disk I/O from becoming the bottleneck.
Key Features
Chunk Size Presets
Instead of remembering byte values, use intuitive presets:
| Preset | Size |
|---|---|
| XS | 1 MB |
| S | 4 MB |
| M | 8 MB |
| L | 64 MB |
| XL | 256 MB |
| XXL | 1 GB |
./s3bench --chunk-size L --bucket mybucket --key bigfile.bin --discard
Concurrency Sweep
The most powerful feature for finding optimal settings. Pass multiple concurrency values to automatically benchmark each:
./s3bench \
--bucket my-bucket \
--key path/to/large-file.bin \
--chunk-size 64MB \
--concurrency 4,8,16,32,64 \
--runs 3 \
--discard
This produces a comparison report:
╔══════════════════════════════════════════════════════════╗
║ Concurrency Sweep Comparison ║
╚══════════════════════════════════════════════════════════╝
Workers Runs Min MB/s Mean MB/s Max MB/s
------- ---- -------- --------- --------
4 3 412.1 438.7 461.3
8 3 781.4 823.9 856.2
16 3 1102.5 1163.8 1201.4
32 3 1367.9 1401.5 1423.0 <-- best
64 3 1389.2 1398.1 1412.7
S3-Compatible Storage
Works with MinIO, Ceph, and any S3-compatible endpoint:
./s3bench \
--endpoint http://minio.local:9000 \
--bucket testbucket \
--key bigfile.bin \
--access-key-id minioadmin \
--secret-access-key minioadmin \
--region us-east-1 \
--chunk-size 64MB \
--concurrency 8,16,32 \
--runs 3 \
--discard
Detailed Output
Per-Run Summary
=== Run 1 ===
Object: s3://my-bucket/path/to/large-file.bin
Object size: 10.00 GB
Chunk size: 64.00 MB (160 chunks)
Concurrency: 16 workers
Results:
Total time: 8.432 s
Total bytes: 10.00 GB
Throughput: 1184.3 MB/s (1.157 GB/s)
Time to 1st byte: 42.3 ms
Chunk latency (per-chunk download time):
Min: 341.2 ms
Max: 892.7 ms
Mean: 526.4 ms
P50: 512.1 ms
P95: 781.3 ms
P99: 856.4 ms
JSON Output
For scripting and automation:
./s3bench --bucket b --key k --concurrency 8,16,32 --discard --json \
| jq '.[] | {workers: .Concurrency, mean_mb_s: .Aggregate.mean_throughput_mb_s}'
Tuning Recommendations
Based on extensive testing, here are my recommendations:
1. Use Concurrency Sweep
Run --concurrency 4,8,16,32,64 to find your saturation point. Throughput will plateau when you’ve hit your network or storage ceiling.
2. Balance Chunk Size and Concurrency
- Smaller chunks + more workers = more connection overhead
- Larger chunks + fewer workers = potential worker idle time
- Sweet spot: Start with
--chunk-size L --concurrency 16and sweep from there
3. Multiple Runs for Accuracy
Use --runs 3 or more. The first run often shows slower results due to cold caches on the storage side.
4. Always Use Discard Mode for Throughput Testing
When measuring network/storage throughput, use --discard to eliminate local disk I/O as a variable.
5. Test from the Right Location
Results are only meaningful when measured from where your workload actually runs. For example:
- EC2 instance in the same region as the bucket (for AWS)
- Same network segment as your MinIO deployment
- Not from your laptop over VPN!
All Command-Line Options
| Flag | Default | Description |
|---|---|---|
--bucket |
(required) | S3 bucket name |
--key |
(required) | S3 object key to download |
--chunk-size |
64MB |
Size of each byte-range read |
--concurrency |
8 |
Parallel workers (single or comma-separated) |
--runs |
1 |
Repetitions per concurrency level |
--profile |
impossible |
AWS named profile |
--access-key-id |
"" |
Override access key |
--secret-access-key |
"" |
Override secret key |
--region |
us-east-1 |
AWS region |
--endpoint |
"" |
Custom S3-compatible endpoint |
--discard |
false |
Discard downloaded bytes |
--output |
"" |
Write to file instead |
--json |
false |
JSON output |
Conclusion
Multipart operations are essential for high-performance object storage access. With s3bench, you can:
- Find optimal concurrency settings
- Compare different storage backends
- Validate network infrastructure
- Benchmark before and after optimizations
The tool is open source under the MIT license. Check out the GitHub repository for the full source code and documentation.
Happy benchmarking!