-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Used buffer instead of stream for S3 uploads #25559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- GCS S3-compatible API fails on streamed uploads due to signature mismatch.
WalkthroughThe S3Storage adapter's save() method was modified to change how file data is uploaded to S3. Previously, it used fs.createReadStream(file.path) to provide the Body parameter for PutObjectCommand. It now uses fs.promises.readFile(file.path) to read the entire file into a Buffer before uploading. All other PutObjectCommand parameters (Bucket, Key, ContentType) remain unchanged. This is an internal implementation change with no public API signature modifications. Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ghost/core/core/server/adapters/storage/S3Storage.ts(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Acceptance tests (Node 22.18.0, sqlite3)
- GitHub Check: Acceptance tests (Node 22.18.0, mysql8)
- GitHub Check: Legacy tests (Node 22.18.0, mysql8)
- GitHub Check: Legacy tests (Node 22.18.0, sqlite3)
- GitHub Check: Ghost-CLI tests
- GitHub Check: Unit tests (Node 22.18.0)
- GitHub Check: Lint
- GitHub Check: Build & Push
|
|
||
| const key = this.buildKey(relativePath); | ||
| const body = fs.createReadStream(file.path); | ||
| const body = await fs.promises.readFile(file.path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for file size validations in the codebase
echo "=== Searching for file size validations ==="
rg -n 'filesize|file\.size|maxFileSize|MAX_FILE_SIZE|fileSizeLimit|max.*size' --type=ts --type=js | head -30
# Search for GCS/Google Cloud Storage files
echo -e "\n=== Searching for GCS/Google Cloud Storage files ==="
fd -type f \( -name "*.ts" -o -name "*.js" \) | xargs grep -l -i 'gcs\|google.*cloud\|google.*storage' 2>/dev/null
# Check storage adapters directory structure
echo -e "\n=== Storage adapters directory ==="
find . -path "*/storage/*" -type f \( -name "*.ts" -o -name "*.js" \)
# Check PutObjectCommand usage in S3Storage.ts
echo -e "\n=== PutObjectCommand usage in S3Storage.ts ==="
rg -n -B5 -A10 'PutObjectCommand' ghost/core/core/server/adapters/storage/S3Storage.ts
# Get context around line 118 in S3Storage.ts
echo -e "\n=== Context around line 118 in S3Storage.ts ==="
wc -l ghost/core/core/server/adapters/storage/S3Storage.ts
sed -n '100,140p' ghost/core/core/server/adapters/storage/S3Storage.ts | cat -n
# Search for imports in S3Storage.ts
echo -e "\n=== Imports in S3Storage.ts ==="
head -30 ghost/core/core/server/adapters/storage/S3Storage.ts | cat -n
# Look for stream usage in storage adapters
echo -e "\n=== Stream usage in storage adapters ==="
find . -path "*/adapters/storage/*" -type f \( -name "*.ts" -o -name "*.js" \) | xargs grep -n 'createReadStream\|Stream' 2>/dev/nullRepository: TryGhost/Ghost
Length of output: 50559
🌐 Web query:
AWS SDK v3 PutObjectCommand Body parameter best practices large files streaming vs buffer memory
💡 Result:
Short guidance — do not load very large files fully into memory; stream or use multipart uploads.
Key best practices
- Prefer streaming (Node.js Readable, fs.createReadStream, HTTP request streams) or use @aws-sdk/lib-storage’s Upload helper instead of building a single huge Buffer for PutObjectCommand. [2][4]
- For large objects (AWS recommends multipart for ~100 MB+; API requires parts ≥5 MB), use multipart upload to get parallelism, resumability, and lower retry cost. [1]
- Use @aws-sdk/lib-storage Upload when you need multipart for buffers, blobs, or streams — it handles splitting, concurrency, retries, and unknown lengths. Configure partSize and queueSize. Default min part = 5 MB. [2]
- Control memory: total in-memory buffer ≈ queueSize * partSize (ManagedUpload/Upload buffer behavior), so lower queueSize or partSize to reduce memory footprint. For single-threaded low-memory, set queueSize = 1. [3][2]
- Provide ContentLength when known (helps some behaviors) and avoid Content-MD5 when Upload will split parts (integrity behavior may differ). If you must preserve exact checksums, adjust partSize to avoid splitting. [4][2]
- If using PutObjectCommand directly: only do so for small files (fits comfortably in memory). For large files prefer multipart Upload or the SDK’s helper. [1][2]
Minimal config example (conceptual)
- Use Upload:
- partSize: >= 5 * 1024 * 1024
- queueSize: tune to memory and network (e.g., 1–4)
- Body: stream or buffer
- await upload.done();. [2]
Sources
- S3 multipart upload guidance — AWS docs. [1]
- @aws-sdk/lib-storage (Upload helper) README / npm. [2]
- ManagedUpload docs (buffering behavior / queueSize note). [3]
- Practical note on PutObject stream regression and recommendation to use Upload. [4]
Verify why this changed from streaming to buffer and assess memory impact for large files.
AWS SDK v3 documentation recommends against using buffers for large file uploads. For media/file uploads, streaming or the @aws-sdk/lib-storage Upload helper is preferred to avoid out-of-memory errors and performance degradation.
While file size validation exists upstream (via limitService in media.js), confirm:
- What specific issue motivated switching from streaming to buffer?
- Does the limit prevent media files large enough to cause memory problems (e.g., videos)?
- If large files are possible, consider using streaming or the Upload helper with multipart support instead.
🤖 Prompt for AI Agents
In ghost/core/core/server/adapters/storage/S3Storage.ts around line 118, the
code was changed to read the entire file into a Buffer (fs.promises.readFile)
which can cause OOM for large media; investigate why streaming was replaced with
buffering, confirm the effective maximum file size enforced by limitService
(ensure it rules out large videos), and if large files are allowed revert to a
streaming approach or use @aws-sdk/lib-storage's Upload helper with multipart
support for uploads; implement streaming by passing a file stream
(fs.createReadStream(file.path)) or switch to Upload to handle multipart and
retries, and add a comment referencing the limitService check and a
unit/integration test to ensure large-file upload behavior is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the GCS S3-compatible API doesn’t fully support streaming uploads the way real S3 does. With a stream, the signature keeps failing because GCS expects the full payload upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not my solution for production but just a patch fix to unblock testing. For production we'll shift to something like a multipart upload or steams(if that could work for GCS)
No description provided.