Skip to content

Conversation

@vershwal
Copy link
Member

@vershwal vershwal commented Dec 1, 2025

No description provided.

- GCS S3-compatible API fails on streamed uploads due to signature mismatch.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

The S3Storage adapter's save() method was modified to change how file data is uploaded to S3. Previously, it used fs.createReadStream(file.path) to provide the Body parameter for PutObjectCommand. It now uses fs.promises.readFile(file.path) to read the entire file into a Buffer before uploading. All other PutObjectCommand parameters (Bucket, Key, ContentType) remain unchanged. This is an internal implementation change with no public API signature modifications.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Memory and performance implications: The shift from streaming to buffering means the entire file is now loaded into memory. Verify this is appropriate for the expected file sizes and doesn't introduce performance regressions or memory pressure issues.
  • S3 SDK compatibility: Confirm that the AWS SDK's PutObjectCommand properly accepts a Buffer for the Body parameter and maintains the same behavior as the stream-based approach.
  • Error handling: Ensure any errors from fs.promises.readFile() are properly handled or propagated by existing error handling logic.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess whether a description relates to the changeset. Add a pull request description explaining the rationale for switching from streams to buffers for S3 uploads and any performance or compatibility implications.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and specifically describes the main change: switching S3 upload implementation from streams to buffers.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch GCSUploadFix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc7b6c7 and 46dac4f.

📒 Files selected for processing (1)
  • ghost/core/core/server/adapters/storage/S3Storage.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Acceptance tests (Node 22.18.0, sqlite3)
  • GitHub Check: Acceptance tests (Node 22.18.0, mysql8)
  • GitHub Check: Legacy tests (Node 22.18.0, mysql8)
  • GitHub Check: Legacy tests (Node 22.18.0, sqlite3)
  • GitHub Check: Ghost-CLI tests
  • GitHub Check: Unit tests (Node 22.18.0)
  • GitHub Check: Lint
  • GitHub Check: Build & Push


const key = this.buildKey(relativePath);
const body = fs.createReadStream(file.path);
const body = await fs.promises.readFile(file.path);
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Search for file size validations in the codebase
echo "=== Searching for file size validations ==="
rg -n 'filesize|file\.size|maxFileSize|MAX_FILE_SIZE|fileSizeLimit|max.*size' --type=ts --type=js | head -30

# Search for GCS/Google Cloud Storage files
echo -e "\n=== Searching for GCS/Google Cloud Storage files ==="
fd -type f \( -name "*.ts" -o -name "*.js" \) | xargs grep -l -i 'gcs\|google.*cloud\|google.*storage' 2>/dev/null

# Check storage adapters directory structure
echo -e "\n=== Storage adapters directory ==="
find . -path "*/storage/*" -type f \( -name "*.ts" -o -name "*.js" \)

# Check PutObjectCommand usage in S3Storage.ts
echo -e "\n=== PutObjectCommand usage in S3Storage.ts ==="
rg -n -B5 -A10 'PutObjectCommand' ghost/core/core/server/adapters/storage/S3Storage.ts

# Get context around line 118 in S3Storage.ts
echo -e "\n=== Context around line 118 in S3Storage.ts ==="
wc -l ghost/core/core/server/adapters/storage/S3Storage.ts
sed -n '100,140p' ghost/core/core/server/adapters/storage/S3Storage.ts | cat -n

# Search for imports in S3Storage.ts
echo -e "\n=== Imports in S3Storage.ts ==="
head -30 ghost/core/core/server/adapters/storage/S3Storage.ts | cat -n

# Look for stream usage in storage adapters
echo -e "\n=== Stream usage in storage adapters ==="
find . -path "*/adapters/storage/*" -type f \( -name "*.ts" -o -name "*.js" \) | xargs grep -n 'createReadStream\|Stream' 2>/dev/null

Repository: TryGhost/Ghost

Length of output: 50559


🌐 Web query:

AWS SDK v3 PutObjectCommand Body parameter best practices large files streaming vs buffer memory

💡 Result:

Short guidance — do not load very large files fully into memory; stream or use multipart uploads.

Key best practices

  • Prefer streaming (Node.js Readable, fs.createReadStream, HTTP request streams) or use @aws-sdk/lib-storage’s Upload helper instead of building a single huge Buffer for PutObjectCommand. [2][4]
  • For large objects (AWS recommends multipart for ~100 MB+; API requires parts ≥5 MB), use multipart upload to get parallelism, resumability, and lower retry cost. [1]
  • Use @aws-sdk/lib-storage Upload when you need multipart for buffers, blobs, or streams — it handles splitting, concurrency, retries, and unknown lengths. Configure partSize and queueSize. Default min part = 5 MB. [2]
  • Control memory: total in-memory buffer ≈ queueSize * partSize (ManagedUpload/Upload buffer behavior), so lower queueSize or partSize to reduce memory footprint. For single-threaded low-memory, set queueSize = 1. [3][2]
  • Provide ContentLength when known (helps some behaviors) and avoid Content-MD5 when Upload will split parts (integrity behavior may differ). If you must preserve exact checksums, adjust partSize to avoid splitting. [4][2]
  • If using PutObjectCommand directly: only do so for small files (fits comfortably in memory). For large files prefer multipart Upload or the SDK’s helper. [1][2]

Minimal config example (conceptual)

  • Use Upload:
    • partSize: >= 5 * 1024 * 1024
    • queueSize: tune to memory and network (e.g., 1–4)
    • Body: stream or buffer
    • await upload.done();. [2]

Sources

  • S3 multipart upload guidance — AWS docs. [1]
  • @aws-sdk/lib-storage (Upload helper) README / npm. [2]
  • ManagedUpload docs (buffering behavior / queueSize note). [3]
  • Practical note on PutObject stream regression and recommendation to use Upload. [4]

Verify why this changed from streaming to buffer and assess memory impact for large files.

AWS SDK v3 documentation recommends against using buffers for large file uploads. For media/file uploads, streaming or the @aws-sdk/lib-storage Upload helper is preferred to avoid out-of-memory errors and performance degradation.

While file size validation exists upstream (via limitService in media.js), confirm:

  • What specific issue motivated switching from streaming to buffer?
  • Does the limit prevent media files large enough to cause memory problems (e.g., videos)?
  • If large files are possible, consider using streaming or the Upload helper with multipart support instead.
🤖 Prompt for AI Agents
In ghost/core/core/server/adapters/storage/S3Storage.ts around line 118, the
code was changed to read the entire file into a Buffer (fs.promises.readFile)
which can cause OOM for large media; investigate why streaming was replaced with
buffering, confirm the effective maximum file size enforced by limitService
(ensure it rules out large videos), and if large files are allowed revert to a
streaming approach or use @aws-sdk/lib-storage's Upload helper with multipart
support for uploads; implement streaming by passing a file stream
(fs.createReadStream(file.path)) or switch to Upload to handle multipart and
retries, and add a comment referencing the limitService check and a
unit/integration test to ensure large-file upload behavior is safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the GCS S3-compatible API doesn’t fully support streaming uploads the way real S3 does. With a stream, the signature keeps failing because GCS expects the full payload upfront.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my solution for production but just a patch fix to unblock testing. For production we'll shift to something like a multipart upload or steams(if that could work for GCS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants