Skip to content

Conversation

@steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Nov 14, 2025

Temporarily revert commit 390ba19 to help with debugging #4462

Edit: it was confirmed on #4462 that this revert solves the issue.

@steve-chavez steve-chavez force-pushed the temp-revert-3869 branch 2 times, most recently from 7c62701 to 693c1aa Compare November 14, 2025 22:35
@mkleczek
Copy link
Contributor

See #4472

@steve-chavez
Copy link
Member Author

I wonder why #3869 would cause #4462 (comment)? @taimoorzaeem Any ideas?

It would make sense to consume considerable memory for 125K tables at schema cache creation time, but why would it keep increasing as findTable is called on each request?

findTable :: QualifiedIdentifier -> TablesMap -> Either Error QualifiedIdentifier
findTable qi@QualifiedIdentifier{..} tableMap =
case HM.lookup qi tableMap of
Nothing -> Left $ SchemaCacheErr $ TableNotFound qiSchema qiName (HM.elems tableMap)
Just _ -> Right qi

@taimoorzaeem
Copy link
Collaborator

taimoorzaeem commented Dec 3, 2025

I wonder why #3869 would cause #4462 (comment)? @taimoorzaeem Any ideas?

It would make sense to consume considerable memory for 125K tables at schema cache creation time, but why would it keep increasing as findTable is called on each request?

It's quite strange. I tried tracing it and on error cases, control flow ends on the evaluation of Fuzzy.getOne here:

perhapsTable = Fuzzy.getOne fuzzyTableSet tblName

I am debugging further to get to the root cause.

@taimoorzaeem
Copy link
Collaborator

I tried to run the fuzzy search operation on a large set. I got the source tarball of fuzzyset-0.2.4 and built it with ghc-9.4.8. Here is what happened:

-- [~/github/fuzzyset]$ cabal v2-repl
ghci> import Data.FuzzySet
ghci> import Data.Text
ghci> fuzzyset = fromList ([ pack ("unknown_table_" <> show (i :: Int)) | i <- [1..100000]])
ghci> getOne fuzzyset (pack "table-with-a-weird-name")
Error: [Cabal-7125]
repl failed for fuzzyset-0.2.4. The build process was killed (i.e. SIGKILL). The typical
reason for this is that there is not enough memory available (e.g. the OS killed a
process using lots of memory).

The memory and CPU consumption spiked to 100% and then either the OS kills the process like above or the process nearly kills my OS (a few times it caused my ubuntu OS to kill all processes and logged out 😿). This does not seem safe.

The top command output is something like:

PID   USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
61689 taimooz   20   0 1024.5g   9.8g 128160 S 100.3  85.8   0:25.65 ghc-9.4+

In Nix Environment

However, if I try the same thing in postgrest nix-shell environment, the CPU spikes to 100% but the MEM usage remains low and the OS does not kill the process the expression eventually (takes about 1-2 minutes) evaluates to Nothing.

top output looks something like:

PID   USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
62098 taimooz   20   0 1024.8g 987124 174764 S 100.3   6.3   0:53.05 ghc    

Now, this is my analysis so far and from just testing in ghci. I am not sure how our warp HTTP server handles this kind of exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants