Skip to content

Conversation

@michel2323
Copy link
Member

There was a test failure on a Max 1100 GPU in

execute!(queue) do list
pattern = [42]
append_fill!(list, pointer(dst), pointer(pattern), sizeof(pattern), sizeof(src))
append_barrier!(list)
append_copy!(list, pointer(chk), pointer(dst), sizeof(src))
end
synchronize(queue)
@test all(isequal(42), chk)
caused by passing a pointer to a standard Julia array (pattern = [42]) to zeCommandListAppendMemoryFill. AFAIK, on discrete Intel GPUs (unlike integrated ones), standard host memory is often not directly accessible by the device command processor. I also fixed fill! to address the same issue.

In that vein, I will also add a GitHub Actions runner for that GPU.

@michel2323 michel2323 requested a review from maleadt December 1, 2025 16:46
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/array.jl b/src/array.jl
index 4d621cf..8ab4ba8 100644
--- a/src/array.jl
+++ b/src/array.jl
@@ -505,17 +505,17 @@ fill(v, dims...) = fill!(oneArray{typeof(v)}(undef, dims...), v)
 fill(v, dims::Dims) = fill!(oneArray{typeof(v)}(undef, dims...), v)
 
 function Base.fill!(A::oneDenseArray{T}, val) where T
-  length(A) == 0 && return A
-  val = convert(T, val)
-  sizeof(T) == 0 && return A
-
-  # execute! is async, so we need to allocate the pattern in USM memory
-  # and keep it alive until the operation completes.
-  buf = oneL0.host_alloc(context(A), sizeof(T), Base.datatype_alignment(T))
-  unsafe_store!(convert(Ptr{T}, buf), val)
-  unsafe_fill!(context(A), device(), pointer(A), convert(ZePtr{T}, buf), length(A))
-  synchronize(global_queue(context(A), device()))
-  oneL0.free(buf)
+    length(A) == 0 && return A
+    val = convert(T, val)
+    sizeof(T) == 0 && return A
+
+    # execute! is async, so we need to allocate the pattern in USM memory
+    # and keep it alive until the operation completes.
+    buf = oneL0.host_alloc(context(A), sizeof(T), Base.datatype_alignment(T))
+    unsafe_store!(convert(Ptr{T}, buf), val)
+    unsafe_fill!(context(A), device(), pointer(A), convert(ZePtr{T}, buf), length(A))
+    synchronize(global_queue(context(A), device()))
+    oneL0.free(buf)
   A
 end
 
diff --git a/test/level-zero.jl b/test/level-zero.jl
index ed7b283..3b13f34 100644
--- a/test/level-zero.jl
+++ b/test/level-zero.jl
@@ -271,22 +271,22 @@ let src = rand(Int, 1024)
     synchronize(queue)
     @test chk == src
 
-    # FIX: Allocate pattern in USM Host Memory
-    # Standard Host memory (stack/heap) is not accessible by discrete GPUs for fill patterns.
-    # We must use USM Host Memory.
-    pattern_val = 42
-    pattern_buf = oneL0.host_alloc(ctx, sizeof(Int), Base.datatype_alignment(Int))
-    unsafe_store!(convert(Ptr{Int}, pattern_buf), pattern_val)
+        # FIX: Allocate pattern in USM Host Memory
+        # Standard Host memory (stack/heap) is not accessible by discrete GPUs for fill patterns.
+        # We must use USM Host Memory.
+        pattern_val = 42
+        pattern_buf = oneL0.host_alloc(ctx, sizeof(Int), Base.datatype_alignment(Int))
+        unsafe_store!(convert(Ptr{Int}, pattern_buf), pattern_val)
 
     execute!(queue) do list
-        # Use the USM pointer (converted to ZePtr)
-        append_fill!(list, pointer(dst), convert(ZePtr{Int}, pattern_buf), sizeof(Int), sizeof(src))
+            # Use the USM pointer (converted to ZePtr)
+            append_fill!(list, pointer(dst), convert(ZePtr{Int}, pattern_buf), sizeof(Int), sizeof(src))
         append_barrier!(list)
         append_copy!(list, pointer(chk), pointer(dst), sizeof(src))
     end
     synchronize(queue)
 
-    oneL0.free(pattern_buf)
+        oneL0.free(pattern_buf)
 
     @test all(isequal(42), chk)
 

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.28%. Comparing base (a00fad6) to head (a919ae5).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #555      +/-   ##
==========================================
+ Coverage   79.24%   79.28%   +0.04%     
==========================================
  Files          46       46              
  Lines        3064     3070       +6     
==========================================
+ Hits         2428     2434       +6     
  Misses        636      636              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's curious; I developed this on an A770 discrete GPU where it worked fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants