Draft
Conversation
…eExecutor to accept a Seq of Buffer for input and expect a Buffer for output. Also removed MapExecutor
… buffers. Update GMem and RamGMem traits to include Vulkan buffer handling and cleanup methods. Introduce mapping and unmapping functionality in Buffer class for direct memory access. Now GContext needs to be updated to align with the changes
…ecute methods in GContext and GFunction to support uniform structures. Enhance SequenceExecutorTest to utilize new Buffer class for input and output operations.
MarconZet
reviewed
Jun 5, 2025
Comment on lines
44
to
59
| def map(): ByteBuffer = { | ||
| pushStack { stack => | ||
| val pData = stack.callocPointer(1) | ||
| check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation") | ||
| val dataPtr = pData.get(0) | ||
| if (dataPtr == NULL) { | ||
| throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1) | ||
| } | ||
| memByteBuffer(dataPtr, this.size) | ||
| } | ||
| } | ||
|
|
||
| def unmap(): Unit = { | ||
| vmaUnmapMemory(allocator.get, allocation) | ||
| } | ||
|
|
Collaborator
There was a problem hiding this comment.
I think maping should use more functional approach. Unmapping should not need to be called explicitly.
Collaborator
Author
There was a problem hiding this comment.
this?
Buffer.scala
def map[R](f: ByteBuffer => R): R = {
var dataPtr: Long = NULL
try {
dataPtr = pushStack { stack =>
val pData = stack.callocPointer(1)
check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation")
val ptr = pData.get(0)
if (ptr == NULL) {
throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1)
}
ptr
}
val byteBuffer = memByteBuffer(dataPtr, this.size)
f(byteBuffer)
} finally {
if (dataPtr != NULL) {
vmaUnmapMemory(allocator.get, allocation)
}
}
}
def get(dst: Array[Byte]): Unit = {
val len = Math.min(dst.length, size)
this.map { mappedBuffer =>
val bufferSlice = mappedBuffer.slice()
bufferSlice.limit(len)
bufferSlice.get(dst, 0, len)
}
}
protected def close(): Unit =
vmaDestroyBuffer(allocator.get, handle, allocation)
}
object Buffer {
def copyBuffer(src: ByteBuffer, dst: Buffer, bytes: Long): Unit = {
dst.map { dstMappedBuffer =>
val srcSlice = src.slice()
srcSlice.limit(bytes.toInt)
dstMappedBuffer.put(srcSlice)
vmaFlushAllocation(dst.allocator.get, dst.allocation, 0, bytes)
}
}
def copyBuffer(src: Buffer, dst: ByteBuffer, bytes: Long): Unit =
src.map { srcMappedBuffer =>
val srcSlice = srcMappedBuffer.slice()
srcSlice.limit(bytes.toInt)
dst.put(srcSlice)
}
Collaborator
Author
There was a problem hiding this comment.
959e7a6 - Let me know if that works
I'm figuring out the type mismatch error in Raytracing.scala
|
|
||
| import java.nio.ByteBuffer | ||
|
|
||
| private[cyfra] abstract class AbstractExecutor(dataLength: Int, val bufferActions: Seq[BufferAction], context: VulkanContext) { |
Collaborator
There was a problem hiding this comment.
Not much point in keeping AbstractExecutor now. All other executors were superseded by SequenceExecutor.
| buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (buffer, i) => | ||
| Buffer.copyBuffer(inputs(i), stagingBuffer, buffer.size) | ||
| Buffer.copyBuffer(stagingBuffer, buffer, buffer.size, commandPool).block().destroy() | ||
| buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (gpuDeviceBuffer, i) => |
Collaborator
There was a problem hiding this comment.
Wasn't the idea to move all the data transfers out of Executor? Now the executor still moves the data RAM <-> GRAM.
ef11eaa to
b605d8a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Solution
Executors
Implementation
The
executemethod inAbstractExecutor.scalaandSequenceExecutor.scalaare modified to accept and returnSeq[io.computenode.cyfra.vulkan.memory.Buffer].This involved changing the method signatures
and updating the internal logic to use
Bufferobjects for data transfer, typically involving host-visible buffers for input/output and device-local buffers for computation.AbstractExecutorSequenceExecutorThe main idea is:
Seq[Buffer]are assumed to be host-visible buffers. Their contents will be copied to the internal device-local compute buffers usingvkCmdCopyBuffer(via theBuffer.copyBufferhelper that utilizes a command pool).Bufferobjects will be created. The results from the internal device-local compute buffers will be copied to these new output buffers, again usingvkCmdCopyBuffer.The general-purpose
stagingBufferused forByteBuffertransfers can be removed, as directBuffertoBuffercopies (staged internally by theBuffer.copyBuffer) will be used.Input Buffer Copying (
BufferAction.LoadTo):* The
inputs: Seq[Buffer]are assumed to be already createdBufferobjects, typically withVK_MEMORY_PROPERTY_HOST_VISIBLE_BITandVK_BUFFER_USAGE_TRANSFER_SRC_BIT.*
computeBufferrefers to the internal buffers created bycreateBuffers, which are device-local (VMA_MEMORY_USAGE_GPU_ONLY) and haveVK_BUFFER_USAGE_STORAGE_BUFFER_BITandVK_BUFFER_USAGE_TRANSFER_DST_BIT(fromaction.action).*
Buffer.copyBuffer(inputHostBuffer, computeBuffer, inputHostBuffer.size, commandPool):* This helper method internally creates a temporary command buffer from the
commandPool.* It records a
vkCmdCopyBuffercommand to copy data frominputHostBuffertocomputeBuffer.* It submits this command buffer to a queue and returns a
Fencewhich is then waited upon (block()) to ensure the copy completes. The fence is then destroyed.* This is suitable for copying between host-visible and device-local memory.
Output Buffer Copying (
BufferAction.LoadFrom):*
computeBufferis an internal device-local buffer holding computation results. It needsVK_BUFFER_USAGE_TRANSFER_SRC_BIT(fromaction.action).*
val outputHostBuffer = new Buffer(...): A newBufferis created for each piece of output data.*
VK_BUFFER_USAGE_TRANSFER_DST_BIT: So it can be the target of a copy.*
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT: Makes the buffer memory mappable and visible to the host (CPU).HOST_COHERENTensures writes by the GPU are visible to the CPU without explicit flushing.*
VMA_MEMORY_USAGE_GPU_TO_CPU: VMA hint for memory that is written by GPU and read by CPU.*
Buffer.copyBuffer(computeBuffer, outputHostBuffer, computeBuffer.size, commandPool):* Similar to the input copy, this uses
vkCmdCopyBufferwithin a temporary command buffer to transfer data from the device-localcomputeBufferto the newly created host-visibleoutputHostBuffer.* The operation is synchronized using a fence.
* The
Seqof theseoutputHostBufferobjects is then returned.Solution
GMem
Implementation
toReadOnlyBuffermethod will be removed fromGMemandRamGMem.vulkanBufferof typeio.computenode.cyfra.vulkan.memory.Bufferwill be added to theGMemtrait.FloatMemandVec4FloatMemwill be updated to store and expose this VulkanBuffer. Their constructors will need to accept anio.computenode.cyfra.vulkan.memory.Buffer.toArray(using GContext)methods inFloatMemandVec4FloatMemto read data from their VulkanBufferback into anArray.FloatMemVec4FloatMemapplymethods that take asizeandGContextto create an uninitializedGMembacked by a VulkanBuffer.FloatMemVec4FloatMemThese changes should provide the necessary functionality for creating
GMeminstances from arrays, retrieving data back to arrays, and cleaning up the underlying Vulkan buffers(def cleanup(): Unit = vulkanBuffer.destroy()).GFunction
The primary role of
GFunctionhas been simplified:No Longer Holds a
ComputePipeline:GFunctionhad aval pipeline: ComputePipeline = context.compile(this)field. This meant that upon creation, aGFunctioninstance would immediately try to compile itself using the providedGContext.pipelinefield has been removed.GFunctionis now purely a container for the function logic (fn) and its associated type information (viaTags andGStructSchema).GContext.GFunctionfrom the compilation process, making it a simpler data structure.GFunctionneeded aGContextto compile, andGContextneededGFunctiondetails to compile.No Implicit
GContextin Primary Constructor:case class GFunction[...](fn: ...)(implicit context: GContext)took an implicitGContext.GContexthas been removed from the primary constructor.GFunctionno longer compiles itself upon instantiation, it doesn't need direct access toGContextat that point. The factory methods in theGFunctioncompanion object (applyandfrom2D) still take(using context: GContext)because the context might be needed by the caller or for future extensions, but theGFunctioninstance itself doesn't store it.GFunctionis now a more passive data holder representing the intent of a GPU computation, rather than an active object that manages its own compiled state.GContext
GContextCompilation and Execution Logic:* Refined
createPipelineinGContextto correctly derive the expression tree fromGFunction.fn.execute:sourceBuffersForExecutorcollects buffers likemem.vulkanBufferand theuniformStagingVkBufferthat theSequenceExecutorwill read from.bufferActionsmap is configured:LayoutLocation(0, 0)(main input) usesBufferAction.LoadTo.LayoutLocation(0, 1)(main output) usesBufferAction.LoadFrom, indicating it should be returned by the executor.LayoutLocation(0, 2)(uniforms) usesBufferAction.LoadToif uniforms exist.uniformStagingVkBufferis created for uniform data, populated, and added tosourceBuffersForExecutor. This buffer is destroyed after the execution.SequenceExecutoris instantiated and executed.outputVulkanBuffersis taken as the result.GMemis created by checking the type tag ofRand instantiating the corresponding concreteGMemtype (FloatMem,Vec4FloatMem). AnUnsupportedOperationExceptionis thrown for unhandled types, and the orphaned buffer is destroyed.dumpSpvToFilemethod now includes a try-catch forIOExceptionand ensurescode.rewind()is in afinallyblock, thoughrewindafterwritemight not be strictly necessary if the buffer isn't immediately reused for reading by the caller ofdumpSpvToFile. The primary use ofrewindhere is becauseshaderCodeis passed tonew Shaderafterwards.GMem
GMem.mapmethod takes afnof typeGFunction[G, H, R], whereGis a generic type parameter. IfGis notGStruct.Empty, thenGFunction[G, H, R]does not matchGFunction[GStruct.Empty, H, R], leading to the error.To fix this,
GMemshould provide twomapoverloads:G. Thismapmethod will also take theuniformStructinstance and pass it to the correspondingcontext.executeoverload.GStruct.Emptyas their uniform type. Thismapmethod will call thecontext.executeoverload that expects aGFunction[GStruct.Empty, H, R].Renderer and Animated Files (e.g., AnimatedFunctionRenderer, ImageRtRenderer, AnimationRtRenderer, Raytracing.scala)
Before:
Called fmem.map(fn) or similar, which only works if fn is a GFunction[GStruct.Empty, ...].
When fn used a custom uniform struct, this led to type errors.
After:
Now correctly call fmem.map(uniformStruct, fn) when fn uses a custom uniform struct.
The uniform struct (e.g., RaytracingIteration(i)) is explicitly created and passed to map.
This matches the correct overload and resolves type mismatches.
Example:
SequenceExecutorTest:
The test now creates a Vulkan Buffer, copies the input data into it, and passes this Buffer to sequenceExecutor.execute.
The output is read by mapping the result Vulkan Buffer and extracting the data as an array.
All Vulkan resources (Buffer, pipelines, executor, shader) are explicitly destroyed at the end of the test.