Skip to content

Conversation

@TE-7000026184
Copy link

No description provided.

@TE-7000026184 TE-7000026184 changed the title feat: support graph encoding backend auto-detection in wasi-nn [WIP] feat: support graph encoding backend auto-detection in wasi-nn Nov 14, 2025
@TE-7000026184
Copy link
Author

TE-7000026184 commented Nov 28, 2025

In current implementation of this PR, we assume that loading will fail if the format is not matched.
However, after checking some code, it seems the result of "load" function cannot indicate whether the file is in the correct format, for some back-ends, "load" functions only do some memory copy works.

Wondering whether there is an additional API for each back-end to check whether the model file is encoded in its way?
Plan B is to look into the content of the model file to be loaded. For "tflite" and "h5" format, header of the file contains magic number to verify the format, but "onnx" and some other formats do not have such things.

@TE-7000026184
Copy link
Author

How to check model file format:

openvino:
”load” function checks “builder“ number, should be two, one xml file and one binary weight file. Partial reliable.

onnx:
”load” function checks the status of ctx->ort_api->CreateSessionFromArray. Reliable.

tensorflow:
Is it really supported? It should be a directory containing a pb or pbtxt file.

pytorch:
Is it really supported? It should be a package.

tensorflowlite:
“TFL3” at 5-8 byte: “00 1c 00 00 46 54 33 4c 00 14 00 20 00 1c”. Need to add a check.

ggml:
“GGUF” at 1-4 byte: “47 47 46 55”. Need to add a check. But llama does not support “load“ function?

__attribute__((visibility("default"))) wasi_nn_error
load(void *ctx, graph_builder_array *builder, graph_encoding encoding,
     execution_target target, graph *g)
{
    return unsupported_operation;
}

@TE-7000026184 TE-7000026184 force-pushed the feature/backend-auto-detection branch from 9e94c9d to 44d2094 Compare December 2, 2025 08:31
@TE-7000026184 TE-7000026184 force-pushed the feature/backend-auto-detection branch from 4e8212d to 08a1d08 Compare December 2, 2025 08:35
@yamt
Copy link
Collaborator

yamt commented Dec 2, 2025

i guess you should explain the motivation of auto-detection.

@TE-7000026184
Copy link
Author

i guess you should explain the motivation of auto-detection.

The motivation is to implement auto-detect option in encoding when loading a model file. In case a binary model file without an extension is provided, or in case different types of model files are used by application user.

res = ensure_backend(instance, autodetect, wasi_nn_ctx);
if (res != success)
goto fail;
if (ends_with(nul_terminated_name, TFLITE_MODEL_FILE_EXT)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name here is not necessarily a filename.
see #4331

NN_ERR_PRINTF("Model too small to be a valid TFLite file.");
return invalid_argument;
}
if (memcmp(tfl_ctx->models[*g].model_pointer + 4, "TFL3", 4) != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_pointer is not even allocated at this point.

@yamt
Copy link
Collaborator

yamt commented Dec 5, 2025

i guess you should explain the motivation of auto-detection.

The motivation is to implement auto-detect option in encoding when loading a model file. In case a binary model file without an extension is provided, or in case different types of model files are used by application user.

i feel it's too much for a wasm runtime to implement file format detection.
although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO.
after all, the user should know the format for sure. what's the point to make a guess?

@ayakoakasaka
Copy link
Contributor

ayakoakasaka commented Dec 5, 2025

i feel it's too much for a wasm runtime to implement file format detection.
although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO.
after all, the user should know the format for sure. what's the point to make a guess?

I believe this is exactly because of wasi-nn’s design goal.

“Another design goal is to make the API framework- and model-agnostic; this allows for implementing the API with multiple ML frameworks and model formats. The load method will return an error message when an unsupported model encoding scheme is passed in. This approach is similar to how a browser deals with image or video encoding.”

In other words, wasi-nn is intentionally trying to be model-agnostic, which is why the API does not allow specifying the backend either in load or load_by_name.
Because of this, the current behavior requires us to recompile the runtime depending on the model we want to target.

So I think the intention is not to “guess” the format, but to stay consistent with the model-agnostic design and wasi-nn interface, where the runtime determines whether the provided model is supported or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants