TensorIR, RelayIR, and IRModule in TVM

Yan Wang

Oct 29, 2024

Concept of TensorIR, RelayIR, IRModule, Prim_fun, TVM Script

Two levels represent the entire neural network program; the first level is a computational graph, and the second level is the tensor program. The previous one is called RelayIR, and the second one is called TensorIR. What we use in TVM to represent the TensorIR is called TVM Script.
The tensor program contains the input and output, nested loop, and computation statement, which is a more fine-grained representation of the entire neural network program.
The difference between the neural network program and the tensor program is the latter is framework-independent. It can represent the neural network program from PyTorch, TensorFlow, and others in a unified format.
The above demonstration is from the vertical perspective.
From the horizontal perspective, each graph contains multiple computation nodes(such as add, mm, ReLu, etc). Each of these nodes has its corresponding part in TensorIR; Each node corresponds to a prim_fun in TensorIR.
Part of the graph contains one or more nodes, which is called a subgraph. Each subgraph corresponds to the combination of several prim_fun in TensorIR. The subgraph in RelayIR and the combination of several prim_fun in TensorIR are both called IRModule. Which is a container that includes several nodes or prim_func.

Four ways to get TensorIR

Write TVM Script
TE Expression
Transformation
Load from PyTorch

Schedule Compute Decomposition

Transformation

Source Code

Object

All the IR datatypes should be inherent in the base class called Object, which defines the attributes of this class, and a function called VisitAttrs.
In this way, the IR datatypes are in a unified format.
Serialize/format/reflection/python bind/hash, to put it simply, interact with these IRs.
The visitor pattern is a design pattern that makes the data and operations of that data separate. The benefits are from two aspects: 1. When we add new operations or change the original operations, we don’t have to change the code of data. 2. VisitAttrs allows us can process all the fields in a unified way.


class TensorNode : public Object {
public:
  /*! \brief The shape of the tensor */
  Array<Expr> shape;
  /*! \brief data type in the content of the tensor */
  Type dtype;
  /*! \brief the source operation, can be None */
  Operation op;
  /*! \brief the output index from source operation */
  int value_index{0};
  /*! \brief constructor */
  TensorNode() {}

  void VisitAttrs(AttrVisitor* v) final {
    v->Visit("shape", &shape);
    v->Visit("dtype", &dtype);
    v->Visit("op", &op);
    v->Visit("value_index", &value_index);
  }
};

Runtime

What is runtime? After we compile the code or the model code, we still need some other support to run the code, such as memory management, handling errors, etc. So, we develop a program to do that, and that program is called runtime.
In the following example, you need to: 1. load the compiled model code; 2. initialize new data; 3. search the needed function; 4. run the function and print the result.

import tvm
mod = tvm.runtime.load_module("compiled_artifact.so") # tvm.runtime.Module
arr = tvm.nd.array([1, 2, 3], device=tvm.cuda(0)) # tvm.runtime.NDArray
fun = mod["addone"] # tvm.runtime.PackedFunc
fun(a)
print(a.numpy())

shared_ptr

include/tvm/runtime/object.h defines Object and ObjectRef as two types.
The ObjectRef can be seen as the share_ptr<Object)
The shared_ptr has three features
- Automatic management of memory, specifically, automatically deletes the Object if it is released.
- One Object could have multiple shared_ptr
- Reference counter: if the count of references is 0, the object will be deleted.

#include <iostream>
#include <memory>

int main() {
    std::shared_ptr<int> p1 = std::make_shared<int>(10); // Creates a shared_ptr to an int with value 10
    std::shared_ptr<int> p2 = p1; // Now p1 and p2 share ownership of the same int

    std::cout << *p1 << std::endl; // Outputs: 10
    std::cout << "Use count: " << p1.use_count() << std::endl; // Outputs: 2, as p1 and p2 share ownership

    p2.reset(); // p2 releases ownership of the int
    std::cout << "Use count after reset: " << p1.use_count() << std::endl; // Outputs: 1, as only p1 owns the int now
}

PackedFunc

==TODO==

TVM has a Foreign Function Interface mechanism, which allows any language to call a function written by any language.
What should we do for the function? First of all, we need to erase the parameter type and return type of the function.

#include <tvm/runtime/packed_func.h>

void MyAdd(TVMArgs args, TVMRetValue* rv) {
  int a = args[0];  // Access arguments with type conversion
  int b = args[1];
  *rv = a + b;  // Set the return value
}

void CallPacked() {
  PackedFunc myadd = PackedFunc(MyAdd);  // Wrap MyAdd as a PackedFunc
  int c = myadd(1, 2);  // Call PackedFunc, returns 3
}

Module

The result of the compilation is Runtime.Module, which is a hashmap<functionname, PackedFunc>.
When this hashmap is retrieved, the caller gets the function and sets up the function and then runs the function.