上篇文章记录了如何在Win10下配置TensorRT，这篇将记录如何将一个最简单的超分辨率SRCNN的TensorFlow模型.tf转化为TensorRT的engin文件，最后使用TensorRT推导。

模型格式转换：`.tf->.onnx`

安装tf2onnx和onnxruntime

1 2	pip install onnxruntime pip install git+https://github.com/onnx/tensorflow-onnx

转换命令

1	python -m tf2onnx.convert --saved-model ./checkpoints/yolov4.tf --output model.onnx --opset 11 --verbose

成功生成onnx模型：

(base) C:\Users\11197\Desktop\vitsr\models>python -m tf2onnx.convert --saved-model vitsr_4x.tf --output model.onnx --opset 11 --verbose
2022-05-31 11:44:25.907286: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
C:\Users\11197\Miniconda3\lib\runpy.py:127: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2022-05-31 11:44:27,590 - WARNING - tf2onnx: ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
2022-05-31 11:44:27.592219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2022-05-31 11:44:27.605153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.56GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-31 11:44:27.605279: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2022-05-31 11:44:27.612433: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-05-31 11:44:27.612553: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2022-05-31 11:44:27.615466: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2022-05-31 11:44:27.616751: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2022-05-31 11:44:27.619042: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2022-05-31 11:44:27.621767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2022-05-31 11:44:27.622415: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-05-31 11:44:27.622605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-05-31 11:44:27.623070: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-31 11:44:27.623904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.56GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-31 11:44:27.624021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-05-31 11:44:27.951984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-31 11:44:27.952142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2022-05-31 11:44:27.952264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2022-05-31 11:44:27.952483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5484 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-05-31 11:44:27,953 - WARNING - tf2onnx.tf_loader: '--tag' not specified for saved_model. Using --tag serve
2022-05-31 11:44:36,348 - INFO - tf2onnx.tf_loader: Signatures found in model: [serving_default].
2022-05-31 11:44:36,348 - WARNING - tf2onnx.tf_loader: '--signature_def' not specified, using first signature: serving_default
2022-05-31 11:44:36,348 - INFO - tf2onnx.tf_loader: Output names: ['output_1']
2022-05-31 11:44:36.633737: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-05-31 11:44:36.633977: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2022-05-31 11:44:36.635395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.56GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-31 11:44:36.635531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-05-31 11:44:36.635679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-31 11:44:36.635805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2022-05-31 11:44:36.635919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2022-05-31 11:44:36.636120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5484 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-05-31 11:44:36.699775: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 702 nodes (567), 1002 edges (867), time = 11.066ms.
  function_optimizer: function_optimizer did nothing. time = 0.291ms.

2022-05-31 11:44:37.342929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.56GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-31 11:44:37.343116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-05-31 11:44:37.343250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-31 11:44:37.343378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2022-05-31 11:44:37.343482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2022-05-31 11:44:37.343648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5484 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
WARNING:tensorflow:From C:\Users\11197\Miniconda3\lib\site-packages\tf2onnx\tf_loader.py:711: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-05-31 11:44:37,499 - WARNING - tensorflow: From C:\Users\11197\Miniconda3\lib\site-packages\tf2onnx\tf_loader.py:711: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-05-31 11:44:37.851693: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-05-31 11:44:37.851885: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2022-05-31 11:44:37.852918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.56GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-31 11:44:37.853025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-05-31 11:44:37.853114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-31 11:44:37.853190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2022-05-31 11:44:37.853276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2022-05-31 11:44:37.853433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5484 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-05-31 11:44:37.999328: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 348 nodes (-354), 538 edges (-464), time = 22.997ms.
  function_optimizer: function_optimizer did nothing. time = 0.407ms.
  constant_folding: Graph size after: 348 nodes (0), 538 edges (0), time = 4.128ms.
  function_optimizer: function_optimizer did nothing. time = 0.246ms.

2022-05-31 11:44:39,646 - INFO - tf2onnx: inputs: ['input_1:0']
2022-05-31 11:44:39,646 - INFO - tf2onnx: outputs: ['Identity:0']
2022-05-31 11:44:39,894 - INFO - tf2onnx.tfonnx: Using tensorflow=2.5.0, onnx=1.11.0, tf2onnx=1.10.0/16eb4b
2022-05-31 11:44:39,895 - INFO - tf2onnx.tfonnx: Using opset <onnx, 11>
2022-05-31 11:44:48,170 - INFO - tf2onnx.tf_utils: Computed 0 values for constant folding
2022-05-31 11:44:54,755 - VERBOSE - tf2onnx.tfonnx: Mapping TF node to ONNX node(s)
2022-05-31 11:44:54,810 - VERBOSE - tf2onnx.tfonnx: Summay Stats:
        tensorflow ops: Counter({'Const': 199, 'Mul': 26, 'AddV2': 25, 'Conv3D': 22, 'BiasAdd': 22, 'Relu': 22, 'ConcatV2': 7, 'Squeeze': 6, 'Identity': 5, 'StridedSlice': 4, 'DepthToSpace': 3, 'Split': 2, 'Softmax': 2, 'Placeholder': 1, 'NoOp': 1, 'ResizeBilinear': 1, 'Pad': 1})
        tensorflow attr: Counter({'dtype': 200, 'value': 199, 'data_format': 47, 'dilations': 22, 'padding': 22, 'strides': 22, 'N': 7, 'Tidx': 7, 'squeeze_dims': 6, 'begin_mask': 4, 'ellipsis_mask': 4, 'end_mask': 4, 'new_axis_mask': 4, 'shrink_axis_mask': 4, 'block_size': 3, 'num_split': 2, 'shape': 1, 'align_corners': 1, 'half_pixel_centers': 1})
        onnx mapped: Counter({'Const': 111, 'Mul': 26, 'AddV2': 25, 'Conv3D': 22, 'BiasAdd': 22, 'Relu': 22, 'ConcatV2': 7, 'Squeeze': 6, 'Identity': 4, 'StridedSlice': 4, 'DepthToSpace': 3, 'Split': 2, 'Softmax': 2, 'Placeholder': 1, 'ResizeBilinear': 1, 'Pad': 1})
        onnx unmapped: Counter()
2022-05-31 11:44:54,811 - INFO - tf2onnx.optimizer: Optimizing ONNX model
2022-05-31 11:44:54,811 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2022-05-31 11:44:54,969 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: Add -22 (47->25), Const +23 (128->151), Identity -3 (5->2), Reshape +45 (0->45), Transpose -44 (52->8)
2022-05-31 11:44:54,970 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2022-05-31 11:44:54,991 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2022-05-31 11:44:54,992 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2022-05-31 11:44:55,022 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: Cast -1 (1->0), Const -43 (151->108), Reshape -43 (45->2), Transpose -1 (8->7)
2022-05-31 11:44:55,023 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2022-05-31 11:44:55,039 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2022-05-31 11:44:55,039 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2022-05-31 11:44:55,056 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2022-05-31 11:44:55,056 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2022-05-31 11:44:55,075 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: Const -3 (108->105)
2022-05-31 11:44:55,076 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2022-05-31 11:44:55,092 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2022-05-31 11:44:55,093 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2022-05-31 11:44:55,109 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2022-05-31 11:44:55,109 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2022-05-31 11:44:55,127 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2022-05-31 11:44:55,127 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2022-05-31 11:44:55,143 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: Identity -2 (2->0)
2022-05-31 11:44:55,143 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2022-05-31 11:44:55,159 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2022-05-31 11:44:55,159 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2022-05-31 11:44:55,176 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2022-05-31 11:44:55,176 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2022-05-31 11:44:55,196 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
2022-05-31 11:44:55,197 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2022-05-31 11:44:55,213 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2022-05-31 11:44:55,214 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2022-05-31 11:44:55,756 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: no change
2022-05-31 11:44:55,757 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2022-05-31 11:44:55,773 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2022-05-31 11:44:55,773 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2022-05-31 11:44:55,789 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2022-05-31 11:44:55,790 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2022-05-31 11:44:55,807 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
2022-05-31 11:44:55,807 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2022-05-31 11:44:55,824 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2022-05-31 11:44:55,824 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2022-05-31 11:44:55,841 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2022-05-31 11:44:55,842 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2022-05-31 11:44:55,858 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2022-05-31 11:44:55,858 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2022-05-31 11:44:55,875 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: no change
2022-05-31 11:44:55,875 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2022-05-31 11:44:55,892 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2022-05-31 11:44:55,892 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2022-05-31 11:44:55,909 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2022-05-31 11:44:55,911 - INFO - tf2onnx.optimizer: After optimization: Add -22 (47->25), Cast -1 (1->0), Const -23 (128->105), Identity -5 (5->0), Reshape +2 (0->2), Transpose -45 (52->7)
2022-05-31 11:44:55,935 - INFO - tf2onnx:
2022-05-31 11:44:55,935 - INFO - tf2onnx: Successfully converted TensorFlow model vitsr_4x.tf to ONNX
2022-05-31 11:44:55,935 - INFO - tf2onnx: Model inputs: ['input_1']
2022-05-31 11:44:55,935 - INFO - tf2onnx: Model outputs: ['output_1']
2022-05-31 11:44:55,935 - INFO - tf2onnx: ONNX model is saved at model.onnx

生成engin文件

在开发者手册里面第4章介绍了Python API，给了一些基本用法：

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
success = parser.parse_from_file("models/model.onnx")
for idx in range(parser.num_errors):
    print(parser.get_error(idx))
if not success:
    pass # Error handling code here
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20) # 1 MiB
serialized_engine = builder.build_serialized_network(network, config)
with open("sample.engine", "wb") as f:
    f.write(serialized_engine)

使用该程序报错如下：

(base) C:\Users\11197\Desktop\vitsr>python quantization.py
[05/31/2022-12:11:18] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/31/2022-12:11:18] [TRT] [E] 4: [network.cpp::nvinfer1::Network::validate::3011] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
[05/31/2022-12:11:18] [TRT] [E] 2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::619] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
  File "C:\Users\11197\Desktop\vitsr\quantization.py", line 21, in <module>
    f.write(serialized_engine)
TypeError: a bytes-like object is required, not 'NoneType'

按照报错，根据开发者手册8.2 Optimization Profiles添加了一些配置：

profile = builder.create_optimization_profile()
profile.set_shape("input_1", (1, 75, 75, 3), (1, 75, 75, 3), (1, 75, 75, 3))
profile.set_shape("output_1", (1, 300, 300, 3), (1, 300, 300, 3), (1, 300, 300, 3))
config.add_optimization_profile(profile)

最后生成成功

(base) C:\Users\11197\Desktop\srcnn>python serialize.py
[06/01/2022-11:54:18] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/01/2022-11:54:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.5.1
[06/01/2022-11:54:19] [TRT] [W] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1

推理

这里我不太会写，在一篇知乎文章上修改：https://zhuanlan.zhihu.com/p/347172593

要注意的是，.engin文件的输入输出如下

1 2	input_1 16875 <class 'numpy.float32'> output_1 270000 <class 'numpy.float32'>

字段分别是：name，size，dtype。输入时需要把图片flatten，输出时需要把图片reshape。

核心代码：

import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt

cfx = cuda.Device(0).make_context()
stream = cuda.Stream()
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)

engine_file_path = "sample.engine"
with open(engine_file_path, "rb") as f:
     engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = []

for binding in engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = trt.nptype(engine.get_binding_dtype(binding))
    print(binding, size, dtype)

    # 分配主机和设备buffers
    host_mem = cuda.pagelocked_empty(size, dtype)    # 主机
    cuda_mem = cuda.mem_alloc(host_mem.nbytes)       # 设备
    # 将设备buffer绑定到设备.
    bindings.append(int(cuda_mem))
    # 绑定到输入输出
    if engine.binding_is_input(binding):
         host_inputs.append(host_mem)           # CPU
         cuda_inputs.append(cuda_mem)           # GPU
    else:
         host_outputs.append(host_mem)
         cuda_outputs.append(cuda_mem)

import time
import numpy as np
from PIL import Image

for i in range(701,761):
    image = np.array(Image.open("./data/40_10_test/LR/Frame0%d.png" % i))[np.newaxis,...]

    t1 = time.time()
    # 拷贝输入图像到主机buffer
    np.copyto(host_inputs[0], image.flatten())
    # 将输入数据转到GPU.
    cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
    # 推理.
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    # 将推理结果传到CPU.
    cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
    # 同步 stream
    stream.synchronize()
    # 拿到推理结果 batch_size = 1
    output = host_outputs[0].reshape(300,300,3)
    t2 = time.time()

    print("Inference time: %.2f ms"%(1000*t2-1000*t1))

cfx.pop()

命令行输出：

(base) C:\Users\11197\Desktop\srcnn>python inference.py
[06/01/2022-11:59:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +395, GPU +0, now: CPU 6761, GPU 1332 (MiB)
[06/01/2022-11:59:25] [TRT] [I] Loaded engine size: 0 MiB
[06/01/2022-11:59:25] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB)
[06/01/2022-11:59:25] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +34, now: CPU 0, GPU 35 (MiB)
input_1 16875 <class 'numpy.float32'>
output_1 270000 <class 'numpy.float32'>
Inference time: 1.00 ms
Inference time: 1.00 ms
Inference time: 1.00 ms
Inference time: 1.00 ms
Inference time: 1.03 ms
...

原来使用TensorFlow-GPU推理速度是50ms，现在竟然只要1ms，速度提升了50倍！！！

参考文献

将 TensorFlow 模型转换为 ONNX：https://docs.microsoft.com/zh-cn/windows/ai/windows-ml/tutorials/tensorflow-convert-model
https://github.com/NVIDIA/TensorRT/issues/301
https://zhuanlan.zhihu.com/p/347172593

猪老大要进步！

TensorRT--Windows下使用

模型格式转换：`.tf->.onnx`

生成engin文件

推理

参考文献

TensorRT--Windows下使用

模型格式转换：.tf->.onnx

生成engin文件

推理

参考文献

模型格式转换：`.tf->.onnx`