14.TensorRT中文版开发教程-----TensorRT中的常见问题

来源：http://www.tudoupe.com时间：2022-05-23

TensorRT 最常见的问题

在这里插入图片描述

点击此处加入 NVIDIA 开发者方案。

以下部分涉及关于荷兰荷兰人口研究所TensorRT通常情况的最常见问题。

14.1. FAQs

这一部分的目标是协助解决问题,并回答经常提出的问题。

如何制造引擎来优化不同批量的大小?

答复:虽然TensorRT允许发动机在某一批量尺寸上以任何较小尺寸工作,但这些较小尺寸无法有效运行。OptProfilerSelector::kOPT创建尺寸最佳配置文件。

问题:引擎和校准表能否跨越TensorRT版本移动?

A:否。由于内部化和格式化将不断优化,不同版本之间也各不相同,因此引擎和校准表不能保证与TensorRT二进制的不同版本兼容。在使用新版本的TensorRT时,应用程序必须生成新引擎和INT8校准表。

您如何确定最佳工作空间大小?

某些TensorRT算法需要更多的 GPU 工作空间。IBuilderConfig::setMemoryPoolLimit()限制可分配工作空间的最大数量,阻止建筑商考虑需要更多空间的方法。IExecutionContext即使搜索文件夹(_E)..IBuilderConfig::setMemoryPoolLimit()因此,应用软件应尽可能为TensorRT建筑商提供尽可能多的工作空间;在操作时,TensorRT不会分配更多,而且往往会减少。

如何利用TensorRT在多个 GPU 上使用?

答：每个ICudaEngine当夸大时,物体会被附加在特定的 GPU 上, 无论是通过构建器还是反序列。要选择 GPU, 在使用构建器或反序列引擎之前使用它。cudaSetDevice()。每个IExecutionContext他们拥有与引擎相同的 GPU 。execute()或enqueue(),如果需要,通过拨号cudaSetDevice()检查线索是否被绑在正确的设备上。

问:我怎样才能从图书馆档案中获得TensorRT版本?

A:符号表只有一个名字。tensorrt_version_#_#_#_#,提供 TensorRT 版本号。在 Linux 上,您可以使用 nm 指令读取此符号如下:

问:如果我的网络返回错误的结果, 我该怎么办?
A:您的网络产生错误结果有几个原因。以下是一些解决问题的方法,有助于诊断问题:

在日志流中打开 VERBOSE 级别信息,并检查 TensorRT 报告的内容。
检查输入前处理是否正在生成网络所需的输入格式。
如果您选择的精度较低,请在 FP32 中运行网络。如果它提供正确的结果,在网络的动态范围中,精度较低可能不够。
将网络的中负载标记为输出并检查它是否符合你的期望

注:将负载标记为产出妨碍优化,从而影响结果。

质谱学可以帮助您调试和诊断。

问题:TensorRT的批量标准化如何运作?

TensorRT确实支持批量标准化。IElementWiseLayer序列完成。

问:为什么当我使用DLA时,我的网络比不使用DLA时慢?

DLA的目标是最大限度地提高能源效率。取决于 DLA 和 GPU 支持的功能,任何类型的成绩都会提高业绩。就业的实现类型取决于您的延迟或摄入需求以及您的权力预算。由于所有DLA引擎都独立于 GPU 和彼此独立,因此,你可以使用两种类型的实现来进一步提高网络的吞吐量。

问题:TensorRT能处理INT4或INT16吗?

TensorRT目前不支持INT4和INT16量化。

问:TensorRT何时支持我的网络?

答复:UFF已经退役。我们建议客户迁移到 ONNX 工作流程。 TensorRT ONNX 采集器是一个自由开放源码的项目。

问:我能否利用多位TensorRT建筑商建立各种目标?

TensorRT假设,它建造的设备的所有资源都可以最佳地加以利用。同时使用几个TensorRT建筑工匠(例如,多重数字T假设,它建造的设备的所有资源都可以最佳地加以利用。同时使用几个TensorRT建筑工匠(例如,多重)trtexec不同目标(例如,DLA0、DLA1系统资源的超额订阅(如CPU和GPU)可能导致不确定的行为(如计划效率低下、建筑工故障或系统不稳定)。

建议使用带有--saveEngine参数的trtexec为各种目的(DLA和GPU)准备和储存其计划的文件,然后可以重新装载这类计划文件(有新的资料)。--loadEngine参数的trtexec）并在各个目标（DLA0、DLA1、GPU提供了许多推理练习,这种分两步走的方法减少了系统资源在建设阶段的超额订阅,同时使建筑商能够着手执行排定的文件。

高温核心的哪一层加速了?

多数数学结合将加速由高柱核心加速-体积、反体积、完整连接和矩阵倍增。然而,在某些情况下,特别是在处理大量小型走廊或群落大小时,另一个实现速度可能更快,而且可能超过体积核心。

14.2.Understanding Error Messages

TensorRT 给出一个错误信息, 以便在出现执行错误时帮助解决问题。下一节讨论开发者可能面临的一些常见的不准确问题。

UFF 解析器的错误消息

下表捕获了常见的 UFF 解析器的错误消息。

Error Message	Description
This error message can occur due to incorrect input dimensions. In UFF, input dimensions should always be specified with the implicit batch dimensionnotincluded in the specification.


As indicated by the error message, the axis must be a build-time constant in order for UFF to parse the node correctly.

无法启动 Evolution 的邮件组件。

下表捕获了常见的无法启动 Evolution 的邮件组件。。有关特定 ONNX 节点支持的更多信息，请参阅 operators支持文档。

Error Message	Description
`<X> must be an initializer!`	These error messages signify that an ONNX node input tensor is expected to be an initializer in TensorRT. A possible fix is to run constant folding on the model using TensorRT’sPolygraphytool:
`!inputs.at(X).is_weights()`
This is an error stating that the ONNX parser does not have an import function defined for a particular operator, and did not find a corresponding plugin in the loaded registry for the operator.

TensorRT 核心库中的错误消息

下表捕获了常见的 TensorRT 核心库中的错误消息。

	Error Message	Description
Installation Errors	`Cuda initialization failure with error <code>. Please check cuda installation:` `http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html` `.`	This error message can occur if the CUDA or NVIDIA driver installation is corrupt. Refer to the URL for instructions on installing CUDA and the NVIDIA driver on your operating system.
Builder Errors	This error message occurs because there is no layer implementation for the given node in the network that can operate with the given workspace size. This usually occurs because the workspace size is insufficient but could also indicate a bug. If increasing the workspace size as suggested doesn’t help, report a bug (refer toHow Do I Report A Bug?).
	`<layer-name>: (kernel\|bias) weights has non-zero count but null values`	This error message occurs when there is a mismatch between the values and count fields in a Weights data structure passed to the builder. If the count is`0`, then the values field must contain a null pointer; otherwise, the count must be non-zero, and values must contain a non-null pointer.
	`Builder was created on device different from current device.`	This error message can show up if you: Created an IBuilder targeting one GPU, then Called`cudaSetDevice()`to target a different GPU, then Attempted to use the IBuilder to create an engine. Ensure you only use the`IBuilder`when targeting the GPU that was used to create the`IBuilder`.
	You can encounter error messages indicating that the tensor dimensions do not match the semantics of the given layer. Carefully read the documentation onNvInfer.hon the usage of each layer and the expected dimensions of the tensor inputs and outputs to the layer.
INT8 Calibration Errors	This warning occurs and should be treated as an error when data distribution for a tensor is uniformly zero. In a network, the output tensor distribution can be uniformly zero under the following scenarios: Constant tensor with all zero values; not an error. Activation (ReLU) output with all negative inputs: not an error. Data distribution is forced to all zero due to computation error in the previous layer; emit a warning here.¹ User does not provide any calibration images; emit a warning here.¹
This error message indicates that a calibration failure occurred with no scaling factors detected. This could be due to no INT8 calibrator or insufficient custom scales for network layers. For more information, refer tosampleINT8located in the`opensource/sampleINT8`directory in the GitHub repository to set up calibration correctly.
This error message can occur if you are running TensorRT using an engine PLAN file that is incompatible with the current version of TensorRT. Ensure you use the same version of TensorRT when generating the engine and running it.
This error message can occur if you build an engine on a device of a different compute capability than the device that is used to run the engine.
This warning message can occur if you build an engine on a device with the same compute capability but is not identical to the device that is used to run the engine. As indicated by the warning, it is highly recommended to use a device of the same model when generating the engine and deploying it to avoid compatibility issues.
These error messages can occur if there is insufficient GPU memory available to instantiate a givenTensorRTengine. Verify that the GPU has sufficient available memory to contain the required layer weights and activation tensors.


This error message can occur if you attempt to deserialize an engine that uses FP16 arithmetic on a GPU that does not support FP16 arithmetic. You either need to rebuild the engine without FP16 precision inference or upgrade your GPU to a model that supports FP16 precision inference.
This error message can occur if the`initialize()` method of a given plugin layer returns a non-zero value. Refer to the implementation of that layer to debug this error further. For more information, refer to TensorRT Layers .

14.3. Code Analysis Tools

14.3.1. Compiler Sanitizers

一系列代码分析工具,称为谷歌净化剂。

14.3.1.1. Issues With dlopen And Address Sanitizer

Sanitizer 当然,这里有一个已知的问题要记录在案。 sanitizerTensorRT 坠落了dlopen除非采用以下两种办法之一,否则记录和档案管理泄漏报告如下:

在Sanitizer下运行时不要调用dlclose。
将标志RTLD_NODELETE传递给dlopen。

14.3.1.2. Issues With dlopen And Thread Sanitizer

从多个线程使用dlopen为抑制此警告, 请为线索清理程序命名。tsan.supp并在文件中包括以下内容:

要在用线条净化器执行程序时设置环境变量,请使用以下命令:

14.3.1.3. Issues With CUDA And Address Sanitizer

CUDA 方案中有一个已知问题,这里有文件记载。要在地址清洁器下适当执行 CUDA 图书馆(如 TensorRT), 请选择 protect_shadow_gap=0添加到ASAN_OPTIONS环境变量中。

已知问题可能导致CUDA11.4 CUDA的地址清理程序中分配和发布错误不匹配。alloc_dealloc_mismatch=0添加到ASAN_OPTIONS以禁用这些错误。

14.3.1.4. Issues With Undefined Behavior Sanitizer

未定义的行为保护者(UBSan)使用它。 -fvisibility=hidden 如下文所述,备选方案报告报告的报告有误。 -fno-sanitize=vptr选项以避免UBSan报告此类误报。

14.3.2. Valgrind

Valgrind是一个动态分析工具框架,可用于自动发现程序内存管理和线性缺陷。

某些版本的valgrind和glibc被一个导致使用错误所左右dlopen错误的内存泄漏将会被报告。这首先可能是这种情况。valgrind的memcheck为解决这一问题,请在此记录中增加以下内容。valgrind抑制文件中：

CUDA 11. 大约4个可能发生在CUDA11valgrind,造成不匹配分布和释放过失。--show-mismatched-frees=no添加到valgrind为了防止这些错误,请使用命令行。

14.3.3. Compute Sanitizer

在计算清理程序下执行 TensorRT 应用程序时,由于功能不足, ComgetProcAddress 方法可能因错误代码500而失败。--report-api-errors no这是因为CUDA的工具包/河流组合可进入CUDA的后向兼容性检查能力,这些特征在CUDA的更高版本中引入,但目前平台上无法提供。

上一篇：WSL2+docker+redis 数据卷挂载部署

下一篇：C语言常见知识点

2023-04-16 2台电脑怎么共享(2台电脑怎么共享	2023-04-16 主板检测卡代码(电脑主板检测卡代
2023-04-16 dnf未响应(dnf未响应老是上不去)	2023-04-16 ppoe(pppoe拨号上网)
2023-04-16 网速不稳定(网速不稳定是路由器的	2023-04-16 wds状态(Wds状态成功)
2023-04-16 光标键(光标键不动了怎么办)	2023-04-16 电脑提速(电脑提速100倍的方法)
2023-04-16 切换用户(切换用户怎么切换回来	2023-04-16 数据包是什么(产品数据包是什么