tf.data.experimental.make_csv_dataset参数解释

来源：http://www.tudoupe.com时间：2022-03-17

官方文档

默认

Reads CSV files into a dataset, where each element of the dataset is a (features, labels) tuple that corresponds to a batch of CSV rows. The features dictionary maps feature column names to Tensors containing the corresponding feature data, and labels is a Tensor containing the batch’s label data.
By default, the first rows of the CSV files are expected to be headers listing the column names. If the first rows are not headers, set header=False and provide the column names with the column_names argument.
By default, the dataset is repeated indefinitely, reshuffling the order each time. This behavior can be modified by setting the num_epochs and shuffle arguments.

将 CSV 文件读入数据集, 其每个元素是一组与 CSV 线匹配的元素( 属性、标签) 的集合。功能字典将包含相关特性数据的特性列名称链接到包含相关特性数据的长度, 而标签则是包含批量标签数据的长度。
列表的标题应该默认地显示在 CSV 文档的第一行。设定标题= False 并使用 collumn_ names 参数来提供列表, 如果第一行不是标题。
默认情况下数据集合会无休止地重复重复, 顺序会每次重新排序。 num_ epops and shuffle 参数可以更改以改变此行为。
如果发现模型在第一轮中完成,并确定这是否是问题的根源,那么这一点就意义重大。

Args

Args
`file_pattern`	List of files or patterns of file paths containing CSV records. See`tf.io.gfile.glob`for pattern rules. CSV 文件列表或文件路径模式中的 CSV 记录
`batch_size`	An int representing the number of records to combine in a single batch. 单个批量处理中要汇总的记录数量,单位类型。
`column_names`	An optional list of strings that corresponds to the CSV columns, in order. One per column of the input record. If this is not provided, infers the column names from the first row of the records. These names will be the keys of the features dict of each dataset element. 输入记录中的一列。如果未提供,则从记录的第一行外推列表。这些名称将是每个数据集元素的特征字典的关键。
`column_defaults`	A optional list of default values for the CSV fields. One item per selected column of the input record. Each item in the list is either a valid CSV dtype (float32, float64, int32, int64, or string), or a`Tensor`with one of the aforementioned types. The tensor can either be a scalar default value (if the column is optional), or an empty tensor (if the column is required). If a dtype is provided instead of a tensor, the column is also treated as required. If this list is not provided, tries to infer types based on reading the first num_rows_for_inference rows of files specified, and assumes all columns are optional, defaulting to`0`for numeric values and`""`for string values. If both this and`select_columns`are specified, these must have the same lengths, and`column_defaults`is assumed to be sorted in order of increasing column index. CSV 字段可能的默认值集合。以列表中的每一列项目填入空白。是有效的 CSV 数据类型( float32, float64, int32, int64, int64, 或字符串)要么是上面提到的"马斯"一类的"马斯"阈值可以设定为默认值(如果列是可选的)。如果您想要一个列,您也可以是空的。如果给定d类型以代替音量,必要时还将讨论该栏。如果无法取得这份名单,这是我第一次读一个文件, 我试着通过读读预言_rows_rows_for_ information line, 来猜测是哪一种。此外,假设所有栏目都是可选的,对于数值默认为`0`,使用字符串值作为默认`""`仅在`select_columns`它们都已经指定了,它们必须长度相同,它们必须宽度相同。`column_defaults`假设列指数增加的顺序。
`label_name`	A optional string corresponding to the label column. If provided, the data for this column is returned as a separate`Tensor`from the features dictionary, so that the dataset complies with the format expected by a`tf.Estimator.train`or`tf.Estimator.evaluate`input function. 代表标签栏的可选字符串如果提供，本栏中的数据在特写词典中作为单独的负载返回。为了将数据组与“ tf” 匹配, 请接受我的道歉, 估算师。“ 训练” 或“ tf ” 请接受我的道歉, 估算师。评估输入功能所需的格式。
`select_columns`	An optional list of integer indices or string column names, that specifies a subset of columns of CSV data to select. If column names are provided, these must correspond to names provided in`column_names`or inferred from the file header lines. When this argument is specified, only a subset of CSV columns will be parsed and returned, corresponding to the columns specified. Using this results in faster parsing and lower memory usage. If both this and`column_defaults`are specified, these must have the same lengths, and`column_defaults`is assumed to be sorted in order of increasing column index. 显示要选择的 CSV 数据栏子项的数值索引或文本列表的任择列表或文本列表列表。如果给出列表,它们必须符合`column_names`。从文档标题行中给出或推断的名称。在提供此选项时,将处理和返回与给定列相对应的 CSV 列子集。这将导致更快的分辨率和更少的内存使用。如果..`select_columns`和`column_defaults`它们都已经指定了,它们必须长度相同,它们必须宽度相同。`column_defaults`预计您将按照增加的列索引顺序进行分类。
`field_delim`	An optional`string`. Defaults to`","`. Char delimiter to separate fields in a record. 一个可选的`字符串`。默认为`","`要在记录中分隔字段,请使用字符分隔符。
`use_quote_delim`	An optional bool. Defaults to`True`. If false, treats double quotation marks as regular characters inside of the string fields. 一个可选的布尔值。将默认值设为“ True ” 。如果它是假的, 字符串字段中的双引号会被作为普通字符处理。
`na_value`	Additional string to recognize as NA/NaN. NA/NAN是另一个得到承认的字符串。
`header`	A bool that indicates whether the first rows of provided CSV files correspond to header lines with column names, and should not be included in the data. 布林值指定交付的 CSV 文档的第一行是否与带有列表的页眉行匹配,并且应该从数据中排除。
`num_epochs`	An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.} 此参数指定此数据收集中重复的整数。否则, 将回收数据集。
`shuffle`	A bool that indicates whether the input should be shuffled. 布林值指定输入是否要中断。
`shuffle_buffer_size`	Buffer size to use for shuffling. A large buffer size ensures better shuffling, but increases memory usage and startup time. 缓冲地带的大小:较大的缓冲地带提供更好的冲洗,但增加记忆使用和启动时间。
`shuffle_seed`	Randomization seed to use for shuffling. 随机选择打乱种子。
`prefetch_buffer_size`	An int specifying the number of feature batches to prefetch for performance improvement. Recommended value is the number of batches consumed per training step. Defaults to auto-tune. int 表示要预设的特性组数, 以便提高性能。每个训练阶段的批量被吃掉的数量是建议的数量。设置默认值为自动协调。
`num_parallel_reads`	Number of threads used to read CSV records from files. If >1, the results will be interleaved. Defaults to`1`. 用于从文件中读取 CSV 记录的线程数。如果>1，结果将被交错。默认为“1”。
`sloppy`	If`True`, reading performance will be improved at the cost of non-deterministic ordering. If`False`, the order of elements produced is deterministic prior to shuffling (elements are still randomized if`shuffle=True`. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to`False`. 如果有什么是"真",那就是"真"为了提高可读性,就牺牲了不确定性。如果答案是"False" 答案是"False"如果“ shuffle=True”, 则在 shuffle 之前选择所创建项目的顺序。元素仍然是随机的。请注意，如果设置了种子，因此,在洗牌之后的项目顺序是明确的。” 默认为“假”。
`num_rows_for_inference`	Number of rows of a file to use for type inference if record_defaults is not provided. If None, reads all the rows of all the files. Defaults to 100. 如果未指定记录_ 默认,则使用用来推断该类型的行文件数。如果为无,则读取所有文件的所有行。默认值为100。
`compression_type`	(Optional.) A`tf.string`scalar evaluating to one of`""`(no compression),`"ZLIB"`, or`"GZIP"`. Defaults to no compression. （可选。）一个`tf.string`标量评估为`""`（无压缩）、`"ZLIB"`或者"GZIP" 设定默认值为不压缩。
`ignore_errors`	(Optional.) If`True`, ignores errors with CSV file parsing, such as malformed data or empty lines, and moves on to the next valid CSV record. Otherwise, the dataset raises an error and stops processing when encountering any invalid records. Defaults to`False`. （可选。）如果为`True`时忽略 CSV 文件解析错误,例如错误格式化的数据或空行,然后转到下一个有效的 CSV 记录。相反,数据集在遇到任何错误记录时会产生错误并终止处理。设定默认值为“虚假 ” 。

上一篇：石大师装机大师好不好用？石大师装机大师使用方法

下一篇：openssl 基本算法小例

2023-05-06 微pe怎么初始化U盘(微pe怎么恢复初	2023-05-06 Xp系统boot 进入pe(boot manager 怎么进入
2023-05-06 win pe修复bcdboot(pe修复系统)	2023-05-06 win7更新失败 pe(win7更新失败还原更
2023-05-06 u盘装了pe读取不了(u盘能进pe读取不	2023-05-06 u盘pe 发热(u盘发热烫手)
2023-05-06 u盘pe下看不到硬盘(u盘启动pe看不到	2023-05-06 pe盘 ntfs(u盘ntfs格式)
2023-05-06 sony笔记本进入pe模式(联想笔记本怎	2023-05-06 pe启动盘进不去(pe启动盘进不去系统

tf.data.experimental.make_csv_dataset参数解释

官方文档

默认

Args

相关新闻

站内搜索