Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnist.py Loader类加载错误 #52

Open
pengfei123xiao opened this issue Sep 21, 2019 · 1 comment
Open

mnist.py Loader类加载错误 #52

pengfei123xiao opened this issue Sep 21, 2019 · 1 comment

Comments

@pengfei123xiao
Copy link

pengfei123xiao commented Sep 21, 2019

您好,我在运行 mnist.py里的transpose(get_training_data_set())方法时,Loader类提示了错误。

     24         将unsigned byte字符转换为整数
     25         '''
---> 26         return struct.unpack('B', byte)[0]
     27 
     28 

TypeError: a bytes-like object is required, not 'int'

我的数据是从tensorflow内下载下来的。

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('', one_hot=True)

求指教,谢谢。

@Jiahui-Wu
Copy link

Jiahui-Wu commented Oct 17, 2019

去掉文件后面的.gz(http://yann.lecun.com/exdb/mnist/ 数据集介绍)

我把函数改写了:
def to_int(self, byte):
'''
将unsigned byte字符转换为整数
'''
#return struct.unpack('B', byte)[0]
#print(type(byte))
return byte

但是程序运行其它地方报错(自问自答,原来文件直接解压,修改好名字:其中一个 - 变成 .):
self.to_int(content[start + i * 28 + j]))
IndexError: index out of range

我调试了一下,是f.read()时的content没有读完全好像
def get_file_content(self):
'''
读取文件内容
'''
f = open(self.path, 'rb')
content = f.read()
print(len(content)) -----> 9912422,应该是60000X(28X28+16)
f.close()
return content

另外,start = index * 28 * 28 + 16 为什么要加16?是不是和offset有关。。
THE IDX FILE FORMAT
the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is

magic number
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data:
0x08: unsigned byte
0x09: signed byte
0x0B: short (2 bytes)
0x0C: int (4 bytes)
0x0D: float (4 bytes)
0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants