Except for the CAD models which are in STL formats, all files in the dataset are stored in binary formats. Details are as follows.
Each point cloud is stored in a binary file with a
.points extension. The file content is an array of 8+3*
N little-endian single-precision real numbers. The first 8 numbers specify the pose of the object (i.e. a combination of scale, translation, and rotation), in the following format:
where s is the scale, t = (x,y,z) is the translation, and q = (qw,qx,qy,qz) is a unit quaternion that represents a 3D rotation. This pose specifies uniquely a similarity transformation matrix M = [s R(q), t; 0, 1] that takes as input the coordinates of a point in the global 3D space and returns as output its coordinates in a local coordinate system associated with the object. The local coordinate system originates at the center of the object, one unit length in the local system is equivalent to the object's scale, and the orientation of the local system corresponds to the orientation of the object.
The next 3*N numbers specify the N points of the point cloud in 3D space, in the following order:
Note that you can derive N as N = (file_size - 32)/12.
All features extracted from a point cloud are stored in a single binary file with a .feats extension. The file content is an array of N features, each of which of size 156 bytes. Similarly, you can find N by dividing the file size by 156.
Each feature is represented by an array of 39 single-precision real numbers of the following format:
where (s,t=(x,y,z),q=(qw,qx,qy,qz)) specifies a similarity transformation matrix M = [s R(q), t; 0, 1] that takes as input a point in a local coordinate system associated with the feature and returns as output the coordinates of the point in the local coordinate system associated with the object. Here, the local coordinate system associated with the feature originates at the feature's center location, one unit length in the system is equivalent to the scale of the feature, and the system's orientation corresponds to the orientation of the feature. The vector(d1,d2,...,d31) specifies the feature's description.
All votes obtained by matching features extracted from a test point cloud with features in the dataset are stored in a single binary file with a .votes extension. This time, the file content is an array of N+1 votes, where the first vote contains the ground truth information of the object, and the rest N votes are the votes generated by matching the features.
Each vote is represented by an array of 12 single-precision real numbers of the following format:
Here, weight is the weight of the vote, classId is the index of the predicted object class (more details below), (s,t=(x,y,z),q=(qw,qx,qy,qz)) specifies a similarity transformation matrix M = [s R(q), t; 0, 1] that takes as input a point in the local coordinate system associated with the object and returns as output the coordinates of the point in the global 3D space. (testFeatureId,trainingFeatureId) is the pair of indices of the test feature and the training feature respectively that were matched and from which the vote was cast.
Both the features in the training dataset and the features extraced from a test scene are zero-based indexed. The details of indexing the training features are as follows. Object classes are indexed according to the following table:
There are 20 instances per object classes, namely "001", "002", ..., "020". These instances are zero-based indexed according to the ascending order of their names. Hence, a training feature is uniquely specified by the tuple (objectId, instanceId, localFeatureId) where objectId is the index of the training object class, instanceId is the instance index within the class, and localFeatureId is the zero-based index of the training feature within the set of all training features extracted from the instance. The set of all tuples corresponding to all the features in the training dataset are then sorted in the ascending order of objectId then instanceId then localFeatureId. The value of trainingFeatureId of each vote is the zero-based index to the tuple corresponding to the matching training feature.
Some objects of our dataset are rotationally symmetric. In fact, 7 out of 11 are rotationally symmetric. These are piston2, flange1, knob1, bracket1, block1, cog1 and cog2. Thus, when comparing two poses, one has to take into account the rotational symmetry of an object to have a fair result. We have precomputed all possible symmetric rotations for each object (i.e. rotations by which the object appears unchanged) and stored them as similarity transformations in binary files with a .rotsym extension. Since a similarity transformation is uniquely identified by a pose, we use the same format for object poses and feature poses. In other words, each .rotsym file contains a few similarity transformations, each of which is specified by 8 single-precision real numbers as follows:
where s is the scale, (x,y,z) is the translation, and (qw,qx,qy,qz) is a unit quaternion that represents the 3D rotation. Currently, all these transformations have unit scale and zero translation. However, this may change in the future when we will have more complicated objects. Note that the identity similarity transformation (x,y,z,s,qx,qy,qz,qw)=(0,0,0,1,0,0,0,1) is always included in each file.