< Tensorflow > Data flow in Tensorflow

Using data queue to read data(abandoned, not support in future Tensorflow version)

If you store you data in traditional hard disk, I suggest you read data through the data format of tfrecord. However, if you store your data in solid state drives , I suggest that you read by FIFO Queue to directly read image by its root.
Firstly, you have to set up a FIFO Queue:

1
2
3
4
5
input_queue = data_flow_ops.FIFOQueue(capacity=3000000,
dtypes=[tf.string, tf.int64],
shapes=[(1,), (1,)],
shared_name=None, name=None)

The queue above contain two values. The first is image root, and the second is image label.
To load image roots and labels to the FIFO Queue, you have to use the a enqueue_op.

1
enqueue_op = input_queue.enqueue_many([image_paths_placeholder, labels_placeholder], name='enqueue_op')

Next, you should open up a session to feed your image roots and labels to respective placeholder.

1
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array})

During training, you have to dequeue the FIFO Queue and use multi-thread to accelerate the process.

1
2
3
4
5
6
7
8
9
10
11
images_and_labels_list = []
for _ in range(preprocess_threads):
filenames, label= input_queue.dequeue()
images = []
image_depths = []
for filename, single_label in zip(tf.unstack(filenames), tf.unstack(label)):
file_contents = tf.read_file(filename)
image = tf.image.decode_bmp(file_contents, channels=3)
image = tf.reshape(image, original_img_shape)
images.append(image)
images_and_labels_list.append([images, label])

Finally, to create batches of examples, you should use “tf.train.batch_join” api.

1
2
3
4
5
img_batch, img_depth_batch,label_batch = tf.train.batch_join(
images_and_labels_list, batch_size=batch_size,
shapes=[[to_height, to_width, channels], ()], enqueue_many=True,
capacity=4 * preprocess_threads * 100,
allow_smaller_final_batch=True)

Using tf.data API

Tensorflow officially promote using tf.data API for data processing. Using tf.data API is relatively easy to coding.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
   def _parse_data(line):
file_contents = tf.read_file(image_filepath)
image = tf.image.decode_bmp(file_contents, channels=3)
image = tf.reshape(image, original_img_shape)
return image, single_label


dataset = tf.data.TextLineDataset([file_root])

dataset = dataset.map(map_func=_parse_data, num_parallel_calls=4)
dataset = dataset.shuffle(buffer_size=batch_size * 3)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(epoc)
dataset = dataset.prefetch(2000)
data_iterator = dataset.make_one_shot_iterator()

return data_iterator

To accelerate the reading process, do not forget to use dataset.prefetch and num_parallel_calls.

< Tensorflow > Data flow in Tensorflow

https://zhengtq.github.io/2019/01/09/tf-read-data/

Author

Billy

Posted on

2019-01-09

Updated on

2021-03-13

Licensed under

Comments