我已经阅读了许多类似的问题,但却无法使其正常工作。
我的模型训练得很好,每个时代都有检查点文件。 我想拥有它,所以程序可以从重新加载的纪元x继续,并且还可以在每次迭代时打印在该纪元上。 我可以简单地将数据保存在检查点文件之外,但是我也想要这样做以给我信心,其他一切也正确存储。
不幸的是,当我重新启动时,epoch / global_step变量中的值始终为0。
import tensorflow as tf import numpy as np import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(init_op, sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) sess.run(init_op) return with tf.Graph().as_default() as g: # build models total_batch = data.train.num_examples / batch_size epochLimit = 51 saver = tf.train.Saver() init_op = tf.global_variables_initializer() with tf.Session() as sess: saver = tf.train.Saver() init_op = tf.global_variables_initializer() restore(init_op, sess, saver) epoch = global_step.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)我还使用底部的assign更新了全局步骤变量。 有任何想法吗? 任何帮助将不胜感激。
I have read many similar questions and just cannot get this to work properly.
I have my model being trained well and checkpoint files are being made every epoch. I want to have it so the program can continue from epoch x once reloaded and also for it to print that is on that epoch with every iteration. I could simply save the data outside of the checkpoint file, however I was also wanting to do this to give me confidence everything else is also being stored properly.
Unfortunately the value in the epoch/global_step variable is always still 0 when I restart.
import tensorflow as tf import numpy as np import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(init_op, sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) sess.run(init_op) return with tf.Graph().as_default() as g: # build models total_batch = data.train.num_examples / batch_size epochLimit = 51 saver = tf.train.Saver() init_op = tf.global_variables_initializer() with tf.Session() as sess: saver = tf.train.Saver() init_op = tf.global_variables_initializer() restore(init_op, sess, saver) epoch = global_step.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)I also update the global step variable using assign at the bottom. Any ideas? Any help would be greatly appreciated.
最满意答案
在恢复后调用sess.run(init_op) ,会将所有变量重置为其初始值。 注释掉线,事情应该有效。
My original code was wrong for several reasons because I was trying so many things. The first responder Alexandre Passos gives a valid point, but I believe what changed the game was also the use of scopes (maybe?).
Below is the working updated code if it helps anyone:
import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) return saver, True, sess saver = tf.train.Saver() init_op = tf.global_variables_initializer() sess.run(init_op) return saver, False , sess batch_size = 100 learning_rate = 0.0001 beta1 = 0.5 z_size = 100 save_interval = 1 data = dataset.read() total_batch = data.train.num_examples / batch_size def fill_queue(): for i in range(int(total_batch*epochLimit)): sess.run(enqueue_op, feed_dict={batch: data.train.next_batch(batch_size)}) # runnig in seperate thread to feed a FIFOqueue with tf.variable_scope("glob"): global_step = tf.get_variable(name='global_step', initializer=0,trainable=False) # build models epochLimit = 51 saver = tf.train.Saver() with tf.Session() as sess: saver,rstr,sess = restore(sess, saver) with tf.variable_scope("glob", reuse=True): epocht = tf.get_variable(name='global_step', trainable=False, dtype=tf.int32) epoch = epocht.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)无法从tensorflow 1.1中的上一个会话加载int变量(Cannot load int variable from previous session in tensorflow 1.1)我已经阅读了许多类似的问题,但却无法使其正常工作。
我的模型训练得很好,每个时代都有检查点文件。 我想拥有它,所以程序可以从重新加载的纪元x继续,并且还可以在每次迭代时打印在该纪元上。 我可以简单地将数据保存在检查点文件之外,但是我也想要这样做以给我信心,其他一切也正确存储。
不幸的是,当我重新启动时,epoch / global_step变量中的值始终为0。
import tensorflow as tf import numpy as np import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(init_op, sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) sess.run(init_op) return with tf.Graph().as_default() as g: # build models total_batch = data.train.num_examples / batch_size epochLimit = 51 saver = tf.train.Saver() init_op = tf.global_variables_initializer() with tf.Session() as sess: saver = tf.train.Saver() init_op = tf.global_variables_initializer() restore(init_op, sess, saver) epoch = global_step.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)我还使用底部的assign更新了全局步骤变量。 有任何想法吗? 任何帮助将不胜感激。
I have read many similar questions and just cannot get this to work properly.
I have my model being trained well and checkpoint files are being made every epoch. I want to have it so the program can continue from epoch x once reloaded and also for it to print that is on that epoch with every iteration. I could simply save the data outside of the checkpoint file, however I was also wanting to do this to give me confidence everything else is also being stored properly.
Unfortunately the value in the epoch/global_step variable is always still 0 when I restart.
import tensorflow as tf import numpy as np import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(init_op, sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) sess.run(init_op) return with tf.Graph().as_default() as g: # build models total_batch = data.train.num_examples / batch_size epochLimit = 51 saver = tf.train.Saver() init_op = tf.global_variables_initializer() with tf.Session() as sess: saver = tf.train.Saver() init_op = tf.global_variables_initializer() restore(init_op, sess, saver) epoch = global_step.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)I also update the global step variable using assign at the bottom. Any ideas? Any help would be greatly appreciated.
最满意答案
在恢复后调用sess.run(init_op) ,会将所有变量重置为其初始值。 注释掉线,事情应该有效。
My original code was wrong for several reasons because I was trying so many things. The first responder Alexandre Passos gives a valid point, but I believe what changed the game was also the use of scopes (maybe?).
Below is the working updated code if it helps anyone:
import tensorflow as tf import numpy as np # more imports def extract_number(f): # used to get latest checkpint file s = re.findall("epoch(\d+).ckpt",f) return (int(s[0]) if s else -1,f) def restore(sess, saver): # called to restore or just initialise model list = glob(os.path.join("./params/e*")) if list: file = max(list,key=extract_number) saver.restore(sess, file[:-5]) return saver, True, sess saver = tf.train.Saver() init_op = tf.global_variables_initializer() sess.run(init_op) return saver, False , sess batch_size = 100 learning_rate = 0.0001 beta1 = 0.5 z_size = 100 save_interval = 1 data = dataset.read() total_batch = data.train.num_examples / batch_size def fill_queue(): for i in range(int(total_batch*epochLimit)): sess.run(enqueue_op, feed_dict={batch: data.train.next_batch(batch_size)}) # runnig in seperate thread to feed a FIFOqueue with tf.variable_scope("glob"): global_step = tf.get_variable(name='global_step', initializer=0,trainable=False) # build models epochLimit = 51 saver = tf.train.Saver() with tf.Session() as sess: saver,rstr,sess = restore(sess, saver) with tf.variable_scope("glob", reuse=True): epocht = tf.get_variable(name='global_step', trainable=False, dtype=tf.int32) epoch = epocht.eval() while epoch < epochLimit: total_batch = data.train.num_examples / batch_size for i in range(int(total_batch)): sys.stdout.flush() voxels = newData.eval() batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) sess.run(opt_G, feed_dict={z:batch_z, train:True}) sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True}) with open("out/loss.csv", 'a') as f: batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False}) batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False}) msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D) print(msgOut) epoch=epoch+1 sess.run(global_step.assign(epoch)) saver.save(sess, "params/epoch{0}.ckpt".format(epoch)) batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32) voxels = sess.run(x_, feed_dict={z:batch_z}) v = voxels[0].reshape([32, 32, 32]) > 0 util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)
发布评论