You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using t40.yml from the branch on github, using network 40800 (tried others, same result). There are no games in the input_test / train directories, as I'm just trying to generate the model at this point. If I pass in a t30 network with the t40.yml I get much further (scroll down)
dataset:
input_test: D:\\Chess\\Training\\lczero-training\\games\test\b001
input_train: D:\\Chess\\Training\\lczero-training\\games\\train\b001
num_chunks: 500000
train_ratio: 0.9
gpu: 0
model:
filters: 256
policy_channels: 80
residual_blocks: 20
se_ratio: 8
name: 256x20-t40
training:
batch_size: 4096
checkpoint_steps: 10000
lr_boundaries:
- 100
lr_values:
- 0.02
- 0.02
max_grad_norm: 2
num_batch_splits: 8
path: D:\\Chess\\Training\\lczero-training\\networks
policy_loss_weight: 1.0
shuffle_size: 500000
swa: true
swa_max_n: 10
swa_steps: 25
test_steps: 125
total_steps: 250
train_avg_report_steps: 25
value_loss_weight: 1.0
warmup_steps: 125
Traceback (most recent call last):
File "net_to_model.py", line 25, in <module>
raise ValueError("Number of filters in YAML doesn't match the network")
ValueError: Number of filters in YAML doesn't match the network
YAML for reference
%YAML 1.2
---
name: '256x20-t40' # ideally no spaces
gpu: 0 # gpu id to process on
dataset:
num_chunks: 500000 # newest nof chunks to parse
train_ratio: 0.90 # trainingset ratio
# For separated test and train data.
input_train: 'D:\\Chess\\Training\\lczero-training\\games\\train\b001' # supports glob
input_test: 'D:\\Chess\\Training\\lczero-training\\games\test\b001' # supports glob
# For a one-shot run with all data in one directory.
#input: '/work/lc0/data/'
training:
swa: true
swa_steps: 25
swa_max_n: 10
max_grad_norm: 2
batch_size: 4096 # training batch
num_batch_splits: 8
test_steps: 125 # eval test set values after this many steps
train_avg_report_steps: 25 # training reports its average values after this many steps.
total_steps: 250 # terminate after these steps
warmup_steps: 125
checkpoint_steps: 10000 # optional frequency for checkpointing before finish
shuffle_size: 500000 # size of the shuffle buffer
lr_values: # list of learning rates
- 0.02
- 0.02
lr_boundaries: # list of boundaries
- 100
policy_loss_weight: 1.0 # weight of policy loss
value_loss_weight: 1.0 # weight of value loss
path: 'D:\\Chess\\Training\\lczero-training\\networks' # network storage dir
model:
filters: 256
residual_blocks: 20
se_ratio: 8
policy_channels: 80
...
dataset:
input_test: D:\\Chess\\Training\\lczero-training\\games\test\\b001
input_train: D:\\Chess\\Training\\lczero-training\\games\\train\\b001
num_chunks: 500000
train_ratio: 0.9
gpu: 0
model:
filters: 256
policy_channels: 80
residual_blocks: 20
se_ratio: 8
name: 256x20-t40
training:
batch_size: 4096
checkpoint_steps: 10000
lr_boundaries:
- 100
lr_values:
- 0.02
- 0.02
max_grad_norm: 2
num_batch_splits: 8
path: D:\\Chess\\Training\\lczero-training\\networks
policy_loss_weight: 1.0
shuffle_size: 500000
swa: true
swa_max_n: 10
swa_steps: 25
test_steps: 125
total_steps: 250
train_avg_report_steps: 25
value_loss_weight: 1.0
warmup_steps: 125
2019-02-16 13:31:07.643530: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2019-02-16 13:31:07.918932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2019-02-16 13:31:07.927027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-16 13:31:08.403139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-16 13:31:08.407630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-16 13:31:08.410846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-16 13:31:08.414238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From D:\Chess\Training\lczero-training\tf\tfprocess.py:144: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
Traceback (most recent call last):
File "net_to_model.py", line 39, in <module>
tfp.replace_weights(weights)
File "D:\Chess\Training\lczero-training\tf\tfprocess.py", line 302, in replace_weights
new_weight = tf.constant(new_weights[e], shape=weights.shape)
File "D:\Users\brandon\Miniconda3\lib\site-packages\tensorflow\python\framework\constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "D:\Users\brandon\Miniconda3\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 497, in make_tensor_proto
(shape_size, nparray.size))
ValueError: Too many elements provided. Needed at most 256, but received 589824
The text was updated successfully, but these errors were encountered:
The newest code has some backwards incompatible changes and T30 nets can't be trained with the current code. You need to use code from before the SE commit.
For restoring T40 net you need to specify policy: classical and value: classical in yaml, since defaults are now changed to convolutional policy and WDL value head.
The newest code has some backwards incompatible changes and T30 nets can't be trained with the current code. You need to use code from before the SE commit.
For restoring T40 net you need to specify policy: classical and value: classical in yaml, since defaults are now changed to convolutional policy and WDL value head.
Thank you, super helpful! Is there a specific branch for pre-SE or any idea when that commit was? I was poking around trying to find code before / after se_ratio: showed up but was still seeing some errors when I thought I found the right code version..
Using t40.yml from the branch on github, using network 40800 (tried others, same result). There are no games in the input_test / train directories, as I'm just trying to generate the model at this point. If I pass in a t30 network with the t40.yml I get much further (scroll down)
input
(base) D:\Chess\Training\lczero-training\tf>python net_to_model.py --cfg=../configs/t40.yml 40800
output
YAML for reference
T30 attempt
output
The text was updated successfully, but these errors were encountered: