-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One error #12
Comments
I ran into the same problem when using a different LLM. The problem you are finding is related to equation (14) of the paper MEMIT, in my case the problem I had was that the "aggregate statistic I found two real solutions:
and then change the lines 196-199 to:
I hope it helps. Good luck! |
I found your response to be quite valuable. Thank you very much! |
I found your response to be quite valuable. Thank you very much!
丅
***@***.***
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2023年10月9日(星期一) 晚上11:22
收件人: ***@***.***>;
抄送: ***@***.***>; ***@***.***>;
主题: Re: [kmeng01/memit] One error (Issue #12)
I ran into the same problem when using a different LLM. The problem you are finding is related to equation (14) of the paper MEMIT, in my case the problem I had was that the "aggregate statistic $C_0$" had rows and columns with zeros, and even summing $K_1 K_1^T$ those rows were still zero. When you have any row of zeros, the matrix is "singular", which implies that you cannot compute its inverse. If you look at the construction of these matrices, the existence of zeros implies that there are coordinates in the hidden states unused. However, how can you solve it?
I found two real solutions:
Easy solution. Do not retrain the layers that are having these problems. If you go to "hparams/MEMIT/EleutherAI_gpt-j-6B.json" you'll see that the layers that are being trained are:
"layers": [ 3, 4, 5, 6, 7, 8 ],
Change the matrices that are causing these problems, if you look at the causal trace you will see that you have some freedom to choose between them.
Hard solution: Remove the rows/columns that are full of zeros, compute the inverse of the matrix, and add the rows/columns of zeros again. Note that here you will not be computing the inverse, since some columns will be zero, but it will be an approximation that do not add noise. The problem I experimented with this solution is that even removing the zero row/columns, there were still some "unimportant" coordinates that were raising the norm of my delta matrix, which is making me cautious.
To implement this go to memit_main and add these lines to the beggining:
`
def make_null_i(matrix, i):
new_matrix = matrix.clone()
new_matrix[:,i] = new_matrix[:,i]*0
new_matrix[i,:] = new_matrix[i,:]*0
return new_matrix
def identify_null_cols(matrix):
# Check if all elements in each row are zero
row_sums = matrix.clone().sum(dim=1)
zero_rows = torch.nonzero(row_sums == 0).squeeze()
return zero_rows.numel(), zero_rows.tolist()
def remove_column(matrix, i):
new_matrix = matrix.clone()
new_matrix = torch.cat((new_matrix[:i], new_matrix[i+1:]), dim=0)
new_matrix = torch.cat((new_matrix[:, :i], new_matrix[:, i+1:]), dim=1)
return new_matrix
def add_zero_column(matrix, i):
new_matrix = matrix.clone()
new_row = torch.zeros(1, matrix.shape[1], device=matrix.device, dtype=matrix.dtype)
new_col = torch.zeros(matrix.shape[0] + 1, 1, device=matrix.device, dtype=matrix.dtype)
new_matrix = torch.cat((new_matrix[:i], new_row, new_matrix[i:]), dim=0)
new_matrix = torch.cat((new_matrix[:, :i], new_col, new_matrix[:, i:]), dim=1)
return new_matrix
def compute_pseudoinverse_matrix(matrix):
n, ids = identify_null_cols(matrix)
print(f"There are {n} columns with zeros")
if n==0:
return torch.linalg.inv(matrix)
# Remove the zero columns that are causing our matrix to be singular
new_matrix = matrix.clone()
for id_ in ids[::-1]:
new_matrix = remove_column(new_matrix, id_)
# Computing inverse
new_matrix = torch.linalg.inv(new_matrix)
# Rescaling the matrix
for id_ in ids:
new_matrix = add_zero_column(new_matrix,id_)
return new_matrix
`
and then change the lines 196-199 to:
matrix = hparams.mom2_update_weight * ***@***.***_ks.T.detach().cpu() n_nul_cols,_ = identify_null_cols(matrix) if n_nul_cols != 0: adj_k = compute_pseudoinverse_matrix(matrix) @ layer_ks.detach().cpu() else: adj_k = torch.linalg.solve(matrix,layer_ks.detach().cpu())
3) Extra possible solution? Increment number of edits.
I hope it helps. Good luck!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I sincerely hope you could tell me how to handle this error?
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lyn/memit/experiments/evaluate.py", line 299, in
main(
File "/home/lyn/memit/experiments/evaluate.py", line 146, in main
edited_model, weights_copy = apply_algo(
File "/home/lyn/memit/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/home/lyn/memit/memit/memit_main.py", line 196, in execute_memit
adj_k = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: The diagonal element 2 is zero, the solve could not be completed because the input matrix is singular.
I just run with this command:
CUDA_VISIBLE_DEVICES=2 python3 -m experiments.evaluate --alg_name=MEMIT --model_name=/home/lyn/EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=1
The text was updated successfully, but these errors were encountered: