One error #12

mumuyeye · 2023-10-03T12:53:14Z

I sincerely hope you could tell me how to handle this error？
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lyn/memit/experiments/evaluate.py", line 299, in
main(
File "/home/lyn/memit/experiments/evaluate.py", line 146, in main
edited_model, weights_copy = apply_algo(
File "/home/lyn/memit/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/home/lyn/memit/memit/memit_main.py", line 196, in execute_memit
adj_k = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: The diagonal element 2 is zero, the solve could not be completed because the input matrix is singular.
I just run with this command:
CUDA_VISIBLE_DEVICES=2 python3 -m experiments.evaluate --alg_name=MEMIT --model_name=/home/lyn/EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=1

dtamayo-nlp · 2023-10-09T15:22:43Z

I ran into the same problem when using a different LLM. The problem you are finding is related to equation (14) of the paper MEMIT, in my case the problem I had was that the "aggregate statistic $C_0$" had rows and columns with zeros, and even summing $K_1 K_1^T$ those rows were still zero. When you have any row of zeros, the matrix is "singular", which implies that you cannot compute its inverse. If you look at the construction of these matrices, the existence of zeros implies that there are coordinates in the hidden states unused. However, how can you solve it?

I found two real solutions:

Easy solution. Do not retrain the layers that are having these problems. If you go to "hparams/MEMIT/EleutherAI_gpt-j-6B.json" you'll see that the layers that are being trained are:
"layers": [ 3, 4, 5, 6, 7, 8 ],
Change the matrices that are causing these problems, if you look at the causal trace you will see that you have some freedom to choose between them.
Hard solution: Remove the rows/columns that are full of zeros, compute the inverse of the matrix, and add the rows/columns of zeros again. Note that here you will not be computing the inverse, since some columns will be zero, but it will be an approximation that do not add noise. The problem I experimented with this solution is that even removing the zero row/columns, there were still some "unimportant" coordinates that were raising the norm of my delta matrix, which is making me cautious.
To implement this go to memit_main and add these lines to the beginning:

def make_null_i(matrix, i):
    new_matrix = matrix.clone()
    new_matrix[:,i] = new_matrix[:,i]*0
    new_matrix[i,:] = new_matrix[i,:]*0
    return new_matrix
def identify_null_cols(matrix):
    row_sums = matrix.clone().sum(dim=1)
    zero_rows = torch.nonzero(row_sums == 0).squeeze()
    return zero_rows.numel(), zero_rows.tolist()

def remove_column(matrix, i):
    new_matrix = matrix.clone()
    new_matrix = torch.cat((new_matrix[:i], new_matrix[i+1:]), dim=0)
    new_matrix = torch.cat((new_matrix[:, :i], new_matrix[:, i+1:]), dim=1)
    return new_matrix

def add_zero_column(matrix, i):
    new_matrix = matrix.clone()
    new_row = torch.zeros(1, matrix.shape[1], device=matrix.device, dtype=matrix.dtype)
    new_col = torch.zeros(matrix.shape[0] + 1, 1, device=matrix.device, dtype=matrix.dtype)
    new_matrix = torch.cat((new_matrix[:i], new_row, new_matrix[i:]), dim=0)
    new_matrix = torch.cat((new_matrix[:, :i], new_col, new_matrix[:, i:]), dim=1)
    return new_matrix

def compute_pseudoinverse_matrix(matrix):
    n, ids = identify_null_cols(matrix)
    print(f"There are {n} columns with zeros")
    if n==0:
        return torch.linalg.inv(matrix)
    # Remove the zero columns that are causing our matrix to be singular
    new_matrix = matrix.clone()
    for id_ in ids[::-1]:
        new_matrix = remove_column(new_matrix, id_)
    # Computing inverse
    new_matrix = torch.linalg.inv(new_matrix)
    # Rescaling the matrix
    for id_ in ids:
        new_matrix = add_zero_column(new_matrix,id_)
    return new_matrix

and then change the lines 196-199 to:

matrix = hparams.mom2_update_weight * cov.double().detach().cpu()+layer_ks.detach().cpu()@layer_ks.T.detach().cpu()
n_nul_cols,_ = identify_null_cols(matrix)
if n_nul_cols != 0:
    adj_k = compute_pseudoinverse_matrix(matrix) @ layer_ks.detach().cpu()
else:
    adj_k = torch.linalg.solve(matrix,layer_ks.detach().cpu())

Extra possible solution? Increment number of edits.

I hope it helps. Good luck!

mumuyeye · 2023-10-09T15:31:56Z

I found your response to be quite valuable. Thank you very much！

mumuyeye · 2023-10-09T15:36:19Z

I found your response to be quite valuable. Thank you very much！   丅 ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: ***@***.***>; 发送时间: 2023年10月9日(星期一) 晚上11:22 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [kmeng01/memit] One error (Issue #12) I ran into the same problem when using a different LLM. The problem you are finding is related to equation (14) of the paper MEMIT, in my case the problem I had was that the "aggregate statistic $C_0$" had rows and columns with zeros, and even summing $K_1 K_1^T$ those rows were still zero. When you have any row of zeros, the matrix is "singular", which implies that you cannot compute its inverse. If you look at the construction of these matrices, the existence of zeros implies that there are coordinates in the hidden states unused. However, how can you solve it? I found two real solutions: Easy solution. Do not retrain the layers that are having these problems. If you go to "hparams/MEMIT/EleutherAI_gpt-j-6B.json" you'll see that the layers that are being trained are: "layers": [ 3, 4, 5, 6, 7, 8 ], Change the matrices that are causing these problems, if you look at the causal trace you will see that you have some freedom to choose between them. Hard solution: Remove the rows/columns that are full of zeros, compute the inverse of the matrix, and add the rows/columns of zeros again. Note that here you will not be computing the inverse, since some columns will be zero, but it will be an approximation that do not add noise. The problem I experimented with this solution is that even removing the zero row/columns, there were still some "unimportant" coordinates that were raising the norm of my delta matrix, which is making me cautious. To implement this go to memit_main and add these lines to the beggining: ` def make_null_i(matrix, i): new_matrix = matrix.clone() new_matrix[:,i] = new_matrix[:,i]*0 new_matrix[i,:] = new_matrix[i,:]*0 return new_matrix def identify_null_cols(matrix): # Check if all elements in each row are zero row_sums = matrix.clone().sum(dim=1) zero_rows = torch.nonzero(row_sums == 0).squeeze() return zero_rows.numel(), zero_rows.tolist() def remove_column(matrix, i): new_matrix = matrix.clone() new_matrix = torch.cat((new_matrix[:i], new_matrix[i+1:]), dim=0) new_matrix = torch.cat((new_matrix[:, :i], new_matrix[:, i+1:]), dim=1) return new_matrix def add_zero_column(matrix, i): new_matrix = matrix.clone() new_row = torch.zeros(1, matrix.shape[1], device=matrix.device, dtype=matrix.dtype) new_col = torch.zeros(matrix.shape[0] + 1, 1, device=matrix.device, dtype=matrix.dtype) new_matrix = torch.cat((new_matrix[:i], new_row, new_matrix[i:]), dim=0) new_matrix = torch.cat((new_matrix[:, :i], new_col, new_matrix[:, i:]), dim=1) return new_matrix def compute_pseudoinverse_matrix(matrix): n, ids = identify_null_cols(matrix) print(f"There are {n} columns with zeros") if n==0: return torch.linalg.inv(matrix) # Remove the zero columns that are causing our matrix to be singular new_matrix = matrix.clone() for id_ in ids[::-1]: new_matrix = remove_column(new_matrix, id_) # Computing inverse new_matrix = torch.linalg.inv(new_matrix) # Rescaling the matrix for id_ in ids: new_matrix = add_zero_column(new_matrix,id_) return new_matrix ` and then change the lines 196-199 to: matrix = hparams.mom2_update_weight * ***@***.***_ks.T.detach().cpu() n_nul_cols,_ = identify_null_cols(matrix) if n_nul_cols != 0: adj_k = compute_pseudoinverse_matrix(matrix) @ layer_ks.detach().cpu() else: adj_k = torch.linalg.solve(matrix,layer_ks.detach().cpu()) 3) Extra possible solution? Increment number of edits. I hope it helps. Good luck! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One error #12

One error #12

mumuyeye commented Oct 3, 2023

dtamayo-nlp commented Oct 9, 2023 •

edited

Loading

mumuyeye commented Oct 9, 2023

mumuyeye commented Oct 9, 2023 via email

One error #12

One error #12

Comments

mumuyeye commented Oct 3, 2023

dtamayo-nlp commented Oct 9, 2023 • edited Loading

mumuyeye commented Oct 9, 2023

mumuyeye commented Oct 9, 2023 via email

dtamayo-nlp commented Oct 9, 2023 •

edited

Loading