causal sites #53

GertjanBisschop · 2023-07-14T16:53:56Z

The current API assumes we provide num_causal_sites. It might make more sense to turn this into causal_sites and allow both integer values as well as numpy arrays. The case of an integer is the currently implemented behaviour. In case of an array, these should all be sites contained in the sites table. We would then no longer randomly pick sites.

Alternatively, it might make more sense to provide an example in the documentation on how to mask certain regions of the genome with tree sequences, in case we have knowledge of coding vs non-coding regions for example.

The text was updated successfully, but these errors were encountered:

daikitag · 2023-07-14T16:59:59Z

Thanks for your comment, and I think it sounds like a great idea. I will try modifying the codes and post a pull request about it really soon.

GertjanBisschop · 2023-07-14T19:52:59Z

Just a suggestion to make sure the API is future proof, even when we have not implemented all possibilities just yet.

daikitag · 2023-07-15T03:11:32Z

Instead of changing num_causal to causal_sites, I'm actually thinking about adding a new argument, causal_id = None as one of the arguments in addition to num_causal, but what do you think about it? I will try implementing the changes after #52 gets processed.

GertjanBisschop · 2023-07-15T09:21:28Z

That would make sense. The main goal here is I think is to add a layer of realism to the simulations by specifying regions of the genome on which we might have prior knowledge. The obvious thing being coding regions. The simplest way would probably be the way you propose using a list of causal_id. This list might contain 100 000 elements from which we then randomly select num_causal_sites. This would require preprocessing by the user. Another option would therefore be to specify a list of lists containing ranges for which to include variants. We should probably have a look at how this is done in stdpopsim.

jeromekelleher · 2023-07-15T10:13:09Z

I think providing the option to specify the actual causal sites is the most flexible thing, that way users can do arbitrarily complex things to choose them. There's no point in getting into choosing randomly from some other set, since that's trivial to do using numpy anyway.

daikitag · 2023-07-15T11:23:10Z

I just made a modification to the sim_trait() function to incorporate this change. Would it be possible for you to review it whenever you have some time?
#55

daikitag mentioned this issue Jul 15, 2023

Causal site #55

Closed

jeromekelleher mentioned this issue Aug 30, 2023

Causal sites #24

Closed

GertjanBisschop closed this as completed Sep 26, 2023

jeromekelleher mentioned this issue Nov 22, 2023

Specify particular causal sites #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

causal sites #53

causal sites #53

GertjanBisschop commented Jul 14, 2023

daikitag commented Jul 14, 2023

GertjanBisschop commented Jul 14, 2023

daikitag commented Jul 15, 2023

GertjanBisschop commented Jul 15, 2023

jeromekelleher commented Jul 15, 2023

daikitag commented Jul 15, 2023

causal sites #53

causal sites #53

Comments

GertjanBisschop commented Jul 14, 2023

daikitag commented Jul 14, 2023

GertjanBisschop commented Jul 14, 2023

daikitag commented Jul 15, 2023

GertjanBisschop commented Jul 15, 2023

jeromekelleher commented Jul 15, 2023

daikitag commented Jul 15, 2023