Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

causal sites #53

Closed
GertjanBisschop opened this issue Jul 14, 2023 · 6 comments
Closed

causal sites #53

GertjanBisschop opened this issue Jul 14, 2023 · 6 comments

Comments

@GertjanBisschop
Copy link
Member

The current API assumes we provide num_causal_sites. It might make more sense to turn this into causal_sites and allow both integer values as well as numpy arrays. The case of an integer is the currently implemented behaviour. In case of an array, these should all be sites contained in the sites table. We would then no longer randomly pick sites.

Alternatively, it might make more sense to provide an example in the documentation on how to mask certain regions of the genome with tree sequences, in case we have knowledge of coding vs non-coding regions for example.

@daikitag
Copy link
Collaborator

Thanks for your comment, and I think it sounds like a great idea. I will try modifying the codes and post a pull request about it really soon.

@GertjanBisschop
Copy link
Member Author

Just a suggestion to make sure the API is future proof, even when we have not implemented all possibilities just yet.

@daikitag
Copy link
Collaborator

Instead of changing num_causal to causal_sites, I'm actually thinking about adding a new argument, causal_id = None as one of the arguments in addition to num_causal, but what do you think about it? I will try implementing the changes after #52 gets processed.

@GertjanBisschop
Copy link
Member Author

That would make sense. The main goal here is I think is to add a layer of realism to the simulations by specifying regions of the genome on which we might have prior knowledge. The obvious thing being coding regions. The simplest way would probably be the way you propose using a list of causal_id. This list might contain 100 000 elements from which we then randomly select num_causal_sites. This would require preprocessing by the user. Another option would therefore be to specify a list of lists containing ranges for which to include variants. We should probably have a look at how this is done in stdpopsim.

@jeromekelleher
Copy link
Member

I think providing the option to specify the actual causal sites is the most flexible thing, that way users can do arbitrarily complex things to choose them. There's no point in getting into choosing randomly from some other set, since that's trivial to do using numpy anyway.

@daikitag
Copy link
Collaborator

I just made a modification to the sim_trait() function to incorporate this change. Would it be possible for you to review it whenever you have some time?
#55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants