Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill out most of ARG tute #199

Merged
merged 3 commits into from
Nov 16, 2022
Merged

Fill out most of ARG tute #199

merged 3 commits into from
Nov 16, 2022

Conversation

hyanwong
Copy link
Member

@hyanwong hyanwong commented Nov 10, 2022

Starts to address #43. Perhaps @a-ignatieva would like to take a look and modify etc?

@hyanwong hyanwong force-pushed the ARG-tute branch 2 times, most recently from cbc37c2 to 6244725 Compare November 11, 2022 10:05
@a-ignatieva
Copy link

This looks great Yan! Yes I'll have a go through (next week).

@hyanwong
Copy link
Member Author

hyanwong commented Nov 11, 2022

This looks great Yan! Yes I'll have a go through (next week).

Thanks. It would be nice to show how KwARG can also be used to make a reasonable-looking ARG. I'm not sure how easy it would be to actually run KwARG in the tutorial notebook, but we could save an output file in /data. I wonder if we should implement a from_kwarg importer in tsconvert? Maybe your argutils.convert_kwarg function from https://github.com/tskit-dev/what-is-an-arg-paper/blob/926edb17484e8687539fc5d28a944fa65b6285ed/argutils/convert.py#L98 could be simply ported to tsconvert? Are there unit tests etc for that (which is always the tedious bit, TBH!). And is the KwARG output format sufficiently stable to make that worth maintaining?

Edit - I opened an issue at tskit-dev/tsconvert#44

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, spotted a couple of small things

args.md Outdated Show resolved Hide resolved
args.md Outdated Show resolved Hide resolved
@hyanwong hyanwong marked this pull request as ready for review November 15, 2022 16:35
@hyanwong
Copy link
Member Author

Looks great, spotted a couple of small things

Great, thanks for that. A search and replace found another "the the" too! So I corrected that as well.

Co-authored-by: Jerome Kelleher <[email protected]>
@hyanwong
Copy link
Member Author

Comments in leospeidel/relate_lib#3 (comment) make me thing that I need to be a bit more careful with the use of the word ARG in this tutorial, to allow for the terminology that a simplified tree sequence can still be (loosely) described as an ARG.

Are we happy with the term "full ARG" to mean "with recombination and CA-non-coalescent" nodes. Or do we need another adjective instead of "full"?

@jeromekelleher
Copy link
Member

"full" or "complete" seems reasonable.

@hyanwong
Copy link
Member Author

"full" or "complete" seems reasonable.

Thanks. I think "full" is best then: no change of terminology, and doesn't quite imply completeness in the big ARG sense. Let's standardise on that (and also check we are consistent in the ARG paper prose)

@jeromekelleher
Copy link
Member

Ready to merge then?

@hyanwong
Copy link
Member Author

Ready to merge then?

I think so, although @a-ignatieva said she would give it a look over. Happy to merge now and update when she's looked, or wait until she gives feedback.

@jeromekelleher jeromekelleher merged commit 0861413 into tskit-dev:main Nov 16, 2022
@a-ignatieva
Copy link

This looks great and reads very well to me @hyanwong! Just a couple of small suggestions.

  • For the figure in the "unary tree nodes" section, might it be better to use time_scale="rank" or something so the node times don't look like they overlap?
  • You already make this point, but would it be worth stating more explicitly at the start of the section that the tskit format can record full ARGs, without losing any information? So maybe instead of

We call these “full ARGs”, and this tutorial aims to show you how tskit can be used to store and analyse them

something like

These "full ARGs" can be efficiently encoded using tskit without losing any information, and the network form of the full ARG can be visualised using its tree sequence encoding. In this tutorial, we show you how tskit can be used to store and analyse full ARGs, and precisely what information is lost when the corresponding tree sequence is fully simplified.

@hyanwong
Copy link
Member Author

Thanks @a-ignatieva. I still need to incorporate your comments in #43 (comment) into the tutorial.

I'm reworking it a bit so can incorporate your comments:

For the figure in the "unary tree nodes" section, might it be better to use time_scale="rank" or something so the node times don't look like they overlap?

The "rank" time scale in this example makes the mutations times overlap. I think maybe we just find another random see that gives nice example. E.g. 3128 which gives this:

Screenshot 2022-11-20 at 19 40 18

You already make this point, but would it be worth stating more explicitly at the start of the section that the tskit format can record full ARGs, without losing any information?

Good plan. I'll add something like your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants