Skip to content

Commit

Permalink
Merge pull request #135 from jeromekelleher/0.1.3-final-changes
Browse files Browse the repository at this point in the history
Minor docs updates.
  • Loading branch information
jeromekelleher authored Nov 2, 2018
2 parents 282d095 + 2f01108 commit 7bacd34
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 17 deletions.
8 changes: 4 additions & 4 deletions docs/inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,12 +126,12 @@ file format, we provide a simple :ref:`Python API <sec_api_file_formats>`
to allow the user to efficiently construct it from their own data.
An example of how to use this API is given in the :ref:`sec_tutorial`.

We do not provide an automatic means of important data from a VCF
intentionally, as we believe that this would be extremely difficult to do.
We do not provide an automatic means of importing data from VCF (or any
other format) intentionally, as we believe that this would be extremely difficult to do.
As there is no universally accepted way of encoding ancestral state
information in VCF, in practise the user would most often have to write
a new VCF file with ancestral state and metadata information in the form
that we require. Thus, it is more efficient to skip this intermediate
a new VCF file with ancestral state and metadata information in a specific
form that we would require. Thus, it is more efficient to skip this intermediate
step and to directly produce a :ref:`format <sec_file_formats_samples>`
that is both compact and very efficient to process.

Expand Down
17 changes: 7 additions & 10 deletions docs/simulation-example.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,19 @@
sys.path.insert(0, os.path.abspath('..'))
import tsinfer

if False:
if True:
ts = msprime.simulate(
sample_size=10000, Ne=10**4, recombination_rate=1e-8, mutation_rate=1e-8,
length=10*10**6, random_seed=42)
ts.dump("simulation-source.trees")
print("Simulation done:", ts.num_trees, "trees and", ts.num_sites)

progress = tqdm.tqdm(total=ts.num_sites)
sample_data = tsinfer.SampleData.initialise(
num_samples=ts.num_samples, sequence_length=ts.sequence_length,
path="simulation.samples", num_flush_threads=2)
for var in ts.variants():
sample_data.add_site(var.site.position, var.alleles, var.genotypes)
progress.update()
progress.close()
sample_data.finalise()
with tsinfer.SampleData(
sequence_length=ts.sequence_length, path="simulation.samples",
num_flush_threads=2) as samples:
for var in tqdm.tqdm(ts.variants(), total=ts.num_sites):
samples.add_site(var.site.position, var.genotypes, var.alleles)

else:
source = msprime.load("simulation-source.trees")
inferred = msprime.load("simulation.trees")
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ scope of this manual. Assuming that we know the ancestral state, we can then imp
import tsinfer
with tsinfer.SampleData() as sample_data:
with tsinfer.SampleData(sequence_length=6) as sample_data:
sample_data.add_site(0, [0, 1, 0, 0, 0], ["A", "T"])
sample_data.add_site(1, [0, 0, 0, 1, 1], ["G", "C"])
sample_data.add_site(2, [0, 1, 1, 0, 0], ["C", "A"])
Expand Down
4 changes: 2 additions & 2 deletions tsinfer/formats.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,8 +676,8 @@ class SampleData(DataContainer):
with tsinfer.SampleData(path="mydata.samples") as sample_data:
# Define populations
sample_data.population(metadata={"name": "CEU"})
sample_data.population(metadata={"name": "YRI"})
sample_data.add_population(metadata={"name": "CEU"})
sample_data.add_population(metadata={"name": "YRI"})
# Define individuals
sample_data.add_individual(
ploidy=2, population=0, metadata={"name": "NA12878"})
Expand Down

0 comments on commit 7bacd34

Please sign in to comment.