Skip to content
John Malloy edited this page Dec 8, 2021 · 3 revisions

assemblyCalcs_percentiles.py

Calculates assembly values of SureChemBL compounds which are in the top/bottom percentiles. Percentiles are based off:

  1. Attachment value
  2. Change in attachment value

The code used to calculate percentiles can be found in XX. The data used is in Data/Cpd_Data/

assemblyCalcs_samples.py

Calculates assembly values from a pool of sampled SureChemBL compounds. These compunds are currently sampled from:

  1. Compounds added to the database at a given month, which were not previously found in the database ("new compounds")
  2. All compounds present in the database at a given month

The data used is in Data/Assembly_Values/, the samples were generated in cpd_analysis.py

build_network.py

Builds a bipartite, undirected network of the SureChemBL database using iGraph.

Data used is in Data/SureChemblMAP/, Data/CpdPatentIdsDates/, and Data/Cpd_Data/

cpd_analysis.py

Calculates statistics over compound data, track compounds, and samples compounds. Does not utilize the network, rather uses a dataframe of all SureChemBL compounds in Cpd_Data/SureChemBL_allCpds.p

get_bipartite_network_data.py

Builds monthly subgraphs and calculates network statistics - average degree, degree distribution, clustering coefficient, etc... - across compounds & patents within these subgraphs.

Subgraphs are saved in Graphs/, degree distributions are in Degrees/Months/, network statistics are in NetworkStats/

get_cpd_network_data.py

Calculates preferential attachment across time periods witin SureChemBL dataset.

Data used is in Degree/Months/, and calculations are saved in pref_attach_dict*` in base Data directory.