-
Notifications
You must be signed in to change notification settings - Fork 1
Code
Calculates assembly values of SureChemBL compounds which are in the top/bottom percentiles. Percentiles are based off:
- Attachment value
- Change in attachment value
The code used to calculate percentiles can be found in XX. The data used is in Data/Cpd_Data/
Calculates assembly values from a pool of sampled SureChemBL compounds. These compunds are currently sampled from:
- Compounds added to the database at a given month, which were not previously found in the database ("new compounds")
- All compounds present in the database at a given month
The data used is in Data/Assembly_Values/
, the samples were generated in cpd_analysis.py
Builds a bipartite, undirected network of the SureChemBL database using iGraph.
Data used is in Data/SureChemblMAP/
, Data/CpdPatentIdsDates/
, and Data/Cpd_Data/
Calculates statistics over compound data, track compounds, and samples compounds. Does not utilize the network, rather uses a dataframe of all
SureChemBL compounds in Cpd_Data/SureChemBL_allCpds.p
Builds monthly subgraphs and calculates network statistics - average degree, degree distribution, clustering coefficient, etc... - across compounds & patents within these subgraphs.
Subgraphs are saved in Graphs/
, degree distributions are in Degrees/Months/
, network statistics are in NetworkStats/
Calculates preferential attachment across time periods witin SureChemBL dataset.
Data used is in Degree/Months/, and calculations are saved in
pref_attach_dict*` in base Data directory.