-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdata flow.txt
86 lines (58 loc) · 3.23 KB
/
data flow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Local upload --> file export
Myfair --> --> data reciever --> data processing --> data sender --> myFAIR
JUNIPER --> JUNIPER
data reciever
from_local
from_myfair
from_juniper
data processing
Juniper PandasDataFrame Wrapper (can use this methods)
data_sender
to_local
to_myFAIR
to_juniper
packaging ?
visualisation first
make powerpoint of plots
upload data --> pandas reads in dataframe --> django send dataframe to doval api -->
doval api gets table element and web components --> django sends web componentns to template
class
Viral
Bacterial
1d data visualisation aproach:
check datatypes from pandas df ->
-> string data or bool data: make tuple of header and string -> count occurences of each string in dataset -> create bar chart of number of occurences.
-> integer or float date: count unique values -> calculate amount of bins to group data into (datapoints /10), minimal 10. -> group all datapoints into bins -> display bins as bar chart.
options: bar chart
2d data visualisation:
options: line plot, box and whiskers, scatter plot, bar, pie chart, venn?
3d data visualisation:
options: heatmap, scatter with histogram, 3d scatter, stacked bar, multi-line, combinations?
Hooking the notebook up to diferent sources:
can be used by a single user inside of jupyter
can be used by multi users inside of myfair
can be used by multiple users inside of galaxy
doval linked to seek:
- take id from seek with dataset
- export id to seek with dataset (seek api)
make powerpoint with test cases.
- show data formatting
- show data transformation
- show data filter
for user management:
create new session for id, no id is blank page
work on exporter functionality
gave feedback on canvasxpress github
visualisation tool: if someone shares his dataset by excel you don't know what happened to it. in this you can track what you did.
if someone wants to reproduce your r project he needs to install all your packages, in the right version number first then it can still be different. with this tool you can check your dataset without installing anything.
canvasx v 23.5 04/04. Author responded to questions fix to some problems coming in 23.6.
why is it better than existing tools?
canvas 23.6 07/04 Author fixed the reported problems.
Hierarchical pivot
Pivot tables are used to summarise datasets based on multiple parameters. While implementations of pivot tables do exist (eg. Pandas Pivot table functionality)
there is no existing implementation for a pivot table using a hierarchical structure for element summarisation.
We developed a pivot table using a hierarchical structure, for this we use the pandas dataframe as base and a list of features to pivot on.
The hierarchical pivot function will group the data using the pandas groupby functionality and sumerize the result.
The resulting object contains a object of class Multiindex which summerizes all the values in the dataset.
We loop through the values, count the total, and check if there is a next layer in the dataset. If the next layer exists the function calls itself and repeats until no next layer is found.
By doing this a hierarchical structure is created where each value will have multiple subvalues unless it is a value in the last level of the multiindex.