Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some adaptations to better support new research studies #7

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
268eef9
Some adaptations to better support new research studies
mellelieuwes Apr 11, 2022
b986614
rename whatsapp folder
ShNadi May 24, 2022
92d4655
Initial commit
ShNadi May 24, 2022
97930ea
initial commit
ShNadi May 24, 2022
476e515
Rename test_whatsapp to test_whatsapp_chat
ShNadi May 24, 2022
0852f27
Initial commit
ShNadi May 24, 2022
e62987b
Initial commit
ShNadi May 24, 2022
598406c
Add test data
ShNadi May 24, 2022
2dae4f4
Remove main
ShNadi May 24, 2022
99b3442
Remove main
ShNadi May 24, 2022
d829bda
factorizing the sensetive data instead of hashing
ShNadi May 31, 2022
6fdde5b
Factorize the sensetive data instead of hashing
ShNadi May 31, 2022
cfbc84b
Factorize username, reply_2, and rely_2_user columns instead of hashing
ShNadi May 31, 2022
5840ec5
Resolve conflict
ShNadi May 31, 2022
254fd8d
Factorize sensetive columns
ShNadi May 31, 2022
6a1aaa2
Add missed return to the anonymize_participants()
ShNadi May 31, 2022
43c0808
Rename imports
ShNadi May 31, 2022
ede47cc
Rename anonymized columns
ShNadi May 31, 2022
3d15d55
Merge pull request #11 from sodascience/v2
mellelieuwes Jun 14, 2022
c3e2bcd
Add first/last message date
ShNadi Jun 15, 2022
4d65e0d
Add first/last message date
ShNadi Jun 15, 2022
9fe508a
Merge pull request #1 from sodascience/v3
parisa-zahedi Jun 21, 2022
dbf50ae
change output format
parisa-zahedi Jun 30, 2022
9a17cd7
Remove group's name
ShNadi Jul 1, 2022
435e111
adapt to new output format
ShNadi Jul 1, 2022
eb6a596
Replace set with unique
ShNadi Jul 1, 2022
5014770
remove system messages
parisa-zahedi Jul 3, 2022
986fc08
add reply_2_user and user_reply_2
parisa-zahedi Jul 7, 2022
faedf24
Add reply_2
ShNadi Jul 7, 2022
ffe7e53
Add test for new output format, reply2, and remove group_name
ShNadi Jul 7, 2022
2ea702b
Add groupname
ShNadi Jul 7, 2022
3f934fa
Add groupname
ShNadi Jul 7, 2022
a9234d2
change format
ShNadi Jul 7, 2022
f169f16
change format
ShNadi Jul 7, 2022
da3c616
Remove main()
ShNadi Jul 7, 2022
1c75497
pylint check
ShNadi Jul 7, 2022
3e0b9ce
run poetry
ShNadi Jul 7, 2022
7779e87
Merge pull request #3 from sodascience/v4
parisa-zahedi Jul 7, 2022
43b8c12
rename reply_2user and user_reply2 fields
parisa-zahedi Jul 8, 2022
0bdc58b
add group_name and system message logs
parisa-zahedi Jul 8, 2022
117673a
Merge pull request #4 from sodascience/v2
parisa-zahedi Jul 8, 2022
4cdd582
Merge pull request #5 from sodascience/v2
parisa-zahedi Jul 8, 2022
b8fa592
Merge pull request #14 from sodascience/v4
mellelieuwes Jul 11, 2022
ccc6320
extend media, location and url regex
parisa-zahedi Jul 11, 2022
41c48c8
fix pylint error in regex
parisa-zahedi Jul 11, 2022
b9fec5c
Merge pull request #16 from sodascience/newformat
mellelieuwes Jul 11, 2022
21de258
Merge pull request #18 from sodascience/media_added
mellelieuwes Jul 11, 2022
0a1bac5
Fix pylint errors
mellelieuwes Jul 11, 2022
9587167
Update pilot version (#21)
ShNadi Aug 28, 2022
598cf66
Fixed support for filename matching (#20)
mellelieuwes Aug 28, 2022
3cb1ab2
Pilot- fix dash (#22)
parisa-zahedi Sep 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
17 changes: 0 additions & 17 deletions data_extractor/data_extractor/__init__.py

This file was deleted.

20 changes: 20 additions & 0 deletions data_extractor/example/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
__version__ = '0.2.0'

import zipfile
import pandas as pd


def process(file_data):
names = []
zfile = zipfile.ZipFile(file_data)
data = []
for name in zfile.namelist():
names.append(name)
info = zfile.getinfo(name)
data.append((name, info.compress_size, info.file_size))

return [{
"id": "overview",
"title": "The following files where read:",
"data_frame": pd.DataFrame(data, columns=["filename", "compressed size", "size"])
}]
11 changes: 11 additions & 0 deletions data_extractor/example/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<script src="../pyprocess.js" data-script="./example/__init__.py"></script>
<div id="controls">
<p id="loading-indicator">Loading...</p>
<input type="file" id="fileItem" disabled onChange="toggleProcessButton()" />
<button onClick="process()" id="process" disabled>Process</button>
</div>

<div id="results" style="display: none">
<p id="summary" />
<div id="html" />
</div>
11 changes: 11 additions & 0 deletions data_extractor/google_search_history/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<script src="../pyprocess.js" data-script="./google_search_history/__init__.py"></script>
<div id="controls">
<p id="loading-indicator">Loading...</p>
<input type="file" id="fileItem" disabled onChange="toggleProcessButton()" />
<button onClick="process()" id="process" disabled>Process</button>
</div>

<div id="results" style="display: none">
<p id="summary" />
<div id="html" />
</div>
10 changes: 5 additions & 5 deletions data_extractor/google_search_history/simulation_gsh.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,16 +157,16 @@ def browserhistory(num: int, site_diff: float, time_diff: bool,
parts = _create_bins(num)
# create browserhistory data
results = []
for moment in PERIODS:
for moment, period in PERIODS.items():
# simulate dates
if moment == 'during':
perc = 0.15+site_diff
dates = _create_date(num=parts[moment], start=PERIODS[moment][0],
end=PERIODS[moment][1], time_perc=time_diff)
dates = _create_date(num=parts[moment], start=period[0],
end=period[1], time_perc=time_diff)
else:
perc = 0.15
dates = _create_date(num=parts[moment], start=PERIODS[moment][0],
end=PERIODS[moment][1], time_perc=0)
dates = _create_date(num=parts[moment], start=period[0],
end=period[1], time_perc=0)
# simulate website URLs
url = _create_website(num=parts[moment], perc=perc, fake=fake)
for i in range(parts[moment]):
Expand Down
11 changes: 11 additions & 0 deletions data_extractor/google_semantic_location_history/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<script src="../pyprocess.js" data-script="./google_semantic_location_history/__init__.py"></script>
<div id="controls">
<p id="loading-indicator">Loading...</p>
<input type="file" id="fileItem" disabled onChange="toggleProcessButton()" />
<button onClick="process()" id="process" disabled>Process</button>
</div>

<div id="results" style="display: none">
<p id="summary" />
<div id="html" />
</div>
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def fake_data(json_file, seed=0):
places = _create_places(total=max(NPLACES.values()))

# Get json schema from json file
with open(json_file) as file_object:
with open(json_file, encoding="utf-8") as file_object:
json_data = json.load(file_object)
json_schema = get_json_schema(json_data)

Expand Down
Loading