Mining in Social Networks
Hoje em dia as redes sociais possuem um papel muito relevante da difusão da informação. Os seus utilizadores estão constantemente a fazer publicações sobre os mais variados assuntos desde trivialidades e acontecimentos do dia a dia, a assuntos de maior relevância como política e ciência. A circulação desta informação tem vindo a aumentar exponencialmente, assim como a complexa rede envolvida na propagação desta informação e como tal várias áreas de estudo estão a dedicar-se a resolução de problemas relacionados com este tema. Mais recentemente a temática das "fake news", noticias falsas como o nome indica, tornou-se um tópico mediático, fazendo a sua resolução um problema de grande interesse.
https://detiuaveiro.github.io/social-network-mining/
Web app developed using the template: https://coreui.io/react/
To run, first install dependencies:
$ cd web-app
$ npm install
Then use the command to start the web app on port 3000:
$ npm start
- instaloader
pip3 install -r requirements.txt
python3 insta.py
Arch Linux | Debian |
---|---|
sudo pacman -S postgresql |
sudo apt update; sudo apt install postgresql postgresql-contrib |
|
$ sudo mkdir /var/lib/postgres/data
$ sudo chown postgres /var/lib/postgres/data
$ sudo -i -u postgres
$ initdb -D '/var/lib/postgres/data'
$ sudo systemctl start postgresql
$ sudo su postgres -c psql
# CREATE USER postgres WITH PASSWORD 'password';
# ALTER ROLE postgres WITH CREATEDB;
# CREATE DATABASE policies;
# CREATE DATABASE postgres;
Arch Linux | Debian |
---|---|
yay mongo |
tutorial |
$ sudo systemctl enable mongodb
$ mongo
Arch Linux | Debian |
---|---|
tutorial | tutorial |
- On Debian:
$ sudo systemctl enable neo4j
- On Arch Linux:
$ docker run neo4j
$ neo4j console
$ cypher-shell # to set new password
- Instalation and setting:
$ sudo apt-get install tor # instalation on Debian systems
$ sudo pacman -S tor # instalation on Arch systems
$ sudo systemctl enable tor # on the deployment server is recomended to enable the service instead of starting it each time the machine boots
-
On the server side, it's necessary to run a new
tor
service for each new bot we have:- For each new bot, create a file /etc/tor/torrc.{1..} with the following content (note that it's necessary to change the ports for each new bot and the number on the directory). Then, on the bots, we have to connect to the port defined on
SocksPort
:
SocksPort 9060 ControlPort 9061 DataDirectory /var/lib/tor1
- For each new bot, create a file /etc/tor/torrc.{1..} with the following content (note that it's necessary to change the ports for each new bot and the number on the directory). Then, on the bots, we have to connect to the port defined on
-
On the server, it is necessary to run the bots with the environment variable
PROXY
with the proxy value (the default value is the localhost value) -
More info about how to configure ToR with python on link
- First, it's necessary to make a pull request to github with the tag
deploy
with the code we want to deploy next to the server. This will trigger the deploy workflow, that will create new images of the code to be deployed. - The first time, it's necessary to have all containers pre-created on the server. So, on the server terminal, run:
$ docker container run --env-file ~/PI_2020/env_vars/rest.env --publish 7000:7000 --detach --name rest docker.pkg.github.com/detiuaveiro/social-network-mining/rest # run the rest container
$ docker container run --env-file ~/PI_2020/env_vars/bot.env --network host --detach --name bot docker.pkg.github.com/detiuaveiro/social-network-mining/bot # run the bot container
$ docker container run --env-file ~/PI_2020/env_vars/control_center.env --detach --name control_center docker.pkg.github.com/detiuaveiro/social-network-mining/control_center # run the control center container
- Also, it's necessary to have a
watchtower
container running on the server, that will deploy automaticly all the images created with thedeploy github workflow
:
$ docker run --env-file ~/PI_2020/env_vars/watchtower.env -d --name watchtower -v /var/run/docker.sock:/var/run/docker.sock -v ~/.docker/config.json:/config.json containrrr/watchtower
- For the parlai service:
- First, we have must have a copy of the parlai repository on the server where we want to deploy the service. Then, we must run the command:
$ python examples/interactive.py -m transformer/polyencoder \ -mf zoo:pretrained_transformers/model_poly/model \ --encode-candidate-vecs true \ --eval-candidates fixed \ --fixed-candidates-path data/models/pretrained_transformers/convai_trainset_cands.txt
- ATTENTION: you must stop this process once it begins to retrain with the given candidates (we just did this step to download an already trained model).
- The next step is to copy the
tweets.txt
with the tweets candidates to the directoryParlAI/data/models/pretrained_transformers
. This file can be obtained on the directorycode/backend/twitter/tweets_text/
once you run:$ python start_cc.py --export_tweets_text # script in the directory code/backend/twitter of this repository. you also must run it in a virtual environment with the requirements in requirements_cc.txt installed
- Then, we have to copy the Dockerfile to build the correspondent image to the server. This can be found on the directory
code/backend/twitter/docker/parlai
and you must place it in theParlAI/
directory on the server. - It's also necessary to copy the
requirements.txt
fromcode/backend/twitter/docker/parlai
of this repository to theParlAI/
directory on the server. - At last, you have to build the docker image and to create the correspondent container:
$ docker build -t parlai . $ docker container run --publish 5555:5555 --restart always --detach --name parlai parlai
- First, we have must have a copy of the parlai repository on the server where we want to deploy the service. Then, we must run the command:
cd scripts
chmod +x import_databases.sh
./import_databases.sh
- Access
> mongoimport --db twitter --collection tweets --file scripts/mongodb/tweets.json -u user -p password
> mongoimport --db twitter --collection users --file scripts/mongodb/users.json -u user -p password
- Indexation
> db.users.createIndex({id_str: 1}, { unique:true })
> db.users.createIndex({id: 1}, { unique:true })
> db.users.createIndex({screen_name: 1}, { unique:true })
> db.tweets.createIndex({id: 1}, { unique:true })
> db.tweets.createIndex({id_str: 1}, { unique:true })
> db.tweets.createIndex({protected: 1}, { unique:false })
- Access
psql -U postgres_pi twitter -h localhost < scripts/postgresql/twitter.pgsql
-
Modifications to the initial bd
- Add a new column for protected users on table
users
-- Add column to user table to include if it's protected or not ALTER TABLE users ADD COLUMN protected BOOLEAN DEFAULT False;
- change
id
columns on postgresql fromint
tonumeric
(because of possible overflow)
alter table logs alter column id_bot type numeric; alter table logs alter column target_id type numeric; alter table tweets alter column tweet_id type numeric; alter table tweets alter column user_id type numeric; alter table users alter column user_id type numeric; alter table policies alter column bots type numeric[];
- Add a new column for protected users on table
- Import CALL apoc.load.json("user_nodes.json")
YIELD value
MERGE (p:User {name: value.a.properties.name, id: value.a.properties.id, username: value.a.properties.username})
CALL apoc.load.json("bots_nodes.json")
YIELD value
MERGE (p:Bot {name: value.a.properties.name, id: value.a.properties.id, username: value.a.properties. username})
CALL apoc.load.json("tweets.json")
YIELD value
MERGE (p:Tweet {id: value.a.properties.id})
CALL apoc.load.json("follow_rel.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:FOLLOWS]->(u)
CALL apoc.load.json("retweet.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:RETWEETED]->(u)
CALL apoc.load.json("reply.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:REPLIED]->(u)
CALL apoc.load.json("wrote.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:WROTE]->(u)
CALL apoc.load.json("quote.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:QUOTED]->(u)
- Export:
call apoc.export.json.query("match (start) - [r:QUOTED] ->(end) return start, r, end", "quote.json")
call apoc.export.json.query("match (start) - [r:WROTE] ->(end) return start, r, end", "write.json")
call apoc.export.json.query("match (start) - [r:RETWEETED] ->(end) return start, r, end", "retweet.json")
call apoc.export.json.query("match (start) - [r:FOLLOWS] ->(end) return start, r, end", "follow_rel.json")
call apoc.export.json.query("match (start) - [r:REPLIED] ->(end) return start, r, end", "reply.json")
call apoc.export.json.query("match (a:Tweet) return a", "tweets.json")
call apoc.export.json.query("match (a:User) return a", "user_nodes.json")
call apoc.export.json.query("match (a:Bot) return a", "bots_nodes.json")
- Indexation
// create index on user id
CREATE CONSTRAINT user_id
ON (u:User)
ASSERT u.id IS UNIQUE
// create index on tweet id
CREATE CONSTRAINT tweet_id
ON (t:Tweet)
ASSERT t.id IS UNIQUE
// create index on bot id
CREATE CONSTRAINT bot_id
ON (b:Bot)
ASSERT b.id IS UNIQUE
// create index on bot username
CREATE CONSTRAINT bot_username
ON (b:Bot)
ASSERT b.username IS UNIQUE
// create index on user username
CREATE CONSTRAINT user_username
ON (u:User)
ASSERT u.username IS UNIQUE