Skip to content

A scraper for the data made available by the Italian Senate, and a cluster analysis to detect similar amendments.

License

Notifications You must be signed in to change notification settings

jacquerie/senato.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

senato.py

Automated Clustering of Similar Amendments in the Italian Senate

The problem

The Italian Senate is under a Denial-of-Service attack. Software is being used to generate millions of amendments to block the passing of certain laws. The amendments are generated using a black-hat technique that produces several variations of a given text. This puts a huge strain on the Senate, which has to discuss and vote on the individual amendments, effectively bringing proceedings to a standstill.

The solution

The Italian Senate makes its data available publicly. An automated clustering analysis can be performed on these data to eliminate what are essentially duplicate amendments and reduce the total number of amendments that have to be considered.

clusters.png

senato.py is a scraper for data from the Senate. The data can be analysed using the Jupyter notebook provided in this repository.

Installation and Usage

  1. Clone this repository: git clone https://github.com/jacquerie/senato.py.git
  2. Install the dependencies: cd senato.py && pip install -r requirements.txt
  3. Fetch the amendments by running the scraper: scrapy crawl cirinna
  4. Examine the analysis by running the notebook: jupyter notebook cirinna.ipynb

About senato.py

senato.py is authored by Jacopo Notarstefano (@Jaconotar). You can learn more about it by watching this short "lightning talk" given by Jacopo at CERN on 17 June 2016.

License

MIT

About

A scraper for the data made available by the Italian Senate, and a cluster analysis to detect similar amendments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published