Systems and methods for Big and Unstructured Data project held at Politecnico di Milano in a.y. 2022/2023. The aim of the project is to design and implement NoSQL databases for different scenarios.
Teacher Marco Brambilla
This project aims to build databases with different technologies that handle scientific articles contained in the DBLP bibliography. The focus is on creating a database which allows efficient information retrieval of the articles.
Design, store and query graph data structures in a NoSQL DB for DBLP bibliography.
Tasks to perform:
- Design conceptual model
- Store a sample dataset in Neo4J
- Write basic data creation\update Commands (minimum 5)
- Write basic Queries (minimum 10)
- Check complexity and performance time
Design, store and query documental data structures in a NoSQL DB for DBLP bibliography.
Tasks to perform:
- Design conceptual model
- Store a sample dataset in MongoDB
- Write basic data creation\update Commands (minimum 5)
- Write basic Queries (minimum 10)
- Check complexity and performance time
Design, store and query data structures in a NoSQL DB for DBLP bibliography using Spark.
Tasks to perform:
- Design conceptual model
- Store a sample dataset in Spark
- Write basic data creation\update Commands (minimum 5)
- Write 10 Queries with the following requirements (provided using their equivalents for simplicity):
- WHERE, JOIN
- WHERE, LIMIT, LIKE
- WHERE, IN, Nested Query
- GROUP BY, 1 JOIN, AS
- WHERE, GROUP BY
- GROUP BY, HAVING, AS
- WHERE, GROUP BY, HAVING, AS
- WHERE, Nested Query (i.e., 2-step Queries), GROUP BY
- WHERE, GROUP BY, HAVING, 1 JOIN
- WHERE, GROUP BY, HAVING, 2 JOINs
- Check complexity and performance time
The final version includes:
Project presentation slides: presentation
- LaTeX - IntelliJ
- GraphDB - Neo4j
- Documental DB - MongoDB
- Conceptual Models - draw.io