First project in the Udacity Full Stack Web Developer Nanodegree.
The main task of this project is a reporting tool that uses information from a database called news that contains three tables :
- The
authors table
includes information about the authors of articles. - The
articles table
includes the articles themselves. - The
log table includes
one entry for each time a user has accessed the site.
The reporting tool should answer the following questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
logs_analysis.py
- The Python program that connects to the PostgreSQL database, executes the SQL queries and displays the results.README.md
- This read me file.logs_output.txt
- The text output of thelogs_analysis.py
The project code requires the following software:
- Python
- psycopg2
- PostgreSQL
- Vagrant
- virtualbox
1.Install Vagrant and VirtualBox, Please check instructions to install the virtual machine 2.Download or Clone this repository in the /vagrant directory You must finish step 1 first
1.The virtual machine from step 1 If you need to bring the virtual machine back online with $ vagrant up. Then log into it with $ vagrant ssh
2.Download the data from here. Unzip this file after downloading it. The file inside is called newsdata.sql.
To run the reporting tool, you'll need to load the site's data into your local database. To load the data, use the command
psql -d news -f newsdata.sql
3.Creating views
This view is used to show the information about author_title,author_name,author_ID and he total view for each author
CREATE VIEW collect AS
SELECT articles.title AS article_title,
articles.author AS author_id,
authors.name AS author_name,
count(log.path) AS views
FROM log, articles, authors
WHERE log.path LIKE ('%' || articles.slug)
AND authors.id = articles.author
GROUP BY article_title, author_id, author_name
ORDER BY views DESC;
This view show the total number of requests on the website
CREATE VIEW All_requests AS
SELECT date(time), COUNT(*) AS reviewers
FROM log
GROUP BY date(time)
ORDER BY date(time);
This view show the total nymber of bad requests on the website
CREATE VIEW error_requests AS
SELECT date(time), COUNT(*) AS errors
FROM log WHERE status = '404 NOT FOUND'
GROUP BY date(time)
ORDER BY date(time);
This view show the percentage of the bad requests
CREATE VIEW error_rate AS
SELECT All_requests, (100.0*error_requests.errors/All_requests.reviewers) AS percentage
FROM All_requests, error_requests
WHERE All_requests.date = error_requests.date
ORDER BY All_requests.date;
4- Run the Tool
1.From the vagrant directory inside the virtual machine, run logs_analysis.py using:
$ python logs_analysis.py
2-Check the output file for the results output.txt