Skip to content
Brandon Tang edited this page Jan 27, 2020 · 6 revisions

Welcome to the AutoCite wiki!

What is AutoCite?

AutoCite is a python application that enables one to automatically create citations for websites in either APA or Chicago formats. With AutoCite, one can focus less on citations and more on actually writing a paper.

For a quick start, just head over to Autocite.zapto.org to try it out :)

Distributions

There are 4 main ways to use AutoCite

  1. AutoCite Command Line [AutoCite_CLI.py]
  2. AutoCite Graphical User Interface [AutoCite_GUI.pyw / AutoCite_Win_xxxxxx.exe / AutoCite_Linux_xxxxxx]
  3. AutoCite Private Server [AutoCite_Web]
  4. AutoCite on AWS [Autocite.zapto.org / AutoCite_lambda]

AutoCite Command Line

AutoCite Command Line is a python application with just the bare essentials.

This is good if you want to use AutoCite as a component of another project or if you just really like command line applications.

Usage

USAGE: AutoCite_CLI.py URL FORMAT
Possible Formats:
    Chicago (default)
    apa

Notes:
    Ensure that URL begins with either "http://" or "https://"

Example

> python AutoCite_CLI.py https://google.com apa
Citing https://google.com...
Citation format set as apa
Google (n.d). Retrieved from https://google.com

Dependencies

  1. Python 3.7+
  2. Beautiful Soup 4 (Python Module)
  3. Python Date Utility (Python Module)

Installation (Use pip instead of pip3 for windows)

pip3 install bs4 python-dateutil

AutoCite Graphical User Interface

AutoCite_GUI is the fastest and most direct way to create citations as the code is run directly on your own machine (unlike AutoCite on AWS). Simply download the executable for your operating system and run it!

Use this if you want quick citations and dislike having to keep visit the online page.

Install the same dependencies as for AutoCite Command Line to run the python version of the GUI.

AutoCite Private Server

AutoCite Private Server is a flask webserver that serves a webpage where one can use AutoCite from. This is useful in the context of creating a home/company/school wide solution for citation needs.

Use this if you are a system admin or something

Dependencies

  1. Python 3.7+
  2. Beautiful Soup 4 (Python Module)
  3. Python Date Utility (Python Module)
  4. Flask (Python Module)
  5. Gunicorn3 (Linux Package)

Installation (for Debian based Linux distributions)

pip3 install bs4 python-dateutil flask
sudo apt-get install gunicorn3

Quick Start

git clone https://github.com/BrandonTang89/AutoCite.git
cd AutoCite_Web
chmod +x run_deployment_server.sh
./run_deployment_server.sh

AutoCite on AWS

AutoCite on AWS is the primary way that most people will have access to AutoCite. Static site hosting on AWS Simple Storage Service (S3) and AWS CloudFront caching is used to serve the webpage. AWS lambda is use for the main citation work. AWS RDS is used as a cache for the citations to reduce latency and reduce AWS lambda requests.

Flow Chart for Autocite on AWS

Autocite on AWS can be broken down into 2 different sections,

  1. Retrieving the static website HTML document of the site
  2. Creating the citations upon the "generate citations" button being clicked

Retrieving the static website HTML document of the site

When the user sends a get request to autocite.zapto.org or autocite.info.tm, the request will be rerouted to "https://d2chtxlgatshjb.cloudfront.net/", this is the domain of the cloudfront distribution of the S3 bucket configured to for static website hosting. From there, a cached version of the HTML document will be served to the user.

Cloudfront is used for 2 main reasons

  1. Transport layer security (TLS) by securing the communication with an SSL certificate
  2. Reduced latency as the HTML document is cached at edge servers by AWS

Creating the citations upon the "generate citations" button being clicked

Cache Check

When the generate citations is clicked, the entire batch of citations are sent via a post request to AWS API Gateway which sends a trigger to AWS lambda to connect to the database to check for cached citations.

The database is organised as such

database
|
--- autocite_cache (schema)
     |
     --- Records (table)
          |
          | --- hash
          | --- citation

Where "hash" is formed by performing a SHA1 hash on a concatenated string of the raw URL and the citation format (APA or Chicago). This is done to ensure that both the URL and format are used to define a unique citation. Furthermore, the hashing of any input to the SQL query helps to protect against SQL injection attacks.

Citation is a column that stores the citation of the URL and citation format used to form the hash. It is stored without the date accessed in the case of a Chicago formatted citation. This is because the citation may be retrieved at a different date than it is cached.

Citing Un-Cached Citations

Citations that were not found in cached are sent to another AWS Lambda function to be cited asynchronously. A separate lambda request is sent for each citation. This greatly increases the speed of the citations being made as they are done in parallel.

Updating Cache with New Citations

When all the citations have arrived for the user, they are displayed to the user in the output box. In the background, a cache update request is sent. Sending a separate request for cache update as opposed to doing it in the lambda function to create new citations improves speed as only one connection to the database is need for the update. Furthermore, this ensures that the database is not overflowed with connections, preventing a denial of service attack.