This project is a solution to the Waldo engineering assignment described here.
- The application is entirely written in Rust, and can be compiled using the current nightly version of rustc (1.15.x at the time of this writing). Note that the nightly limitation is imposed by Serde, and not the core codebase
- The codebase is roughly ~470 lines
- The service uses a combination of asynchronous I/O using epoll + a pool of
1--
env::var("WALDO_THREADS")
threads to maximize performance (see main.rs) - Persistence is achieved through MongoDB (see storage.rs)
- Each thread maintains its own connection to MongoDB
GET /<photo_key>
returns a JSON-ish response with the photo information, if the requested key exists- Otherwise, it responds with an error message and 404 status
POST /
fetches the XML file from HTTP using libcurl and dumps it into MongoDB- POST spawns threads equal to the number of CPU cores which start downloading and processing the EXIF information on the photos. They read from a shared Michael-Scott lock-free queue that's fed by the XML listing the photo resources (see process.rs)
- Errors come back with HTTP 500
- Install Rust Nightly
- Install & launch MongoDB on
127.0.0.1:27017
, or adjust the config in main.rs:47 - The application will create everything in a collection called
waldo_assignment_areg
(See storage.rs:17) - Install gexiv2, the Gnome photo metadata library. On macOS, this is just
brew install gexiv2
. - In terminal, navigate to the root directory of the project
$ cargo run
- Witness that a GET request to http://127.0.0.1:3000/ says "Photo not found", which means the server is up and running
- Do a POST to http://127.0.0.1:3000/ to pull the XML and populate Mongo
- Send a GET to http://127.0.0.1:3000/<photo_key_id> and observe the response
Note: In order to have the app spawn more than 1 processing thread, set an
environment variable called WALDO_THREADS
to the desired number of threads.
Example: $ export WALDO_THREADS="4"
.
- Why not?
- It's super performant, has a state-of-the-art type system, is a lot of fun to write in
- Could have implemented this in Python/{gevent/Twisted} in a few hours, but wanted to demonstrate I could ship working code in a language I only know the theory of
- If not good enough, can still ship in Python!
- Initially the plan was to roll with Redis. Although it doesn't really have "traditional" persistence, its snapshotted persistence would be good enough for this project, and much faster than MongoDB
- Since we already have the source XML on S3, Redis with its pub/sub for monitoring changes would have been perfect
- Unfortunately ran into issues with the Redis Rust library, which is super raw at this point
- Quickly refactored
storage::Storage
to use MongoDB instead
Given enough time, everything :) In order of priority:
- Make
Photo::last_modified
more strongly typed, similar toPhoto::StorageClass
. - Move the feed part out of the current
POST
endpoint. Perhaps make it a separate service. If not, spawn it in its own separate thread/process - Switch from DOM-based XML parsing to event-based stream parsing. Uses less memory and in this particular instance would also take fewer CPU cycles, albeit is more work
Use Serde for serializing/deserializing both XML and BSON and JSONSpeaking of which, respond with some kind of a proper JSON protocol- Dockerize the service to streamline deployment
- Externalize configuration
- Although error handling is
nearlyperfect, add proper logging - Unit test where appropriate, although this particular codebase is already pretty fool proof
THE FINE PRINT: The sheer amount of fun and learning that happened during this implementation was over the roof. Two red bull cans were harmed while coding the project. It's entirely possible that the person writing the problem description had something completely different in mind, which wouldn't obsolete neither the fun nor the learning mentioned 2 sentences ago. Rust is distributed under the terms of both the MIT license and the Apache License (Version 2.0). Every package the project depends on is distributed under similarly permissive licenses. For questions or concerns, contact [email protected].