Implemented for Python 3
This is an implementation of the TRACLUS algorithm as described in the paper: "Trajectory Clustering: A Partition-and-Group Framework" by Lee, Han, & Whang (2007) [http://hanj.cs.illinois.edu/pdf/sigmod07_jglee.pdf]
Implementation Author: Adriel Isaiah V. Amoguis (De La Salle University) Implementation Date: 2023-03-19
This implementation was done as part of the algorithms required for the implementation author's undergraduate thesis. The implementation is not guaranteed to be bug-free and may not be optimized for certain use-cases. The implementation author is not responsible for any damages caused by the use of this implementation. Use at your own risk. End-users are encouraged to examine the code in the case of any issues. If you find any bugs, please report them to the implementation author via the repository's issues page on GitHub.
This implementation requires Python 3.6 or higher. It is recommended to use a virtual environment to avoid dependency conflicts.
Optionally, create a Conda Virtual Environment:
conda create -n traclus python=3.6
conda activate traclus
Install the package from pip:
pip install traclus-python==1.0.1
The TRACLUS algorithm requires a Python list of trajectories. Each trajectory is a 2-Dimensional Numpy Array with the following format:
[[x1, y1],
[x2, y2],
...
[xn, yn]]
where x
is the x-coordinate, and y
is the y-coordinate for each point n
.
from traclus import traclus as tr
# Your Trajectory Data
trajectories = ...
# Run the TRACLUS Algorithm
partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = tr.traclus(trajectories)
The partitions
variable will contain all the trajectories that are partitioned by the algorithm into their characteristic points (cp).
The segments
variable will contain all the generated partitions split into segments.
The dist_matrix
variable will contain the distance matrix generated by the distance function as defined in the paper.
The clusters
variable will contain the line segment clusters generated by the algorithm.
The cluster_assignments
variable will contain the cluster assignments for each line segment.
The representative_trajectories
variable will contain the representative trajectories generated by the algorithm.
This implementation uses three smaller distance function that computes the overall distance between two points in the trajectory. These are the Perpendicular Distance, Parallel Distance, and Angular Distance. To compute the overall distance, these three distances are weighted and added together as shown below:
The weights for each distance function can be adjusted by providing a weights
list to the traclus
function. The default weights are:
weights = [1, 1, 1]
Additionally, the Perpendicular Distance is computed differently depending on whether or not the trajectories are direction-sensitive.
Trajectories are treated as directional by default, but this can be changed by providing the directional
parameter to the traclus
function.
Such as:
partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = tr.traclus(trajectories, directional=False)
The traclus
function takes in a clustering_algorithm
parameter that can be used to specify the clustering algorithm to use. This is set to
DBSCAN from Scipy by default. The traclus
function also accepts two parameters for DBSCAN: eps
and min_samples
. These are set to 1 and 10 by default.
The sub_sample_trajectory
function can be used to sub-sample a trajectory into a trajectory of the same profile but with lesser points.
It takes sample_n
as a parameter, which is the number of points to sub-sample the trajectory into.
from traclus.traclus import sub_sample_trajectory
# Your Trajectory Data
trajectories = ...
# Sub-Sample the Trajectories
sub_sampled_trajectories = [sub_sample_trajectory(trajectory, sample_n=100) for trajectory in trajectories]
In the example above, the trajectories will be sub-sampled into 100 points each. This is useful for reducing the number of points processed by the algorithm, decreasing its runtime. However, this may also reduce the accuracy of the results. It is recommended to experiment with different values of sample_n
to find the best value for your use-case.
The smooth_trajectory
function can be used to smooth the representative trajectories generated by the algorithm.
from traclus.traclus import traclus, smooth_trajectory
# Your Trajectory Data
trajectories = ...
# Run the TRACLUS Algorithm
partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = traclus(trajectories)
# Smooth the Representative Trajectories
smoothed_representative_trajectories = [smooth_trajectory(trajectory, window_size=21) for trajectory in representative_trajectories]