Skip to content

An implementation of the TRACLUS algorithm, A Partition-and-Group Framework (http://hanj.cs.illinois.edu/pdf/sigmod07_jglee.pdf).

License

Notifications You must be signed in to change notification settings

sbremman/TRACLUS

 
 

Repository files navigation

TRACLUS

Implemented for Python 3

This is an implementation of the TRACLUS algorithm as described in the paper: "Trajectory Clustering: A Partition-and-Group Framework" by Lee, Han, & Whang (2007) [http://hanj.cs.illinois.edu/pdf/sigmod07_jglee.pdf]

Implementation Author: Adriel Isaiah V. Amoguis (De La Salle University) Implementation Date: 2023-03-19

This implementation was done as part of the algorithms required for the implementation author's undergraduate thesis. The implementation is not guaranteed to be bug-free and may not be optimized for certain use-cases. The implementation author is not responsible for any damages caused by the use of this implementation. Use at your own risk. End-users are encouraged to examine the code in the case of any issues. If you find any bugs, please report them to the implementation author via the repository's issues page on GitHub.


Installation

This implementation requires Python 3.6 or higher. It is recommended to use a virtual environment to avoid dependency conflicts.

Optionally, create a Conda Virtual Environment:

conda create -n traclus python=3.6
conda activate traclus

Install the package from pip:

pip install traclus-python==1.0.1

Usage

Preparing Your Trajectory Data

The TRACLUS algorithm requires a Python list of trajectories. Each trajectory is a 2-Dimensional Numpy Array with the following format:

[[x1, y1],
 [x2, y2],
 ...
 [xn, yn]]

where x is the x-coordinate, and y is the y-coordinate for each point n.

Running the Algorithm in Your Own Script File

from traclus import traclus as tr

# Your Trajectory Data
trajectories = ...

# Run the TRACLUS Algorithm
partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = tr.traclus(trajectories)

The partitions variable will contain all the trajectories that are partitioned by the algorithm into their characteristic points (cp). The segments variable will contain all the generated partitions split into segments. The dist_matrix variable will contain the distance matrix generated by the distance function as defined in the paper. The clusters variable will contain the line segment clusters generated by the algorithm. The cluster_assignments variable will contain the cluster assignments for each line segment. The representative_trajectories variable will contain the representative trajectories generated by the algorithm.

Distance Weights and Trajectory Direction

This implementation uses three smaller distance function that computes the overall distance between two points in the trajectory. These are the Perpendicular Distance, Parallel Distance, and Angular Distance. To compute the overall distance, these three distances are weighted and added together as shown below:

$distance = w_{perpendicular} * d_{perpendicular} + w_{parallel} * d_{parallel} + w_{angular} * d_{angular}$

The weights for each distance function can be adjusted by providing a weights list to the traclus function. The default weights are:

weights = [1, 1, 1]

Additionally, the Perpendicular Distance is computed differently depending on whether or not the trajectories are direction-sensitive. Trajectories are treated as directional by default, but this can be changed by providing the directional parameter to the traclus function. Such as:

partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = tr.traclus(trajectories, directional=False)

Clustering Parameters

The traclus function takes in a clustering_algorithm parameter that can be used to specify the clustering algorithm to use. This is set to DBSCAN from Scipy by default. The traclus function also accepts two parameters for DBSCAN: eps and min_samples. These are set to 1 and 10 by default.

Supporting Functions

The sub_sample_trajectory function can be used to sub-sample a trajectory into a trajectory of the same profile but with lesser points. It takes sample_n as a parameter, which is the number of points to sub-sample the trajectory into.

from traclus.traclus import sub_sample_trajectory

# Your Trajectory Data
trajectories = ...

# Sub-Sample the Trajectories
sub_sampled_trajectories = [sub_sample_trajectory(trajectory, sample_n=100) for trajectory in trajectories]

In the example above, the trajectories will be sub-sampled into 100 points each. This is useful for reducing the number of points processed by the algorithm, decreasing its runtime. However, this may also reduce the accuracy of the results. It is recommended to experiment with different values of sample_n to find the best value for your use-case.

The smooth_trajectory function can be used to smooth the representative trajectories generated by the algorithm.

from traclus.traclus import traclus, smooth_trajectory

# Your Trajectory Data
trajectories = ...

# Run the TRACLUS Algorithm
partitions, segments, dist_matrix, clusters, cluster_assignments, representative_trajectories = traclus(trajectories)

# Smooth the Representative Trajectories
smoothed_representative_trajectories = [smooth_trajectory(trajectory, window_size=21) for trajectory in representative_trajectories]

About

An implementation of the TRACLUS algorithm, A Partition-and-Group Framework (http://hanj.cs.illinois.edu/pdf/sigmod07_jglee.pdf).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.1%
  • Python 0.9%