-
Notifications
You must be signed in to change notification settings - Fork 4
Vision
- [Infrastructure] (#wiki-infrastructure)
- [Fovea] (#wiki-fovea)
- [Colour Histograms] (#wiki-colour-histograms)
- [Modules] (#wiki-modules)
- [Field Edge Detection] (#wiki-field-edge-detection)
- [Goal Detection] (#wiki-goal-detection)
- [Robot Detection] (#wiki-robot-detection)
- [Surf Landmark Extraction] (#wiki-surf-landmark-extraction)
- [Visual Odometry] (#wiki-visual-odometry)
- [Surf Goal Classification] (#wiki-surf-goal-classification)
- [Ball Detection] (#wiki-ball-detection)
- [Field Feature Detection] (#wiki-field-feature-detection)
The Vision module begins by retrieving both the top and bottom camera images. If one of the images is not ready, the thread will wait until they are both ready. Two primary foveas are constructed from the raw images, one per camera, to form the basis of the vision pipeline. The top camera image is scaled down to 160x120 whilst the bottom camera image is scaled down to 80x60.
A detailed description of the vision infrastructure can be found here.
A fovea is used to represent a section of an image. It is a flexible class that provides a generic interface between images (or sub images) and the vision pipeline. Many of the vision algorithms are designed to be run on any fovea, allowing for maximal code reuse and cleanliness. A fovea can have colour and edge saliency data available. The colour saliency is generated from the colour calibration table and the edge saliency is generated from the grey scale version of the raw image.
The colour histograms contain a count of the number of pixels of each colour for each row and column in a fovea. They are calculated once per fovea, reducing the amount of repetition across the vision pipeline.
The field edge plays a vital role in the vision pipeline. The field edge is used to determine where the field starts so that we can save time when scanning the image for other features like the ball or field lines. The module sets an index for each column to indicate the pixel at the edge of the field for all other algorithms to use as a start when scanning.
The algorithm for detecting the field edge starts at the top of the image and scans down until it finds a significant patch of green. This scan is run on every column and results in a set of points across the image. RANSAC is then applied to the set of points to attempt to extract straight lines out of it. We attempt to detect up to two lines in any one image, since there can be two field edges in view at any point in time. This algorithm is run in both the top and bottom cameras independently since the field edge may run across both images.
If no field edge is detected, then we guess if the field edge is above or below the current camera view. This guess is based on the amount of green present in the image. If the image contains a large portion of green, but no distinct field edge, the field edge is assumed to be above the camera view and the entire image is treated as being "on the field". If not enough green is present, then the robot is deemed to be looking off the field or into the sky.
More details can be found here.
Goal Detection is run on the primary bottom camera fovea first, then on the primary top camera fovea. It uses the colour histograms to approximate the area where each goal post might be. The first step is to examine each column and look for local maxima of yellow pixels. From this, candidate regions are formed for areas which could indicate the presence of a goal post. The horizontal histograms are then scanned for yellow to determine the approximate horizontal bounds of each candidate region. This results in a set of candidate regions purely based on the colour histograms.
The next step is to examine the quality of each candidate region and determine which actually contain goal posts or not. This involves a series of checks including:
- A strong edge in the centre and base of the post
- The % of colour in the segment (vertically)
- The bottom of the post should be below the field edge
The next step is to tune the bounding box and calculate a distance measurement to the goalpost. To calculate the distance to the goal we have 2 measurements, one using the kinematics-based distance to the pixel at the base of the post and one using the width of the post. The kinematics distance is generally more reliable than the width, however is very heavily influenced by the rock of the robot as it walks. Thus the kinematics distance measurements lose their accuracy significantly while walking. During this process, higher resolution foveas are used to more accurately detect the base and width of the post. Data from these measurements is used to fine tune the bounding box around the goal with more accurate measurements.
The posts are then labelled as left / right and the data from the two cameras is merged and saved in the vision frame.
Robot detection uses breaks in the field edge as the basis for finding potential robots. A break in the field edge is when there aren't any green pixels along one section of the field edge. The assumption is that if there aren't green pixels there, some form of obstruction must be there.
Each candidate region is examined more closely to determine where the top and the bottom of the obstruction are. Once the area is finalised, a bayesian machine learner is used to determine if a region contains part of a robot or not.
The next step is to attempt to find the jersey and identify which team the robot is from. If the team cannot be identified, the robot can still be detected, but is labelled as unknown team. Finally the robot arms and legs are merged into the primary robot region, since they often form independent regions either side of the main robot.
More details can be found here.
Local image features are extracted from the horizon line in the top camera (determined using the robots kinematic chain). The horizon is used so the features remain in view as the robot moves on a flat plane (processing the whole image would be too slow). Therefore pixels grey values on the horizon line are subsampled and averaged over a 30 pixel vertical band, before being processed with a 1D version of the SURF algorithm. For each feature found the algorithm outputs the location and a 6 dimension feature descriptor. Features are fairly robust to minor variation in viewing angle, lighting etc. As well as the descriptor, features also have a scale and can be categorised as maxima or minima features. More details can be found here.
The VO module estimates changes to the robot's heading, assuming it is moving on a planar surface. It doesn't estimate translation, however in RoboCup uncommanded heading-changes (e.g. getting caught and turned) are a much larger source of navigation error than translation errors. Basically the VO module works by matching SURF feature descriptors to their nearest neighbour in adjacent camera frames. The robot's approximate heading change is then estimated from the median feature displacement.
Over time the VO heading is compared to the expected heading change from the walk engine. If the walk engine is deemed to be 'wrong', the VO module corrects the position update that is given to the localisation module. 'Slipping' behaviours can also be triggered, e.g. the robot let's it's arm's go limp so it can brush past an obstacle more easily.
There are a couple of other refinements and checks, for example, discarding features on other robots. Also, the module actually keeps track of SURF landmarks from the last 4 camera frames and looking for the 'best' matching path through the 4 frames - this reduces the chance of a single bad or blurred camera frame introducing a heading error. Note that the system was shown to work well in 2013, however there might be a parameter or other bug in the current code.
More details can be found here.
This module is designed to disambiguate one end of the field from the other. At the start of each half, each robot collects up to 80 images of the goal ends during the walk on behaviour. During the game, if the robot is facing a goal and a goal post is detected, this database is queried to find the closest matching database images. If the matches are good and form a consensus view on which goal can be seen, the module can confirm that the robot's belief about the goal it is facing is correct, or otherwise reinitialise localisation (if the required confidence level is met).
The goal feature database works by using the 'bag of visual words' representation for each image. This requires a visual vocabulary in order to vector quantise the original SURF features. This vocab was learned offline by clustering, using typical features detected in the rUNSWift lab. The ranking of matching images is done using tf-idf, like in text retrieval. This approach is used because the naive NN matching used in Visual Odometry is much too slow.
This module works ok, and has triggered localisation recovers in matches, but in many cases the landmarks (usually people) behind the goals change too much during the course of the game for the system to reach high confidence levels. A dynamic method of updating the database (e.g. SLAM approach) is probably needed.
More details can be found here.
Ball Detection runs in the primary bottom camera fovea first, and if it does not detect a ball there, is run on the primary top camera fovea. If the ball cannot be detected without prior knowledge, we then attempt to search areas we expect the ball to be in using outside information, such as team mate ball locations or previous ball locations.
Ball Detection utilises colour histograms to determine points of interest that may contain a ball. It matches columns and rows that both contain orange and then examines those areas in a higher resolution fovea. The actual ball detection algorithm has two steps, finding the ball edges and then fitting a circle to those edges.
The process for finding the ball edges involves scanning around the fovea and keeping track of the strongest edges found in the fovea. The scan starts at the centre of the fovea and scans outwards, radially, a number of times. If a strong edge point is detected during each scan it is added to the total list of edge points.
Once all the edge points are found, a RANSAC algorithm is applied to fit a circle to them. The RANSAC algorithm takes 3 points, generates a circle and tests how many other edge points lie on the circle. This process is repeated a set number of times and if a good enough match is found then the centre is calculated and stored as a detected ball.
Ball Detection tends to over detect balls, rather than under detect balls. As a result the algorithm works best by underclassifing orange pixels so that the ball has some orange, but other non-ball items (such as jerseys) have little to no orange on them.
More details can be found here.
Field Feature Detection runs in both cameras, but is expensive to run so we attempt to minimise usage where possible. It will always run in the bottom camera, then it will run inside small windows in the top camera. If we still don't find enough features, it will run in the entire top camera frame, which is the most expensive, but also the most likely to detect good features.
The first attempt to run in the top camera uses a searchForFeatures function which guesses where interesting field line data exists based on the current localisation estimate. If the robot is well localised, this works quite well and the robot is able to detect features from 4-5m away and remain well localised. If the robot isn't well localised, sometimes the windows are lucky and still detect a good feature, but often we rely on searching the entire top camera when localisation is uncertain.
Field feature detection has 3 stages, finding points that might lie on a field line, fitting lines and circles to those points, and finally generating more complex shapes like corners and t-intersections.
Finding field line points involves scanning both vertically and horizontally whilst examining edge data. The algorithm searches for matching pairs of strong edges that have opposing directions and uses the midpoint as the output. The pair of strong edges represent the green-to-white and white-to-green edges on either side of the line, so the midpoint should lie on the centre of the line. There are a variety of checks on the points, including the distance between them, the colour of the midpoint, etc to ensure quality points.
Fitting lines and circles involves using a RANSAC based approach. Each RANSAC cycle picks two points at random and attempts to fit both a line and a circle through them. It calculates how many other points also fit each shape to determine which is the best fit, with neither being an acceptable result. This is repeated a set number of times with the best overall match being tracked along the way. Once a cycle is complete, all the points matching the best fitting line or circle are removed and the process is started again. All the points are projected into the ground plane, using the kinematics of the robot, to make matching shapes easier.
Matching higher order shapes involves combining primitive lines and circles into more complicated and identifiable features, including corners, t-intersections and parallel lines. The key metrics to match shapes are the angles of lines relative to each other and the distance between each line's endpoints and the endpoints of neighbouring lines.
More details can be found here.