Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GFD mining #465

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add GFD mining #465

wants to merge 5 commits into from

Conversation

AntonChern
Copy link
Contributor

@AntonChern AntonChern commented Sep 25, 2024

This PR implements an algorithm for mining graph functional dependencies based on article "Discovering Graph Functional Dependencies" by Fan Wenfei, Hu Chunming, Liu Xueli, and Lu Ping. Algorithm, given an input graph, returns a set of dependencies satisfied on this graph. The algorithm also has two configurable parameters: k is the maximum number of vertices in the pattern of the mined dependency and sigma is its minimum frequency.
In addition, the PR implements the ability to run the algorithm in Python, and also contains examples of its use.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 29. Check the log or trigger a new build to see more.

src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from c3eeecf to a59e3cc Compare October 3, 2024 08:42
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
: query_(query_), iso_(iso_), res_(res_) {}

template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
bool operator()(CorrespondenceMap1To2 f, CorrespondenceMap2To1) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second parameter is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boost Graph Library requires such syntax from callback function. Unfortunately, if I change the function signature, the code will not compile

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@xJoskiy
Copy link
Contributor

xJoskiy commented Oct 8, 2024

Add PR description

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
@xJoskiy
Copy link
Contributor

xJoskiy commented Oct 9, 2024

You are doing great! Please don't forget to notify when you are done with changes requested. Also please mark conversations if they are resolved

@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from 60aeadc to 9ce90a3 Compare October 16, 2024 15:46
@AntonChern AntonChern requested a review from xJoskiy October 16, 2024 15:47
src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Token name_token = Token(i, name.first);
for (auto& value : name.second) {
Token value_token = Token(-1, value);
result.push_back(Literal(name_token, value_token));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.reserve(name.second.size() * attrs_info.at(label).size())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, name.second can have different number of values, depending on name.first. The reserve may not work correctly.

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/python_bindings/bindings.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from 11d6a2e to 13b76ae Compare November 5, 2024 16:38
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from 44b4271 to b6e4f89 Compare November 27, 2024 17:23
Copy link
Collaborator

@vs9h vs9h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the pull request a bit and left comments on the things that caught my eye.

It would be nice if you try to improve the code yourself some more. It might be helpful to read google's coding style, which we try to follow.

Also you can use some tools. For example, cppcheck can point out errors and possible improvements

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Comment on lines +368 to +320
class CompareCallback {
private:
graph_t const& query_;
Embedding& iso_;
bool& res_;

public:
CompareCallback(graph_t const& query, Embedding& iso, bool& res)
: query_(query), iso_(iso), res_(res) {}

template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
bool operator()(CorrespondenceMap1To2 f, CorrespondenceMap2To1) const {
BGL_FORALL_VERTICES_T(u, query_, graph_t) {
iso_.emplace(u, get(f, u));
}
res_ = true;
return false;
}
};

struct VCompare {
graph_t const& query;
graph_t const& graph;

bool operator()(vertex_t const& fr, vertex_t const& to) const {
return query[fr].attributes.at("label") == graph[to].attributes.at("label");
}
};

struct ECompare {
graph_t const& query;
graph_t const& graph;

bool operator()(edge_t const& fr, edge_t const& to) const {
return query[fr].label == graph[to].label;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use lambdas instead of these structs?
Define them directly where you need them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the boost graph library requires exactly this structure for the code to work correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the boost graph library requires exactly this structure for the code to work correctly.

Are you sure about that? Cause this works fine with lambdas: https://godbolt.org/z/63ooq15nG

src/python_bindings/gfd/bind_gfd_mining.h Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
std::string ToString();
};
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add blank line at the end. Also applies to other files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CI check complains when I do this

examples/basic/mining_gfd/mining_gfd1.py Outdated Show resolved Hide resolved
examples/basic/mining_gfd/mining_gfd2.py Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/tests/all_paths.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 3 times, most recently from 69c2d3f to 10ff8ca Compare December 11, 2024 17:14
Add class for GFD miner
Add minimal GFD and GFD with multiple conclusion tests
Added two examples with searching for dependencies in small graphs.
@xJoskiy
Copy link
Contributor

xJoskiy commented Dec 12, 2024

graph_descriptor.h structures and aliases should not be in global namespace, add them to gfd namespace

@xJoskiy
Copy link
Contributor

xJoskiy commented Dec 12, 2024

Change commit name Correct changes to previous algorithms so it's clear, what was changed

// Defines a specific attribute of the pattern.
// The first element is the index of the vertex,
// the second is the name of the attribute.
// An alias for user convenience.
using Token = std::pair<int, std::string>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are indexes supposed to have negative values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I encode the constant token with the value -1

src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/comparator.cpp Show resolved Hide resolved
src/core/algorithms/gfd/comparator.h Show resolved Hide resolved
Comment on lines +25 to +72
class CmpCallback {
private:
bool& res_;

public:
CmpCallback(bool& res) : res_(res) {}

template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
bool operator()(CorrespondenceMap1To2, CorrespondenceMap2To1) const {
res_ = true;
return false;
}
};

struct VCmp {
graph_t const& lhs;
graph_t const& rhs;

bool operator()(vertex_t const& fr, vertex_t const& to) const {
return lhs[fr].attributes.at("label") == rhs[to].attributes.at("label");
}
};

struct ECmp {
graph_t const& lhs;
graph_t const& rhs;

bool operator()(edge_t const& fr, edge_t const& to) const {
return lhs[fr].label == rhs[to].label;
}
};

bool IsSub(graph_t const& query, graph_t const& graph) {
bool result = false;
VCmp vcmp = {query, graph};
ECmp ecmp = {query, graph};
CmpCallback callback(result);
boost::property_map<graph_t, boost::vertex_index_t>::type query_index_map =
get(boost::vertex_index, query);
boost::property_map<graph_t, boost::vertex_index_t>::type graph_index_map =
get(boost::vertex_index, graph);
std::vector<vertex_t> query_vertex_order = vertex_order_by_mult(query);
boost::vf2_subgraph_iso(query, graph, callback, query_index_map, graph_index_map,
query_vertex_order, ecmp, vcmp);
return result;
}

} // namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not done

src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
std::vector<Literal> premises = {};
Gfd gfd = {pattern, premises, conclusion};
if (Validate(graph, gfd, embeddings, sigma)) {
rules.push_back({{}, l});
Copy link
Contributor

@xJoskiy xJoskiy Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid unnecessary temporary object creation

Suggested change
rules.push_back({{}, l});
rules.emplace_back({}, l);

Consider everywhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggested line gives a compiler error:
no matching function for call to ‘std::vector<std::pair<std::vector<std::pair<std::pair<int, std::__cxx11::basic_string >, std::pair<int, std::__cxx11::basic_string > > >, std::pair<std::pair<int, std::__cxx11::basic_string >, std::pair<int, std::__cxx11::basic_string > > > >::emplace_back(, gfd::Literal&)’

Copy link
Collaborator

@ol-imorozko ol-imorozko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review gfd_miner.cpp tomorrow (:


bool ContainsLiteral(std::vector<Literal> const& literals, Literal const& l) {
auto check = [&l](auto const& cur_lit) { return CompareLiterals(cur_lit, l); };
return std::any_of(literals.begin(), literals.end(), check);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::any_of(literals.begin(), literals.end(), check);
return std::ranges::any_of(literals, check);

More cleaner this way, gets rid of .begin()/.end() boilerplate

return false;
}
auto check = [&rhs](auto const& cur_lit) { return ContainsLiteral(rhs, cur_lit); };
return std::all_of(lhs.begin(), lhs.end(), check);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::all_of(lhs.begin(), lhs.end(), check);
return std::ranges::all_of(lhs, check);


#include <algorithm>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <algorithm> is not used in this file

Comment on lines +25 to +37
class CmpCallback {
private:
bool& res_;

public:
CmpCallback(bool& res) : res_(res) {}

template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
bool operator()(CorrespondenceMap1To2, CorrespondenceMap2To1) const {
res_ = true;
return false;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using references as fields is not a good sign in any code.
Here we pass this FunctionObject to boost::vf2_subgraph_iso as a callback that will be called with operator() and set local variable bool result = false; to true.

Can't we just pass simple lambda to it then?
Like

bool result = false;
auto callback = [&result](auto, auto) { result = true; return false; }
//...
boost::vf2_subgraph_iso(query, graph, callback, //...
//...
return result;

Comment on lines +39 to +55
struct VCmp {
graph_t const& lhs;
graph_t const& rhs;

bool operator()(vertex_t const& fr, vertex_t const& to) const {
return lhs[fr].attributes.at("label") == rhs[to].attributes.at("label");
}
};

struct ECmp {
graph_t const& lhs;
graph_t const& rhs;

bool operator()(edge_t const& fr, edge_t const& to) const {
return lhs[fr].label == rhs[to].label;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is slighty better because we use const references as fields, but still, why don't just use lambdas?

auto vertex_cmp =  [&query, &graph](vertex_t const& fr, vertex_t const& to) {
        return query[fr].attributes.at("label") == graph[to].attributes.at("label");
    };;
auto edge_cmp = [&query, &graph](edge_t const& fr, edge_t const& to) {
        return query[fr].label == graph[to].label;
    };

//pass vertex_cmp and edge_cmp to boost::vf2_subgraph_iso

Comment on lines +81 to +83
bool Gfd::operator!=(Gfd const& gfd) const {
return !(*this == gfd);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've defined operator==, this code can be easily generated by compiler. Put that in .h file:

    bool operator==(const Gfd& gfd) const;
    bool operator!=(const Gfd& gfd) const = default;

and remove manual implementation

src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
using Info = std::map<std::string, std::set<std::string>>;

void NextSubset(std::vector<std::size_t>& indices, std::size_t const border) {
if (indices.at(0) == border - indices.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (indices.at(0) == border - indices.size()) {
if (indices.empty() || indices[0] == border - indices.size()) {


void NextSubset(std::vector<std::size_t>& indices, std::size_t const border) {
if (indices.at(0) == border - indices.size()) {
indices = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices = {};
indices.clear();

Comment on lines +43 to +53
std::size_t index = 0;
for (int i = static_cast<int>(indices.size()) - 1; i >= 0; --i) {
if (indices.at(i) != border - indices.size() + static_cast<std::size_t>(i)) {
index = static_cast<std::size_t>(i);
break;
}
}
indices.at(index)++;
for (std::size_t i = index + 1; i < indices.size(); ++i) {
indices.at(i) = indices.at(index) + i - index;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can refactor that as one loop without static_cast's:

    std::size_t i = indices.size();
    while (i > 0) {
        --i;
        if (indices[i] < border - indices.size() + i) {
            ++indices[i];
            for (std::size_t j = i + 1; j < indices.size(); ++j) {
                indices[j] = indices[j - 1] + 1;
            }
            return;
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, I'm not sure whether we need to execute second loop if we failed to find index in the first one

}

template <typename T>
std::vector<std::vector<T>> GetSubsets(std::vector<T> const& elements, std::size_t const n) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::vector<T>> GetSubsets(std::vector<T> const& elements, std::size_t const n) {
std::vector<std::vector<T>> GetSubsets(std::vector<T> const& elements, std::size_t n) {

Const here does nothing

if (elements.size() < n) {
return {};
}
std::vector<std::vector<T>> result = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just

Suggested change
std::vector<std::vector<T>> result = {};
std::vector<std::vector<T>> result;

is enough

Comment on lines +607 to +609
std::vector<graph_t> new_patterns = {};
std::vector<Embeddings> new_embeddings_set = {};
std::vector<Rules> new_forbidden_rules_set = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<graph_t> new_patterns = {};
std::vector<Embeddings> new_embeddings_set = {};
std::vector<Rules> new_forbidden_rules_set = {};
std::vector<graph_t> new_patterns;
std::vector<Embeddings> new_embeddings_set;
std::vector<Rules> new_forbidden_rules_set;

Comment on lines +627 to +629
patterns = new_patterns;
embeddings_set = new_embeddings_set;
forbidden_rules_set = new_forbidden_rules_set;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
patterns = new_patterns;
embeddings_set = new_embeddings_set;
forbidden_rules_set = new_forbidden_rules_set;
patterns = std:move(new_patterns);
embeddings_set = std:move(new_embeddings_set);
forbidden_rules_set = std::move(new_forbidden_rules_set);

Comment on lines +24 to +32
static std::unique_ptr<algos::GfdMiner> CreateGfdMiningInstance(
std::filesystem::path const& graph_path, TestConfig const& gfdConfig) {
StdParamsMap option_map = {{config::names::kGraphData, graph_path},
{config::names::kGfdK, gfdConfig.k},
{config::names::kGfdSigma, gfdConfig.sigma}};
return algos::CreateAndLoadAlgorithm<GfdMiner>(option_map);
}
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid code duplication, here you can add an additional helper to execute and compare:

    static void ExecuteAndCompare(const std::unique_ptr<algos::GfdMiner>& algorithm,
                                  const std::vector<Gfd>& expected_gfds) {
        algorithm->Execute();
        const auto gfd_list = algorithm->GfdList();
        ASSERT_EQ(expected_gfds.size(), gfd_list.size());
        ASSERT_THAT(gfd_list, ::testing::ElementsAreArray(expected_gfds));
    }

Also we can use matcher ::testing::ElementsAreArray instead of manually comparing them with ASSERT_TRUE(x == y) in a loop

Comment on lines +37 to +48
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
algorithm = CreateGfdMiningInstance(graph_path, {.k = 3, .sigma = 3});
algorithm->Execute();
std::size_t expected_size = gfd_list.size();
gfd_list = algorithm->GfdList();
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call CreateGfdMiningInstance and Execute two times in a row?
Should't this be just

Suggested change
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
algorithm = CreateGfdMiningInstance(graph_path, {.k = 3, .sigma = 3});
algorithm->Execute();
std::size_t expected_size = gfd_list.size();
gfd_list = algorithm->GfdList();
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
std::unique_ptr<algos::GfdMiner> algorithm = CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
ExecuteAndCompare(algorithm, expected_gfds);

?

Comment on lines +55 to +63
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
std::size_t expected_size = 1;
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing with all other tests:

Suggested change
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
std::size_t expected_size = 1;
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
std::unique_ptr<algos::GfdMiner> algorithm = CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
ExecuteAndCompare(algorithm, gfds);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants