Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to merge each isolated bus to the backbone network #903

Merged
merged 76 commits into from
Apr 23, 2024

Conversation

ekatef
Copy link
Member

@ekatef ekatef commented Oct 18, 2023

Closes #852

Changes proposed in this Pull Request

Added merge_into_network( ) function to merge each isolated bus into the network following an approach of drop_isolated_nodes( ) and merge_isolated_nodes( ).

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@ekatef
Copy link
Member Author

ekatef commented Oct 18, 2023

Currently, the code works if a number of isolated nodes is more than one. It would be nice to generalise on a case with a single isolated nodes, preferably without need to using conditional statements.

Functionality has not been tested yet on geometries.

What would be great is to better understand requirements towards the implementation:

  • do we need a voltage threshold for merge_into_network( )?
  • how should be this option combines with drop of the isolated nodes and merging into a single node?

@ekatef
Copy link
Member Author

ekatef commented Oct 26, 2023

Currently, fetching isolated nodes into the closest network in implemented. Tested on a CL and seems to work fine:

base.nc

image

elec_s.nc, current implementation

image

elec_s.nc, PR implementation

image

Red highlights buses which don't belong to the largest sub-network.

Fetching the isolated nodes keeps invariant values of the overall load and generation.

@ekatef
Copy link
Member Author

ekatef commented Oct 27, 2023

What would be nice to understand, is which strategies are sensible for drop/merge/fetch isolated nodes and isolated networks and adjust configuration parameters accordingly. My feeling is that it feels like alternatives to merge isolated nodes in a single isolated node or to fetch them into network.

Another point is handling isolated fragments of the network (like those red buses on first picture above). I suspect, in this particular case they mostly get reduced to a single node and dropped or merged after. However, the may be cases when isolated networks remain as two-three "micro-networks" after simplification, and can't be merged into the "main" network with a current approach. Probably, we can find some solution for this case, as well.

@davide-f would be very grateful if you could find time for looking into it 🙂

@Edudiro
Copy link

Edudiro commented Oct 27, 2023

Hi @ekatef, thank you for your response on google groups and for developing this PR. I have tested your approach and it works perfectly in the Chilean case, as you have proven above.

I am assuming the large amount of unconnected nodes from the base network comes from OSM data gaps? In that case I think following this solution is a valid approach. Otherwise, unfeasibilities might arise for a dispatch only problem.

@ekatef
Copy link
Member Author

ekatef commented Oct 27, 2023

Hi @ekatef, thank you for your response on google groups and for developing this PR. I have tested your approach and it works perfectly in the Chilean case, as you have proven above.

I am assuming the large amount of unconnected nodes from the base network comes from OSM data gaps? In that case I think following this solution is a valid approach. Otherwise, unfeasibilities might arise for a dispatch only problem.

Hello @Edudiro! Great to hear that it worked for you. Thanks for giving an inspiration for pushing these developments :D

I also have a feeling that the unconnected nodes in base.nc result from some data gaps in OSM data base. At least, looking on GENI description of a national power system, it seems that there shouldn't be any reason for having these isolated power systems. However, some regional expertise may be very helpful to give a more definitive answer.

@ekatef ekatef marked this pull request as ready for review October 31, 2023 21:45
@ekatef
Copy link
Member Author

ekatef commented Oct 31, 2023

@davide-f I think, the PR is ready for the first review round :)

Currently, isolated nodes are merged to the closest non-isolated ones. Further improvements are obviously possible, for example to account for the subnetwork type or implement a power threshold.

Haven't added yet configuration parameters for fetching isolated nodes, as it feels a good idea to decide on a general interfacing concept. Would be very grateful for any input on that :)

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool PR :D I've added some possible improvements.
What do you think?

Comment on lines 785 to 800
points_buses = np.array(
list(zip(n.buses.loc[i_connected].x, n.buses.loc[i_connected].y))
)
islands_points = np.array(
list(zip(n.buses.loc[i_islands].x, n.buses.loc[i_islands].y))
)

btree = cKDTree(points_buses)
dist0, idx0 = btree.query(islands_points, k=1)

points_nearest = [points_buses[i] for i in idx0]

nearest_bus_list = [
n.buses.loc[(n.buses.x == x) & (n.buses.y == y)] for x, y in points_nearest
]
nearest_bus_df = pd.concat(nearest_bus_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a more efficient way to do this is with the geopandas spatial join function:
gdf_map = gpd.sjoin_nearest(gdf1, gdf2, houw="left")
where gdf1 is the selection of isolated to merge and the right gdf2 frame is the set of non-isolated buses like df_islands.query("carrier == 'DC'")[["geometry"]].
Internally, it exploits the KDTree and it saves lines of code.

The output is a geodataframe where for every row of gdf1 (isolated bus) we identify the index of gdf2 that is the closest, using the column "index_right" (default).
Therefore, the value gdf_map["index_right"] is almost the busmap we are looking for (once filled with the missing indices that shall not change and the values for the DC nodes)
Since we cannot exclude duplicated indices in gdf_map in case of equidistances, duplicated indices can be dropped.

Note that it is better to do the processing also for DC nodes: we merge isolated AC nodes with the closest non-isolated AC nodes and similarly the DC nodes.
An auxiliary function may be defined to repeat the process on AC and DC similarly, but it may be an overkill

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f thanks a lot for the review!

Actually, I have completely forgotten about a possible polarity difference... I'll think about implementation.

A great idea with sjoin_nearest(). I had a (wrong) impression that it may be not perfect in terms of performance, but it looks like rather the opposite should be the case. Thanks, happy to play with that! :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f thanks a lot for the suggestions and sorry for putting this work on hold for a while. Great to come back to it :)

Implementation with sjoin_nearest() require a couple of additional imports, but it looks cleaner to me: https://github.com/ekatef/pypsa-earth/compare/fetch_isolated_to_network..fetch_isolated_to_network2
So, I'd opt for it instead of current usage of cKDTree. What do you think?

Have added filtering by AC carrier for the network to be merged with, agree that it's generally a good idea to find a way to deal also with DC networks. However, how realistic is a case of isolated DC nodes/network? Are you aware of any examples where it could be useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! and that's great :D I much like it :)

I'm wondering that an efficient way to account for the country-specific bus and potentially the AC/DC problem may be to exploit the groupby to do the merge, as follows:

  1. create a geodataframe for n.buses (already done basically)
  2. add a column in that dataframe to denote the "isolated networks", such as gdf["is_isolated"]
    If we want to do the AC/DC difference, the determine_network_topology is only able to well adopt the AC ones. To account for the DC ones, an option is to drop out of the DC isolated networks identified with the determine_topology, those where a link is connected. I drop a piece of code I did to investigate the issue in a network.
  3. exploit the groupby function, such as:
    gdf.groupby(["country"{, "carrier"}]).map(lambda x: gdp.sjoin_nearest(x.query("is_isolated"), x.query("not is_isolated"), {options as needed}))

As a note, the output of the s_join_nearest will create a column, like "index_right" or "index_left" that for every index of the geodataframe, specifies the closest bus.
That column is basically the map_isolated_node_by_country series that we need for the clustering process.

Would you be curious in investigating that?

I agree that isolated dc nodes would not make sense from a physical perspective, but I feel that it would make the workflow more robust as I believe we could have those type of nodes due to missing data and alike.

I've also done a quick check on a continent with the code below and it seems that only 4 DC nodes worldwide are actually isolated with the run I did.

---- Piece of code to identify the isolated buses to be adapted to the PR ---

is_isolated_network = n.buses["sub_network"].value_counts().map(lambda x: x <= 1)

DC_links = n.links.query("carrier == 'DC'")
links_buses = np.unique(list(DC_links["bus0"].values) + list(DC_links["bus1"].values))


is_isolated_bus = n.buses["sub_network"].map(is_isolated_network)
# all buses having
is_isolated_bus.loc[links_buses] = False
is_isolated_bus

Copy link
Member Author

@ekatef ekatef Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f super! :D Many thanks :D Have updated the implementation.

Ahh, right! We have also to account allocation buses by countries. Of course, I'm curious to look into it. Thanks a lot for the hints!!

May you have any ideas on isolated networks which spread across two-three countries? It would be very helpful for testing.

Absolutely agree regarding the need to account for isolated DC network for the sake of the workflow stability. Not sure is it's a well-grounded suggestion, but wouldn't it be a more natural approach for isolated DC buses to build an additional link to the closest non-isolated DC bus? My impression is that DC lines are often built to connect demand with renewable sources, and it may be probably a good idea to try to maintain topology. Does it sounds any reasonable for you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @davide-f! Thanks a lot for the implementation idea :D It has been funny to play around, and it looks like the approach should be transferable to subnetwork-to-subnetwork merging.

Have drafted a possible implementation here:
https://github.com/ekatef/pypsa-earth/compare/fetch_isolated_to_network..fetch_isolated_to_network3

Have I get your idea properly? Currently, there is looping over groups, while I do agree that it may be a way to apply some functional-programming magic over the grouped dataframe. Not sure however if it makes sense to look for it. In my experience, when map-lambda is perfect when you manage to keep it simple. It things get complicated, that may look great but usually it difficult to maintain. What is your feeling about this? 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision :)
I'll go into it soon, expected tomorrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @davide-f, when you would have time and resources, would be perfect to have your opinion on this.

It may be a bit too early for a proper review, but would be amazing to align on the concept 🙂

@davide-f
Copy link
Member

As an additional comment, for a future PR, I think that we could revise in the future the behaviour of the isolated_nodes and extend to networks: drop/merge isolated networks with demand below a given threshold

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the little delay, here is my feedback :D
Great!

Comment on lines 787 to 798
points_buses = np.array(
list(zip(n.buses.loc[i_connected].x, n.buses.loc[i_connected].y))
)
islands_points = np.array(
list(zip(n.buses.loc[i_islands].x, n.buses.loc[i_islands].y))
)

gds_buses = gpd.GeoSeries(map(Point, points_buses))
gds_islands = gpd.GeoSeries(map(Point, islands_points))

gdf_buses = gpd.GeoDataFrame(geometry=gds_buses)
gdf_islands = gpd.GeoDataFrame(geometry=gds_islands)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to create a geodatafarme (let's say gdf_buses)of the entire buses df we may use this function, that should be quite compact

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it could be implemented in a more elegant way :) more Wrapped-up into a function transformation into a geopandas dataframe.

Have I understood your idea correctly? :)

Comment on lines 769 to 772
i_islands = n.buses[
(~n.buses.duplicated(subset=["sub_network"], keep=False))
& (n.buses.carrier == "AC")
].index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this account identifies isolated networks? This seems to me that nodes that are interconnected by a line but disconnected from the entire network are not well identified or am I wrong?

We could extend the working principle to "isolated networks" that may be defined as subnetworks that do not represent the larger national power system whose demand is low (below threshold).

To keep the approach simple we can also skip that

Copy link
Member Author

@ekatef ekatef Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function to pick isolated networks using a number of the nodes as a criterium in the first approximation. Absolutely agree that it could make sense to use other heuristics, the current implementation just generalises the merging approach for n_isolated_buses > 1.

Comment on lines 802 to 806
nearest_bus_list = [
n.buses.loc[(n.buses.x == x) & (n.buses.y == y)]
for x, y in zip(gdf_map["geometry"].x, gdf_map["geometry"].y)
]
nearest_bus_df = pd.concat(nearest_bus_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that nearest_bus_df = n.buses.loc[gdf_map.index_right.values] or something like that.

gdf_map.index_right/index_left may contain the mapping between each island and its closest bus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it feels much more natural to use indices for filtering. Added an bus_id column to the buses dataframe, and modified filtering.

Comment on lines 808 to 815
# each isolated node should be mapped into the closes non-isolated node
map_isolated_node_by_country = (
n.buses.assign(bus_id=n.buses.index)
.loc[nearest_bus_df.index]
.groupby("country")["bus_id"]
.first()
.to_dict()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

gdf_buses.groupby(["country"{, "carrier"}]).map(lambda x: gdp.sjoin_nearest(x.query("is_isolated"), x.query("not is_isolated"), {options as needed}))

or

gdf_buses.groupby(["country"{, "carrier"}]).map(lambda x: gdp.sjoin_nearest(x.loc[i_islands], x.loc[i_connected], {options as needed}))

may work.

In the case of neighboring buses across countries I'm unsure the isolated network/node gets connected to the closest bus of the same country, it may remain there. Have you tested that?

@ekatef
Copy link
Member Author

ekatef commented Dec 18, 2023

Sorry for the little delay, here is my feedback :D Great!

Hey @davide-f, no problem, I was aware that you were very busy. Thanks a lot for the review!

It looks like we are well aligned on the concept, and happy to introduce the changes.

Yeah, the idea of this PR as far was to focus on isolated nodes only, keeping in mind that same approach may be used to deal with isolated networks, as well.

@carlosfv92 May you be interested to work on implementation for isolated networks? Happy to discuss on align on the implementation.

@ekatef
Copy link
Member Author

ekatef commented Mar 30, 2024

Revised to simplify the implementation. @davide-f thank you so much for the discussion and the ideas! The current version looks definitely cleaner. Could you please review it, when you have time?

Summary of the major changes as compared with the previous iteration:

  1. moved all load processing into the function which creates a buses geo-dataframe;
  2. revised the threshold from to be a share of the national load instead an absolute value;
  3. replaced all the functions to filter the island and backbone networks with the flags in the buses geo-dataframe.

@ekatef ekatef changed the title Add functionality to merge each isolated bus to the closest non-disconnected bus Add functionality to merge each isolated bus to the backbone network Apr 4, 2024
@ekatef
Copy link
Member Author

ekatef commented Apr 7, 2024

@davide-f when you have time, it would be perfect to have your review 🙂

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @ekatef :D
I think the functionality is quite there and way cleaner!
In the future we may reformulate but I think it is in great shape! There may not be the need to duplicate the information by subnetwork to each row of the buses, but I think it is good to go with minor changes! :)
Great work and patience!

Comment on lines 811 to 815
n_buses_gdf["is_multicnt_subntw"] = n_buses_gdf.sub_network.map(
n_buses_gdf.groupby(["sub_network"])["country"].apply(
lambda x: len(pd.unique(x)) > 1
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you need the double map; probably the following should work or something like that

Suggested change
n_buses_gdf["is_multicnt_subntw"] = n_buses_gdf.sub_network.map(
n_buses_gdf.groupby(["sub_network"])["country"].apply(
lambda x: len(pd.unique(x)) > 1
)
)
n_buses_gdf["is_multicnt_subntw"] = n_buses_gdf.sub_network.map(
n_buses_gdf.groupby(["sub_network"]).country.unique().count() > 1
)

May that work?

However, I'm wondering if we need to fill that value at the gdf and we can work with a df for subnetworks, like:

df_subn = n_buses_gdf.groupby(["subnetwork"]).apply(lambda x:
   {
       "is_multicnt_subntw": x.country.unique() > 1,
       "total_demand": x.load.sum(),
   }
)

However, I see that in the buses gdf you filled a column with the demand of the subnetwork, so that may not be needed actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the revision on map().apply(), thanks! :D

Not sure I get your idea with the sub-networks dataframe. I thought, we aim to keep all the data in a single dataframe in the spirit of the tidy data concept, which actually encourages a certain degree of duplication. It gives a lot of power and flexibility, while allowing to stay in the same "variables space". If we add an additional dataframe, it means a need to switch attention between two objects, and each of them has an own set of variables. To me, it sounds a bit more complicated.

I'm also not sure that having two objects would be more effective in terms of performance, as compared with having an additional column. Have I missed anything? :)

gdf_backbone_buses = (
n_buses_gdf.query("is_backbone_sbntw")
.query("carrier=='AC'")
.query("load_in_subnetw>0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm I'd drop this check but I'd pick the index with the highest load; for small countries with no demand this may trigger issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, revised :)

nearest_bus_df = n.buses.loc[n.buses.index.isin(gdf_map.bus_id_right)]

i_lines_islands = n.lines.loc[n.lines.bus1.isin(gdf_islands.index)].index
n.mremove("Line", i_lines_islands)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remove should not be needed and performed by the clustering no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally yes, but our use case is not well captured by get_clustering_from_busmap( ), as it seems to expect that the lines in a cluster belong to the same sub-network. When I try to remove the line n.mremove("Line", i_lines_islands) leads to AssertionError In Bus cluster sub_network the values of attribute sub_network do not agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok :)

Comment on lines 113 to 114
from scipy.spatial import cKDTree
from shapely.geometry import Point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@ekatef ekatef force-pushed the fetch_isolated_to_network branch from eb25080 to cc97b02 Compare April 7, 2024 22:14
@ekatef
Copy link
Member Author

ekatef commented Apr 7, 2024

Great @ekatef :D I think the functionality is quite there and way cleaner! In the future we may reformulate but I think it is in great shape! There may not be the need to duplicate the information by subnetwork to each row of the buses, but I think it is good to go with minor changes! :) Great work and patience!

Thanks a lot for the great support and the review @davide-f!

I'd say this duplication is rather a feature than a bug ;) There is the "tidy data" concept which assumes that each row should contain a full information for a single variable. That makes life way easier even if it implies some duplication. That is, if I have get your idea properly. Happy to discuss more and adjust the implementation, if needed :)

@ekatef
Copy link
Member Author

ekatef commented Apr 21, 2024

@davide-f could you please check if you have any questions on the PR, when you would have time?

It seems to be needed, and I think it's simpler to finalise it instead to explain how to test it from a fork.

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @ekatef <3 amazing contribution and sorry for the delay...
I approve this PR and feel free to merge it with squash merge or normal merge as you see it more fair/clean :)

@ekatef
Copy link
Member Author

ekatef commented Apr 22, 2024

Great @ekatef <3 amazing contribution and sorry for the delay... I approve this PR and feel free to merge it with squash merge or normal merge as you see it more fair/clean :)

No worries and thanks a lot for the outstanding support :D

@ekatef ekatef merged commit 653e017 into pypsa-meets-earth:main Apr 23, 2024
4 checks passed
@ekatef ekatef deleted the fetch_isolated_to_network branch April 23, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to merge isolated buses to the closest non-disconnected bus
3 participants