Skip to content

Commit

Permalink
Merge pull request #48 from kshitijrajsharma/feature/boundary_docker
Browse files Browse the repository at this point in the history
Feature : Docker Support and --Boundary Filter
  • Loading branch information
kshitijrajsharma authored Jul 7, 2023
2 parents 8aff75a + 0ff4978 commit 9d0dd17
Show file tree
Hide file tree
Showing 5 changed files with 206 additions and 87 deletions.
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM python:3

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
gdal-bin \
libgdal-dev \
libboost-dev \
python3-gdal \
osmium-tool

ENV PATH="/osmium/bin:${PATH}"

RUN pip install osmsg

CMD ["/bin/bash"]
148 changes: 72 additions & 76 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,115 +22,111 @@ pip install osmium
pip install osmsg
```

### [DOCKER] Install with Docker locally

- Clone repo & Build Local container :

```
docker build -t osmsg:latest .
```

- Run Container terminal to run osmsg commands:

```
docker run -it osmsg
```

Attach your volume for stats generation if necessary

```
docker run -it -v /home/user/data:/app/data osmsg
```

### Usage:

```
osmsg [-h] [--start_date START_DATE] [--end_date END_DATE]
[--username USERNAME] [--password PASSWORD]
[--timezone {Nepal,UTC}] [--name NAME]
osmsg [-h] [--start_date START_DATE] [--end_date END_DATE] [--username USERNAME]
[--password PASSWORD] [--timezone {Nepal,UTC}] [--name NAME]
[--country COUNTRY [COUNTRY ...]] [--tags TAGS [TAGS ...]]
[--hashtags HASHTAGS [HASHTAGS ...]]
[--length LENGTH [LENGTH ...]] [--force] [--field_mappers]
[--meta] [--tm_stats] [--rows ROWS] [--users USERS [USERS ...]]
[--workers WORKERS] [--url URL [URL ...]] [--last_week]
[--last_day] [--last_month] [--last_year] [--last_hour]
[--days DAYS] [--charts] [--summary] [--exact_lookup]
[--hashtags HASHTAGS [HASHTAGS ...]] [--length LENGTH [LENGTH ...]] [--force]
[--field_mappers] [--meta] [--tm_stats] [--rows ROWS] [--users USERS [USERS ...]]
[--workers WORKERS] [--url URL [URL ...]] [--last_week] [--last_day] [--last_month]
[--last_year] [--last_hour] [--days DAYS] [--charts] [--summary] [--exact_lookup]
[--changeset] [--all_tags] [--temp]
[--format {csv,json,excel,image,text} [{csv,json,excel,image,text} ...]]
[--read_from_metadata READ_FROM_METADATA] [--update]
[--read_from_metadata READ_FROM_METADATA] [--boundary BOUNDARY] [--update]
```

### Options:

```
options:
-h, --help show this help message and exit
--start_date START_DATE
Start date in the format YYYY-MM-DD HH:M:Sz eg:
2023-01-28 17:43:09+05:45
--end_date END_DATE End date in the format YYYY-MM-DD HH:M:Sz eg:
2023-01-28 17:43:09+05:45
--username USERNAME Your OSM Username : Only required for Geofabrik
Internal Changefiles
--password PASSWORD Your OSM Password : Only required for Geofabrik
Internal Changefiles
Start date in the format YYYY-MM-DD HH:M:Sz eg: 2023-01-28 17:43:09+05:45
--end_date END_DATE End date in the format YYYY-MM-DD HH:M:Sz eg: 2023-01-28 17:43:09+05:45
--username USERNAME Your OSM Username : Only required for Geofabrik Internal Changefiles
--password PASSWORD Your OSM Password : Only required for Geofabrik Internal Changefiles
--timezone {Nepal,UTC}
Your Timezone : Currently Supported Nepal, Default :
UTC
Your Timezone : Currently Supported Nepal, Default : UTC
--name NAME Output stat file name
--country COUNTRY [COUNTRY ...]
List of country name to extract (get id from
data/countries), It will use geofabrik countries
updates so it will require OSM USERNAME. Only
Available for Daily Updates
List of country name to extract (get id from data/countries), It will use
geofabrik countries updates so it will require OSM USERNAME. Only Available for
Daily Updates
--tags TAGS [TAGS ...]
Additional stats to collect : List of tags key
--hashtags HASHTAGS [HASHTAGS ...]
Hashtags Statistics to Collect : List of hashtags ,
Limited until daily stats for now , Only lookups if
hashtag is contained on the string , not a exact
Hashtags Statistics to Collect : List of hashtags , Limited until daily stats
for now , Only lookups if hashtag is contained on the string , not a exact
string lookup on beta
--length LENGTH [LENGTH ...]
Calculate length of osm features , Only Supported for
way created features , Pass list of tags key to
calculate eg : --length highway waterway , Unit is in
Calculate length of osm features , Only Supported for way created features ,
Pass list of tags key to calculate eg : --length highway waterway , Unit is in
Meters
--force Force for the Hashtag Replication fetch if it is
greater than a day interval
--force Force for the Hashtag Replication fetch if it is greater than a day interval
--field_mappers Filter stats by field mapping editors
--meta Generates stats_metadata.json including sequence info
, start_data end_date , Will be useful when running
daily/weekly/monthly by service/cron
--tm_stats Includes Tasking Manager stats for users , TM Projects
are filtered from hashtags used , Appends all time
stats for user for project id produced from stats
--rows ROWS No. of top rows to extract , to extract top 100 , pass
100
--meta Generates stats_metadata.json including sequence info , start_data end_date ,
Will be useful when running daily/weekly/monthly by service/cron
--tm_stats Includes Tasking Manager stats for users , TM Projects are filtered from
hashtags used , Appends all time stats for user for project id produced from
stats
--rows ROWS No. of top rows to extract , to extract top 100 , pass 100
--users USERS [USERS ...]
List of user names to look for , You can use it to
only produce stats for listed users or pass it with
hashtags , it will act as and filter. Case sensitive
use ' ' to enter names with space in between
--workers WORKERS No. of Parallel workers to assign : Default is no of
cpu available , Be aware to use this max no of workers
may cause overuse of resources
--url URL [URL ...] Your public list of OSM Change Replication URL ,
'minute,hour,day' option by default will translate to
planet replciation url. You can supply multiple urls
for geofabrik country updates , Url should not have
trailing / at the end
List of user names to look for , You can use it to only produce stats for
listed users or pass it with hashtags , it will act as and filter. Case
sensitive use ' ' to enter names with space in between
--workers WORKERS No. of Parallel workers to assign : Default is no of cpu available , Be aware
to use this max no of workers may cause overuse of resources
--url URL [URL ...] Your public list of OSM Change Replication URL , 'minute,hour,day' option by
default will translate to planet replciation url. You can supply multiple urls
for geofabrik country updates , Url should not have trailing / at the end
--last_week Extract stats for last week
--last_day Extract Stats for last day
--last_month Extract Stats for last Month
--last_year Extract stats for last year
--last_hour Extract stats for Last hour
--days DAYS N nof of last days to extract , for eg if 3 is
supplied script will generate stats for last 3 days
--days DAYS N nof of last days to extract , for eg if 3 is supplied script will generate
stats for last 3 days
--charts Exports Summary Charts along with stats
--summary Produces Summary.md file with summary of Run and also
a summary.csv which will have summary of stats per day
--exact_lookup Exact lookup for hashtags to match exact hashtag
supllied , without this hashtag search will search for
the existence of text on hashtags and comments
--changeset Include hashtag and country informations on the stats.
It forces script to process changeset replciation ,
Careful to use this since changeset replication is
minutely according to your internet speed and cpu
cores
--all_tags Extract statistics of all of the unique tags and its
count
--temp Deletes downloaded osm files from machine after
processing is done , if you want to run osmsg on same
files again keep this option turn off
--summary Produces Summary.md file with summary of Run and also a summary.csv which will
have summary of stats per day
--exact_lookup Exact lookup for hashtags to match exact hashtag supllied , without this
hashtag search will search for the existence of text on hashtags and comments
--changeset Include hashtag and country informations on the stats. It forces script to
process changeset replciation , Careful to use this since changeset replication
is minutely according to your internet speed and cpu cores
--all_tags Extract statistics of all of the unique tags and its count
--temp Deletes downloaded osm files from machine after processing is done , if you
want to run osmsg on same files again keep this option turn off
--format {csv,json,excel,image,text} [{csv,json,excel,image,text} ...]
Stats output format
--read_from_metadata READ_FROM_METADATA
Location of metadata to pick start date from previous
run's end_date , Generally used if you want to run bot
on regular interval using cron/service
--update Update the old dataset produced by osmsg , Very
Experimental : There should be your name stats.csv and
summary.csv in place where command is run
Location of metadata to pick start date from previous run's end_date ,
Generally used if you want to run bot on regular interval using cron/service
--boundary BOUNDARY Boundary geojson file path to filter stats, see data/example_boudnary for
format of geojson
--update Update the old dataset produced by osmsg , Very Experimental : There should be
your name stats.csv and summary.csv in place where command is run
```

It is a Simple python script processes osm files live and produces stats on the fly
Expand Down
31 changes: 31 additions & 0 deletions data/example_boundary.geojson
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
83.87023688498698,
28.270258904428587
],
[
83.87023688498698,
28.15972410323974
],
[
84.0590428625166,
28.15972410323974
],
[
84.0590428625166,
28.270258904428587
],
[
83.87023688498698,
28.270258904428587
]
]
],
"type": "Polygon"
}
}
40 changes: 29 additions & 11 deletions osmsg/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
import osmium
import pandas as pd
from matplotlib.font_manager import FontProperties
from shapely.geometry import box
from tqdm import tqdm

from .changefiles import (
Expand All @@ -62,8 +61,10 @@
download_osm_files,
extract_projects,
generate_tm_stats,
get_bbox_centroid,
get_editors_name_strapped,
get_file_path_from_url,
process_boundary,
sum_tags,
update_stats,
update_summary,
Expand Down Expand Up @@ -260,7 +261,9 @@ def collect_changefile_stats(
def calculate_stats(
user, uname, changeset, version, tags, osm_type, timestamp, osm_obj_nodes=None
):
if hashtags or collect_field_mappers_stats: # intersect with changesets
if (
hashtags or collect_field_mappers_stats or geom_boundary
): # intersect with changesets
if (
len(processed_changesets) > 0
): # make sure there are changesets to intersect if not meaning hashtag changeset not found no need to go for changefiles
Expand Down Expand Up @@ -306,6 +309,14 @@ def __init__(self):

def changeset(self, c):
run_hashtag_check_logic = False
centroid = get_bbox_centroid(c.bounds)

if geom_boundary:
if not centroid:
return
if not geom_filter_df.within(centroid).any():
return

if collect_field_mappers_stats:
if "created_by" in c.tags:
editor = get_editors_name_strapped(c.tags["created_by"])
Expand Down Expand Up @@ -348,15 +359,8 @@ def changeset(self, c):
for hash_tag in hashtags_comment:
if hash_tag not in processed_changesets[c.id]["hashtags"]:
processed_changesets[c.id]["hashtags"].append(hash_tag)
# get bbox
bounds = str(c.bounds)
if "invalid" not in bounds:
bbox_list = bounds.strip("()").split(" ")
minx, miny = bbox_list[0].split("/")
maxx, maxy = bbox_list[1].split("/")
bbox = box(float(minx), float(miny), float(maxx), float(maxy))
# Create a point for the centroid of the bounding box
centroid = bbox.centroid

if centroid:
intersected_rows = countries_df[countries_df.intersects(centroid)]
for i, row in intersected_rows.iterrows():
if row["name"] not in processed_changesets[c.id]["countries"]:
Expand Down Expand Up @@ -655,6 +659,12 @@ def parse_args():
"--read_from_metadata",
help="Location of metadata to pick start date from previous run's end_date , Generally used if you want to run bot on regular interval using cron/service",
)
parser.add_argument(
"--boundary",
type=str,
default=None,
help="Boundary geojson file path to filter stats, see data/example_boudnary for format of geojson",
)

parser.add_argument(
"--update",
Expand Down Expand Up @@ -774,6 +784,8 @@ def main():
global exact_lookup
global summary
global collect_field_mappers_stats
global geom_filter_df
global geom_boundary

all_tags = args.all_tags
additional_tags = args.tags
Expand All @@ -783,6 +795,12 @@ def main():
length = args.length
summary = args.summary
collect_field_mappers_stats = args.field_mappers
geom_boundary = args.boundary
if args.boundary:
if not args.changeset and not args.hashtags:
args.changeset = True
geom_filter_df = process_boundary(args.boundary)

if args.field_mappers:
if not args.changeset and not args.hashtags:
args.changeset = True
Expand Down
Loading

0 comments on commit 9d0dd17

Please sign in to comment.