-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
209 lines (138 loc) · 7.99 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# neo4jshell <img src="neo4jshell.png" align="right" width="200"/>
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/neo4jshell)](https://CRAN.R-project.org/package=neo4jshell)
[![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/neo4jshell?color=green)](https://cran.r-project.org/package=neo4jshell)
[![R build status](https://github.com/keithmcnulty/neo4jshell/workflows/R-CMD-check/badge.svg)](https://github.com/keithmcnulty/neo4jshell/actions)
[![Travis build status](https://travis-ci.com/keithmcnulty/neo4jshell.svg?branch=master)](https://travis-ci.com/keithmcnulty/neo4jshell)
<!-- badges: end -->
The goal of neo4jshell is to provide rapid querying of 'Neo4J' graph databases by offering a programmatic interface with 'cypher-shell'. A wide variety of other functions are offered that allow importing and management of data files for local and remote servers, as well as simple administration of local servers for development purposes.
## Pre-installation notes
This package requires the `ssh` package for interacting with remote 'Neo4J' databases, which requires `libssh` to be installed. See the vignettes for the `ssh` package [here](https://CRAN.R-project.org/package=ssh) for more details.
This package also requires the 'cypher-shell' executable to be available **locally**. This is installed as standard in 'Neo4J' installations and can usually be found in the `bin` directory of that installation. It can also be installed standalone using Homebrew or is available here: https://github.com/neo4j/cypher-shell.
It is recommended, for ease of use, that the path to the 'cypher-shell' executable is added to your `PATH` environment variable. If not, you should record its location for use in some of the functions within this package.
## Installation
You can install the released version of neo4jshell from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("neo4jshell")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("keithmcnulty/neo4jshell")
```
## Functionality
### Query
`neo4j_query()` sends queries to the specified 'Neo4J' graph database and, where appropriate, retrieves the results in a dataframe.
In this example, the movies dataset has been started locally in the 'Neo4J' browser, with a user created that has the credentials indicated. `cypher-shell` is in the local system path.
``` {r, message = FALSE, warning = FALSE}
library(neo4jshell)
library(dplyr)
library(tibble)
```
```{r}
# set credentials (no port required in bolt address)
neo_movies <- list(address = "bolt://localhost", uid = "neo4j", pwd = "password")
# find directors of movies with Kevin Bacon as actor
CQL <- 'MATCH (p1:Person {name: "Kevin Bacon"})-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
RETURN p2.name, m.title;'
# run query
neo4j_query(con = neo_movies, qry = CQL)
```
Older versions of 'Neo4J' and 'cypher-shell' (<4.0) will require the `encryption` argument to be explicitly `'true'` or `'false'`. For newer versions, which have multi-tenancy, you can use the `database` argument to specify the database to query.
### Server management
- `neo4j_import()` imports a csv, zip or tar.gz file from a local source into the specified 'Neo4J' import directory, uncompresses compressed files and removes the original compressed file as clean up.
- `neo4j_rmfiles()` removes specified files from specified 'Neo4J' import directory
- `neo4j_rmdir()` removes entire specified subdirectories from specified 'Neo4J' import directory
### Remote development
In this general example, we can see how these functions can be used for smooth ETL to a remote 'Neo4J' server. This example assumes that the URL of the server that hosts the 'Neo4J' database is the same as the bolt URL for the 'Neo4J' database. If not, a different set of credentials will be needed for using `neo4j_import()`.
```
# credentials (note no port required in server address)
neo_server <- list(address = "bolt://neo.server.address", uid = "neo4j", pwd = "password")
# csv data file to be loaded onto 'Neo4J' server (path relative to current working directory)
datafile <- "data.csv"
# CQL query to write data from datafile to 'Neo4J'
loadcsv_CQL <- "LOAD CSV FROM 'file:///data.csv' etc etc;"
# path to import directory on remote 'Neo4J' server (should be relative to user home directory on remote server)
impdir <- "./import"
# import data
neo4jshell::neo4j_import(con = neo_server, source = datafile, import_dir = impdir)
# write data to 'Neo4J' (assumes cypher-shell is in system PATH variable)
neo4jshell:neo4j_query(con = neo_server, qry = loadcsv_CQL)
# remove data file as clean-up
neo4jshell::neo4j_rmfiles(con = neo_server, files = datafile, import_dir = impdir)
```
In Windows, the 'cypher-shell' executable may need to be specified with the file extension, for example `shell_path = "cypher-shell.bat"`.
### Local Development
If you are working with the 'Neo4J' server locally, below will help you get started.
First, the code below is relative to user and is using 'Neo4J 4.0.4 Community' installed at my user's root. The directory containing the 'cypher-shell' and 'neo4j' executables are in my system's PATH environment variables.
``` {r}
## start the local server
neo4j_start()
## setup connection credentials and import directory location
neo_con <- list(address = "bolt://localhost:7687", uid = "neo4j", pwd = "password")
import_loc <- path.expand("~/neo4j-community-4.0.4/import/")
```
First we save `mtcars` to a `.csv` file, and we compress that file. This package supports a number of delivery formats, but we use a `.zip` file as an example.
```{r}
mtcars <- mtcars %>%
tibble::rownames_to_column(var = "model")
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
zip("mtcars.zip", "mtcars.csv")
```
Now we use `neo4j_import()` to place a **copy** of this file within the import directory you defined in `import_loc` above.
```{r}
neo4j_import(local = TRUE, graph, source = "mtcars.zip", import_dir = import_loc)
```
We now write a CQL query to write some information from `mtcars.csv` to the graph, and execute that query.
```{r}
CQL <- "LOAD CSV WITH HEADERS FROM 'file:///mtcars.csv' AS row
WITH row WHERE row.model IS NOT NULL
MERGE (c:Car {name: row.model});"
neo4j_query(neo_con, CQL)
```
Now, let's remove the `mtcars.csv` file from the import directory of our local server as cleanup. If you want to use a sub-directory to help manage your files during an ETL into 'Neo4J', you can remove that local sub-directory when your process has completed using `neo4j_rmdir()`.
```{r}
## remove the file
neo4j_rmfiles(local = TRUE, graph, files="mtcars.csv", import_dir = import_loc)
```
Now let's run a query to check the data was loaded to the graph.
```{r}
CQL <- "MATCH (c:Car) RETURN c.name as name LIMIT 5;"
neo4j_query(neo_con, CQL)
```
### Local server administration and control
- `neo4j_start()` starts a local 'Neo4J' instance
- `neo4j_stop()` stops a local 'Neo4J' instance
- `neo4j_restart()` restarts a local 'Neo4J' instance
- `neo4j_status()` returns the status of a local 'Neo4J' instance
- `neo4j_wipe()` wipes an entire graph from a local 'Neo4J' instance
For example:
```{r}
# my server was already running, confirm
neo4j_status()
# stop the server
neo4j_stop()
# restart
neo4j_start()
# give it a few seconds to fire up
Sys.sleep(10)
# query again
neo4j_query(neo_con, qry="MATCH (c:Car) RETURN c.name as name LIMIT 5;")
```
If you are using an admin account and you are using 'Neo4J 4+' you can check what databases are available by querying the system database.
```{r}
neo4j_query(neo_con, qry="SHOW DATABASES;", database = "system")
```