Skip to content

Commit 74ed9a0

Browse files
authored
Support loading from S3 with %seed (#488)
* Add support for seeding from external files * Modify datasets test * Improve widget option switching, add unit tests * Remove debug * Support Cypher query datasets, use separate --language and --model options depending on source type * Update seed command documentation * Update languages for clarity * Remove lowercasing of file path, status statements * Modify Cypher error handling * Revise notebooks for seed changes * Fix for test failure * Update Changelog * Manually verify that queries are insert type * Remove hard-coded prefix checks * Add file browser UI * More file selection changes * Disable language widget on submit * Support individual seed file paths * Usability improvements and documentation * Remove --source-type option, more refactoring * Update sample notebooks and README * Add single file load unit tests * remove remaining references to source_type * Add option for Gremlin full-file queries * Modify full-file query option to also support Cypher * More graceful handling for invalid args * Preserve widget states when switching between options * Fix switching source back to custom with no current language value * Fix query progress displaying incorrectly * Load samples from S3 * Support loading custom datafiles from any S3 URI * Revert to using pre-packaged sample data * Handle invalid local paths * update changelog --------- Co-authored-by: Michael Chin <chnmch@amazon.com>
1 parent d73ba00 commit 74ed9a0

File tree

4 files changed

+214
-49
lines changed

4 files changed

+214
-49
lines changed

ChangeLog.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Starting with v1.31.6, this file will contain a record of major features and upd
66
- New Sample Applications - Healthcare and Life Sciences notebooks ([Link to PR](https://github.com/aws/graph-notebook/pull/484))
77
- Path: 03-Sample-Applications > 05-Healthcare-and-Life-Sciences-Graphs
88
- Added openCypher and local file path support to `%seed` ([Link to PR](https://github.com/aws/graph-notebook/pull/292))
9+
- Added S3 support to `%seed` ([Link to PR](https://github.com/aws/graph-notebook/pull/488))
910
- Added `%toggle_traceback` line magic ([Link to PR](https://github.com/aws/graph-notebook/pull/486))
1011
- Added support for setting `%graph_notebook_vis_options` from a variable ([Link to PR](https://github.com/aws/graph-notebook/pull/487))
1112
- Pinned JupyterLab<4.x to fix Python 3.8/3.10 builds ([Link to PR](https://github.com/aws/graph-notebook/pull/490))
@@ -151,6 +152,7 @@ Starting with v1.31.6, this file will contain a record of major features and upd
151152
- Updated the airports property graph seed files to the latest level and suffixed all doubles with 'd'. ([Link to PR](https://github.com/aws/graph-notebook/pull/257))
152153
- Added grouping by depth for Gremlin and openCypher queries ([PR #1](https://github.com/aws/graph-notebook/pull/241))([PR #2](https://github.com/aws/graph-notebook/pull/251))
153154
- Added grouping by raw node results ([Link to PR](https://github.com/aws/graph-notebook/pull/253))
155+
- Added loading from file path with `%seed` ([Link to PR](https://github.com/aws/graph-notebook/pull/247))
154156
- Added `--no-scroll` option for disabling truncation of query result pages ([Link to PR](https://github.com/aws/graph-notebook/pull/243))
155157
- Added `--results-per-page` option ([Link to PR](https://github.com/aws/graph-notebook/pull/242))
156158
- Added relaxed seed command error handling ([Link to PR](https://github.com/aws/graph-notebook/pull/246))

src/graph_notebook/magics/graph_magic.py

Lines changed: 118 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -2070,52 +2070,74 @@ def seed(self, line, local_ns: dict = None):
20702070
layout=widgets.Layout(display='none')
20712071
)
20722072

2073+
location_option_dropdown = widgets.Dropdown(
2074+
description='Location:',
2075+
options=['Local', 'S3'],
2076+
value='Local',
2077+
disabled=False,
2078+
layout=widgets.Layout(display='none')
2079+
)
2080+
20732081
seed_file_location_text = widgets.Text(
2082+
description='Source:',
20742083
placeholder='path/to/seedfiles/directory',
2075-
description='Directory:',
20762084
disabled=False
20772085
)
20782086

20792087
seed_file_location = FileChooser()
20802088
seed_file_location.layout.display = 'none'
20812089

2090+
seed_file_location_text_hbox = widgets.HBox([seed_file_location_text])
2091+
20822092
submit_button = widgets.Button(description="Submit")
20832093
model_dropdown.layout.visibility = 'hidden'
20842094
language_dropdown.layout.visibility = 'hidden'
20852095
data_set_drop_down.layout.visibility = 'hidden'
20862096
fullfile_option_dropdown.layout.visibility = 'hidden'
2087-
seed_file_location_text.layout.visibility = 'hidden'
2097+
location_option_dropdown.layout.visibility = 'hidden'
2098+
seed_file_location_text_hbox.layout.visibility = 'hidden'
20882099
seed_file_location.layout.visibility = 'hidden'
20892100
submit_button.layout.visibility = 'hidden'
20902101

2091-
def reset_seedfile_textbox():
2092-
seed_file_location_text.layout.visibility = 'hidden'
2093-
seed_file_location_text.layout.display = 'none'
2102+
def hide_all_widgets():
2103+
location_option_dropdown.layout.visibility = 'hidden'
2104+
location_option_dropdown.layout.display = 'none'
2105+
seed_file_location_text_hbox.layout.visibility = 'hidden'
2106+
seed_file_location_text_hbox.layout.display = 'none'
2107+
language_dropdown.layout.visibility = 'hidden'
2108+
language_dropdown.layout.display = 'none'
2109+
fullfile_option_dropdown.layout.visibility = 'hidden'
2110+
fullfile_option_dropdown.layout.display = 'none'
2111+
seed_file_location.layout.visibility = 'hidden'
2112+
seed_file_location.layout.display = 'none'
2113+
model_dropdown.layout.visibility = 'hidden'
2114+
model_dropdown.layout.display = 'none'
2115+
data_set_drop_down.layout.visibility = 'hidden'
2116+
data_set_drop_down.layout.display = 'none'
2117+
submit_button.layout.visibility = 'hidden'
20942118

20952119
def on_source_value_change(change):
2096-
reset_seedfile_textbox()
2097-
submit_button.layout.visibility = 'hidden'
2120+
hide_all_widgets()
20982121
selected_source = change['new']
20992122
if selected_source == 'custom':
2100-
model_dropdown.layout.visibility = 'hidden'
2101-
model_dropdown.layout.display = 'none'
2102-
data_set_drop_down.layout.visibility = 'hidden'
2103-
data_set_drop_down.layout.display = 'none'
21042123
language_dropdown.layout.visibility = 'visible'
21052124
language_dropdown.layout.display = 'flex'
2125+
location_option_dropdown.layout.visibility = 'visible'
2126+
location_option_dropdown.layout.display = 'flex'
21062127
if language_dropdown.value:
21072128
if language_dropdown.value != 'sparql':
21082129
fullfile_option_dropdown.layout.visibility = 'visible'
21092130
fullfile_option_dropdown.layout.display = 'flex'
2110-
# If textbox has a value, display it instead of the filepicker
2111-
if seed_file_location_text.value:
2112-
seed_file_location_text.layout.visibility = 'visible'
2113-
seed_file_location_text.layout.display = 'flex'
2114-
submit_button.layout.visibility = 'visible'
2115-
else:
2131+
# If textbox has a value, OR we are loading from S3, display textbox instead of the filepicker
2132+
if seed_file_location_text.value or location_option_dropdown.value == 'S3':
2133+
seed_file_location_text_hbox.layout.visibility = 'visible'
2134+
seed_file_location_text_hbox.layout.display = 'flex'
2135+
elif seed_file_location.value or location_option_dropdown.value == 'Local':
21162136
seed_file_location.layout.visibility = 'visible'
21172137
seed_file_location.layout.display = 'flex'
2118-
if language_dropdown.value and (seed_file_location.value or seed_file_location_text.value):
2138+
if language_dropdown.value \
2139+
and (seed_file_location_text.value or
2140+
(seed_file_location.value and location_option_dropdown.value == 'Local')):
21192141
submit_button.layout.visibility = 'visible'
21202142
elif selected_source == 'samples':
21212143
language_dropdown.layout.visibility = 'hidden'
@@ -2138,6 +2160,8 @@ def on_source_value_change(change):
21382160
fullfile_option_dropdown.layout.display = 'none'
21392161
seed_file_location.layout.visibility = 'hidden'
21402162
seed_file_location.layout.display = 'none'
2163+
seed_file_location_text.layout.visibility = 'hidden'
2164+
seed_file_location_text.layout.display = 'none'
21412165
model_dropdown.layout.visibility = 'hidden'
21422166
model_dropdown.layout.display = 'none'
21432167
data_set_drop_down.layout.visibility = 'hidden'
@@ -2176,14 +2200,35 @@ def on_language_value_change(change):
21762200
else:
21772201
fullfile_option_dropdown.layout.visibility = 'hidden'
21782202
fullfile_option_dropdown.layout.display = 'none'
2179-
if not seed_file_location_text.value and seed_file_location_text.layout.visibility == 'hidden':
2203+
if not seed_file_location_text.value and seed_file_location_text_hbox.layout.visibility == 'hidden':
21802204
seed_file_location.layout.visibility = 'visible'
21812205
seed_file_location.layout.display = 'flex'
21822206
submit_button.layout.visibility = 'visible'
21832207
return
21842208

2185-
def on_seedfile_value_change(change):
2186-
if seed_file_location.value or seed_file_location_text.value:
2209+
def on_location_value_change(change):
2210+
selected_location = change['new']
2211+
if selected_location == 'Local' and not seed_file_location_text.value:
2212+
seed_file_location_text_hbox.layout.visibility = 'hidden'
2213+
seed_file_location_text_hbox.layout.display = 'none'
2214+
seed_file_location.layout.visibility = 'visible'
2215+
seed_file_location.layout.display = 'flex'
2216+
else:
2217+
seed_file_location.layout.visibility = 'hidden'
2218+
seed_file_location.layout.display = 'none'
2219+
seed_file_location_text_hbox.layout.visibility = 'visible'
2220+
seed_file_location_text_hbox.layout.display = 'flex'
2221+
return
2222+
2223+
def on_seedfile_text_value_change(change):
2224+
if seed_file_location_text.value:
2225+
submit_button.layout.visibility = 'visible'
2226+
else:
2227+
submit_button.layout.visibility = 'hidden'
2228+
return
2229+
2230+
def on_seedfile_select_value_change(change):
2231+
if seed_file_location.value:
21872232
submit_button.layout.visibility = 'visible'
21882233
else:
21892234
submit_button.layout.visibility = 'hidden'
@@ -2195,10 +2240,17 @@ def disable_seed_widgets():
21952240
language_dropdown.disabled = True
21962241
data_set_drop_down.disabled = True
21972242
fullfile_option_dropdown.disabled = True
2243+
location_option_dropdown.disabled = True
21982244
seed_file_location_text.disabled = True
21992245
seed_file_location.disabled = True
22002246
submit_button.close()
22012247

2248+
def normalize_language_name(lang):
2249+
lang = lang.lower().replace('_', '')
2250+
if lang in ['opencypher', 'oc', 'cypher']:
2251+
lang = 'opencypher'
2252+
return lang
2253+
22022254
def process_gremlin_query_line(query_line, line_index, q):
22032255
# Return a state here, with indication of any other variable states that need changing.
22042256
# return 0 = continue
@@ -2275,12 +2327,28 @@ def process_cypher_query_line(query_line, line_index, q):
22752327
return 2
22762328

22772329
def on_button_clicked(b=None):
2330+
seed_file_location_text_hbox.children = (seed_file_location_text,)
22782331
filename = None
22792332
if source_dropdown.value == 'samples':
22802333
data_set = data_set_drop_down.value.lower()
22812334
fullfile_query = False
22822335
else:
22832336
if seed_file_location_text.value:
2337+
stall_with_warning = False
2338+
if location_option_dropdown.value == 'S3' and not (seed_file_location_text.value.startswith('s3://')
2339+
and len(seed_file_location_text.value) > 7):
2340+
seed_file_location_text_validation_label = widgets.HTML(
2341+
'<p style="color:red;">S3 source URI must start with s3://</p>')
2342+
stall_with_warning = True
2343+
elif location_option_dropdown.value == 'Local' \
2344+
and not seed_file_location_text.value.startswith('/'):
2345+
seed_file_location_text_validation_label = widgets.HTML(
2346+
'<p style="color:red;">Local source URI must be a valid file path</p>')
2347+
stall_with_warning = True
2348+
if stall_with_warning:
2349+
seed_file_location_text_validation_label.style = DescriptionStyle(color='red')
2350+
seed_file_location_text_hbox.children += (seed_file_location_text_validation_label,)
2351+
return
22842352
filename = seed_file_location_text.value
22852353
elif seed_file_location.value:
22862354
filename = seed_file_location.value
@@ -2298,9 +2366,14 @@ def on_button_clicked(b=None):
22982366
with output:
22992367
print(f'Loading data set {data_set} for {loading_msg_model}')
23002368
queries = get_queries(model, data_set, source_dropdown.value)
2301-
if len(queries) < 1:
2369+
if queries:
2370+
if len(queries) < 1:
2371+
with output:
2372+
print('Did not find any queries for the given dataset')
2373+
return
2374+
else:
23022375
with output:
2303-
print('Did not find any queries for the given dataset')
2376+
print('Query retrieval from files terminated with errors.')
23042377
return
23052378

23062379
load_index = 1 # start at 1 to have a non-empty progress bar
@@ -2360,12 +2433,12 @@ def on_button_clicked(b=None):
23602433
progress.close()
23612434
return
23622435
else: # gremlin and cypher
2363-
pg_language = language_dropdown.value if language_dropdown.value else 'gremlin'
2436+
pg_language = normalize_language_name(language_dropdown.value)
23642437
if fullfile_query: # treat entire file content as one query
2365-
if pg_language == 'gremlin':
2366-
query_status = process_gremlin_query_line(q['content'], 0, q)
2367-
else:
2438+
if pg_language == 'opencypher':
23682439
query_status = process_cypher_query_line(q['content'], 0, q)
2440+
else:
2441+
query_status = process_gremlin_query_line(q['content'], 0, q)
23692442
if query_status == 2:
23702443
progress.close()
23712444
return
@@ -2377,10 +2450,10 @@ def on_button_clicked(b=None):
23772450
continue
23782451
else: # treat each line as its own query
23792452
for line_index, query_line in enumerate(q['content'].splitlines()):
2380-
if pg_language == 'gremlin':
2381-
query_status = process_gremlin_query_line(query_line, line_index, q)
2382-
else:
2453+
if pg_language == 'opencypher':
23832454
query_status = process_cypher_query_line(query_line, line_index, q)
2455+
else:
2456+
query_status = process_gremlin_query_line(query_line, line_index, q)
23842457
if query_status == 2:
23852458
progress.close()
23862459
return
@@ -2405,22 +2478,29 @@ def on_button_clicked(b=None):
24052478
model_dropdown.observe(on_model_value_change, names='value')
24062479
data_set_drop_down.observe(on_dataset_value_change, names='value')
24072480
language_dropdown.observe(on_language_value_change, names='value')
2408-
seed_file_location_text.observe(on_seedfile_value_change, names='value')
2481+
location_option_dropdown.observe(on_location_value_change, names='value')
2482+
seed_file_location_text.observe(on_seedfile_text_value_change, names='value')
2483+
seed_file_location.observe(on_seedfile_select_value_change, names='value')
24092484

24102485
display(source_dropdown, model_dropdown, language_dropdown, data_set_drop_down, fullfile_option_dropdown,
2411-
seed_file_location, seed_file_location_text, submit_button, progress_output, output)
2486+
location_option_dropdown, seed_file_location, seed_file_location_text_hbox, # seed_file_location_text,
2487+
submit_button, progress_output, output)
24122488

24132489
if args.source != '' or args.language != '':
24142490
source_dropdown.value = 'custom'
24152491
valid_language_value = False
2416-
if args.language != '':
2417-
if args.language.lower() in SEED_LANGUAGE_OPTIONS:
2418-
language_dropdown.value = args.language.lower()
2419-
valid_language_value = True
2492+
language = normalize_language_name(args.language)
2493+
if language != '' and language in SEED_LANGUAGE_OPTIONS:
2494+
language_dropdown.value = language
2495+
valid_language_value = True
24202496
if args.source != '':
24212497
seed_file_location_text.value = args.source
2422-
seed_file_location_text.layout.visibility = 'visible'
2423-
seed_file_location_text.layout.display = 'flex'
2498+
seed_file_location_text_hbox.layout.visibility = 'visible'
2499+
seed_file_location_text_hbox.layout.display = 'flex'
2500+
if seed_file_location_text.value.startswith('s3://'):
2501+
location_option_dropdown.value = 'S3'
2502+
location_option_dropdown.layout.visibility = 'visible'
2503+
location_option_dropdown.layout.display = 'flex'
24242504
seed_file_location.layout.visibility = 'hidden'
24252505
seed_file_location.layout.display = 'none'
24262506
if seed_file_location_text.value and valid_language_value and args.run:

src/graph_notebook/notebooks/03-Sample-Applications/01-Fraud-Graphs/01-Building-a-Fraud-Graph-Application.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
"source": [
5858
"### Load data\n",
5959
"\n",
60-
"The cell below loads the example fraud graph into your Neptune cluster. When you run the cell you will be prompted to select a `Data Model` and a `Data set`. Select `Property_Graph` and `fraud_graph` respectively. The graph takes about 5 minutes to load."
60+
"The cell below loads the example fraud graph into your Neptune cluster. When you run the cell you will be prompted to select a `Source type`, a `Data Model`, and a `Data set`. Select `samples`, `Property_Graph`, and `fraud_graph`, respectively. The graph takes about 5 minutes to load."
6161
]
6262
},
6363
{

0 commit comments

Comments
 (0)