Snowflake Sql Regex ~ Extracting Multiple Vals
Solution 1:
This select statement will give you what you want ... sorta. You should notice that it will look for the a specific occurence of "val" and then give you the next word character after that.
REGEX to my knowledge evaluates to the first occurence of the expression, so once the pattern is found it's done. You may want to look at the Snowflake JavaScript Stored Procedure to see if you can take the example below and iterate through, incrementing the appropriate value to produce the expected output.
SELECT REGEXP_SUBSTR('Type: a:- !<string>val: "A" - !<string> val: "B" - !<string> val: "C"','val\\W+(\\w+)', 1, 1, 'e', 1) as A,
REGEXP_SUBSTR('Type: a:- !<string>val: "A" - !<string> val: "B" - !<string> val: "C"','val\\W+(\\w+)', 1, 2, 'e', 1) as B,
REGEXP_SUBSTR('Type: a:- !<string>val: "A" - !<string> val: "B" - !<string> val: "C"','val\\W+(\\w+)', 1, 3, 'e', 1) as C;
Solution 2:
You have to extract the values in two stages;
- Extract the section of the document below Type: a: containing all the
val: "data". - Extract the
"data"as an array or useREGEXP_SUBSTR()+ index n to extract the nth element
SELECT'Type:\\s+\\w+:((\\s+- !<string>\\s+val:\\s+"[^"]")+)' type_section_rx
REGEXP_SUBSTR(col, type_section_rx, 1, 1, 'i', 1) vals,
PARSE_JSON('[0'|| REPLACE(vals, REGEXP_SUBSTR(vals, '[^"]+'), ', ') ||']') raw_array,
ARRAY_SLICE(raw_array, 1, ARRAY_SIZE(raw_array)) val_array,
val_array[1] B
FROM INPUT_STRING
The result is an array where you can access the first value with the index [0] etc.
The first regexp can be shortened down to a "least effort" 'Type:\\s+\\w+:(([^"]+"[^"]+")+)'.
Solution 3:
One more angle -- Use javascript regex capabilities in a UDF.
For example:
create or replace functionmy_regexp(S text)
returns array
language javascript
as
$$
const re = /(\w+)/greturn [...S.match(re)]
$$
;
Invoked this way:
set S = 'Type:
a:
- !<string>
val: "A"
- !<string>
val: "B"
- !<string>
val: "C"';select my_regexp($S);
Yields:
[ "Type", "a", "string", "val", "A", "string", "val", "B", "string", "val", "C" ]Implementing your full regex is a little more work but as you can see, this gets around the single value limitation.
That said, if performance is your priority, I would expect Snowflake native regex support to outperform, even though you specify the regex multiple times, though I haven't tested this.
Post a Comment for "Snowflake Sql Regex ~ Extracting Multiple Vals"