[Announcement] Improved JSON Support for Amazon Kinesis Analytics

0

Since the launch of Kinesis Analytics, we have received a lot of great feedback from customers about providing better support for JSON. This forum announcement will be used to provide an overview of our JSON support in schema discovery (DiscoverInputSchema API) and parsing into your application. We plan to iterate quickly in this area and will be updating our Developer Guide to provide more robust information on this topic. If you have a data format for which you want support for, please send me a private message or create a new topic on the forum.

Amazon Kinesis Analytics supports schema discovery and parsing for UTF-8 encoded JSON and CSV data. For JSON, elements are referenced using a subset of JSONPath, a path expression syntax for selecting parts of a JSON document. An example record is provided below with JSONPath references (Partial Source: http://goessner.net/articles/JsonPath/).

Amazon Kinesis Analytics supports heterogeneous data in a stream by specification of a superset of elements across all event types in the input schema. A common use case is the combination of different data producers in a single stream and application. These data producers can be writing different event types (and therefore schema) to a stream. Your Amazon Kinesis Analytics application can support this scenario by specifying all columns from all event types in the input schema. Each column will be read and where a column is missing, null data will appear.

der a single application with a single schema. These data producers can be contributing data to different column groups in the schema, but the user is able to analyze the sum of it in their single application.

Amazon Kinesis Analytics schema discovery and parsing will allow you to read any element that can be expressed with a fully qualified JSONPath element without arrays. For arrays in JSON documents, the service supports reading elements with fully explicit JSON paths that de-reference a single, non-nested array. Further, additional non-nested arrays can be read as strings through a direct reference.

JSON Example

{   
 "store":{   
 "name":"My Bookstore",  
 "location":{   
 "street":"123 Main Street",  
 "city":"Seattle",  
 "state":"WA"  
 },  
 "book":[   
 {   
 "category":"reference",  
 "title":"Sayings of the Century",  
 "price":8.95  
 },  
 {   
 "category":"fiction",  
 "title":"Sword of Honor",  
 "price":12.99  
 }  
 ],  
 "newspaper":[   
 {   
 "name":"Sun",  
 "price":3.99  
 },  
 {   
 "name":"Herald",  
 "price":2.99  
 }  
 ]  
 }  
}  

Operators
The following JSON path operators are supported:

_Root Reference_

$  

The default root elements that starts all path expression. Not recommended to switch.

_Dot-notated child_

.```  
  
\_Array Reference\_  
  

.[0:]

  
A single reference for each array.   
   
**Currently Supported Examples**   
\_Example 1 – Simple key value pair\_   
JSONPath 1 =  
  

$.store.name

  
JSONPath 2 =  
  

$.store.location

  
JSONPath 3 =  
  

$.store.location.street

  
Result   
Column 1 (Name) = "My bookstore"   
Column 2 (Location) =  
  

"{"street": "123 Main Street","city": "Seattle","state": "WA"}"

  
Column 3 (Street) = "123 Main Street"   
   
\_Example 2 – Specific element reference to a single non-nested array\_   
JSONPath =  
  

$.store.book[0:].category

  
Result = 2 rows; "reference", "fiction"   
   
\_Example 3 – Specific element reference to a single non-nested array\_   
   
JSONPath =  
  

$.store.book[0:].category

  
Result = 2 rows; "reference", "fiction"   
   
\_Example 4 - Non-specific reference to an array\_   
JSONPath =  
  

$.store.newspaper[0:]

  
Result = 1 row;  
  

"{ "name": "Sun","price": 3.99},{ "name": "Herald","price": 2.99}"

  
\_Example 5 – Referencing two arrays\_   
A single, non-nested array may be referenced one or more times. Additional arrays can be read in its entirety as a string.   
   
JSONPath 1 =  
  

$.store.book[0:].category

  
JSONPath 2 =  
  

$.store.book[0:].title

  
JSONPath 3 =  
 `$.store.book[0:].price`  
   
JSONPath 4 =  
  

$.store.newspaper[0:]

  
Result (2 Rows)   
Column 1 (Category) = reference, fiction   
Column 2 (Title) = Sayings of the Century, Sword of Honour Category Title   
Column 3 (Price) = 8.95, 12.99   
Column 3 (Newspaper) =  
  

"{ "name": "Sun","price": 3.99},{ "name": "Herald","price": 2.99}",
"{ "name": "Sun","price": 3.99},{"name":"Herald","price": 2.99}"

  
**Non-Supported Cases and Future Improvements**   
The below cases are not currently supported. We are working on improving support for many of the cases below. Please check this announcement for updates.   
   
\_Case 1 - Non-explicit reference to arrays by using an index\_   
JSONPath =  
  

$.store.newspaper[1]

  
JSONPath =  
  

$.store.newspaper[1:2]

  
Current Result = A single row sent to the error stream.   
Future Result = Flattened arrays across multiple rows.   
   
\_Case 2 - Reference to the entire JSON payload\_   
JSONPath =  
  

$.*

  
Current Result = A single row sent to error stream.   
Future Result = Entire JSON payload placed in column.   
   
\_Case 3 – Dereferencing multiple arrays\_   
JSONPath 1 =  
  

$.store.newspaper[0:].name

  
JSONPath 2 =  
  

$.store.book[0:].category

  
Current Result = A single row sent to the error stream.   
Future Result = A single array flattened and other arrays brought in as entire columns.   
   
\_Case 3 – Dereferencing nested arrays\_   
For this case, we use the following record example.   
  

{
"store":{
"name":"My Bookstore",
"book":[
{
"category":"fiction",
"title":" Sayings of the Century ",
"chapter":[
{ "name":"Chapter 1" },
{ "name":"Chapter 2" }
]
},
{
"category":"fiction",
"title":"Sword of Honor",
"chapter":[
{ "name":"Chapter I" },
{ "name":"Chapter II" }
]
}
]
}
}

  
JSONPath 1 =  
  

$.store.book[0:].fiction

  
JSONPath 2 =  
  

$.store.book[0:].chapter[0:]

  
Current Result – A single row sent to the error stream.   
Future Result - First array flattened across multiple rows. Second array brought in as entire string.   
   
\_Case 4 – Support for more JSON path operators\_   
The following JSON path operators are not supported. (Source: http://goessner.net/articles/JsonPath/).   
   
\_Deep Scan\_  
  

..

  
Deep scan. Available anywhere a name is required.   
   
\_Wildcard\_  
  
  
Available anywhere a name or numeric are required.   
   
\_Filter Predicate\_  
  

@

  
The current node being processed by a filter predicate.   
   
\_Operator 4\_  
  

[start:end]

  
\_Array slice operator\_  
  

['' (, '')]

  
 ; Bracket-notated child or children   
   
\_Array index or indexes\_  
  

[ (, )]

  
   
   
\_Filter expression\_  
  

[?()]

  
Expression must evaluate to a boolean value.
  • This is an announcement migrated from AWS Forums that does not require an answer

AWS
asked 7 years ago108 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions