Spatial Databox User's Guide

Introduction

This document covers client-side interfaces and native wire protocols to the Spatial Databox.

A client communicates with Spatial Databox (SDB) through an HTTP RESTful interface. The REST interface is a language and platform-neutral procotol; the URL specifies the resource and the HTTP request specifies the operation (GET, POST, PUT, DELETE)

SDB can can reply to client query requests in a number of native wire formats: binary, CSV, GPX, SEO-friendly HTML, JSON, Little JSON, KML and XML.

The goals of the client-side interface are as follows:

1. easy to use, from all popular programming language and scripting platforms: native wire protocols are abstracted by an application-level interface with native language bindings. SDB becomes a simple extension of the platform. This extension allows early detection of usage errors, and where supported, allows such errors to be detected at compile-time.

2. integrate with popular location based services or devices (eg. Google Earth, Garmin GPS, etc)

3. Support the "deep web": HTML interface provides structured "food" for Google bots to index the contents of a layer.

4. Hide myriad SQL complexities from simple, typical usage scenarios. SQL is still there for complex needs.

To meet goal #1, the following language / scripting platforms are currently supported:

1. Client-side application languages: a> ActionScript 3 / Flash: streaming binary via flash.net.URLStream b> JavaScript: JSON via an XMLHttpRequest connection

More platforms (Perl, Python, Ruby, etc) will be supported as demand appears from these platforms.

REST verbs

All SDB functionality is expressed through these four REST verbs:

HTTP VerbSDB2 Action
GETRetrieve POI from layer
POSTCreate a POI and add to layer
PUTUpdate POI in layer
DELETERemove a POI from layer

Reference: HTTP/1.1 Method definitions

GET (Query) Syntax

A query is submitted to Spatial Databox via a URL structured as follows:

http://spatialdatabox.com/r/
    {layer}/
    {primary_cond}/
    {secondary_cond}
    ?
    p={projection}&
    f={format}&
    r={rowLimit}&
    sc200 (always return a status code of 200)
    #bufferLimit


To minimize the number of parameters passed via the query string, the layer, primary and secondary query conditions are passed within the URL path.

{layer} is required and is the the name of the layer to which to apply the GET operation.

{primary_cond} is required and is a spatial or identity expression that selects candidate rows. It is case insensitive, and takes one of the following forms:

BBOX=latMin, lonMin, latMax, lonMax
select all rows that lie within a rectangle. All rows are written as a single group.

BBOX=lat1Min, lon1Min, lat1Max, lon1Max, lat2Min, lon2Min, lat2Max, lon2Max
select all rows that lie in either rectangle. All rows are written as a single group.

TILE=rowLimit, zoom [ + | - recurse ] , pos {, pos }
select all rows that lie in the tiles at position pos {, pos}. All tiles are at zoom level zoom. An optional recurse parameter can recursively subdivide each tile into 2^(2*recurse)) sub-tiles.

If the count of rows selected per tile or sub-tile exceed rowLimit, they are replaced by their count.

zoom is a base 24 representation of the zoom level of all pos fields. A tile at zoom level 0 represents the whole earth. The maximum zoom level is 23 (which is represented by N in base 24)

pos is a hexadecimal number that captures two bits per zoom level, from the coordinates of the tile's upper-left edge.

Up to 100 pos values can be specified, which is slightly more than the number of tiles contained in a map that occupies a 1920 x 1200 pixel screen.

Rows are grouped per tile, or if subdivided, per sub-tile.

A rowLimit of 0 returns a row count for each tile or sub-tile, and is useful for the construction of a heat map. For this application, larger recurse values increase the precision of the heat map pixels, without the need to add more pos values, whose length would normally grow exponentially with an increase in the precision.

Although rowLimit applies to the entire tile, its interaction with sub-tile counts is determined by the character that separates zoom from recurse:
  • zoom+recurse applies rowLimit to each sub-tile. For example, for TILE=30,5+2 the count replaces the rows for each sub-tile whose row count is greater than 30.
  • zoom-recurse conditionally divides a tile into sub-tiles. Rows are written as one group for the entire tile, if their count do not exceed rowLimit; otherwise, the tile is sub-divided according to the prescribed recurse value, with all sub-tiles receiving a count of rows per sub-tile.
    For example, for TILE=500,5-2, 16 groups of row counts are written per tile, if the tile contains more than 500 rows.
    The row counts per sub-tile allow for a weighted positioning of a single marker bin within a tile.
Here are some examples:

0,0+4,0 = divide the world into 2^4 x 2^4 tiles. Because rowLimit is 0, the reply will contain only row counts for each of the 256 sub-tiles.

0,0,0 = return a single count of all rows in the world

CENTER=lat, lon
select all rows that are closest to lat, lon. This condition must include a rowLimit in the URL (eg. r=rowLimit)

ID=id, locode
select the single row that matches this key

Notes:

1. Each tile can be uniquely identified by this formula: pos * 32 + zoom. This identifier requires 51 bits (5 bits for zoom + 23 * 2 bits for pos) and can fit within an ActionScript 3 Number type, which provides a 52 bit integer mantissa.

2. See here for information about map zoom levels and their equivalence across Google, Yahoo and Microsoft.

{secondary_cond} is an optional boolean expression, expressed in standard SDB infix notation, that is applied to each row that satisfies {primary_cond}. Only rows that satisfy both {primary_cond} and {secondary_cond} appear in the result. If not present, "true" is substituted.

{projection}is an optional expression that projects information from the row. It takes one of the following three forms:

ProjectionDescription
expression_listA comma-separated list of one or more expressions of various types
*a shorthand for an expression_list that captures all columns from the layer's data source (ie. all layer columns minus LOCODE and TAGS).


If not present, the default "*" (all) projection is used.

{format} is an optional code that selects a response format for the expression_list projection. Valid format codes are as follows:

Format CodeDescriptionWriting Format
Bbinarybinary
CCSVXML
GGPXXML
HSEO friendly HTMLXML
JJSONUTF-8 (JS / C-style escaped)
jLittle JSONUTF-8 (JS / C-style escaped)
KKML 2.2XML
XXMLXML


If not present, the default "H" format code is used.

As GPX and KML define a fixed format expression, {projection} must specify four or six expressions, respectively.

{rowLimit} is an integer that represents the maximum number of rows returned by the query. If not present, the limit is set to 2^31 - 1 (infinity). This parameter is required for a CENTER-based {primary_cond}.

The URI fragment is optional and defines the maximum size of the reply buffer, in bytes. The default value is 131,072 (128K).

If the reply buffer overflows, the row that caused the overflow is removed completely from the reply buffer, so as to ensure that the reply message captures only complete rows.

A note about the use of rowLimit, in combination with a TILE-based primary condition:

The row limit is applied as if all tile groups were merged into one.

For example, for three TILE groups, one with 90 rows, the second with 130 rows, and the third with 10 rows, all rows from the first group would be written (including the group count), the first 10 from the second group would be written (with the group count), and 0 from the third group (but with the group count) would be written. This means the the sum of TILE group counts (230 = 90 + 130 + 10) can exceed the number of rows written.

Response Format: GET / Binary

The response to a successful GET (ie. status code is 2xx) takes this form:

struct GET_Response {
    Header header;
    byte columnTypes[header.numColumns];
    RowGroup groups[numGroups];
    TagLabel tagLabels[];
}

The following are descriptions of the four sections of GET_Response:

Header Section

A fixed-length record that contains metadata about the response, and composed of the following fields at the following offsets from the start of the response:

OffsetNameTypeDescription
0x00endianubyte0xAA = little endian
0xD5 = big endian
The endian byte must be read before all other values, as it defines the structure of all multi-byte values that follow. The values for little and big-endian are both distinct from any ASCII character (MSB = 1) and distinct from each other (remaining seven bits are complements of each other: 0101010 for little endian vs 1010101 for big endian)
0x01versionubyte0x01 = the version documented here
0x02numColumnsushortNumber of columns in RowGroup.Row
0x04numGroupsintNumber of RowGroup records in RowGroup section. Always 1 or more
0x08numRowsintThe sum of all positive RowGroup.size values
0x0celapsedMsecintElapsed milliseconds for SDB2 to perform query and write response to its output buffer.
0x10responseSizeBytesintSize, in bytes, of GET_Response. This includes Header and all other sections.


RowGroup Section

A sequence of RowGroup records, each containing a list of rows that lie within the same tile or locode range.

struct RowGroup {
    int size;
    Row rows[size]; // optional
}


The magnitude of size (ie. abs (size)) represents the the number of rows within the group; if size is negative, the rows array is omitted; otherwise, the rows in the array are structured as follows:

struct Row {
    columnTypes[0] column0; // eg id
    columnTypes[1] column1; // eg locode
     : :
    columnTypes[numColumns-1] column-n;
}


It is valid for RowGroup.size to be 0. In this case, no rows follow. (This condition can be due to no rows existing for that group, or because of buffer overflow or a row limit reached upon attempting to write the first row for that group.)

A TAGLIST column value is stored as a variable length array of int4 tagIDs where the last tag has its MSB set to 1. For example, a row with three tags (0x150, 0x33, 0x9) would be serialized into this sequence of int4 values: 0x00000150, 0x00000033, 0x80000009.

An empty taglist consists of this single int4 value: 0xffffffff (or -1).

Note: For a TILE-based primary condition, each tile (recursively defined or not) is written as one group per tile.

TagLabel Section

Contains either a single NoTagLabel record, or a sequence of one or more TagLabel records.

If both of these conditions are true, this section will contain one or more TagLabel records:
  • there is at least one typeCodes[] value that equals 0x4c (TAGLIST)
  • at least one tag list is not empty.

struct TagLabel {
    int id;
    Varchar clean_tag;
}


The last record in the sequence has the MSB of TagLabel.id set to 1.

struct NoTagLabel {
    int id = 0xffffffff;
}


Varchar String Format

CHAR values are stored in fields of a Row structure and VARCHAR values are stored in the heap or TagLabel section. Regardless of their location, both CHAR and VARCHAR values are stored in the following format:

struct Varchar {
    ushort length;
    byte utf8_stream[length];
}


This format can be read directly by the ActionScript 3 method URLStream.readUTF (). An empty string is represented by this structure:

struct EmptyVarchar {
    ushort length = 0x0000;
}

SDB2 Field Types

The data types supported by a SELECT operation are as follows:
NameIDataInput functionmessage size (bytes)ActionScript 3
native type
typeCodeSDB impl
BOOLEANreadBoolean ()1Boolean0x41yes
BYTEreadByte ()1int0x42no
SHORTreadShort ()2int0x43yes (as an expression type)
INTreadInt ()4int0x44yes
LONGreadLong ()8Number0x45yes
FLOATreadFloat ()4Number0x46yes
DOUBLEreadDouble ()8Number0x47yes
CHARreadUTF ()2+nString0x48yes
VARCHARreadUTF ()2+nString0x49yes
TIMESTAMPreadInt () * 28Date0x4ayes
LOCODEreadInt () * 28Number * 20x4byes
TAGLISTreadInt () * n4*nArray(n)0x4cyes


Notes:
  • LOCODE is a 64 bit unsigned integer that is represented in AS3 as a pair of Numbers, representing latitude and longitude
  • TIMESTAMP is a 64 bit unsigned integer that represents the number of milliseconds from 1-JAN-1970. In AS3, it is first converted to a Number which is passed to new Date () to instantiate a Date object; in Java, int64 is obtained from Timestamp.getTIme ()
  • Boolean is represented by these byte values: 0=false, 1=true.
  • The type codes correspond to ASCII uppercase letters 65='A', 66='B', ...
  • For maximum server performance, the binary protocol uses the server's native endian format:
    Intel x86: flash.netURLStream = Endian.LITTLE_ENDIAN
    Other: flash.net.URLStream = Endian.BIG_ENDIAN
    (The client is flexible and can accommodate either endian format.)
  • For type 75 (tags) the last tag in the list has its MSB set to 1
  • Date is created by new Date ().setTime (Number)
  • SHORT is supported as an expression type that is written directly to a response. It is not supported as a storage type, however.

Response Format: GET / XML

The response to a successful GET (ie. status code is 2xx) captures the same row information as a GET/ Binary response, except suitable for parsing by either a SAX or DOM parser:

<xml version="1.0" encoding="UTF-8"?>
<GetResponse version="1" elapsedMsec="43">
<POI>
  <f_1>231<f_1>
  <f_2>239847eaf238743c<f_2>
  <f_3>lt;t>Hotspot<t>lt;t>Coffee<t>lt;t>DriveThru<t>lt;f_3>
  <f_4>Starbucks Nanimo<f_4>
<POI>
<POI>
  <f_1>1742<f_1>
  <f_2>92832afe8987987c<f_2>
  <f_3>lt;t>Coffee<t>lt;t>Dinner<t>lt;t>DriveThru<t>lt;f_3>
  <f_4>Tim Hortons Parksdale<f_4>
<POI>
<GetResponse>


Note: To preserve space to write elapsedMsec, the response is formatted as follows:

<GetResponse version="1" elapsedMsec="~~~~~~~~~~~~
    :


where ~ represents 0x20; (a single space character). when the query has finished, the int value that represents the elapsed time is written immediately after elapsedMsec=" (this overwrites the spaces) and is then followed by the closing sequence ">. Any spaces that follow the attribute value remain in the response message, and will appear undetected by casual observation of the final XML response message.

Twelve space characters are reserved, which equals 10 for the largest positive integer value plus 2 for the closing sequence (">)

Field type codes are not included in the response, for the result is, in xml-parlance, self-evident.


Response Format: GET / JSON

The response to a successful GET (ie. status code is 2xx) captures the same POI information as a GET/ Binary response, except suitable for parsing by a JSON parser:
{
"version" : 1,
"elapsedMsec" : 43,
"poi" : [
  [ 231,
    "239847eaf238743c",
    [ "Hotspot", "Coffee", "DriveThru" ],
    "Starbucks Nanimo"
  ],
  [ 1742,
    "92832afe8987987c",
    [ "Coffee", "Dinner", "DriveThru" ],
    "Tim Hortons Parksdale"
  ]
]
}


Note: To preserve space to write elapsedMsec, the response is formatted as follows:

"elapsedMsec" : ~~~~~~~~~~~

where ~ represents 0x20; (a single space character). when the query has finished, the int value that represents the elapsed time is written immediately after "elapsedMsec" : (this overwrites the spaces) and is then followed by the closing sequence ,. Any spaces that follow the value remain in the response message, and will appear undetected by casual observation of the final JSON response message.

Eleven space characters are reserved, which equals 10 for the largest positive integer value plus 1 for the closing sequence (,)

Type codes are not included in the response, for the result is, in JSON-parlance, self-evident.


Expression Syntax

The following is the valid syntax of an expression in standard SDB2 infix notation, and is used in both the secondary condition and the projections.

ElementSyntax
expression   char_literal
| number_literal
| NOT expression
| ( expression )
| function ( expression_list )
| column
| expression math_op2 expression
| expression bool_op2 expression
| expression rel_op2 expression
| expression string_op2 expression
| bool_function
char_literal' { character } '
number_literal[+|-] digit { digit } [ . {digit}]
columnidentifier
functionbit_functions| num_function| string_function| ELAPSED | NVL | TAGS
bit_functionsBITAND
num_functionABS | CEIL | COS | EXP | FLOOR | LN | POWER | ROUND | SIGN | SIN | SQRT | TAG_LIST | TAG_MASK | TAN | TILE_COORD
bool_functionTAG | TAG_AND | TAG_OR
string_functionCONCAT | LENGTH | SUBSTR
math_op2* | / | + | -
bool_op2AND | OR
rel_op2< | <= | > | >= | = | !=
string_op2||
identifierletter { letter | digit | '_' }

Error Response

Independent of reply wire format, all errors are returned as a three line text document, each line terminated by a '\n' character (0x0a).

Content-Type: text/plain

  ERROR 999
  Message
  Resource

where
999 = three digit HTTP status code.
Message = Detailed description of error
Resource = Entity for which access was requested


For example

  ERROR 503
  Too many requests pending completion. Please try again in a moment.
  Server

  ERROR 400
  Syntax error in Secondary condition: Unknown field 'foobar' @col 34
  Client

HTTP Response Codes

Spatial Databox can return a response code in one of two ways:
  • status code in HTTP response header and message in response body
  • both status code and message in response body (status code is always 200)

Case 1 is the default and serves clients that adhere to the full HTTP protocol.
Case 2 is for ActionScript 3 clients, where some browsers (Firefox and some other non Internet Explorer based browsers) do not pass the http status code along to the ActionScript 3 client. The work-around is to always return a status code of 200 and inspect the response message to determine if it represents a normal response or an error response.
Regardless which case is used, the following status codes are returned by the following four HTTP request actions.
HTTP Method
CodeGETPOSTPUTDELETE
200 OKAll POI (if any) were returnedPOI was updated
201 CreatedPOI was created (response contains URL to POI)
204 No ContentPOI was deleted
206 Partial ContentA subset of POI requested were returned
400 Bad Requestmalformed syntax
403 Forbiddeninsufficient privileges
404 Not Foundlayer doesn't existlayer doesn't existlayer or POI doesn't existlayer or POI doesn't exist
413 Request Entity Too Large# POI returned exceed buffer size
500 Internal Server Errorrequest failed
503 Service Unavailabletry again


Notes:
  • A 206 response to a GET request implies that the request buffer overflowed. As documented here, it is not possible to pass a Range header (eg. Range: bytes=0-99999) which would allow the client to set the buffer size. Because a request that generates a 206 MUST have included a Range header field (as per rfc2616) and because we must support all popular clients, it is not possible for SDB to generate a 206 response.
  • The response to a GET always includes an elapsed time; therefore, a response with no rows will return a 200 instead of 204.