Part of the "Understanding Parser Combinators" series (link)

Writing a JSON parser from scratch

In 250 lines of code

UPDATE: Slides and video from my talk on this topic

In this series, we are looking at how applicative parsers and parser combinators work.

  • In the first post, we created the foundations of a parsing library.
  • In the second post, we built out the library with many other useful combinators.
  • In the third post, we improved the error messages.
  • In this last post, we’ll use the library we’ve written to build a JSON parser.

First, before we do anything else, we need to load the parser library script that we developed over the last few posts, and then open the ParserLibrary namespace:

#load "ParserLibrary.fsx"

open System
open ParserLibrary

You can download ParserLibrary.fsx from the link at the bottom of this post.

1. Building a model to represent the JSON spec

The JSON spec is available at json.org. I’ll paraphase it here:

  • A value can be a string or a number or a bool or null or an object or an array.
    • These structures can be nested.
  • A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.
  • A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
  • A boolean is the literal true or false
  • A null is the literal null
  • An object is an unordered set of name/value pairs.
    • An object begins with { (left brace) and ends with } (right brace).
    • Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
  • An array is an ordered collection of values.
    • An array begins with [ (left bracket) and ends with ] (right bracket).
    • Values are separated by , (comma).
  • Whitespace can be inserted between any pair of tokens.

In F#, this definition can be modelled naturally as:

type JValue =
  | JString of string
  | JNumber of float
  | JBool   of bool
  | JNull
  | JObject of Map<string, JValue>
  | JArray  of JValue list

So the goal of our JSON parser is:

  • Given a string, we want to output a JValue value.

2. Getting started with “Null” and “Bool”

Let’s start with the simplest tasks – parsing the literal values for null and the booleans.

Parsing Null

Parsing the null literal is trivial. The logic will be:

  • Match the string “null”.
  • Map the result to the JNull case.

Here’s the code:

let jNull =
  pstring "null"
  |>> (fun _ -> JNull)  // map to JNull
  <?> "null"            // give it a label

Note that we don’t actually care about the value returned by the parser because we know in advance that it is going to be “null”!

This is a common situation, so let’s write a little utility function, >>% to make this look nicer:

// applies the parser p, ignores the result,
// and returns x.
let (>>%) p x =
  p |>> (fun _ -> x)

Now we can rewrite jNull as follows:

let jNull =
  pstring "null"
  >>% JNull   // using new utility combinator
  <?> "null"

Let’s test:

run jNull "null"
// Success: JNull

run jNull "nulp" |> printResult
// Line:0 Col:3 Error parsing null
// nulp
//    ^Unexpected 'p'

That looks good. Let’s try another one!

Parsing Bool

The bool parser will be similar to null:

  • Create a parser to match “true”.
  • Create a parser to match “false”.
  • And then choose between them using <|>.

Here’s the code:

let jBool =
  let jtrue =
    pstring "true"
    >>% JBool true   // map to JBool
  let jfalse =
    pstring "false"
    >>% JBool false  // map to JBool

  // choose between true and false
  jtrue <|> jfalse
  <?> "bool"           // give it a label

And here are some tests:

run jBool "true"
// Success: JBool true

run jBool "false"
// Success: JBool false

run jBool "truX" |> printResult
// Line:0 Col:0 Error parsing bool
// truX
// ^Unexpected 't'
Note that the “Unexpected ’t'” error is misleading due to the backtracking issue discussed in the previous post. Since “true” failed, it is trying to parse “false” now, and “t” is an unexpected character. We won’t attempt to fix that here.

3. Parsing “String”

Now for something more complicated – strings.

The spec for string parsing is available as a “railway diagram” like this:

All diagrams sourced from json.org.

To build a parser from a diagram like this, we work from the bottom up, building small “primitive” parsers which we then combine into larger ones.

Let’s start with “any unicode character other than quote and backslash”. We have a simple condition to test, so we can just use the satisfy function:

let jUnescapedChar =
  let label = "char"
  satisfy (fun ch -> ch <> '\\' && ch <> '\"') label

We can test it immediately:

run jUnescapedChar "a"
// Success 'a'

run jUnescapedChar "\\" |> printResult
// Line:0 Col:0 Error parsing char
// \
// ^Unexpected '\'

Ok, good.

Escaped characters

Now what about the next case, the escaped characters?

In this case we have a list of strings to match ("\"", "\n", etc) and for each of these, a character to use as the result.

The logic will be:

  • First define a list of pairs in the form (stringToMatch, resultChar).
  • For each of these, build a parser using pstring stringToMatch >>% resultChar).
  • Finally, combine all these parsers together using the choice function.

Here’s the code:

/// Parse an escaped char
let jEscapedChar =
  [
  // (stringToMatch, resultChar)
  ("\\\"",'\"')      // quote
  ("\\\\",'\\')      // reverse solidus
  ("\\/",'/')        // solidus
  ("\\b",'\b')       // backspace
  ("\\f",'\f')       // formfeed
  ("\\n",'\n')       // newline
  ("\\r",'\r')       // cr
  ("\\t",'\t')       // tab
  ]
  // convert each pair into a parser
  |> List.map (fun (toMatch,result) ->
    pstring toMatch >>% result)
  // and combine them into one
  |> choice
  <?> "escaped char" // set label

And again, let’s test it immediately:

run jEscapedChar "\\\\" // Success '\\'
run jEscapedChar "\\t"  // Success '\009'

// or using @-strings to escape the input
run jEscapedChar @"\\"  // Success '\\'
run jEscapedChar @"\n"  // Success '\010'

run jEscapedChar "a" |> printResult
// Line:0 Col:0 Error parsing escaped char
// a
// ^Unexpected 'a'

It works nicely!

Unicode characters

The final case is the parsing of unicode characters with hex digits.

The logic will be:

  • First define the primitives for backslash, u and hexdigit.
  • Combine them together, using four hexdigits.
  • The output of the parser will be a nested, ugly tuple, so we need a helper function to convert the digits to an int, and then a char.

Here’s the code:

/// Parse a unicode char
let jUnicodeChar =

  // set up the "primitive" parsers
  let backslash = pchar '\\'
  let uChar = pchar 'u'
  let hexdigit =
    anyOf (['0'..'9'] @ ['A'..'F'] @ ['a'..'f'])
  let fourHexDigits =
    hexdigit .>>. hexdigit .>>. hexdigit .>>. hexdigit

  // convert the parser output (nested tuples)
  // to a char
  let convertToChar (((h1,h2),h3),h4) =
    let str = sprintf "%c%c%c%c" h1 h2 h3 h4
    Int32.Parse(str,Globalization.NumberStyles.HexNumber) |> char

  // set up the main parser
  backslash >>. uChar >>. fourHexDigits
  |>> convertToChar

And let’s test with a smiley face – \u263A.

run jUnicodeChar "\\u263A"  //  Success ('☺')

The complete “String” parser

Putting it all together now:

  • Define a primitive for quote
  • Define a jchar as a choice between jUnescapedChar, jEscapedChar, and jUnicodeChar.
  • The whole parser is then zero or many jchar between two quotes.
let quotedString =
  let quote = pchar '\"' <?> "quote"
  let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar

  // set up the main parser
  quote >>. manyChars jchar .>> quote

One more thing, which is to wrap the quoted string in a JString case and give it a label:

/// Parse a JString
let jString =
  // wrap the string in a JString
  quotedString
  |>> JString           // convert to JString
  <?> "quoted string"   // add label

Let’s test the complete jString function:

run jString "\"\""
  // Success (JString "")
run jString "\"a\""
  // Success (JString "a")
run jString "\"ab\""
  // Success (JString "ab")
run jString "\"ab\\tde\""
  // Success (JString "ab\tde")
run jString "\"ab\\u263Ade\""
  // Success (JString "ab☺de")

4. Parsing Number

The “railway diagram” for Number parsing is:

Again, we’ll work bottom up. Let’s start with the most primitive components, the single chars and digits:

let optSign = opt (pchar '-')

let zero = pstring "0"

let digitOneNine =
  satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"

let digit =
  satisfy (fun ch -> Char.IsDigit ch ) "digit"

let point = pchar '.'

let e = pchar 'e' <|> pchar 'E'

let optPlusMinus = opt (pchar '-' <|> pchar '+')

Now let’s build the “integer” part of the number. This is either:

  • The digit zero, or,
  • A nonZeroInt, which is a digitOneNine followed by zero or more normal digits.
let nonZeroInt =
  digitOneNine .>>. manyChars digit
  |>> fun (first,rest) -> string first + rest

let intPart = zero <|> nonZeroInt

Note that, for the nonZeroInt parser, we have to combine the output of digitOneNine (a char) with manyChars digit (a string) so a simple map function is needed.

The optional fractional part is a decimal point followed by one or more digits:

let fractionPart = point >>. manyChars1 digit

And the exponent part is an e followed by an optional sign, followed by one or more digits:

let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

With these components, we can assemble the whole number:

optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
|>> convertToJNumber
<?> "number"   // add label

We haven’t defined convertToJNumber yet though. This function will take the four-tuple output by the parser and convert it into a float.

Now rather than writing custom float logic, we’re going to be lazy and let the .NET framework to the conversion for us! That is, each of the components will be turned into a string, concatenated, and the whole string parsed into a float.

The problem is that some of the components (like the sign and exponent) are optional. Let’s write a helper that converts an option to a string using a passed in function, but if the option is None return the empty string.

I’m going to call it |>? but it doesn’t really matter because it is only used locally within the jNumber parser.

// utility function to convert an optional value 
// to a string, or "" if missing
let ( |>? ) opt f =
  match opt with
  | None -> ""
  | Some x -> f x

Now we can create convertToJNumber:

  • The sign is converted to a string.
  • The fractional part is converted to a string, prefixed with a decimal point.
  • The exponent part is converted to a string, with the sign of the exponent also being converted to a string.
let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
  // convert to strings and let .NET parse them!
  // -- crude but ok for now.

  let signStr =
    optSign
    |>? string   // e.g. "-"

  let fractionPartStr =
    fractionPart
    |>? (fun digits -> "." + digits )  // e.g. ".456"

  let expPartStr =
    expPart
    |>? fun (optSign, digits) ->
      let sign = optSign |>? string
      "e" + sign + digits          // e.g. "e-12"

  // add the parts together and convert to a float,
  // then wrap in a JNumber
  (signStr + intPart + fractionPartStr + expPartStr)
  |> float
  |> JNumber

It’s pretty crude, and converting things to strings can be slow, so feel free to write a better version.

With that, we have everything we need for the complete jNumber function:

/// Parse a JNumber
let jNumber =

  // set up the "primitive" parsers
  let optSign = opt (pchar '-')

  let zero = pstring "0"

  let digitOneNine =
    satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"

  let digit =
    satisfy (fun ch -> Char.IsDigit ch ) "digit"

  let point = pchar '.'

  let e = pchar 'e' <|> pchar 'E'

  let optPlusMinus = opt (pchar '-' <|> pchar '+')

  let nonZeroInt =
    digitOneNine .>>. manyChars digit
    |>> fun (first,rest) -> string first + rest

  let intPart = zero <|> nonZeroInt

  let fractionPart = point >>. manyChars1 digit

  let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

  // utility function to convert an optional value 
  // to a string, or "" if missing
  let ( |>? ) opt f =
    match opt with
    | None -> ""
    | Some x -> f x

  let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
    // convert to strings and let .NET parse them! 
    // -- crude but ok for now.

    let signStr =
      optSign
      |>? string   // e.g. "-"

    let fractionPartStr =
      fractionPart
      |>? (fun digits -> "." + digits )  // e.g. ".456"

    let expPartStr =
      expPart
      |>? fun (optSign, digits) ->
        let sign = optSign |>? string
        "e" + sign + digits          // e.g. "e-12"

    // add the parts together and convert to a float, 
    // then wrap in a JNumber
    (signStr + intPart + fractionPartStr + expPartStr)
    |> float
    |> JNumber

  // set up the main parser
  optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
  |>> convertToJNumber
  <?> "number"   // add label

It’s a bit long-winded, but each component follows the spec, so I think it is still quite readable.

Let’s start testing it:

run jNumber "123"     // JNumber 123.0
run jNumber "-123"    // JNumber -123.0
run jNumber "123.4"   // JNumber 123.4

And what about some failing cases?

run jNumber "-123."   // JNumber -123.0 -- should fail!
run jNumber "00.1"    // JNumber 0      -- should fail!

Hmm. Something went wrong! These cases should fail, surely?

Well, no. What’s happening in the -123. case is that the parser is consuming everything up the to decimal point and then stopping, leaving the decimal point to be matched by the next parser! So, not an error.

Similarly, in the 00.1 case, the parser is consuming only the first 0 then stopping, leaving the rest of the input (0.4) to be matched by the next parser. Again, not an error.

To fix this properly, we want the parser to be greedy, but that is out of scope, and in practice, should not be a problem. For now, let’s just add some whitespace to the parser to force it to consume all the characters in the input before terminating.

let jNumber_ = jNumber .>> spaces1

Now let’s test again:

run jNumber_ "123"     // JNumber 123.0
run jNumber_ "-123"    // JNumber -123.0

run jNumber_ "-123." |> printResult
// Line:0 Col:4 Error parsing number andThen many1 whitespace
// -123.
//     ^Unexpected '.'

and we find the error is being detected properly now.

Let’s test the fractional part:

run jNumber_ "123.4"   // JNumber 123.4

run jNumber_ "00.4" |> printResult
// Line:0 Col:1 Error parsing number andThen many1 whitespace
// 00.4
//  ^Unexpected '0'

and the exponent part now:

// exponent only
run jNumber_ "123e4"     // JNumber 1230000.0

// fraction and exponent
run jNumber_ "123.4e5"   // JNumber 12340000.0
run jNumber_ "123.4e-5"  // JNumber 0.001234

It’s all looking good so far. Onwards and upwards!

5. Parsing “Array”

Next up is the Array case. Again, we can use the railway diagram to guide the implementation:

We will start with the primitives again. Note that we are adding optional whitespace after each token:

let jArray =

  let left = pchar '[' .>> spaces
  let right = pchar ']' .>> spaces
  let comma = pchar ',' .>> spaces
  let value = jValue .>> spaces
  //          ^ what is "jValue"?

What is this jValue though? We’ll come to that shortly.

Next, following the guidance of the railway diagram above, we create a list of values separated by a comma, with the whole list between the left and right brackets.

let jArray =
  ...

  // set up the list parser
  let values = sepBy value comma

  // set up the main parser
  between left values right
  |>> JArray
  <?> "array"

Let’s revisit that jValue now.

let jArray =
  ...
  let value = jValue .>> spaces
  //          ^ what is "jValue"?
  ...

Well, the spec says that a JSON array can contain a list of JSON values, so we’ll assume that we have a jValue parser that can be used to parse the input into a JValue object.

But to parse a JValue, we need to parse a JArray first, because JArray is one of the choices of JValue!

We have hit a common problem in parsing – mutually recursive definitions. We need a JValue parser to build an JArray, but we need an JArray parser to build a JValue.

How can we deal with this?

Forward references

The trick is to create a forward reference – a dummy jValue parser that we can use right now to define the jArray parser, and then later on, we will fix up the forward reference with the “real” jValue parser.

This is one time where mutable references come in handy!

We will need a helper function to assist us with this, and the logic will be as follows:

  • Define a dummy parser that will be replaced later.
  • Define a “proxy” or “wrapper” parser that forwards the input stream to the inner dummy parser.
  • Return both the real parser and a mutable reference to the dummy parser.

Later on, the client code will fix up the mutable reference to point to the correct parser, and from then on, the proxy parser will forward the input to the new parser that has replaced the dummy parser. Here’s the code:

let createParserForwardedToRef<'a>() =

  let dummyParser : Parser<'a>=
    let innerFn _ =
      failwith "unfixed forwarded parser"
    {parseFn=innerFn; label="unknown"}

  // mutable pointer to placeholder Parser
  let parserRef = ref dummyParser

  // wrapper Parser
  let innerFn input =
    // forward input to the placeholder
    // (Note: "!" is the deferencing operator)
    runOnInput !parserRef input
  let wrapperParser = {parseFn=innerFn; label="unknown"}

  wrapperParser, parserRef

With this in place, we can create a placeholder for a parser of type JValue:

let jValue,jValueRef = createParserForwardedToRef<JValue>()

Finishing up the “Array” parser

Going back to the jArray parser, we can now compile it successfully, using the jValue “proxy” parser:

let jArray =

  // set up the "primitive" parsers
  let left = pchar '[' .>> spaces
  let right = pchar ']' .>> spaces
  let comma = pchar ',' .>> spaces
  let value = jValue .>> spaces

  // set up the list parser
  let values = sepBy value comma

  // set up the main parser
  between left values right
  |>> JArray
  <?> "array"

If we try to test it now, we get an exception because we haven’t fixed up the reference to the default dummy parser:

run jArray "[ 1, 2 ]"

// System.Exception: unfixed forwarded parser

So for now, let’s fix up the reference to use one of the parsers that we have already created, such as jNumber. Later on, we’ll fix it up to be the entire parser.

jValueRef := jNumber

Now we can successfully test the jArray function, as long as we are careful to only use numbers in our array!

run jArray "[ 1, 2 ]"
// Success (JArray [JNumber 1.0; JNumber 2.0],

run jArray "[ 1, 2, ]" |> printResult
// Line:0 Col:6 Error parsing array
// [ 1, 2, ]
//       ^Unexpected ','

6. Parsing “Object”

The parser for Object is very similar to the one for Array.

First, the railway diagram:

Using this diagram as a guide, we can create the parser directly, so I’ll present it here without comment:

let jObject =

  // set up the "primitive" parsers
  let left = spaces >>. pchar '{' .>> spaces
  let right = pchar '}' .>> spaces
  let colon = pchar ':' .>> spaces
  let comma = pchar ',' .>> spaces
  let key = quotedString .>> spaces
  let value = jValue .>> spaces

  // set up the list parser
  let keyValue = (key .>> colon) .>>. value
  let keyValues = sepBy keyValue comma

  // set up the main parser
  between left keyValues right
  |>> Map.ofList  // convert the list of keyValues into a Map
  |>> JObject     // wrap in JObject
  <?> "object"    // add label

A bit of testing to make sure it works (but remember, only numbers are supported as JValues for now).

run jObject """{ "a":1, "b"  :  2 }"""
// JObject (map [("a", JNumber 1.0); ("b", JNumber 2.0)]),

run jObject """{ "a":1, "b"  :  2, }""" |> printResult
// Line:0 Col:18 Error parsing object
// { "a":1, "b"  :  2, }
//                   ^Unexpected ','

7. Putting it all together

Finally, we can combine all six of the parsers using the choice combinator, and we can assign this to the jValueRef parser reference that we created earlier:

jValueRef := choice
  [
  jNull
  jBool
  jNumber
  jString
  jArray
  jObject
  ]

And now we are ready to rock and roll!

Testing the complete parser: example 1

Here’s an example of a JSON string that we can attempt to parse:

let example1 = """{
  "name" : "Scott",
  "isMale" : true,
  "bday" : {"year":2001, "month":12, "day":25 },
  "favouriteColors" : ["blue", "green"],
  "emptyArray" : [],
  "emptyObject" : {}
}"""
run jValue example1

And here is the result:

JObject
  (map
    [("bday", JObject(map
      [("day", JNumber 25.0);
      ("month", JNumber 12.0);
      ("year", JNumber 2001.0)]));
    ("emptyArray", JArray []);
    ("emptyObject", JObject (map []));
    ("favouriteColors", JArray [JString "blue"; JString "green"]);
    ("isMale", JBool true);
    ("name", JString "Scott");
    ])

Testing the complete parser: example 2

Here’s one from the example page on json.org:

let example2= """{"widget": {
  "debug": "on",
  "window": {
    "title": "Sample Konfabulator Widget",
    "name": "main_window",
    "width": 500,
    "height": 500
  },
  "image": {
    "src": "Images/Sun.png",
    "name": "sun1",
    "hOffset": 250,
    "vOffset": 250,
    "alignment": "center"
  },
  "text": {
    "data": "Click Here",
    "size": 36,
    "style": "bold",
    "name": "text1",
    "hOffset": 250,
    "vOffset": 100,
    "alignment": "center",
    "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
  }
}}  """

run jValue example2

And here is the result:

JObject(map
  [("widget",JObject(map
    [("debug", JString "on");
    ("image",JObject(map
      [("alignment", JString "center");
        ("hOffset", JNumber 250.0);
        ("name", JString "sun1");
        ("src", JString "Images/Sun.png");
        ("vOffset", JNumber 250.0)]));
    ("text",JObject(map
      [("alignment", JString "center");
        ("data", JString "Click Here");
        ("hOffset", JNumber 250.0);
        ("name", JString "text1");
        ("onMouseUp", 
          JString "sun1.opacity = (sun1.opacity/100) * 90;");
        ("size", JNumber 36.0);
        ("style", JString "bold");
        ("vOffset", JNumber 100.0)]));
    ("window",JObject(map
      [("height", JNumber 500.0);
        ("name", JString "main_window");
        ("title", JString "Sample Konfabulator Widget");
        ("width", JNumber 500.0)]))]))]),

Complete listing of the JSON parser

Here’s the complete listing for the JSON parser – it’s about 250 lines of useful code.

Source code used in this post is available here.

open System
open ParserLibrary

type JValue =
  | JString of string
  | JNumber of float
  | JBool   of bool
  | JNull
  | JObject of Map<string, JValue>
  | JArray  of JValue list


// ======================================
// Forward reference
// ======================================

/// Create a forward reference
let createParserForwardedToRef<'a>() =

  let dummyParser : Parser<'a> =
    let innerFn _ = failwith "unfixed forwarded parser"
    {parseFn=innerFn; label="unknown"}

  // ref to placeholder Parser
  let parserRef = ref dummyParser

  // wrapper Parser
  let innerFn input =
    // forward input to the placeholder
    // (Note: "!" is the deferencing operator)
    runOnInput !parserRef input
  let wrapperParser = {parseFn=innerFn; label="unknown"}

  wrapperParser, parserRef

let jValue,jValueRef = createParserForwardedToRef<JValue>()

// ======================================
// Utility function
// ======================================

// applies the parser p, ignores the result, and returns x.
let (>>%) p x =
  p |>> (fun _ -> x)

// ======================================
// Parsing a JNull
// ======================================

let jNull =
  pstring "null"
  >>% JNull   // map to JNull
  <?> "null"  // give it a label

// ======================================
// Parsing a JBool
// ======================================

let jBool =
  let jtrue =
    pstring "true"
    >>% JBool true   // map to JBool
  let jfalse =
    pstring "false"
    >>% JBool false  // map to JBool

  // choose between true and false
  jtrue <|> jfalse
  <?> "bool"           // give it a label


// ======================================
// Parsing a JString
// ======================================

/// Parse an unescaped char
let jUnescapedChar =
  satisfy (fun ch -> ch <> '\\' && ch <> '\"') "char"

/// Parse an escaped char
let jEscapedChar =
  [
  // (stringToMatch, resultChar)
  ("\\\"",'\"')      // quote
  ("\\\\",'\\')      // reverse solidus
  ("\\/",'/')        // solidus
  ("\\b",'\b')       // backspace
  ("\\f",'\f')       // formfeed
  ("\\n",'\n')       // newline
  ("\\r",'\r')       // cr
  ("\\t",'\t')       // tab
  ]
  // convert each pair into a parser
  |> List.map (fun (toMatch,result) ->
    pstring toMatch >>% result)
  // and combine them into one
  |> choice
  <?> "escaped char" // set label

/// Parse a unicode char
let jUnicodeChar =

  // set up the "primitive" parsers
  let backslash = pchar '\\'
  let uChar = pchar 'u'
  let hexdigit = 
    anyOf (['0'..'9'] @ ['A'..'F'] @ ['a'..'f'])
  let fourHexDigits =
    hexdigit .>>. hexdigit .>>. hexdigit .>>. hexdigit

  // convert the parser output (nested tuples)
  // to a char
  let convertToChar (((h1,h2),h3),h4) =
    let str = sprintf "%c%c%c%c" h1 h2 h3 h4
    Int32.Parse(str,Globalization.NumberStyles.HexNumber) |> char

  // set up the main parser
  backslash  >>. uChar >>. fourHexDigits 
  |>> convertToChar


/// Parse a quoted string
let quotedString =
  let quote = pchar '\"' <?> "quote"
  let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar

  // set up the main parser
  quote >>. manyChars jchar .>> quote

/// Parse a JString
let jString =
  // wrap the string in a JString
  quotedString
  |>> JString           // convert to JString
  <?> "quoted string"   // add label

// ======================================
// Parsing a JNumber
// ======================================

/// Parse a JNumber
let jNumber =

  // set up the "primitive" parsers
  let optSign = opt (pchar '-')

  let zero = pstring "0"

  let digitOneNine =
    satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"

  let digit =
    satisfy (fun ch -> Char.IsDigit ch ) "digit"

  let point = pchar '.'

  let e = pchar 'e' <|> pchar 'E'

  let optPlusMinus = opt (pchar '-' <|> pchar '+')

  let nonZeroInt =
    digitOneNine .>>. manyChars digit
    |>> fun (first,rest) -> string first + rest

  let intPart = zero <|> nonZeroInt

  let fractionPart = point >>. manyChars1 digit

  let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

  // utility function to convert an optional value 
  // to a string, or "" if missing
  let ( |>? ) opt f =
    match opt with
    | None -> ""
    | Some x -> f x

  let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
    // convert to strings and let .NET parse them! 
    // -- crude but ok for now.

    let signStr =
      optSign
      |>? string   // e.g. "-"

    let fractionPartStr =
      fractionPart
      |>? (fun digits -> "." + digits )  // e.g. ".456"

    let expPartStr =
      expPart
      |>? fun (optSign, digits) ->
        let sign = optSign |>? string
        "e" + sign + digits          // e.g. "e-12"

    // add the parts together and convert to a float, 
    // then wrap in a JNumber
    (signStr + intPart + fractionPartStr + expPartStr)
    |> float
    |> JNumber

  // set up the main parser
  optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
  |>> convertToJNumber
  <?> "number"   // add label

// ======================================
// Parsing a JArray
// ======================================

let jArray =

  // set up the "primitive" parsers
  let left = pchar '[' .>> spaces
  let right = pchar ']' .>> spaces
  let comma = pchar ',' .>> spaces
  let value = jValue .>> spaces

  // set up the list parser
  let values = sepBy value comma

  // set up the main parser
  between left values right
  |>> JArray
  <?> "array"

// ======================================
// Parsing a JObject
// ======================================


let jObject =

  // set up the "primitive" parsers
  let left = spaces >>. pchar '{' .>> spaces
  let right = pchar '}' .>> spaces
  let colon = pchar ':' .>> spaces
  let comma = pchar ',' .>> spaces
  let key = quotedString .>> spaces
  let value = jValue .>> spaces

  // set up the list parser
  let keyValue = (key .>> colon) .>>. value
  let keyValues = sepBy keyValue comma

  // set up the main parser
  between left keyValues right
  |>> Map.ofList  // convert the list of keyValues into a Map
  |>> JObject     // wrap in JObject
  <?> "object"    // add label

// ======================================
// Fixing up the jValue ref
// ======================================

// fixup the forward ref
jValueRef := choice
  [
  jNull
  jBool
  jNumber
  jString
  jArray
  jObject
  ]

Summary

In this post, we built a JSON parser using the parser library that we have developed over the previous posts.

I hope that, by building both the parser library and a real-world parser from scratch, you have gained a good appreciation for how parser combinators work, and how useful they are.

I’ll repeat what I said in the first post: if you are interesting in using this technique in production, be sure to investigate the FParsec library for F#, which is optimized for real-world usage.

And if you are using languages other than F#, there is almost certainly a parser combinator library available to use.

Thanks!

Source code used in this post is available here.

Comments

blog comments powered by Disqus