Parsing through FIGIs

The Financial Instrument Global Identifier (FIGI) is an open alternative to privately owned and licensed identifiers commonly used in financial markets, such as ISIN and CUSIP. With a comprehensive specification and a freely accessible API, FIGIs are a solid choice for exploring parsing in a financial context. In this post, we explore modeling identifiers with domain types, compare imperative and combinator approaches to parsing, and finally discuss composability and performance of these in a broader system.

Structure and Characteristics

As defined in § 6.1 of the spec, a FIGI is a "twelve character string semantically meaningless", contrasted with contexually rich identifiers tailored for human use, like stock ticker symbols. They serve to "identify financial instruments at the most granular possible level", while also being durable throughout cororpate actions (e.g., delistings, ticker changes) and available for all asset classes and regions.

The charset for FIGIs, as defined in § 6.2, is a subset of ISO 8859-1, consisting of upper-case consonants (including "Y") and numerals 0–9. The twelve characters have specific positional rules:

  • Pos 1–2: two upper-case consonants excluding combinations 'BS', 'BM', 'GG', 'GB', 'GH', 'KY', 'VG'
  • Pos 3: 'G' (for "global")
  • Pos 4–11: any combination of upper case consonants and the numerals 0–9
  • Pos 12: check digit 0–9 calculated using a modified Luhn algorithm

Bloomberg and Kaiko are the only two certified providers of FIGIs, issuing them for traditional financial instruments and cryptocurrencies, respectively. All FIGIs will have the prefix "BBG" if issued by Bloomberg and "KKG" if issued by Kaiko. For example, the FIGI for Apple Inc. (AAPL) on NYSE is "BBG000B9XVV8", whereas the FIGI for BTC/USDT on Binance is "KKG000003B64".

Modeling FIGIs in Rust

Trading and settlement protocols must decide how to convey symbology between market participants. The lingua franca in financial markets is the FIX protocol, which uses the Instrument component block. Other examples include Goldman Sach's AssetIdentifier for their Quant platform and Interactive Broker's SecIdType for their Trader Workstation. These three protocols—developed independently and designed for different purposes—model symbology the same with two fields: a string value along with a discriminator denoting the type (e.g., FIX's SecurityIdSource(22) and SecurityId(48)).

enum SecurityIdType {
    FIGI,
    ISIN,
    // ...
}

struct CreateOrder {
    security_id: Option<String>,
    security_id_source: Option<SecurityIdType>,
    // ...
}

While this modeling makes sense for wire formats bound to backwards-compatibility, it has no place in business logic maximising correctness and simplicity. Imagine creating an API for fetching the VWAP for an equity trading on a specific exchange. A naive API signature would be fetch_vwap(id: &str, id_type: IdType) -> u64, but calling this correctly requires meeting certain (undocumented) pre-conditions of the identifier:

  1. adherence to the syntactic requirements set forth by a specification
  2. granular enough to uniquely identify an instruemnt at a trading venue level (versus e.g. a global-share class identifer)

Implementing these checks inside the fetch_vwap function for many such APIs is not acceptable. Instead, we can use the parse, don't validate idiom for parsing identifier strings into universal domain types. In Rust, this is called the NewType pattern, which usually wraps primitive types in a single-field tuple struct. We then implement conversion traits like FromStr to construct these domain types, providing a natural place for validation checks.

struct Figi(String);

impl FromStr for Figi {
	type Err = FigiParseError;

	fn from_str(s: &str) -> Result<Self, Self::Err> {
	       // Enforce conformance to FIGI's specification
		// Returning an `Ok` on success or `Err` if input is malformed
		todo!()
	}
}

assert(Figi::from_str("BBG000B9XVV8").is_ok());
assert(Figi::from_str("US0378331005").is_err());

Our VWAP function can then be implemented like so:

async fn fetch_exchange_vwap(figi: &Figi) -> u64 { ... }

By embedding correctness into the type-system, we have eliminated all three aforementioned concerns with the naive API, providing a robust approach to modeling FIGIs in our application.

Parsing Imperatively

Now that we have a NewType Figi and a fallible constructor from_str, we must decide how to implement the validation logic. As outlined before, the structure is relatively simple, and hand-rolling a validation function is trivial. Let's translate Python's stdnum implementation of figi.validate as our Rust baseline (full code):

pub struct Figi(pub String);

pub enum FigiParseError {
    InvalidLength,
    InvalidFormat,
    InvalidComponent,
    InvalidChecksum,
}

impl FromStr for Figi {
    type Err = FigiParseError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let valid_chars = "0123456789BCDFGHJKLMNPQRSTVWXYZ";
        if s.len() != 12 {
            return Err(FigiParseError::InvalidLength);
        }
        if !s.chars().all(|c| valid_chars.contains(c)) {
            return Err(FigiParseError::InvalidFormat);
        }
        match &s[0..2] {
            "BS" | "BM" | "GG" | "GB" | "VG" => return Err(FigiParseError::InvalidComponent),
            _ => {}
        }
        if &s[2..3] != "G" {
            return Err(FigiParseError::InvalidComponent);
        }
        if !s.chars().last().unwrap().is_digit(10) {
            return Err(FigiParseError::InvalidChecksum);
        }
        Ok(Self(s.to_string()))
    }
}

Note that the logic is the same as stdnum's, but the approaches differ greatly in how and when it is called (i.e., ad-hoc in Python but systematic in Rust). Some argue for "writing Python like it's Rust", but deviating from conventions rarely takes hold. Nevertheless, we've solved two "issues" from the Python implementation:

  1. Systematic enforcement of FIGI well-formedness at construction time
  2. Errors (in the signature) that must be handled at the call site

The tests pass and the benchmark clocks in at 80.945 ns, so this is pretty solid! We have created a more intutive API, that is performant, has decent error ergonomics, and leverages the type system. But what about composability?

Composability of parsers

Composability in the abstract never helped me grok what it meant in practice. Instead, we will look at a real-world example where we must parse a FIGI as part of a larger format. Bloomberg's BLPAPI is a perfect for that. The syntax for Symbology (a methodology for identifying securities), as defined in § 3.2, consists of following components seperated by a "/":

  • Identifier Type: "ticker", "cusip", "isin", "sedol1", "sedol2", "buid", "bbgid", "bsym", etc.
  • Identifier Value: Unique identifier value according to Identifier Type
  • Provider: Optional mnemonic that has contributed pricing for the given security preceded by a “@”. If is not specified, a default value may apply depending on product.
  • Pricing Source: Optional, generally two-character mnemonic for the data source where the security is traded. For example, in the Equities Business, the Data Source is the Exchange
  • Yellow Key: market sector—"Govt", "Corp", "Mtge", "M-Mkt", "Muni", "Pfd", "Equity", "Comdty", "Index" "Curncy"

Using the Govt (i.e. treasury bond) example "/bbgid/BBG007Z1JW11@BVAL" we have two components (the type and pricing source) that we must parse in addition to the FIGI.

Parser combinators initially seemed like tools solely for functional programming savants. Thankfully, introductory posts like Practical Parsing in Rust with nom and Error recovery with parser combinators revealed their approachability and usefulness. For our use case, they are superior in their composability with other parsers.

Composability is a term I never truly grasped. So let's start with Bloomberg's BLPAPI, which defines identifiers as /<Identifier Type>/<Identifier Value>[@Provider| Pricing Source] <Yellow Key>. Here we only consider <Identifier Type> as "bbgid" (FIGI). Some examples:

AssetIdentifierNote
Govt"//blp/mktdata/bbgid
/BBG007Z1JW11@BVAL"
@BVAL is a pricing source, akin to an exchange from a pricing perspective
Equity"//blp/mktdata/
bbgid/BBG000B9XVV8"
FIGI here is most granular, pointing to a specific exchange

The first components ("//blp/mktdata") are the service schema defined as //blp/<servicename>, supporting:

  1. "//blp/refdata" for reference data
  2. "//blp/mktdata" for market data
  3. "//blp/mktbar" for market bar data

The second major component is the security identifier which includes FIGI along with other commonly used types. Just as the specifications decompose the top-level identifier into smaller parts, we can decompose our parsers into smaller (and independent) parts, composing them to handle more complex structures.

Worked Combinators

Many combinator libraries exist in Rust: nomchumskywinnowmonch. All are superb, but I chose winnow for its extensive blog poststutorials, and special topics.

BLPAPI Service

We'll model our types first:

#[derive(Debug, PartialEq)]
pub enum Scheme {
    BLP,
}

#[derive(Debug, Clone, PartialEq)]
pub enum Provider {
    RefData,
    MktData,
    MktBar,
}

#[derive(Debug, PartialEq)]
pub struct Service {
    scheme: Scheme,
    provider: Provider,
}

Then we parse them using winnow:

fn scheme<'s>(i: &mut &'s str) -> PResult<Scheme> {
    "blp".value(Scheme::BLP).parse_next(i)
}

fn provider<'s>(i: &mut &'s str) -> PResult<Provider> {
    alt((
        "refdata".value(Provider::RefData),
        "mktdata".value(Provider::MktData),
        "mktbar".value(Provider::MktBar),
    ))
    .parse_next(i)
}

This uses built-in Rust type for constructing parsers, so rather than literal("blp") we use impl Parser for &'s str provided by Winnow. The Parser trait implements a few combinators directly, e.g. Parser::value to produce a value as output.

To compose these into Service, expecting //<scheme>/<provider> and ignoring trivia:

fn service<'s>(i: &mut &'s str) -> PResult<Service> {
    ("//", scheme, "/", provider)
        .map(|(_, scheme, _, provider)| Service { scheme, provider })
        .parse_next(i)
}

Here we are using Winnow's impl Parser for (P1, P2, P3, P4) to create a parser from a tuple of size 4. This pattern of mapping a sequence of parsers into a struct is common enough that a macro-based combinator, seq!, exists to make this easier.

fn service<'s>(i: &mut &'s str) -> PResult<Service> {
    seq! {
        Service{
            _: "//",
            scheme: scheme,
            _: "/",
            provider: provider
        }
    }
    .parse_next(i)
}

Using cargo expand we can view the expansion:

fn service<'s>(i: &mut &'s str) -> PResult<Service> {
    ::winnow::combinator::trace(
            "Service",
            move |input: &mut _| {
                use ::winnow::Parser;
                let _ = "//".parse_next(input)?;
                let scheme = scheme.parse_next(input)?;
                let _ = "/".parse_next(input)?;
                let provider = provider.parse_next(input)?;
                #[allow(clippy::redundant_field_names)]
                Ok(Service {
                    scheme: scheme,
                    provider: provider,
                })
            },
        )
        .parse_next(i)
}

We see a more imperative implementaiton here, which is encouraged as one of Winnow's stated goals is to be an "introspectable toolbox that can easily be customized at any level".

BLPAPI Identifier

On to the "/bbgid/…" identifier portion, we face two aspects different from Service:

  1. Path-dependence: the <id> parser depends on the <id_type> parser
  2. Backtrackingalt backtracks on failure, which doesn't make sense once we parse id_type

We solve this with cut_err to disable other branches upon error (see Chapter 6 for more detail):

alt((
  ("figi", cut_err(figi)).context(StrContext::Label("FIGI")),
  ("isin", cut_err(isin)).context(StrContext::Label("ISIN")),
  ...
))

Another pattern is the trivia ("/") between components. Combinators like preceded and delimited can ignore complex sets of trivia.

WIP

Conclusion

In summary, parser combinators in Rust enable powerful, composable solutions for financial identifiers. With the NewType pattern, clear error handling, and expressive APIs, Rust helps us model this domain robustly and ergonomically. I hope this foray into Rust and parser combinators inspires you to explore their potential for enhancing financial software systems.

Footnotes

1 : Section 2.2 in FIGI allocation rules shows this hierarchy. This is specific to Equity securities which have a primary listing on an exchange but trade on all national exchanges for liquidity purposes. The Unlisted Trading Privileges act of 1994 was key in this development. For example, AAPL stock has a share-class level that is useful for cross-border transactions (i.e., trade on Eurex but settle in US DTC), a composite level for a consolidated pricing feed across all US exchanges, and an exchange-level for pricing/execution on specific exchanges (ATS). All three cases necessitate a different identifier.

2: By all asset classes we primarily mean exchange-traded products and debt instruments that trade in secondary markets. There are other heavily traded asset-classes where symbology is not as easy or relevant, such OTC private placements like forwards, money-markets, and swaps, or exchange-traded "contracts" like options, futures, commodities, and forex spots/forwards

Oxidising the markets

For years, I've been fascinated by software designed for financial markets. In my pursuit of knowledge, I've spent countless hours searching for insightful posts on lobste.rs and HN.1 I often found myself writing "posts" hidden away in my Obsidian vault to better understand what I read.2 These posts were useful not only for their content but also for the places they led me to, such as Matklad referencing tedinski. Over time, it felt increasingly important to try and give back to the small web of blogs that helped me so much.

To reciprocate, I have a two-part plan: first, to promote (i.e., backlink) the authors and posts that have helped me; second, to create my own content that can help others. The latter is more focused, as my value is largely determined by my knowledge and experience. Two realities have emerged from my journey:

  1. Rust's dominance in both quality and quantity of content
  2. The lack of programming-related content in finance

I like Rust. I like finance. This post's title, Oxidising the Markets, reflects both: the oxidation (i.e., rewrite it in Rust) of software powering financial markets. While an exaggeration, I believe there are interesting problems to be solved, and this blog will document my learning process. I'm inspired by Jane Street's work in this space, creating podcasts, blog posts, tech talks, and libraries that take "OCaml all the way down".

I plan to start by implementing a FIX (Financial Information eXchange) server, omnipresent in the industry, from basic foundations. I'll explain various business concepts like symbology, market structure, asset classes, and ECNs. These will provide real-world context for exploring technical approaches like parser combinators, incremental computation, domain types, and back-pressure.

References

1 : People with ADHD have been linked to evolutionary success in foraging, so perhaps I just forage for posts in lieu of berries.

2: "What I cannot create, I do not understand." - Richard Feynman