The Financial Instrument Global Identifier (FIGI) is an open alternative to privately owned and licensed identifiers commonly used in financial markets, such as ISIN and CUSIP. With a comprehensive specification and a freely accessible API, FIGIs are a solid choice for exploring parsing in a financial context. In this post, we explore modeling identifiers with domain types, compare imperative and combinator approaches to parsing, and finally discuss composability and performance of these in a broader system.
Structure and Characteristics
As defined in § 6.1 of the spec, a FIGI is a "twelve character string semantically meaningless", contrasted with contexually rich identifiers tailored for human use, like stock ticker symbols. They serve to "identify financial instruments at the most granular possible level", while also being durable throughout cororpate actions (e.g., delistings, ticker changes) and available for all asset classes and regions.
The charset for FIGIs, as defined in § 6.2, is a subset of ISO 8859-1, consisting of upper-case consonants (including "Y") and numerals 0–9. The twelve characters have specific positional rules:
- Pos 1–2: two upper-case consonants excluding combinations 'BS', 'BM', 'GG', 'GB', 'GH', 'KY', 'VG'
- Pos 3: 'G' (for "global")
- Pos 4–11: any combination of upper case consonants and the numerals 0–9
- Pos 12: check digit 0–9 calculated using a modified Luhn algorithm
Bloomberg and Kaiko are the only two certified providers of FIGIs, issuing them for traditional financial instruments and cryptocurrencies, respectively. All FIGIs will have the prefix "BBG" if issued by Bloomberg and "KKG" if issued by Kaiko. For example, the FIGI for Apple Inc. (AAPL) on NYSE is "BBG000B9XVV8", whereas the FIGI for BTC/USDT on Binance is "KKG000003B64".
Modeling FIGIs in Rust
Trading and settlement protocols must decide how to convey symbology between market participants. The lingua franca in financial markets is the FIX protocol, which uses the Instrument
component block. Other examples include Goldman Sach's AssetIdentifier
for their Quant platform and Interactive Broker's SecIdType
for their Trader Workstation. These three protocols—developed independently and designed for different purposes—model symbology the same with two fields: a string value along with a discriminator denoting the type (e.g., FIX's SecurityIdSource(22) and SecurityId(48)).
enum SecurityIdType {
FIGI,
ISIN,
// ...
}
struct CreateOrder {
security_id: Option<String>,
security_id_source: Option<SecurityIdType>,
// ...
}
While this modeling makes sense for wire formats bound to backwards-compatibility, it has no place in business logic maximising correctness and simplicity. Imagine creating an API for fetching the VWAP for an equity trading on a specific exchange. A naive API signature would be fetch_vwap(id: &str, id_type: IdType) -> u64
, but calling this correctly requires meeting certain (undocumented) pre-conditions of the identifier:
- adherence to the syntactic requirements set forth by a specification
- granular enough to uniquely identify an instruemnt at a trading venue level (versus e.g. a global-share class identifer)
Implementing these checks inside the fetch_vwap
function for many such APIs is not acceptable. Instead, we can use the parse, don't validate idiom for parsing identifier strings into universal domain types. In Rust, this is called the NewType pattern, which usually wraps primitive types in a single-field tuple struct. We then implement conversion traits like FromStr
to construct these domain types, providing a natural place for validation checks.
struct Figi(String);
impl FromStr for Figi {
type Err = FigiParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
// Enforce conformance to FIGI's specification
// Returning an `Ok` on success or `Err` if input is malformed
todo!()
}
}
assert(Figi::from_str("BBG000B9XVV8").is_ok());
assert(Figi::from_str("US0378331005").is_err());
Our VWAP function can then be implemented like so:
async fn fetch_exchange_vwap(figi: &Figi) -> u64 { ... }
By embedding correctness into the type-system, we have eliminated all three aforementioned concerns with the naive API, providing a robust approach to modeling FIGIs in our application.
Parsing Imperatively
Now that we have a NewType Figi
and a fallible constructor from_str
, we must decide how to implement the validation logic. As outlined before, the structure is relatively simple, and hand-rolling a validation function is trivial. Let's translate Python's stdnum implementation of figi.validate
as our Rust baseline (full code):
pub struct Figi(pub String);
pub enum FigiParseError {
InvalidLength,
InvalidFormat,
InvalidComponent,
InvalidChecksum,
}
impl FromStr for Figi {
type Err = FigiParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let valid_chars = "0123456789BCDFGHJKLMNPQRSTVWXYZ";
if s.len() != 12 {
return Err(FigiParseError::InvalidLength);
}
if !s.chars().all(|c| valid_chars.contains(c)) {
return Err(FigiParseError::InvalidFormat);
}
match &s[0..2] {
"BS" | "BM" | "GG" | "GB" | "VG" => return Err(FigiParseError::InvalidComponent),
_ => {}
}
if &s[2..3] != "G" {
return Err(FigiParseError::InvalidComponent);
}
if !s.chars().last().unwrap().is_digit(10) {
return Err(FigiParseError::InvalidChecksum);
}
Ok(Self(s.to_string()))
}
}
Note that the logic is the same as stdnum's, but the approaches differ greatly in how and when it is called (i.e., ad-hoc in Python but systematic in Rust). Some argue for "writing Python like it's Rust", but deviating from conventions rarely takes hold. Nevertheless, we've solved two "issues" from the Python implementation:
- Systematic enforcement of FIGI well-formedness at construction time
- Errors (in the signature) that must be handled at the call site
The tests pass and the benchmark clocks in at 80.945 ns, so this is pretty solid! We have created a more intutive API, that is performant, has decent error ergonomics, and leverages the type system. But what about composability?
Composability of parsers
Composability in the abstract never helped me grok what it meant in practice. Instead, we will look at a real-world example where we must parse a FIGI as part of a larger format. Bloomberg's BLPAPI is a perfect for that. The syntax for Symbology (a methodology for identifying securities), as defined in § 3.2, consists of following components seperated by a "/":
- Identifier Type: "ticker", "cusip", "isin", "sedol1", "sedol2", "buid", "bbgid", "bsym", etc.
- Identifier Value: Unique identifier value according to Identifier Type
- Provider: Optional mnemonic that has contributed pricing for the given
security preceded by a “@”. If
is not specified, a default value may apply depending on product. - Pricing Source: Optional, generally two-character mnemonic for the data source where the security is traded. For example, in the Equities Business, the Data Source is the Exchange
- Yellow Key: market sector—"Govt", "Corp", "Mtge", "M-Mkt", "Muni", "Pfd", "Equity", "Comdty", "Index" "Curncy"
Using the Govt (i.e. treasury bond) example "/bbgid/BBG007Z1JW11@BVAL" we have two components (the type and pricing source) that we must parse in addition to the FIGI.
Parser combinators initially seemed like tools solely for functional programming savants. Thankfully, introductory posts like Practical Parsing in Rust with nom and Error recovery with parser combinators revealed their approachability and usefulness. For our use case, they are superior in their composability with other parsers.
Composability is a term I never truly grasped. So let's start with Bloomberg's BLPAPI, which defines identifiers as /<Identifier Type>/<Identifier Value>[@Provider| Pricing Source] <Yellow Key>
. Here we only consider <Identifier Type>
as "bbgid" (FIGI). Some examples:
Asset | Identifier | Note |
---|---|---|
Govt | "//blp/mktdata/bbgid /BBG007Z1JW11@BVAL" | @BVAL is a pricing source, akin to an exchange from a pricing perspective |
Equity | "//blp/mktdata/ bbgid/BBG000B9XVV8" | FIGI here is most granular, pointing to a specific exchange |
The first components ("//blp/mktdata") are the service schema defined as //blp/<servicename>
, supporting:
- "//blp/refdata" for reference data
- "//blp/mktdata" for market data
- "//blp/mktbar" for market bar data
The second major component is the security identifier which includes FIGI along with other commonly used types. Just as the specifications decompose the top-level identifier into smaller parts, we can decompose our parsers into smaller (and independent) parts, composing them to handle more complex structures.
Worked Combinators
Many combinator libraries exist in Rust: nom, chumsky, winnow, monch. All are superb, but I chose winnow for its extensive blog posts, tutorials, and special topics.
BLPAPI Service
We'll model our types first:
#[derive(Debug, PartialEq)]
pub enum Scheme {
BLP,
}
#[derive(Debug, Clone, PartialEq)]
pub enum Provider {
RefData,
MktData,
MktBar,
}
#[derive(Debug, PartialEq)]
pub struct Service {
scheme: Scheme,
provider: Provider,
}
Then we parse them using winnow:
fn scheme<'s>(i: &mut &'s str) -> PResult<Scheme> {
"blp".value(Scheme::BLP).parse_next(i)
}
fn provider<'s>(i: &mut &'s str) -> PResult<Provider> {
alt((
"refdata".value(Provider::RefData),
"mktdata".value(Provider::MktData),
"mktbar".value(Provider::MktBar),
))
.parse_next(i)
}
This uses built-in Rust type for constructing parsers, so rather than literal("blp")
we use impl Parser for &'s str
provided by Winnow. The Parser
trait implements a few combinators directly, e.g. Parser::value
to produce a value as output.
To compose these into Service
, expecting //<scheme>/<provider>
and ignoring trivia:
fn service<'s>(i: &mut &'s str) -> PResult<Service> {
("//", scheme, "/", provider)
.map(|(_, scheme, _, provider)| Service { scheme, provider })
.parse_next(i)
}
Here we are using Winnow's impl Parser for (P1, P2, P3, P4)
to create a parser from a tuple
of size 4. This pattern of mapping a sequence of parsers into a struct
is common enough that a macro-based combinator, seq!
, exists to make this easier.
fn service<'s>(i: &mut &'s str) -> PResult<Service> {
seq! {
Service{
_: "//",
scheme: scheme,
_: "/",
provider: provider
}
}
.parse_next(i)
}
Using cargo expand we can view the expansion:
fn service<'s>(i: &mut &'s str) -> PResult<Service> {
::winnow::combinator::trace(
"Service",
move |input: &mut _| {
use ::winnow::Parser;
let _ = "//".parse_next(input)?;
let scheme = scheme.parse_next(input)?;
let _ = "/".parse_next(input)?;
let provider = provider.parse_next(input)?;
#[allow(clippy::redundant_field_names)]
Ok(Service {
scheme: scheme,
provider: provider,
})
},
)
.parse_next(i)
}
We see a more imperative implementaiton here, which is encouraged as one of Winnow's stated goals is to be an "introspectable toolbox that can easily be customized at any level".
BLPAPI Identifier
On to the "/bbgid/…" identifier portion, we face two aspects different from Service
:
- Path-dependence: the
<id>
parser depends on the<id_type>
parser - Backtracking:
alt
backtracks on failure, which doesn't make sense once we parseid_type
We solve this with cut_err
to disable other branches upon error (see Chapter 6 for more detail):
alt((
("figi", cut_err(figi)).context(StrContext::Label("FIGI")),
("isin", cut_err(isin)).context(StrContext::Label("ISIN")),
...
))
Another pattern is the trivia ("/") between components. Combinators like preceded
and delimited
can ignore complex sets of trivia.
WIP
Conclusion
In summary, parser combinators in Rust enable powerful, composable solutions for financial identifiers. With the NewType pattern, clear error handling, and expressive APIs, Rust helps us model this domain robustly and ergonomically. I hope this foray into Rust and parser combinators inspires you to explore their potential for enhancing financial software systems.
Footnotes
1 : Section 2.2 in FIGI allocation rules shows this hierarchy. This is specific to Equity securities which have a primary listing on an exchange but trade on all national exchanges for liquidity purposes. The Unlisted Trading Privileges act of 1994 was key in this development. For example, AAPL stock has a share-class level that is useful for cross-border transactions (i.e., trade on Eurex but settle in US DTC), a composite level for a consolidated pricing feed across all US exchanges, and an exchange-level for pricing/execution on specific exchanges (ATS). All three cases necessitate a different identifier.
2: By all asset classes we primarily mean exchange-traded products and debt instruments that trade in secondary markets. There are other heavily traded asset-classes where symbology is not as easy or relevant, such OTC private placements like forwards, money-markets, and swaps, or exchange-traded "contracts" like options, futures, commodities, and forex spots/forwards