Data only becomes useful when it’s been filtered, refined, and used to power valuable analytics for your business. Therefore, organizations small and large alike always strive to get their raw data from its source in order to derive the most meaning from it. That being said, it is difficult to know the right way of processing data to fully leverage its potential insights.
Like most things in life, the short answer to this predicament is dependent on a variety of factors, such as the number of variables in the data, its searchability, and the cost involved with building such a system. When time and/or cost factors are at play, most projects really can’t afford to explore all available options in great depth.
For starters, there are essentially two types of databases, schema on write and schema on read. The schema on write is the core of your traditional relational database (RDBMS) where you define its structure when data is created. The schema on read is your NoSQL database where data is mostly stored in its raw form and structure is provided when data is queried.
Simply put, data is becoming more complex, and with the evolving regulatory landscape, the data that powers the financial industry is no exception. The intent of this blog is not to walk you through the pros and cons of the two database technologies; instead, I would like to share a perspective on why I tend to favor the schema on read database technology as a means to alleviate the increasing data complexity I find to be a recurring challenge for client projects.
For example, when designing a financial fraud detection system, it may make sense to first choose schema on write if the goal is to count the number of frauds by type, location, and victims – where data is repetitive and of fixed dimension. However, when you consider the complexity of the data – which can range from money laundering to insider trading – as well as regulatory bodies enforcing greater oversight on and reporting requirements from financial institutions, the original simple financial fraud detection system will not suffice. The multitude of conditional data points associated with it make it too complex for a schema on write database.
Unlike the schema on write databases, schema on read will allow you to transform and bind data to relationships as you need it. Using the same example, when new dimensions of fraud are discovered, you can easily bind the new dimension to the existing financial fraud detection system.
Don’t let data complexity overwhelm you in your quest for data-derived insight. In my experience, schema on read is the database technology that best addresses the challenges of increasingly complex datasets in the financial industry and beyond.
– Dejian (DJ) Fang, Splunk Developer