Aman Garg

Apart from the focus on data quality, freshness SLAs and column based restrictions, more so for PII data, another focus should be on establishing the data lineage. Every fact / information ever created in an org should offer insights into it's lineage.

One of the problems organizations face is establishing all the clear write paths into the store and documenting the consumer behaviour from the store.

This not only helps with the right asset ownership but is a boon when it comes to figuring out the right migration and upgrade strategies.

--

--

Thanks for sharing. Would recommend switching to a consistent throughput unit to save on the reader's mental bandwidth.

The article switches anywhere from Billion / day as the title to per minute (when talking about feature store ingestion) to per second (write capacity units).

This makes it hard to grasp the impact. 1 Billion per day sounds like a lot (probably to garner readers' attention), but if you put it simply, it's just 270 K messages/sec. Otherwise, might as well say "How we process 1/3 trillion messages a year"

Having one standardised unit would help in improving the effectiveness of the changes you highlight.

--

--