Our AI writing assistant, WriteUp, can assist you in easily writing any text. Click here to experience its capabilities.

Zero-ETL, ChatGPT, And The Future of Data Engineering

View Original View Raw

Summary

This article examines potential trends in the post-modern data stack that may affect data engineering, such as Zero-ETL, the use of large language models, and data product containers. It discusses the pros and cons of each and the potential impact on data engineering, as well as the practicality and value of each idea. It concludes that data engineers will still be needed in the future, as the demand for data will continue to increase in sophistication and scale.

Q&As

What is Zero-ETL and how is it changing the data engineering field?
Zero-ETL is a process that has the transactional database do the data cleaning and normalization prior to automatically loading it into the data warehouse. It reduces latency, eliminates duplicate data storage, and reduces the number of sources for failure.

What are the pros and cons of the One Big Table and Large Language Models?
The pros of the One Big Table and Large Language Models are that it could finally deliver on the promise of self service data analytics, speed up insights, and enable the data team to spend more time unlocking data value and building. The cons are that it could give too much freedom, and it could be difficult to solve beyond basic “metric fetching” due to the complexity of data pipelines for more advanced analytics.

What is a data product container and how could it unlock new ways to leverage data?
A data product container is a concept that imagines a containerization of the data table. It could make data more reliable and governable, and it could better surface information such as the semantic definition, data lineage, and quality metrics associated with the underlying unit of data.

How does the modern data stack differ from its predecessors?
The modern data stack differs from its predecessors in that it supports use cases and unlocks value from data in ways that were previously, if not impossible, then certainly very difficult. It also has less governance and the jury is still out on cost efficiency.

What role will data engineers play in the future of data engineering?
Data engineers will continue to play a crucial role in extracting value from data for the foreseeable future. They will be responsible for reviewing and debugging the process of large language models, and they will be needed to make sure data is properly shaped and used.

AI Comments

👍 This article provides an excellent overview of the modern data stack and how it is continuing to evolve. It does a great job of highlighting the pros and cons of each innovation and provides an interesting perspective on the future of data engineering.

👎 This article does not provide any concrete solutions or actionable advice on how to handle the changes in the data stack. It also does not discuss the cost associated with these changes and how it may impact data engineering teams.

AI Discussion

Me: It's about the potential future of data engineering, and how new ideas like Zero-ETL, ChatGPT, and data product containers could make a big impact. It looks at the pros and cons of each idea and how they could shape the future of data engineering.

Friend: Interesting. What implications do you think this has on data engineering?

Me: Well, the article suggests that data engineers will still play a crucial role in extracting value from data. Even though new technologies and innovations can simplify the complex data infrastructures of today, the demand and uses for data will continue to increase, so data engineers will be in demand. But there will be trade-offs - for example, data teams may have less ability to customize how data is treated during the ingestion phase if they choose to use Zero-ETL. It will be interesting to see how the future of data engineering plays out!

Action items

Research the current trends in data engineering and the modern data stack.
Explore the pros and cons of zero-ETL, one big table and large language models, and data product containers.
Experiment with data mesh implementations to understand the potential value and practicality of data product containers.

Technical terms

Zero-ETL: A misnomer for one thing; the data pipeline still exists. Today, data is often generated by a service and written into a transactional database. An automatic pipeline is deployed which not only moves the raw data to the analytical data warehouse, but modifies it slightly along the way.
ChatGPT: A bevy of startups aiming to take the power of large language models like GPT-4 to automate the process of translating business requirements into SQL queries and dashboards.
ELT: A data pipeline that moves raw data to the analytical data warehouse and modifies it slightly along the way.
SaaS: Software as a Service, a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted.
Data Warehouse: A repository of an organization's electronically stored data.
Data Mesh: A data architecture that enables data to be shared across multiple systems and applications.
Data Product Containers: A concept that imagines a containerization of the data table, which would enhance portability, infrastructure abstraction, and ultimately enable organizations to scale microservices.