
There are many new data technologies in 2021: more hot data engines such as ClickHouse, Iceberg, Delta Lake; more data pipeline tools: Apache DolphinScheduler, Apache SeaTunnel, more datamining libs: Ray, Orange and Hugging Face etc.
Today I would like to share some architectures using these new open-source projects in China Internet Companies like Tencent, Alibaba, VIP etc. and the challenges they experienced when using these new open-source projects facing millions of data requirement everyday. How to settle down the problem with new tools and let data developers, data scientist to finish the job themselves easily.