Hello, Kafka enthusiasts!
Kafka Summit, the premier event for developers, architects, data engineers, DevOps professionals, and streaming data lovers got finally back in person on April 25th-26th at the magnificent venue The O2 in London. Hosted by Confluent, the event has been a blast with over 1.500 people joining two days filled with workshops, talks and networking.
Wanna delve deeper into Kafka Summit’s technical talks? Don’t miss our sister company Bitrock’s latest blog post!
Some numbers about Kafka
Nowadays, more than 100.000 companies employ Kafka, the platform by Apache Software Foundation. The open-source platform community is growing day after day, thanks to members’ suggestions and contributions.
This marked an amazingly rapid growth, making Kafka one of the most popular open-source projects in the world.
Definitely, Kafka and data streaming are already part of our daily lives; numerous industries are powered by the Apache Software Foundation’s open-source platform and streaming data technologies, such as:
- Healthcare and medication suggestions;
- Grocery delivery;
- Smart home devices;
- Renewable energy, power grid and smart device management.
Day 1: The Keynote; highlights the current revolution of Data
Day 1 started with the impressive Keynote speech, introduced by Confluent Lead Technologist Ben Stopford. The Keynote kept on with Confluent CEO Jay Kreps’ cracking talk “Modern Data Flow: Data pipelines done right”.
According to Jay, Kafka acts as a kind of central nervous system that connects every application and operational layer — allowing companies to capture a real-time feed. This is useful to act on it intelligently — allowing to build smarter and more connected software.
As a consequence, this approach rises to every different kind of application across any industry one can imagine. Quoting Jay: ”Broadly, we could kind of cut this list into two buckets: the streaming apps, which are acting on real-time data streams to take action, to react or respond; and the streaming pipelines, which are getting the data from place to place. And there’s not a hard distinction between these two.”
Let’s continue by answering the question: “What is happening in the world of data?”
“How do we get the right data to the right place and the right format at the right time, that whole grody area of piping data around at an organization. This is something that’s been around for a while. There’s a whole ecosystem of tools, both quite old and incredibly new that has been around to try and serve this.” states Kreps. “I think Kafka is kind of disrupting this space from the bottom. It’s kinda coming up into it, giving a different way of thinking about data”.
This is the reason why Kreps talked about five principles for modern data flow; “and by modern data flow, I mean where I think this area of pipelines and streaming ETL and data movement is going, it’s all about the flow of data across an organization” says Jay. The five principles include different layers of data management and development, such as:
- (Data) Streaming
- Decentralized (Data)
- Governed & Observable
In particular, we’ll focus on Streaming and Decentralization concepts, as we in Radicalbit are evolving over the same ideas with Helicon, the platform for Data & ML Engineering.
#1: (Data) Streaming
According to Jay, the majority of technologies are still built around batch data extraction, batch data processing and batch data delivery.
“I think that to be building new batch tools in 2022 is a little bit surprising” states Kreps, “I’m reminded of the quote from Henry Ford, who said. ‘If I asked people what they want, they would have told me to build a faster horse’.” Following Kreps, there is a kind of an element that makes batch tools simpler and better: streaming technology. “It’s not like all the ‘horses’ are gone, but it’s clear that cars at this point are the future. And I think that’s analogous to data. It’s not like all our data is streaming yet, but it’s pretty clear the direction that things are heading in.”
Through streaming technology, one has more and more need for businesses to be able to act intelligently in real-time. “When we think about data pipeline technologies, you’re either gonna have a good streaming pipeline which can serve a broad variety of use cases, or you’re gonna end up doing everything twice. You can have different ad hoc pipelines for each thing, which is quite a burden to get right, as we think about the large ecosystem of data that we’re trying to serve.”
This is precisely what Radicalbit’s Data for AI platform, Helicon, is working on. Through intelligent improvements, one doesn’t need to store info in data lakes or do batch. But customers can move to the dynamic and flexible use of AI through autonomous processes designed to grant tangible results.
#2: Decentralized (Data)
In the words of Kreps, “decentralization” means breaking sequential processes up. “For those who have heard about data meshes as kind of an organizing principle for data, that’s this, it’s about decentralization” states Confluent Co-Founder & CEO.
Data Mesh most relevant feature is a new organisational and architectural model that can recognise the importance of a distributed and domain-driven approach to data organisation along with a centralised one to data governance. This makes data actual “outcomes” that can be offered and managed by specific domains, meeting a company’s business requirements, rather than individual application needs.
Jay mentioned that data is flowing across departments and taking different shapes. For him, it’s about time to give developers the freedom of choosing the best data platform to enable a fast and reliable connection between services. “Almost every system in a company needs to have access to data that is maintained elsewhere in the organization. And that data needs to be fresh, it needs to be up to date,” says Jay.
Taking Helicon as an example, one can imagine our platform as a central hub that takes every real-time impulse and flow of data and allows everything to plug into what is needed. Moreover, each team member doesn’t have to know all the details of everybody’s work; allowing companies to scale the use of data. Indeed, decentralized work is a reality with Helicon; Data Scientists can work on one part of a project, communicating with another team (let’s say, Data Engineers) that is working on completely something else.
“So that the teams who are publishing out the data, all they know is that they send data in that schema, the teams that subscribe don’t need to know all the implementation details, they tap into it. To support real-time streaming, you’re gonna have a bunch of things that run all the time. So there’s no after it runs, it’s running continuously, you can’t sequence the running of things.” states Jay.
The Declarative concept spots the opinion that developers should aim to write code that tells what they want to achieve and not how. According to Jay, we are moving from heavy systems and centric infrastructures to something lighter and more declarative. “We’re moving from tools that are kind of closed, gooey oriented ecosystems to something that is kind of an open, developer-friendly ecosystem, and something a little bit opaque to something that’s governed and observable as it occurs,” concludes Kreps.
The fourth concept by Jay is Developer-oriented tools; this might be a slogan for Open Source Software that developers can embrace without licensing limits and integrate with their favourite tools. Moreover, through pre-packaged integrations, developers could spend less time on plumbing allowing them to focus on building. “Something that would’ve been more of a whiteboard concept with a lot of low-level implementation details can now be expressed simply, you can version control it, you can review it, you can use your full toolchain to work with that thing. And I think that’s a really important characteristic of developer-oriented tools. And this is something I think the modern ecosystem does pretty well.”
#5: Governed & Observable
In this era of ever-growing data, it’s easy to lose control over assets. Therefore, developers should have tools to keep an eye on their data landscape. This is where prebuilt integrations to internal and external services and an accessible console can make the difference.
To wrap up
We are now heading in the direction already predicted by Jay; society is inevitably becoming more and more real-time-oriented. Confluent CEO expresses this concept pretty well; “society and infrastructures are becoming more and more real-time,” and he’s right — we’re living in a world where everything is faster and more immediate.
Radicalbit did believe this from the very beginning, with a pioneering and innovation-oriented vision and approach. Our team has worked tirelessly to create something that will improve your business; this is why we’ve developed Helicon to be intuitive and easy to use, so that anyone can use it right away!