Julian LaNeve is the Chief Technical Officer (CTO) at Astronomer, the driving force behind Apache Airflow and modern data orchestration to power everything from AI to general analytics.
Julian does product and engineering at Astronomer where he focuses on developer experience, data observability, and AI. He’s also the author of Cosmos, an Airflow provider for running dbt Core projects as Airflow DAGs.
He’s passionate about all things data and open source as he spends his spare time doing hackathons, prototyping new projects, and exploring the latest in data.
Could you share your personal story of how you became involved with software engineering, and worked your way up to being CTO of Astronomer?
I’ve been coding since I was in middle school. For me, engineering has always been a great creative outlet: I can come up with an idea and use whatever technology’s necessary to build towards a vision. After spending some time in engineering, though, I wanted to do more. I wanted to understand how businesses are run, how products are sold and how teams are built –– and I wanted to learn quickly.
I spent a few years working in management consulting at BCG, where I worked on a wide variety of projects in different industries. I learned a ton, but ultimately missed building products and working towards a longer-term vision. I decided to join Astronomer’s product management team, where I could still work with customers and build strategies (the things I enjoyed from consulting), but could also get very hands on building out the actual product and working with technology.
For a while, I acted as a hybrid PM/engineer –– I’d work with customers to understand the challenges they were facing and design products and features as a PM. Then, I’d take the product requirements and work with the engineering team to actually build out the product or feature. Over time, I did this with a larger set of products at Astronomer, which ultimately led to the CTO role I’m now in.
For users who are unfamiliar with Airflow, can you explain what makes it the ideal platform to programmatically author, schedule and monitor workflows?
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow provides the workflow management capabilities that are integral to modern cloud-native data platforms. It automates the execution of jobs, coordinates dependencies between tasks, and gives organizations a central point of control for monitoring and managing workflows.
Data platform architects leverage Airflow to automate the movement and processing of data through and across diverse systems, managing complex data flows and providing flexible scheduling, monitoring, and alerting. All of these features are extremely helpful for modern data teams, but what makes Airflow the ideal platform is that it is an open-source project –– meaning there is a community of Airflow users and contributors who are constantly working to further develop the platform, solve problems and share best practices.
Airflow also has many data integrations with popular databases, applications, and tools, as well as dozens of cloud services — and more are added every month.
How does Astronomer use Airflow for internal processes?
We use Airflow a ton! Naturally, we have our own data team that uses Airflow to deliver data to the business and our customers. They have some pretty sophisticated tooling they’ve built around Airflow that we’ve used as inspiration for feature development on the broader platform.
We also use Airflow for some pretty untraditional use cases, but it performs very well. For example, our CRE team uses Airflow to monitor the hundreds of Kubernetes clusters and thousands of Airflow deployments we run on behalf of our customers. Their pipelines run constantly to check for issues, and if we notice any, we’ll open proactive support tickets on behalf of our customers.
I’ve even used Airflow for personal use cases. My favorite (to date) was when I was moving to New York City. If you’ve ever lived here, you’ll know the rental market is crazy. Apartments get rented out within hours of them being listed. My roommates and I had a list of criteria we all agreed upon (location, number of bedrooms, bathrooms, etc), and I built an Airflow DAG that ran every few minutes, pulled new listings from various apartment listing sites, and texted me (thanks Twilio!) every time there was something new that matched our criteria. The apartment I’m now living in was found thanks to Airflow!
Astronomer designed Astro, a modern data orchestration platform, powered by Airflow. Can you share with us how this tool enables companies to easily place Airflow at the core of their data operations?
Astro enables organizations and more specifically, data engineers, data scientists, and data analysts, to build, run, and grow their mission-critical data pipelines on a single platform for all of their data flows. It is the only managed Airflow service that provides high levels of data security and protection and helps companies scale their deployments and free up resources to focus on their overarching business goals.
One of our customers, Anastasia, a cutting-edge technology company, chose Astro to manage Airflow because they didn’t have enough time or resources to maintain Airflow on their own. Astro works on the back end so teams can focus on core business activities, rather than spending time on undifferentiated activities like managing Airflow.
One of the core components of Astro is elastic scalability, could you define what this is and why it’s important for cloud computing environments?
For us, this just means our ability to meet the compute demands of our customers without running a ton of infrastructure all the time. Our customers use our platform for a wide variety of use cases, the majority of which have high compute requirements (training machine learning models, processing big data, etc). One of the core value propositions of Astronomer is that, as a customer, you don’t have to think about the machines running your pipelines. You deploy your pipelines to Astro, and can expect that they work. We’ve built a set of features and systems that help scale our infrastructure to meet the changing demands of our customers, and it’s something we’re excited to keep building upon in the future.
You were responsible for the Astronomer team building Ask-Astro, the LLM-powered chatbot for Apache Airflow. Can you share with us details on what is Ask-Astro and the LLMs that power it?
Our team at Astronomer has some of the most knowledgeable Airflow community members and we wanted to make it easier to share their knowledge. To do that, we created a reference implementation of Andreessen Horowitz’s Emerging Architectures for LLM Applications, which shows the most common systems, tools, and design patterns they’ve seen used by AI startups and sophisticated tech companies. We started with some informed opinions about this reference implementation and Apache Airflow also plays a central role in the architecture. Ask Astro is a real-life reference to show how to glue all the various pieces together.
Ask Astro is more than just another chatbot. The Astronomer team chose to develop the application in the open and regularly post about challenges, ideas, and solutions in order to develop institutional knowledge on behalf of the community. What were some of the biggest challenges that the team faced?
The biggest challenge was the lack of clear best practices in the community. Because “state of the art” was redefined every week, it was tough to understand how to approach certain problems (document ingestion, model selection, output accuracy measurement, etc). This was a key driver for us to build Ask Astro in the open. We wanted to establish a set of practices for LLM orchestration that work well for various use cases so our customers and community could feel well-prepared to adopt LLMs and generative AI technologies.
It’s proven to be a great choice –– the tool itself gets a ton of usage, we’ve given several public talks on how to build LLM applications, and we’ve even started working with a select group of customers to roll out internal versions of Ask Astro!
What’s your personal vision for the future of Airflow and Astronomer?
I’m really excited about the future of both Airflow and Astronomer. The Airflow community continues to grow and at Astronomer, we’re committed to fostering its development, support and connection across teams and individuals.
With increasing demand for data-driven insights and an influx of data sources, data engineers have a challenging job. We want to lighten the load for these individuals and teams by empowering them to integrate and manage complex data at scale. Today, this also means supporting AI adoption and implementation. In 2023, like many other companies, we focused on how we can accelerate AI use for our customers. Our platform, Astro, accelerates AI deployment, streamlines ML development, and provides the robust compute power needed for next-gen applications. AI will continue to be a focus for us this year and we’ll support our customers as new technologies and frameworks emerge.
In addition, Astronomer’s a great place to work and grow a career. As the data landscape continues evolving, working here gets more and more exciting. We’re building a great team here and have lots of technical challenges to solve. We also recently moved our headquarters to New York City where we can become an even greater part of the tech community that exists there and we’ll be better equipped to attract the best, most skilled talent in the industry. If you’re interested in joining the team to help us deliver the world’s data on time, reach out!
Thank you for the great interview, readers who wish to learn more should visit Astronomer.
Credit: Source link