Dr. Mike Flaxman is currently the VP of Product at HEAVY.AI, having previously served as Product Manager and led the Spatial Data Science practice in Professional Services. He has spent the last 20 years working in spatial environmental planning. Prior to HEAVY.AI, he founded Geodesign Technolgoies, Inc and cofounded GeoAdaptive LLC, two startups applying spatial analysis technologies to planning. Before startup life, he was a professor of planning at MIT and Industry Manager at ESRI.
HEAVY.AI is a hardware-accelerated platform for real-time, high-impact data analytics. It leverages both GPU and CPU processing to query massive datasets quickly, with support for SQL and geospatial data. The platform includes visual analytics tools for interactive dashboards, cross-filtering, and scalable data visualizations, enabling efficient big data analysis across various industries.
Can you tell us about your professional background and what led you to join HEAVY.AI?
Before joining HEAVY.AI, I spent years in academia, ultimately teaching spatial analytics at MIT. I also ran a small consulting firm, with a variety of public sector clients. I’ve been involved in GIS projects across 17 countries. My work has taken me from advising organizations like the Inter American Development Bank to managing GIS technology for architecture, engineering and construction at ESRI, the world’s largest GIS developer
I remember vividly my first encounter with what is now HEAVY.AI, which was when as a consultant I was responsible for scenario planning for the Florida Beaches Habitat Conservation Program. My colleagues and I were struggling to model sea turtle habitat using 30m Landsat data and a friend pointed me to some brand new and very relevant data – 5cm LiDAR. It was exactly what we needed scientifically, but something like 3600 times larger than what we’d planned to use. Needless to say, no one was going to increase my budget by even a fraction of that amount. So that day I put down the tools I’d been using and teaching for several decades and went looking for something new. HEAVY.AI sliced through and rendered that data so smoothly and effortlessly that I was instantly hooked.
Fast forward a few years, and I still think what HEAVY.AI does is pretty unique and its early bet on GPU-analytics was exactly where the industry still needs to go. HEAVY.AI is firmly focussed on democratizing access to big data. This has the data volume and processing speed component of course, essentially giving everyone their own supercomputer. But an increasingly important aspect with the advent of large language models is in making spatial modeling accessible to many more people. These days, rather than spending years learning a complex interface with thousands of tools, you can just start a conversation with HEAVY.AI in the human language of your choice. The program not only generates the commands required, but also presents relevant visualizations.
Behind the scenes, delivering ease of use is of course very difficult. Currently, as the VP of Product Management at HEAVY.AI, I’m heavily involved in determining which features and capabilities we prioritize for our products. My extensive background in GIS allows me to really understand the needs of our customers and guide our development roadmap accordingly.
How has your previous experience in spatial environmental planning and startups influenced your work at HEAVY.AI?
Environmental planning is a particularly challenging domain in that you need to account for both different sets of human needs and the natural world. The general solution I learned early was to pair a method known as participatory planning, with the technologies of remote sensing and GIS. Before settling on a plan of action, we’d make multiple scenarios and simulate their positive and negative impacts in the computer using visualizations. Using participatory processes let us combine various forms of expertise and solve very complex problems.
While we don’t typically do environmental planning at HEAVY.AI, this pattern still works very well in business settings. So we help customers construct digital twins of key parts of their business, and we let them create and evaluate business scenarios quickly.
I suppose my teaching experience has given me deep empathy for software users, particularly of complex software systems. Where one student stumbles in one spot is random, but where dozens or hundreds of people make similar errors, you know you’ve got a design issue. Perhaps my favorite part of software design is taking these learnings and applying them in designing new generations of systems.
Can you explain how HeavyIQ leverages natural language processing to facilitate data exploration and visualization?
These days it seems everyone and their brother is touting a new genAI model, most of them forgettable clones of each other. We’ve taken a very different path. We believe that accuracy, reproducibility and privacy are essential characteristics for any business analytics tools, including those generated with large language models (LLMs). So we have built those into our offering at a fundamental level. For example, we constrain model inputs strictly to enterprise databases and to provide documents inside an enterprise security perimeter. We also constrain outputs to the latest HeavySQL and Charts. That means that whatever question you ask, we will try to answer with your data, and we will show you exactly how we derived that answer.
With those guarantees in place, it matters less to our customers exactly how we process the queries. But behind the scenes, another important difference relative to consumer genAI is that we fine tune models extensively against the specific types of questions business users ask of business data, including spatial data. So for example our model is excellent at performing spatial and time series joins, which aren’t in classical SQL benchmarks but our users use daily.
We package these core capabilities into a Notebook interface we call HeavyIQ. IQ is about making data exploration and visualization as intuitive as possible by using natural language processing (NLP). You ask a question in English—like, “What were the weather patterns in California last week?”—and HeavyIQ translates that into SQL queries that our GPU-accelerated database processes quickly. The results are presented not just as data but as visualizations—maps, charts, whatever’s most relevant. It’s about enabling fast, interactive querying, especially when dealing with large or fast-moving datasets. What’s key here is that it’s often not the first question you ask, but perhaps the third, that really gets to the core insight, and HeavyIQ is designed to facilitate that deeper exploration.
What are the primary benefits of using HeavyIQ over traditional BI tools for telcos, utilities, and government agencies?
HeavyIQ excels in environments where you’re dealing with large-scale, high-velocity data—exactly the kind of data telcos, utilities, and government agencies handle. Traditional business intelligence tools often struggle with the volume and speed of this data. For instance, in telecommunications, you might have billions of call records, but it’s the tiny fraction of dropped calls that you need to focus on. HeavyIQ allows you to sift through that data 10 to 100 times faster thanks to our GPU infrastructure. This speed, combined with the ability to interactively query and visualize data, makes it invaluable for risk analytics in utilities or real-time scenario planning for government agencies.
The other advantage already alluded to above, is that spatial and temporal SQL queries are extremely powerful analytically – but can be slow or difficult to write by hand. When a system operates at what we call “the speed of curiosity” users can ask both more questions and more nuanced questions. So for example a telco engineer might notice a temporal spike in equipment failures from a monitoring system, have the intuition that something is going wrong at a particular facility, and check this with a spatial query returning a map.
What measures are in place to prevent metadata leakage when using HeavyIQ?
As described above, we’ve built HeavyIQ with privacy and security at its core. This includes not only data but also several kinds of metadata. We use column and table-level metadata extensively in determining which tables and columns contain the information needed to answer a query. We also use internal company documents where provided to assist in what is known as retrieval-augmented generation (RAG). Lastly, the language models themselves generate further metadata. All of these, but especially the latter two can be of high business sensitivity.
Unlike third-party models where your data is typically sent off to external servers, HeavyIQ runs locally on the same GPU infrastructure as the rest of our platform. This ensures that your data and metadata remain under your control, with no risk of leakage. For organizations that require the highest levels of security, HeavyIQ can even be deployed in a completely air-gapped environment, ensuring that sensitive information never leaves specific equipment.
How does HEAVY.AI achieve high performance and scalability with massive datasets using GPU infrastructure?
The secret sauce is essentially in avoiding the data movement prevalent in other systems. At its core, this starts with a purpose-built database that’s designed from the ground up to run on NVIDIA GPUs. We’ve been working on this for over 10 years now, and we truly believe we have the best-in-class solution when it comes to GPU-accelerated analytics.
Even the best CPU-based systems run out of steam well before a middling GPU. The strategy once this happens on CPU requires distributing data across multiple cores and then multiple systems (so-called ‘horizontal scaling’). This works well in some contexts where things are less time-critical, but generally starts getting bottlenecked on network performance.
In addition to avoiding all of this data movement on queries, we also avoid it on many other common tasks. The first is that we can render graphics without moving the data. Then if you want ML inference modeling, we again do that without data movement. And if you interrogate the data with a large language model, we yet again do this without data movement. Even if you are a data scientist and want to interrogate the data from Python, we again provide methods to do this on GPU without data movement.
What that means in practice is that we can perform not only queries but also rendering 10 to 100 times faster than traditional CPU-based databases and map servers. When you’re dealing with the massive, high-velocity datasets that our customers work with – things like weather models, telecom call records, or satellite imagery – that kind of performance boost is absolutely essential.
How does HEAVY.AI maintain its competitive edge in the fast-evolving landscape of big data analytics and AI?
That’s a great question, and it’s something we think about constantly. The landscape of big data analytics and AI is evolving at an incredibly rapid pace, with new breakthroughs and innovations happening all the time. It certainly doesn’t hurt that we have a 10 year headstart on GPU database technology. .
I think the key for us is to stay laser-focused on our core mission – democratizing access to big, geospatial data. That means continually pushing the boundaries of what’s possible with GPU-accelerated analytics, and ensuring our products deliver unparalleled performance and capabilities in this domain. A big part of that is our ongoing investment in developing custom, fine-tuned language models that truly understand the nuances of spatial SQL and geospatial analysis.
We’ve built up an extensive library of training data, going well beyond generic benchmarks, to ensure our conversational analytics tools can engage with users in a natural, intuitive way. But we also know that technology alone isn’t enough. We have to stay deeply connected to our customers and their evolving needs. At the end of the day, our competitive edge comes down to our relentless focus on delivering transformative value to our users. We’re not just keeping pace with the market – we’re pushing the boundaries of what’s possible with big data and AI. And we’ll continue to do so, no matter how quickly the landscape evolves.
How does HEAVY.AI support emergency response efforts through HeavyEco?
We built HeavyEco when we saw some of our largest utility customers having significant challenges simply ingesting today’s weather model outputs, as well as visualizing them for joint comparisons. It was taking one customer up to four hours just to load data, and when you are up against fast-moving extreme weather conditions like fires…that’s just not good enough.
HeavyEco is designed to provide real-time insights in high-consequence situations, like during a wildfire or flood. In such scenarios, you need to make decisions quickly and based on the best possible data. So HeavyEco serves firstly as a professionally-managed data pipeline for authoritative models such as those from NOAA and USGS. On top of those, HeavyEco allows you to run scenarios, model building-level impacts, and visualize data in real time. This gives first responders the critical information they need when it matters most. It’s about turning complex, large-scale datasets into actionable intelligence that can guide immediate decision-making.
Ultimately, our goal is to give our users the ability to explore their data at the speed of thought. Whether they’re running complex spatial models, comparing weather forecasts, or trying to identify patterns in geospatial time series, we want them to be able to do it seamlessly, without any technical barriers getting in their way.
What distinguishes HEAVY.AI’s proprietary LLM from other third-party LLMs in terms of accuracy and performance?
Our proprietary LLM is specifically tuned for the types of analytics we focus on—like text-to-SQL and text-to-visualization. We initially tried traditional third-party models, but found they didn’t meet the high accuracy requirements of our users, who are often making critical decisions. So, we fine-tuned a range of open-source models and tested them against industry benchmarks.
Our LLM is much more accurate for the advanced SQL concepts our users need, particularly in geospatial and temporal data. Additionally, because it runs on our GPU infrastructure, it’s also more secure.
In addition to the built-in model capabilities, we also provide a full interactive user interface for administrators and users to add domain or business-relevant metadata. For example, if the base model doesn’t perform as expected, you can import or tweak column-level metadata, or add guidance information and immediately get feedback.
How does HEAVY.AI envision the role of geospatial and temporal data analytics in shaping the future of various industries?
We believe geospatial and temporal data analytics are going to be critical for the future of many industries. What we’re really focused on is helping our customers make better decisions, faster. Whether you’re in telecom, utilities, or government, or other – having the ability to analyze and visualize data in real-time can be a game-changer.
Our mission is to make this kind of powerful analytics accessible to everyone, not just the big players with massive resources. We want to ensure that our customers can take advantage of the data they have, to stay ahead and solve problems as they arise. As data continues to grow and become more complex, we see our role as making sure our tools evolve right alongside it, so our customers are always prepared for what’s next.
Thank you for the great interview, readers who wish to learn more should visit HEAVY.AI.
Credit: Source link