Carolyn Harvey has extensive experience leading and growing global operations in the field of search relevance ranking and annotation for ML data. Carolyn is currently Chief Operations Officer (COO) of LXT where she leads the company’s global operations division, ensuring consistent delivery of all AI data programs and projects. She focuses on high-quality data at scale, building efficiencies in long-term programs and scaling across large numbers of global locales.
As COO of LXT, Carolyn lends her wealth of experience to develop a best-in-class organization.
Can you briefly describe what LXT does and your role as the COO?
Artificial intelligence relies on data to exist, and LXT is an emerging leader in delivering accurate, ethically sourced data that powers AI innovations. As Chief Operations Officer, my role is to oversee, lead and expand our global operations through strategies, structure, and processes that allow us to deliver the highest quality AI data to our customers. I ensure we deliver on time across a wide range of use cases, from generative AI to search relevance and self-driving cars, among many others.
How has LXT’s mission evolved since its inception in 2010?
Our mission is to power the technologies of the future through data generation and enhancement across every language, culture, and modality. Our goal is to help companies of all sizes capitalize on the incredible benefits that AI delivers by powering their models with high-quality data. As the company’s mission has evolved, our scope of services has expanded from language transcription and speech collection to include a wide range of solutions, including data collection and annotation for text, image and video, generative AI services, and more. We’ve also expanded our global footprint of ISO 27001-certified facilities to meet our customers’ growing needs for secure data services.
What have been the key drivers of its growth in the AI training data sector?
Continued investment in AI from organizations of all sizes has fueled our growth. Companies now know that AI is table stakes for them to remain competitive, and data powers AI. But not all data is equal, and companies that are succeeding in AI know that high-quality data is critical to creating more accurate AI.
Now with generative AI on everyone’s mind, this trend has opened even more growth opportunities for LXT. Humans are critical to ensuring that these solutions are accurate, ethical, and responsible. We provide a range of generative AI services in areas such as fine-tuning large language models, prompt creation and more. Our customers know that to build trust with end users, the output of their generative AI products needs to be factual, represent a diverse audience, and be free of toxic language. We can help them achieve these goals with our human in the loop services.
How has the explosion of generative AI impacted LXT and its customers?
LXT has seen increasing demand for its AI training data due to generative AI, both for core language-oriented data as well as newer aspects related to analysis, creativity, and critical thinking. We are also seeing an increase in demand for domain knowledge and specialized profiles for project workers.
Customer requests are increasingly going beyond the micro tasking machine learning inputs of the past toward LLMs, and the more complex data sets required by apps like ChatGPT, Gemini and the many offshoots. We are currently involved in several innovative projects where we are writing prompts aimed at confusing the generative AI to see how it responds, and then creating the correct answer.
In the future, this may evolve further into artificial general intelligence (AGI) where the data sets will map to even more complicated and sophisticated actions.
You have years of experience working in search and personalization to help improve these algorithms. What are some of the ways that leading companies are improving their search relevance to provide a better user experience?
In a world where time is precious and information is everywhere, improving search relevance can bolster loyalty, increase conversion rates, and make users more productive.
Search relevance begins with cleaning and organizing our customers’ data, rooting out anything that might generate false positives, and creating additional data fields through which search and recommendation engines can scour to generate more precise results. With the help of machine learning and natural language processing, customers can empower their search engine to more intuitively ascertain user intent and learn about their preferences over time. The result is a faster search experience that leads to more personalized results.
Reaching this goal requires large volumes of training data, with a particular focus on training algorithms how to recognize, rank and return relevant entities, and how to handle typos, grammatical errors, and other data anomalies. We also recommend a human-in-the-loop (HITL) reinforcement approach to ensure accurate data, reduced bias, and provide a better search experience for the end user. With ML advancements over the past 10 years, HITL has an intensified focus on quality review processes which drives a need for deeper experience from data providers.
Can you elaborate on LXT’s approach to data annotation and how it ensures the quality and accuracy of AI training data?
As an operations team, we must first understand how customers use the data we provide in the development of their products and services to ensure that it will fit their needs. To make this happen, we need to find experts in both project management and annotation who have experience with the type of data required.
From there, it is largely about preparation and finding the right resources at the start of each project. This includes aligning with customers on success factors during the scoping phase as well as deep qualification and vetting processes for project annotators that consider important details such as educational background, special interests, demographics, and experience. We also develop detailed learning and reference materials as a guide, customized for each project. We apply mature quality and process management oversight throughout all project lifecycles. The approach we use aligns with and informs industry best practices, ensuring results are meeting customer expectations.
And all these methodologies are in service of our guaranteed data quality promise.
How does LXT handle the challenge of annotating unstructured data, which comprises over 80% of all data?
LXT has built an internal annotation platform that automates many parts of the annotation process and provides structure and a consistent user interface for workers. In the pre-processing stage, we focus on preparation of the data, formatting the input files and removing duplicates, among other things, and in post-processing, address packaging the data, collating and formatting for delivery to the client.
Before the project kicks off, we create guidelines that are reviewed with the customer and iterated on throughout the project lifecycle as things change. We can break down the data labeling process into multiple tasks to focus on each element of the project properly. In addition, quality control methodologies are implemented to drive elimination of errors at scale.
Finally, our Operational Excellence Team is responsible for advanced process management to ensure high efficiency and scalability for our projects worldwide.
What are some of the biggest challenges LXT faces in collecting data at scale globally, and how do you overcome them?
Diversity and bias in participants and in the resulting data collections are often some of the biggest challenges that LXT, and any AI training data provider, will face. Other challenges include a recent demand for domain expertise and a rapidly changing landscape with the shift to LLMs and generative AI data.
We overcome these challenges through a highly proactive approach to sourcing our candidate pool, where we review expertise, experience, previous roles, interests, and demographics to form the right diversity among teams by gender or other aspects, such as analytical thinking or creative writing, educational backgrounds, among others.
Once we have sourced the right candidates, we take great care to engage workers on a regular basis to build a more experienced, loyal, and satisfied workforce over the long term.
In terms of AI evaluation, how does LXT work to mitigate bias and ensure ethical outputs in the AI systems it helps train?
As mentioned earlier, ensuring diversity is a challenge that many AI training data providers must solve, and that will go a long way toward mitigating bias and ensuring ethical outputs.
I’ll refer again to our engagement best practices which include finding diverse and representative annotators and being thorough with guidelines and quality control measures. We have an impact sourcing strategy that allows us to bring work to diverse and new groups of annotators, such as in long tail language regions.
We target ethical outputs through our use of industry best practices, aligning on expectations with our customers and driving higher standards for our project managers and annotators. Communication is essential as well as compliance audits, bias analysis and a commitment to data regulation and privacy requirements.
What is the long-term vision for LXT and how do you see the company evolving in the next five years?
Our vision is to provide accurate, ethically sourced data to help drive the rollout of AI and the technologies of the future that will enhance and improve the experience of people around the world.
While automation and technology are important in AI, there is also an important human component that complements the technology. As we move from simple automated tasks to large language models (LLMs), and from generative AI to general artificial intelligence (GAI), it will be critical that AI products faithfully represent the people, both those who generate the data and our global communities at large.
At LXT, we strive to ensure that AI is used in a positive and transformative way that reflects these values.
Thank you for the great interview, readers who wish to learn more should visit LXT.
Credit: Source link