Innovating AI: Artificial Intelligence and the Data Quality Challenge
AFCEA SBC Innovation Roundtable Webinar| April 15, 2025 | 5-6 PM ET
Introduction: Dean Niedosik
Good evening and welcome to the AFCEA Small Business Committee’s Innovation Roundtable. I am your host Dean Niedosik, Co-Chair of the AFCEA Innovation Roundtable, and I’d like to introduce our Co-Chair and Co-host for the event, Alison Gonzalez. We are both happy to be here and most grateful to serve as your host for today’s session titled “Innovating AI: Artificial Intelligence and the Data Quality.”
This virtual roundtable brings together thought leaders from industry, academia, and government to explore one of the most critical — yet often overlooked — pillars of successful AI and machine learning implementation: data quality.
As AI/ML adoption accelerates across the public and private sectors, the integrity of the data we use becomes not just a technical concern, but a strategic one. Today, we’ll hear from panelists who bring distinct perspectives on how their organizations are:
- Tackling real-world data quality challenges,
- Balancing innovation with risk and governance,
- And using AI/ML not just to transform processes — but to do so responsibly and efficiently.
Whether your focus is internal operations or client delivery, our goal is to unpack the benefits, risks, and practical considerations when implementing AI/ML in environments where the quality of your data can determine the outcome of your mission.
Tonight, We’re honored to be joined by:
- Shivaji Sengupta, Founder & CEO of NXTKey Corporation,
- Dr. Biplab Pal, Adjunct Research Professor at the University of Maryland Baltimore County (UMBC),
- Wole Moses, Chief AI Officer for the Federal Civilian Sector at Microsoft.
Our discussion tonight will be expertly moderated by Mr. Moses who has deep experience in this field, so I welcome him to provide additional insight to the conversation from both a moderator and panelist perspective.
Panel Questions and Answers:
Opening & Context
How do you define “data quality” in the context of AI/ML development or delivery?
Data quality means accurate, complete, consistent, and timely data that aligns with the AI/ML model’s requirements. For small businesses, it’s about having reliable data—free from errors, duplicates, or gaps—that supports trustworthy predictions or decisions.
What are the biggest misconceptions small businesses have about their own data readiness for AI?
- They assume existing data is “good enough” without validation.
- They overestimate data volume and underestimate the need for clean, structured data.
- They believe AI can magically fix poor data without preprocessing.
What kinds of early warning signs do you watch for that indicate a project might be derailed by poor data quality?
- Inconsistent data formats or missing values during initial audits.
- Lack of clear data ownership or documentation.
- Unrealistic client expectations about model performance with unverified data.
Tooling, Automation & Innovation
What role does automation or AI itself play in improving data quality in your organization?
Automation streamlines data cleaning—removing duplicates, standardizing formats, and flagging outliers. AI helps by identifying patterns in errors or predicting missing values, reducing manual effort and boosting efficiency.
What low-code or no-code solutions, if any, have you leveraged to address data quality challenges?
- Tools like Airtable for data organization.
- Zapier to automate data workflows.
- Microsoft Power Apps for quick data validation dashboards.
- Lumenn.ai for Business Intelligence
Are there specific open-source or commercial tools that have proven particularly valuable for smaller organizations?
- Open-source: OpenRefine for data cleaning, Pandas for analysis.
- Commercial: Talend for ETL processes, Google Data Studio for visualization, Lumenn.ai for Business Intelligence
- These are affordable and scale well for small teams.
AI Readiness for Client Delivery
When supporting a government customer, how do you address data quality concerns without overstepping your role or criticizing internal systems?
Focus on collaboration—suggest data quality improvements as opportunities to enhance AI outcomes. Frame concerns as neutral observations, using metrics like error rates or gaps, and propose actionable steps without blaming systems.
Have you ever had to “scope in” a data remediation phase before AI work could begin? How did you approach that conversation with the client?
Yes. I presented it as a foundational step, using a simple analogy: “Building AI is like constructing a house—you need a solid foundation.” I shared a timeline showing how remediation ensures faster, better results, gaining buy-in.
What are some tactful ways to educate clients who may not realize the limitations or risks of using poor quality data for AI initiatives?
- Use relatable examples, like how bad ingredients ruin a recipe.
- Share case studies showing ROI from clean data.
- Offer workshops to demonstrate data’s impact on AI performance.
How do you build credibility as a small business when advising clients on their data ecosystems?
- Showcase past successes with clear metrics (e.g., “Improved accuracy by 30%”).
- Stay transparent about processes and limitations.
- Earn certifications like ISO 27001 or partner with trusted platforms.
Managing Risk & Building Trust
How do you mitigate the risk of introducing AI that may rely on flawed or incomplete data, especially when credibility is essential for a small contractor?
- Conduct rigorous data audits before model training.
- Use ensemble methods to cross-check outputs.
- Set clear performance benchmarks and validate results with clients.
What role does transparency play in building trust with clients when AI/ML recommendations are involved?
Transparency builds confidence by explaining how data drives recommendations, admitting uncertainties, and sharing validation steps. Regular updates and clear visualizations help clients feel involved and informed.
What steps do you take to reduce the risk of AI hallucinations caused by poor data quality or gaps in training data?
- Preprocess data to fill gaps or remove noise.
- Use diverse datasets and cross-validation to ensure robustness.
- Implement guardrails like confidence thresholds to flag uncertain outputs.
Forward Look
What do you see as the next major shift in the evolution of AI over the next 3–5 years?
AI will move toward more autonomous, context-aware systems, with greater emphasis on ethical frameworks and real-time adaptability, driven by advances in multimodal models and edge computing.
What role will public-private partnerships and academic collaboration play in shaping the future of trustworthy, data-driven AI?
They’ll accelerate innovation by sharing resources, standardizing data ethics, and creating open datasets. Collaborations will bridge gaps between theory and practical deployment, ensuring AI is reliable and inclusive.
In your opinion, what will separate the leaders from the followers in the race to scalable, impactful AI?
Leaders will prioritize:
- Ethical AI with transparent governance.
- Scalable data pipelines with built-in quality checks.
- User-centric design for broader adoption.
If you could invest in one area of AI or data infrastructure right now to prepare for the future, what would it be and why?
Data governance tools—because high-quality, secure, and compliant data is the backbone of trustworthy AI. Investing here ensures scalability and reduces risks as regulations tighten.
What do you hope we’re no longer talking about when it comes to AI and data quality five years from now?
Basic data cleaning issues like duplicates or missing values. I hope automated, standardized pipelines make these non-issues, letting us focus on advanced challenges like bias mitigation and real-time data integrity.