Data is at the heart of nearly every federal agency initiative.
Without high quality data, the 1,700-plus agency artificial intelligence use cases will struggle to find success.
Without analysis of structured and unstructured data, agency cybersecurity efforts, especially around zero trust, will not prevent and mitigate the ever-increasing number of cyber threats.
For years, the common refrain across the government has been “data is the new oil.” That analogy rings more true every day.
Agencies still face a host of challenges around managing data and applying technology to get the best analysis out of it.
And as agencies start to do more than test out AI, including generative AI, managing their data and making it more valuable becomes imperative.
Chris Townsend, the vice president of public sector at Elastic, said the discussion around data moved to operationalizing it at scale to solve mission problems from just figuring out how best to collect and manage it.
“Our public sector customers operate in a very highly siloed environment, and I think one of the primary challenges is, how do you access all that data reliably and consistently at scale and do it in a cost efficient manner?” Townsend said on the discussion Innovation in Government, sponsored by Carahsoft. “It’s just not realistic to think that you can take all of that data in that siloed environment, whether it’s multi cloud or on-premise, which you can’t really move out of the on premise environment and co-locate it so easily. You have to able to access the data where it lives. That is one of the big advantages that Elastic brings in doing that at a low cost.”
Data standards increases sharing
While breaking down those siloed environments remains a big challenge for many agencies, accessing the data in those systems shouldn’t be.
Townsend said there are ways for agencies to ensure they aren’t incurring the costs of data ingress and egress as well as moving that data around.
One example of there this is already happening is in the Cybersecurity and Infrastructure Security Agency’s continuous diagnostics and mitigation (CDM) program.
Elastic provides the governmentwide and agency-specific dashboards under CDM.
“If you look at what CDM does, they have small Elastic clusters embedded in each one of the 100 agencies that they monitor security for, and they index the data locally,” Townsend said. “We use what we call our data mesh architecture. We index the data where the data lives and allow you to search it centrally, so that you don’t have to pay the ingress and egress fees. You don’t have to move the data around. You don’t have to replicate the data.”
One key factor in the CDM program is CISA has created a data standard for the cybersecurity dashboards.
Townsend said having a common data standard for a common schema for that data is what makes sharing between environments cost effective, scalable and able to be operationalized.
“I think it’s just a different way to think about it architecturally and again, more in this if you hear some of the senior Defense Department leaders, they talk a lot about a data mesh architecture. That’s what they’re talking about,” he said. “As people were finding, it’s a very expensive to bring data into one location and it’s just not practical at the scope and scale our public sector entities operate at.”
Reducing data storage costs
Another benefit of using a data mesh architecture is it reduces the cost of storing data.
Townsend said this approach is especially helpful for unstructured data.
“We’ve come out with this new capability that allows you to compress that data, and we’re seeing compression rates up to up to 50% spent a few studies we’ve done on log data, which requires much less storage. But again, because we pre-index the data, we can still search it and it’s still usable,” he said. “There’s a lot of innovation now happening around reducing the storage and infrastructure costs associated with long term storage of data, especially in government.”
Townsend added this is especially important given the requirements by the Office of Management and Budget for cyber event logging.
The ability to store data at lower costs and access it when needed becomes even more important as agencies increase their use of cybersecurity capabilities that use artificial intelligence and large language models (LLMs).
“If you want to stand up a large language model for whatever reason, whether you want to use it for threat hunting and cybersecurity, or you want to use it to improve customer experience and execute mission needs with the government, those LLMs need to be able to access data that’s secure, relevant, accurate, and to be able to do it at scale. We’re able to feed that data to the large language model,” he said. “It’s all about data collection and analysis, and applying advanced machine learning and LLMs to a lot of that data to be able to inspect it. We’ve started to build a lot of cool automation into our cybersecurity platform to enable the security operations center (SOC) operators. We came out with this really cool capability that, essentially, you take all of your playbooks, if you will, your MITRE Attack framework, any of your internal playbooks that you run in in terms of cyber threats, and you build those into an LLM and then run that LLM against your security alert. Your operators can run that against your security alerts every morning, and obviously, when you take a look at your security alerts, you’re trying to correlate which of these are noise and which of these are correlated events that are focused on breaking into your organization. Well, you can apply an LLM where you build all this knowledge into the LLM, and it helps prioritize and identify those using AI. We’re seeing a lot of cool advancements with AI and cyber.”
Copyright
© 2025 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.