Curated by THEOUTPOST
On Thu, 1 Aug, 12:01 AM UTC
3 Sources
[1]
Data management overhaul driven by open formats and AI - SiliconANGLE
The future of data platforms: Databricks Ali Ghodsi highlights open data formats and AI as future of data management Data management is undergoing a major overhaul as companies face cost pressures and the need for consolidation. The shift toward platformization is simplifying data operations, cutting expenses and boosting efficiency. This change is driven by the adoption of open data formats, artificial intelligence integration and enhanced governance solutions. As businesses focus on more sustainable and flexible data strategies, the future of data platforms promises greater innovation and operational effectiveness. "[One hundred percent] of the data in Databricks is in open-source formats. And 92% is in Delta Lake," said Ali Ghodsi (pictured), co-founder and chief executive officer of Databricks Inc. "I think this is just going to accelerate. I don't see any way for it to go backwards." Ghodsi spoke with theCUBE Research's John Furrier and Dave Vellante at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media's livestreaming studio. They discussed how companies are overhauling data management strategies by consolidating tools, adopting open data formats, integrating AI and governance solutions, and focusing on sustainable, flexible data solutions to reduce costs and improve efficiency. Open formats and interoperability between different data engines are crucial for future-proofing data platforms, according to Ghodsi. The move toward open-data formats, such as Delta Lake and Apache Iceberg, ensures that companies are not locked into proprietary systems, providing the freedom to switch between tools as needed. "I think what has happened is that CIOs and IT departments, but also business leaders in line of business, they just feel like there's cost pressures. So, they have to lower their cost basis," he said. "That's like trend number one that's happening everywhere." One of the key developments at Databricks is the Unity Catalog, a governance solution that unifies data, AI models, dashboards and notebooks. This comprehensive approach to governance addresses not only security and access control but also cost management, sharing and collaboration. The integration of AI into data governance is particularly noteworthy, as it ensures that AI models are developed and deployed with strong oversight and accountability. "Unity Catalog is a catalog for all of those data assets, not just for the tables. That's what makes it unique in the market," Ghodsi said. "We've seen further momentum now that people are adopting it because it's open source. They can deploy it on-premise. They can deploy it in other environments. That's what I would say is the second big important decision and factor that I see with enterprises. I mentioned AI, that's the other big one." Looking ahead, Ghodsi expressed optimism about the future of open source in the data platform industry. Databricks' recent acquisition of Tabular, a company specializing in open-source data formats, underscores its commitment to open standards and interoperability. Databricks is working on Delta UniForm, which aims to unify Delta Lake and Apache Iceberg, providing seamless compatibility between the two formats, Ghodsi revealed. "[Customers are] super excited to see interoperability between these formats," he said. "They don't want to see incompatibility between Delta Lake and Iceberg. The fact we're unifying them in project UniForm, and now you can get both of those and you get full compatibility of your data, they loved it. I was a little bit surprised actually that there was so uniformly positive from the customer base and how positive it was." AI and gen AI also hold a strategic importance for Databricks, according to Ghodsi. The company's focus on data intelligence -- enabling organizations to extract actionable insights from proprietary data -- positions it at the forefront of the AI revolution. The value of AI lies in its ability to provide accurate and secure answers based on a company's unique data sets, he added. "When gen AI happened, I would say two years ago, there's huge demand for, 'I want all my data, unstructured data, I care a lot about my AI models, I care a lot about governance of AI, I'm super worried about privacy, security of my AI and my data.'" Ghodsi said. "AI has now been added to this equation, and Unity Catalog is a catalog for all of those data assets, not just for the tables, and that's what makes it unique in the market." Stay tuned for the complete video interview, part of SiliconANGLE's and theCUBE Research's coverage of the Supercloud 7: Get Ready for the Next Data Platform event.
[2]
Open data formats and AI integration amid a data evolution - SiliconANGLE
Microsoft advances data management with open formats and AI integration Five years ago, if one were to talk about open data formats or governance, they might end up putting others to sleep. But today, it's become the most important conversation going. It's clear that data has evolved. That evolution poses certain advantages for customers, according to Dipti Borkar (pictured), vice president and general manager at Microsoft Corp. "These data formats and table formats, on top of the file formats, essentially give our customers a choice," Borkar said. "It's opened up, which means that they can have computes that they can choose on top as well. Multiple different computes can run on these formats. That's the beauty of it. That's a great value to customers, which means they can do more with their data." Borkar spoke with theCUBE Research's John Furrier and Sanjeev Mohan at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media's livestreaming studio. They discussed the importance of open data formats and the evolving role of data management in the cloud. Microsoft has made the decision to move from its closed-source format to pure open formats with Microsoft Fabric in particular. That was a pretty dramatic change, according to Borkar. "[We moved] all our engines to reengineer these computes, to now read these native formats," she said. "We support Delta Lake, and Iceberg is landing very soon. The reason that these are important, again, customers get a choice." Companies could run a variety of engines on top and interrupt between platforms. That includes running AI with Databricks or Snowflake, according to Borkar. "You can interrupt. We have a layer with OneLake that supports these open formats, which allow customers to interrupt, so that you're not locked in, you can do more with your data. You don't have to move it around," she said. "You can actually leave it in place, reduce your cost and get value." There are three main open data formats -- Delta, Iceberg and Apache Hudi. All three have their own specific way of writing data, and all were built for different use cases, according to Mohan. "Hudi was built for streaming ingest, and Iceberg ... does not support streaming ingest. So when you write in a particular table format, that becomes your primary format," Mohan said. "The compatibility is only at read-only level." That's because it's not possible for one to write some piece of data into Delta and then instruct it to make copies into other formats, according to Mohan. That's because the latency would be too high. "The fine print ... is so important," he said. "Anytime anyone says this is open source, this is compatible, you really have to take it to the next level of detail to understand what is open-source, what is compatible." Today, Microsoft is seeing a combination of structured, semi-structured and unstructured data going into the lake, according to Borkar. The structured data is essentially open table formats. "Typically, you would build semantic models on top. For example, with Power BI you have a semantic model, and our Copilot then operates on that semantic model and is available for natural language questions," Borkar said. "Just using that approach, you can essentially use English to come up with a dashboard, right? Instantaneously." For semi-structured and unstructured data, that's where models directly operating on top of data comes in, according to Borkar. For Microsoft, that includes Azure AI Search. "[That provides] both the vector indexing capabilities directly on this data, but also keyword-based indexing. So, it's actually a combination, which is very powerful, because in some cases you might need one," Borkar said. "In some cases, vector indexing is more powerful, and it applies an internal ranking and gives the best results back out. So, AI Search, on top of OneLake, for example, is one of the patterns that we are also starting to see." This is done, essentially, using the ChatGPT versions of Copilot, according to Borkar. All told, it's a development that has evolved very quickly. "Now you have a stream of structured data, you've thrown in your semi-structured and unstructured data," Borkar said. "Your vector index is on top of that, and now you're building generative AI applications." Stay tuned for the complete video interview, part of SiliconANGLE's and theCUBE Research's coverage of the Supercloud 7: Get Ready for the Next Data Platform event.
[3]
Open table formats: Snowflake, Databricks and data - SiliconANGLE
Snowflake vs. Databricks: Analysts dissect survey results showing joint customer interest in open table formats Survey results recently published by SiliconANGLE showed that enterprise customers remained conflicted on how to rationalize a need to balance data trust with a strong motivation to move fast and innovate. As key data management firms, such as Databricks Inc. and Snowflake Inc.. compete for customers in this area, the end user could end up reaping the benefits through open table formats. In the modern data landscape, companies have shown an increased interest in adopting open table formats to allow any compute to operate on any data. This is forcing organizations to confront new issues around governance. "When you talk about open table formats, then you are essentially having to start abstracting governance from the purview of any one vendor," said George Gilbert (pictured, left) analyst for theCUBE Research. "You have to start thinking about governing all the data in your data and application estate." Gilbert spoke during an analyst panel segment as part of the Supercloud 7: Get Ready for the Next Data Platform event. He was joined on the panel by Sanjeev Mohan (second from right), principal at SanjMo and analyst for theCUBE Collective, along with theCUBE Research's Dave Vellante (right) and Rob Strechay (second from left), and they discussed the Enterprise Technology Research survey data and key factors influencing enterprise interest in open table formats. The survey data from ETR showed results from 105 joint Databricks and Snowflake customers who were asked about preferences for open table formats. While a majority ranked security and governance and lock-in avoidance first and second, ETR uncovered a percentage of customers that could provide a "swing vote" in vendor selection, according to Vellante. "There's 14% that said data security is less important than creating a stack that allows for rapid innovation," Vellante noted. "Why is that the swing vote? It's because ETR correlated that and found it. Those were the ones that were most likely to [leave] Snowflake and move to Databricks as an example, because they're saying 'damn the torpedoes' in governance." Another key element in the open table debate involves multiple formats that have emerged over the years. While there are many choices, Mohan points out that compatibility is a growing issue. "Right now, if you look at which table format has the most momentum, it is Iceberg," Mohan said. "But is it the right thing? I think that is a question which people are not asking. Maybe it should be Hudi or Delta. The problem is that these formats are not all compatible to each other, so that is why you have 'read only' views. If you write in Delta, you are not going to write in Iceberg -- that will kill your entire analytics." Another metric generated from ETR's survey data provided insight into the attractiveness of the Databricks and Snowflake platforms for building generative artificial intelligence applications. While 64% leaned toward Databricks, and 49% leaned toward Snowflake, a healthy 34% expressed an affinity for hyperscaler offerings in this area. "The data platform vendors think they're adding new workloads and expanding their total addressable market and reaching new personas, but what's actually happening is they're bumping into new competitors," Gilbert explained. "That number of 34% for the hyperscalers is significant. They are going after that open data for building applications and the data platform vendors aren't always the best equipped." However, Strechay noted that dependence of the Databricks and Snowflake platforms on the cloud could make this number less significant than it might appear. Customers will still be heavily focused on whichever platform allows them to bring data together and achieve meaningful business results. "I don't know that it's one or the other because of the fact that you have Snowflake and Databricks," Strechay said. "Where do they live? They're in these hyperscale clouds, they're not on-prem. It's going to be a mix, and I think this is why people are looking at it and going, 'OK, how do I bring this data all together?'" ETR asked enterprise customers whether they were using or intending to use any open data formats such as Hudi, Iceberg or Delta. The results showed that 70% were evaluating formats, while only 30% had no interest. Results such as these highlight the competitive opportunity facing Databricks and Snowflake as they battle for market share. "When [Databricks CEO] Ali Ghodsi says, 'Don't give your data to any vendor, including Databricks,' his answer to that is ... use open table formats, may the best engine win," Vellante said. Stay tuned for the complete video discussion, part of SiliconANGLE's and theCUBE Research's coverage of the Supercloud 7: Get Ready for the Next Data Platform event.
Share
Share
Copy Link
Open data formats are gaining traction in the tech industry, with major players like Microsoft embracing the trend. This shift is reshaping data management practices and paving the way for more efficient AI implementations.
In a significant shift within the tech industry, open data formats are rapidly gaining prominence, reshaping the landscape of data management and artificial intelligence (AI). This trend is being driven by the need for more flexible, efficient, and interoperable data solutions in an increasingly complex digital ecosystem 1.
Tech giant Microsoft has made a notable move by adopting open data formats, signaling a major endorsement of this approach. The company's decision is expected to have far-reaching implications for the industry, potentially accelerating the widespread adoption of open data standards 2.
Open data formats offer several advantages over proprietary systems:
These benefits are particularly crucial in the era of big data and AI, where the ability to seamlessly integrate and analyze diverse data sources is paramount 1.
A recent industry survey has shed light on the growing importance of open table formats in data management. Analysts dissecting the results have noted a significant trend towards open data adoption across various sectors 3.
Key findings from the survey include:
The adoption of open data formats is expected to have a profound impact on AI and machine learning initiatives. By providing a more standardized and accessible data foundation, these formats can:
Despite the numerous benefits, the transition to open data formats is not without challenges. Organizations must consider:
As open data formats continue to gain traction, they are poised to play a central role in shaping the future of data management and AI. This shift towards openness and interoperability is likely to foster innovation, improve efficiency, and create new opportunities for businesses and researchers alike 1.
Reference
Snowflake's Data Cloud Summit 2024 showcases AI integration and data management advancements. The event highlights collaborations with industry leaders and introduces new features to enhance data cloud capabilities.
3 Sources
As AI transforms industries, enterprises face the challenge of managing vast amounts of unstructured data. Dell and NVIDIA experts discuss strategies for efficient data organization, storage solutions, and the importance of governance in AI implementations.
2 Sources
Cloudera strengthens its position in the enterprise AI market by expanding partnerships, focusing on hybrid data management, and leveraging AI for enhanced business insights.
3 Sources
The rapid growth of AI is placing unprecedented demands on infrastructure and data quality. This story explores the challenges in AI infrastructure scaling and the critical role of data cleansing in AI development.
2 Sources
Teradata announces new AI capabilities, partnerships, and strategies at Possible 2024, focusing on scalable AI platforms, hybrid analytics, and sustainable AI practices to drive business value and innovation.
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved