2 Sources
[1]
Botanical time machines: AI is unlocking a treasure trove of data held in herbarium collections
University of Melbourne provides funding as a founding partner of The Conversation AU. In 1770, after Captain Cook's Endeavour struck the Great Barrier Reef and was held up for repairs, botanists Joseph Banks and Daniel Solander collected hundreds of plants. One of those pressed plants is among 170,000 specimens in the herbarium at the University of Melbourne. Worldwide, more than 395 million specimens are housed in herbaria. Together they comprise an unparalleled record of Earth's plant and fungal life over time. We wanted to find a better, faster way to tap into this wealth of information. Our new research describes the development and testing of a new AI-driven tool Hespi (short for "herbarium specimen sheet pipeline"). It has the potential to revolutionise access to biodiversity data and open up new avenues for research. The digitisation challenge To unlock the full potential of herbaria, institutions worldwide are striving to digitise them. This means photographing each specimen at high resolution and converting the information on its label into searchable digital data. Once digitised, specimen records can be made available to the public through online databases such as the University of Melbourne Herbarium Collection Online. They are also fed into large biodiversity portals such as the Australasian Virtual Herbarium, the Atlas of Living Australia, or the Global Biodiversity Information Facility. These platforms make centuries of botanical knowledge accessible to researchers everywhere. But digitisation is a monumental task. Large herbaria, such as the National Herbarium of New South Wales and the Australian National Herbarium have used high-capacity conveyor belt systems to rapidly image millions of specimens. Even with this level of automation, digitising the 1.15 million specimens at the National Herbarium of NSW took more than three years. For smaller institutions without industrial-scale setups, the process is far slower. Staff, volunteers and citizen scientists photograph specimens and painstakingly transcribe their labels by hand. At the current pace, many collections won't be fully digitised for decades. This delay keeps vast amounts of biodiversity data locked away. Researchers in ecology, evolution, climate science and conservation urgently need access to large-scale, accurate biodiversity datasets. A faster approach is essential. How AI is speeding things up To address this challenge, we created Hespi - open-source software for automatically extracting information from herbarium specimens. Hespi combines advanced computer vision techniques with AI tools such as object detection, image classification and large language models. First, it takes an image of the specimen sheet which comprises the pressed plant and identifying text. Then it recognises and extracts text, using a combination of optical character recognition and handwritten text recognition. Deciphering handwriting is challenging for people and computers alike. So Hespi passes the extracted text through OpenAI's GPT-4o Large Language Model to correct any errors. This substantially improves the results. So in seconds, Hespi locates the main specimen label on a herbarium sheet and reads the information it contains. This includes taxonomic names, collector details, location, latitude and longitude, and collection dates. It captures the data and converts it into a digital format, ready for use in research. For example, Hespi correctly detected and extracted all relevant components from the herbarium sheet below. This large brown algae specimen was collected in 1883 at St Kilda. We tested Hespi on thousands of specimen images from the University of Melbourne Herbarium and other collections worldwide. We created test datasets for different stages in the pipeline and assessed the various components. It achieved a high degree of accuracy. So it has the potential to save a lot of time, compared to manual data extraction. We are developing a graphical user interface for the software so herbarium curators will be able to manually check and correct the results. Just the beginning Herbaria already contribute to society in many ways: from species identification and taxonomy to ecological monitoring, conservation, education, and even forensic investigations. By mobilising large volumes of specimen-associated data, AI systems such as Hespi are enabling new and innovative applications at a scale never before possible. AI has been used to automatically extract detailed leaf measurements and other traits from digitised specimens, unlocking centuries of historical collections for rapid research into plant evolution and ecology. And this is just the beginning -- computer vision and AI could soon be applied in many other ways, further accelerating and expanding botanical research in the years ahead. Beyond herbaria AI pipelines such as Hespi have the potential to extract text from labels in any museum or archival collection where high-quality digital images exist. Our next step is a collaboration with Museums Victoria to adapt Hespi to create an AI digitisation pipeline suitable for museum collections. The AI pipeline will mobilise biodiversity data for about 12,500 specimens in the museum's globally-significant fossil graptolite collection. We are also starting a new project with the Australian Research Data Commons (ARDC) to make the software more flexible. This will allow curators in museums and other institutions to customise Hespi to extract data from all kinds of collections -- not just plant specimens. Tranformational technology Just as AI is reshaping many aspects of daily life, these technologies can transform access to biodiversity data. Human-AI collaborations could help overcome one of the biggest bottlenecks in collection digitisation -- the slow, manual transcription of label data. Mobilising the information already locked in herbaria, museums, and archives worldwide is essential to make it available for the cross-disciplinary research needed to understand and address the biodiversity crisis. We wish to acknowledge our colleagues at the Melbourne Data Analytics Platform, including Karen Thompson and Emily Fitzgerald, who contributed to this research.
[2]
Botanical time machines: AI is unlocking a treasure trove of data held in herbarium collections
In 1770, after Captain Cook's Endeavor struck the Great Barrier Reef and was held up for repairs, botanists Joseph Banks and Daniel Solander collected hundreds of plants. One of those pressed plants is among 170,000 specimens in the herbarium at the University of Melbourne. Worldwide, more than 395 million specimens are housed in herbaria. Together they comprise an unparalleled record of Earth's plant and fungal life over time. We wanted to find a better, faster way to tap into this wealth of information. Our new research describes the development and testing of a new AI-driven tool Hespi (short for "herbarium specimen sheet pipeline"). It has the potential to revolutionize access to biodiversity data and open up new avenues for research. The digitization challenge To unlock the full potential of herbaria, institutions worldwide are striving to digitize them. This means photographing each specimen at high resolution and converting the information on its label into searchable digital data. Once digitized, specimen records can be made available to the public through online databases such as the University of Melbourne Herbarium Collection Online. They are also fed into large biodiversity portals such as the Australasian Virtual Herbarium, the Atlas of Living Australia, or the Global Biodiversity Information Facility. These platforms make centuries of botanical knowledge accessible to researchers everywhere. But digitization is a monumental task. Large herbaria, such as the National Herbarium of New South Wales and the Australian National Herbarium have used high-capacity conveyor belt systems to rapidly image millions of specimens. Even with this level of automation, digitizing the 1.15 million specimens at the National Herbarium of NSW took more than three years. For smaller institutions without industrial-scale setups, the process is far slower. Staff, volunteers and citizen scientists photograph specimens and painstakingly transcribe their labels by hand. At the current pace, many collections won't be fully digitized for decades. This delay keeps vast amounts of biodiversity data locked away. Researchers in ecology, evolution, climate science and conservation urgently need access to large-scale, accurate biodiversity datasets. A faster approach is essential. How AI is speeding things up To address this challenge, we created Hespi -- open-source software for automatically extracting information from herbarium specimens. Hespi combines advanced computer vision techniques with AI tools such as object detection, image classification and large language models. First, it takes an image of the specimen sheet which comprises the pressed plant and identifying text. Then it recognizes and extracts text, using a combination of optical character recognition and handwritten text recognition. Deciphering handwriting is challenging for people and computers alike. So Hespi passes the extracted text through OpenAI's GPT-4o Large Language Model to correct any errors. This substantially improves the results. So in seconds, Hespi locates the main specimen label on a herbarium sheet and reads the information it contains. This includes taxonomic names, collector details, location, latitude and longitude, and collection dates. It captures the data and converts it into a digital format, ready for use in research. For example, Hespi correctly detected and extracted all relevant components from the herbarium sheet below. This large brown algae specimen was collected in 1883 at St Kilda. We tested Hespi on thousands of specimen images from the University of Melbourne Herbarium and other collections worldwide. We created test datasets for different stages in the pipeline and assessed the various components. It achieved a high degree of accuracy. So it has the potential to save a lot of time, compared to manual data extraction. We are developing a graphical user interface for the software so herbarium curators will be able to manually check and correct the results. Just the beginning Herbaria already contribute to society in many ways: from species identification and taxonomy to ecological monitoring, conservation, education, and even forensic investigations. By mobilizing large volumes of specimen-associated data, AI systems such as Hespi are enabling new and innovative applications at a scale never before possible. AI has been used to automatically extract detailed leaf measurements and other traits from digitized specimens, unlocking centuries of historical collections for rapid research into plant evolution and ecology. And this is just the beginning -- computer vision and AI could soon be applied in many other ways, further accelerating and expanding botanical research in the years ahead. Beyond herbaria AI pipelines such as Hespi have the potential to extract text from labels in any museum or archival collection where high-quality digital images exist. Our next step is a collaboration with Museums Victoria to adapt Hespi to create an AI digitization pipeline suitable for museum collections. The AI pipeline will mobilize biodiversity data for about 12,500 specimens in the museum's globally-significant fossil graptolite collection. We are also starting a new project with the Australian Research Data Commons (ARDC) to make the software more flexible. This will allow curators in museums and other institutions to customize Hespi to extract data from all kinds of collections -- not just plant specimens. Transformational technology Just as AI is reshaping many aspects of daily life, these technologies can transform access to biodiversity data. Human-AI collaborations could help overcome one of the biggest bottlenecks in collection digitization -- the slow, manual transcription of label data. Mobilizing the information already locked in herbaria, museums, and archives worldwide is essential to make it available for the cross-disciplinary research needed to understand and address the biodiversity crisis. This article is republished from The Conversation under a Creative Commons license. Read the original article.
Share
Copy Link
Researchers have developed an AI-driven tool called Hespi that can rapidly extract and digitize information from herbarium specimens, potentially transforming biodiversity research and museum collection management.
Herbaria worldwide house over 395 million plant and fungal specimens, forming an unparalleled record of Earth's biodiversity over time 1. However, accessing this wealth of information has been a monumental challenge. Institutions have been striving to digitize their collections by photographing specimens and converting label information into searchable digital data. This process has been slow and labor-intensive, with many collections not expected to be fully digitized for decades.
Source: The Conversation
Researchers have developed a new AI-driven tool called Hespi (herbarium specimen sheet pipeline) to address this challenge 1. Hespi is open-source software that combines advanced computer vision techniques with AI tools such as object detection, image classification, and large language models.
The AI pipeline processes herbarium specimen images through several steps:
Hespi can process a specimen sheet in seconds, converting the information into a digital format ready for research use 2.
The researchers tested Hespi on thousands of specimen images from various collections worldwide, achieving a high degree of accuracy. This tool has the potential to save significant time compared to manual data extraction methods. A graphical user interface is being developed to allow herbarium curators to manually check and correct results 1.
The potential applications of Hespi extend beyond herbarium collections. The researchers are collaborating with Museums Victoria to adapt the tool for digitizing the museum's fossil graptolite collection, comprising about 12,500 specimens 2.
Furthermore, a new project with the Australian Research Data Commons (ARDC) aims to make the software more flexible, allowing curators in various institutions to customize Hespi for different types of collections 1.
The development of AI tools like Hespi represents a significant leap forward in biodiversity research. By mobilizing large volumes of specimen-associated data, these systems enable new and innovative applications at an unprecedented scale. AI has already been used to automatically extract detailed leaf measurements and other traits from digitized specimens, unlocking centuries of historical collections for rapid research into plant evolution and ecology 2.
As AI continues to reshape various aspects of scientific research, tools like Hespi have the potential to transform access to biodiversity data, accelerating progress in fields such as ecology, evolution, climate science, and conservation.
Databricks raises $1 billion in a new funding round, valuing the company at over $100 billion. The data analytics firm plans to invest in AI database technology and an AI agent platform, positioning itself for growth in the evolving AI market.
12 Sources
Business
1 day ago
12 Sources
Business
1 day ago
Microsoft has integrated a new AI-powered COPILOT function into Excel, allowing users to perform complex data analysis and content generation using natural language prompts within spreadsheet cells.
9 Sources
Technology
1 day ago
9 Sources
Technology
1 day ago
Adobe launches Acrobat Studio, integrating AI assistants and PDF Spaces to transform document management and collaboration, marking a significant evolution in PDF technology.
10 Sources
Technology
1 day ago
10 Sources
Technology
1 day ago
Meta rolls out an AI-driven voice translation feature for Facebook and Instagram creators, enabling automatic dubbing of content from English to Spanish and vice versa, with plans for future language expansions.
5 Sources
Technology
16 hrs ago
5 Sources
Technology
16 hrs ago
Nvidia introduces significant updates to its app, including global DLSS override, Smooth Motion for RTX 40-series GPUs, and improved AI assistant, enhancing gaming performance and user experience.
4 Sources
Technology
1 day ago
4 Sources
Technology
1 day ago