Hyperparam: OSS Tools for Exploring Datasets Locally in the Browser
Hyperparam: OSS Tools for Exploring Datasets Locally in the Browser
Ready to supercharge your #OpenScience profile?
With #OpenAIREEXPLORE + @ORCID_Org you can seamlessly complete your #ORCID record with all your research outputs, from papers & #datasets to #software tools.
Backed by the @OpenAIREGraph EXPLORE identifies and matches your work, including:
Journal articles
Research data
Software & more
Read the article to learn more https://www.openaire.eu/openaire-explore-and-orcid-integration-complete-your-open-science-orcid-profile
Visit https://explore.openaire.eu to make your contributions count publicly and properly.
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
Ready to supercharge your #OpenScience profile?
With #OpenAIREEXPLORE + @ORCID_Org , you can seamlessly complete your #ORCID record with all your research outputs, from papers & #datasets to #software tools.
Backed by the @OpenAIREGraph, EXPLORE identifies and matches your work, including:
-Journal articles
-Research data
-Software & more
Log in with your ORCID → check what’s missing → sync it to your profile in just a few clicks.
Read the article: https://explore.openaire.eu
BBC: Inside the desperate rush to save decades of US scientific data from deletion. “No one knows when the next alert or request to save a chunk of US government-held climate data will come in. Such data, long available online, keeps getting taken down by US President Donald Trump’s administration. For the last six months or so, Cathy Richards has been entrenched in the response. She works […]
Организация датасетов с ClearML
Как версионировать датасеты, отслеживать историю трансформаций в них? Как хранить метаданные? Как строить графики и статистики по данным? Как сделать это "по красоте" с помощью платформы ClearML
Wikipedia and Kaggle Release Structured Dataset to Aid AI Development, Counter Scraping
#AI #AITraining #Wikipedia #Kaggle #AIData #MachineLearning #OpenData #Wikimedia #Datasets #BigData #DataScience #LLMs #NLP #Google #Alphabet
Now available on Kaggle: Wikipedia Structured Contents. It’s in early beta. “This dataset contains all articles of the English and French language editions of Wikipedia, pre-parsed and outputted as structured JSON files with a consistent schema. Each JSON line holds the content of one full Wikipedia article stripped of extra markdown and non-prose sections (references, etc.).”
https://rbfirehose.com/2025/04/17/now-on-kaggle-wikipedia-structured-contents/
National Press Foundation: Data Helps Tell Education Stories, and Journalists Need to Find the Best Sources. “Rachel Rush-Marlowe, the founder and executive director of the education policy think tank ResearchEd and a former Education Department employee, spoke with NPF’s Widening the Pipeline fellows about how to access education data and accurately represent what the statistics […]
Unlock #research insights with the new #OpenAIREGraph #API!
Easily discover #publications, #datasets, & #software across #OpenScience infrastructures.
- Search with precision using linked #metadata
- Find #OpenAccess versions & related datasets
- Trace research back to funders & institutions
Start exploring today: https://graph.openaire.eu/docs/apis/graph-api/
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
From the Data Rescue Project: the Data Rescue Tracker. “The Data Rescue Tracker is a collaborative tool built to catalog existing public data rescue efforts so that we can coordinate better across initiatives. At this stage, you can use the tool to help reduce duplication of rescue efforts. The Data Rescue Tracker aims to provide a consolidated overview of who is backing up which dataset from […]
#Reddit #AI #ContentModeration #datasets
'Researchers at Cornell Tech have released a dataset extracted from more than 300,000 public Reddit communities, and a report detailing how Reddit communities are changing their policies to address a surge in AI-generated content. '
https://news.cornell.edu/stories/2025/04/dataset-reveals-how-reddit-communities-are-adapting-ai
Cornell University: Dataset reveals how Reddit communities are adapting to AI. “The study found that AI rules are most common in subreddits focused on art and celebrity topics. These communities often share visual content, and their rules frequently address concerns about the quality and authenticity of AI-generated images, audio and video. Larger subreddits were also significantly more […]
"Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data."
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
Axios: NOAA research websites slated to go dark get a reprieve.”NOAA has averted the early cancellation of an Amazon Web Services contract that would have caused a slew of agency websites to go dark beginning at midnight, the agency said Friday. Why it matters: The outages mainly would have affected NOAA’s research division, and would have made numerous websites and data sets inaccessible to […]
https://rbfirehose.com/2025/04/06/axios-noaa-research-websites-slated-to-go-dark-get-a-reprieve/
Cornell University: Quantum statistical approach quiets big, noisy data. “A research team with statisticians from Cornell has developed a data representation method inspired by quantum mechanics that handles large data sets more efficiently than traditional methods by simplifying them and filtering out noise. This method could spur innovation in data-rich but statistically intimidating […]
Massive, Unarchivable #Datasets of #Cancer, #Covid, #HIV and #Alzheimer's Research Could Be Lost Forever
Days before RFK announced 10,000 #HHS staffers would lose their jobs, a message appeared on #NIH research repository sites saying they were "under review." Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency.
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
https://archive.ph/Y8asq
Digital Archivists: Protecting Public Data from Erasure
https://spectrum.ieee.org/digital-archive
https://news.ycombinator.com/item?id=43558182
#ListenBrainz / #MetaBrainz I'm confused. Aren't sponsors the true customer? Why use this?
On one hand #Music: "Listen together", "Ethical forever"
On the other: #DATASETS
"Some of the world’s biggest platforms such as Google and Amazon, use our data"
"We ask commercial supporters to support us in order to help fund the creation and maintenance of these datasets."
"The following organizations make use of the data-sets published by MetaBrainz"