🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
-
Updated
May 26, 2026
🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources
MCP server for Open Targets data
BERT finetuned on NER downstream tasks
Measuring and visualizing biomedical data variability/heterogeneity across data sources
Synthetic biomedical data generator for reproducible benchmarking of feature selection methods in high-dimensional machine learning.
Набор инструментов для обработки радиобиологических Excel‑данных: визуализация опухолевого роста и кожных реакций, статистика, интерактивный GUI (PyQt6). Поддерживается оценка параметров LQ‑модели (α/β) и сравнение экспериментов.
Bioinformatics Classifier Project — TCGA BRCA Dataset. Exploratory analysis and machine learning classification on TCGA BRCA gene expression data, focusing on PAM50 breast cancer subtypes.
Three different basic data analysis processes of biomedical data for Python. Level: beginner (~200 lines of pure code).
Step1-Step6 preprocessing workflow and final FAERS compound-PT-SOC core graph releases with standardized compounds, MedDRA PT/SOC mapping, and three pruned graph versions.
Machine learning system for early Parkinson’s disease prediction using multimodal biomedical data (voice and handwriting) with Random Forest and EfficientNet models.
Healthcare AI project analyzing migraine treatment outcomes using longitudinal statistical models in R.
This repository contains the data and analysis code for the study "Machine Learning-driven biomarker discovery for stratifying treatment response in tick-borne illness". It investigates the identification of robust and reproducible baseline predictors of treatment response using a stability-aware, multi-method machine learning framework.
Project focused on exploring and modeling the T1DiabetesGranada dataset, which contains clinical, biochemical, and continuous glucose monitoring (CGM) data from patients with Type 1 Diabetes.
Multiclass classification of breast cancer subtypes using gene expression profiles. Evaluated and compared multiple models (Logistic Regression, Random Forest, HistGradientBoosting) using classification metrics, confusion matrices, and ROC-AUC analysis with Youden’s J statistic on synthetically generated data
Machine learning for Raman spectra analysis of brain tissue with robust preprocessing, classification, and interpretable biomarker discovery
Multiclass classification of breast cancer subtypes using synthetic gene expression data. Refactored code to use a single function for model evaluation across Logistic Regression, Random Forest, and HistGradientBoosting, including metrics and ROC-AUC with Youden’s J statistic.
A lightweight R script for text mining and harmonizing medical phenotype data. Cleans, standardizes, and maps diagnoses to ICD-10 codes, with clinical annotations for enhanced data usability.
A predictive modeling pipeline that leverages machine learning to classify and forecast health conditions based on clinical indicators, emphasizing feature importance and model reliability.
A MATLAB pipeline for classifying FourClass Motor Imagery EEG signals. Implements CSP/FBCSP feature extraction and SVM/CNN/LSTM models, achieving 98.75% accuracy with an optimized Linear SVM. Modular code for preprocessing, feature selection, and classification.
Add a description, image, and links to the biomedical-data topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-data topic, visit your repo's landing page and select "manage topics."