VAE Sample Generator

Instructions for operating the model. As of now it is still work-in-progress and it's current feature is to reconstruct .wav files.

Setup

First, clone the repository

git clone https://github.com/nicks1m/mutant.git

Create a virtual environment and install the requirements:

python3 -m venv /path/to/venv/

source /path/to/venv/bin/activate.

*Venv should be in project root folder

*Import all libraries that are in the scripts

For more information on VENV installation https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

Download

Download the Dataset

https://drive.google.com/file/d/1nAXnKDASp1wotNj6RQVwHZhmOrIMxdq8/view?usp=sharing

Pre-processing

Navigate to /project_folder/

Create a folder named datasets
Create two more folders inside "datasets": "datasets_all" and "mel_spectrograms"
Extract the dataset downloaded from the link above into "datasets_all" folder
Run Preprocess.py in Pycharm or any preferred method
"mel_spectrograms" folder should be populated with .npy files.

The audio have been pre-processed.

Training -- Unnecessary Step, as Weights have been provided

Run train.py script
Model folder would be created and weights + parameters would be saved

Audio Generation/Reconstruction

Create a folder named "samples"
Create four folders inside "samples": "original", "generated", "latent", "spec_comparison"
Run generate.py script and audio files will be saved in the respective folders above.

Outputs

in /samples/generated:

audio reconstructions of mel-spectrograms that have been predicted by the model.

in /samples/original:

audio reconstructions of mel-spectrograms that have been converted from raw audio. These are not fed into the model.

in /samples/latent:

audio reconstructions of mel-spectrograms formed from latent vector sampling. (w.i.p)

in /samples/spec_comparison:

spectogram side by side comparison between original and generated

Additional Helper Functions

Data Augmentation

Due to the small dataset of samples available, I have looked into expanding the dataset via data augmentation, using Facebook Research AugLy library. I paired a couple of augmentation features together with a either/or probability that one will get chosen. For example, one of the two features in the array [(pitch_shift,harmonics)] will be chosen and applied to the audio signal. These features can be manipulated and have their intensity changed. I ran the entire folder of audio files through two transformation functions, hence we get a varied outcome. With an initial data set of 400 samples, I was able to obtain triple the size, leading to a current dataset of 1200. This allows the model to train and generalize better.

Mel-Spec Plotter

I added a visualize_spec script that is able to take a single or double audio signal and plot it using librosa.specshow. This plots the two signals above one another, and we are able to look at the mel-scale frequency domain of the respective signals, to get a rough understanding of the signal composition. This is automatically generated everytime we generate a new set of samples from the model.

Additional Notes

The training of the model was done in Google Colab, utilizing their backend GPU due to insufficient processing power on my laptop. I can provide a link to the Colab notebook if you would like to train your a large dataset and have a go. All that is required is the processed spectrograms from preprocessing.py. The weights and parameters will be saved onto the Colab project folder and it can be downloaded onto the local project folder for generation.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
model		model
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
autoencoder.py		autoencoder.py
data_aug.py		data_aug.py
generate.py		generate.py
preprocess.py		preprocess.py
soundgenerator.py		soundgenerator.py
train.py		train.py
train_backup.py		train_backup.py
visualize_spec.py		visualize_spec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAE Sample Generator

Setup

Download

Pre-processing

Training -- Unnecessary Step, as Weights have been provided

Audio Generation/Reconstruction

Outputs

Additional Helper Functions

Data Augmentation

Mel-Spec Plotter

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VAE Sample Generator

Setup

Download

Pre-processing

Training -- Unnecessary Step, as Weights have been provided

Audio Generation/Reconstruction

Outputs

Additional Helper Functions

Data Augmentation

Mel-Spec Plotter

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages