This project is the backend of the M-SENA Platform.
We provide a docker image of our platform. See the main repo for instructions.
$ git clone https://github.com/iyuge2/M-SENA-Backend.git
$ cd M-SENA-Backend- Install system requirements
$ apt install mysql-server default-libmysqlclient-dev libsndfile1 ffmpeg
- Install python requirements
$ conda create --name sena python=3.8
$ source active sena
$ pip install -r requirements.txt
- Download Bert-Base, Chinese from Google-Bert. Then, convert Tensorflow into pytorch using transformers-cli. Place the converted model under
MM-Codes/pretrained_modeldirectory. - Install Openface Toolkits
- Login MySQL with root
$ mysql -u root -p
- Create a database for M-SENA
mysql> CREATE DATABASE sena;
- Create a user for M-SENA and grant privileges
mysql> CREATE USER sena IDENTIFIED BY 'MyPassword';
mysql> GRANT ALL PRIVILEGES ON sena.* TO sena@`%`;
mysql> FLUSH PRIVILEGES;
- Edit
Constants.py. AlterDATASET_ROOT_DIR,DATASET_SERVER_IP,OPENFACE_FEATURE_PATH,MM_CODES_PATH,MODEL_TMP_SAVE,AL_CODES_PATHandLIVE_TMP_PATHto fit your settings. - Edit
config.sh. Look forDATABASE_URLand change it to fit your database settings.
- Download datasets and locate them under
DATASET_ROOT_DIRspecified inconstants.py - Add information in
DATASET_ROOT_DIR/config.jsonfile to register the new dataset. - Format datasets with
MM-Codes/data/DataPre.py - For datasets that needs labeling, the config file locates in
AL-Codesdirectory.
$ python MM-Codes/data/DataPre.py --working_dir $PATH_TO_DATASET --openface2Path $PATH_TO_OPENFACE2_FeatureExtraction_TOOL --language cn/en
- The structure of the
DATASET_ROOT_DIRdirectory is introduced in the next section.
$ source config.sh
$ flask run --host=0.0.0.0
The structure of the root dataset directory should look like this:
.
├── config.json
├── MOSEI
│ ├── label.csv
│ ├── Processed
│ └── Raw
├── MOSI
│ ├── label.csv
│ ├── Processed
│ └── Raw
└── SIMS
├── label.csv
├── Processed
└── Rawconfig.json: stating necessary information for all datasets. For example,language,label_path,features, etc. It only works when scanning and updating datasets.**/label.csv: storing detailed information for each video clip in**dataset, includingvideo_id,clip_id,normal text,label value (Float),annotation (String),mode (training attributes). Besides, we define a fieldlabel_byto indicate the label type, which is necessary for labeling based on active learning.
**/Processed: placing feature files. We usepickleto store processed features, which are organized as the following structure. These files are used inMM-Codes.
{
"train": {
"raw_text": [],
"audio": [],
"vision": [],
"id": [], # [video_id$_$clip_id, ..., ...]
"text": [],
"text_bert": [],
"audio_lengths": [],
"vision_lengths": [],
"annotations": [],
"classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
"regression_labels": []
},
"valid": {***}, # same as the "train"
"test": {***}, # same as the "train"
}**/Raw: placing raw videos. The path of each clip should be consistent withlabel.csv.
We provide the download link for preprocessed SIMS, code: 4aa6, md5: 3befed5d2f6ea63a8402f5875ecb220d, which follows the above requirements. You can get more datasets from CMU-MultimodalSDK.
The source code is organized as follows:
.
├── AL-Codes # Active learning codes
├── MM-Codes # MSA algorithm codes
├── app.py # Flask main codes
├── config.py # Basic config
├── config.sh # Basic config
├── constants.py # Global variable definition
├── database.py # Database definition & initialization
├── httpServer.py # Dataset server (for video previews)
└── requirements.txt # Python requirements- MM-Codes
MSA Code Framework
Based on MMSA, all model and dataset parameters are saved in MM-Codes/config.json.
- AL-Codes
Labeling based on Active Learning Code Framework
Based on MMSA, all model and dataset parameters are saved in AL-Codes/config.json.
