Mel-spectrograms feed a CNN-RNN; last layers retrain on tiny patient sets, then log-quant weights trim memory 4× for wearables.Mel-spectrograms feed a CNN-RNN; last layers retrain on tiny patient sets, then log-quant weights trim memory 4× for wearables.

Dataset, Features, Model, and Quantization Strategy for Respiratory Sound Classification

Abstract and I Introduction

II. Materials and Methods

III. Results and Discussions

IV. Conclusion and References

II. MATERIALS AND METHODS

A. Dataset

For this work we have used the International Conference on Biomedical and Health Informatics (ICBHI’17) scientific challenge respiratory sound database [39]. This is the largest publicly available respiratory sound database. The database

\ Fig. 1. Hybrid CNN-RNN: a three stage deep learning model. Stage 1 is a CNN that extracts abstract feature maps from input Mel-spectrograms, stage 2 consists of a Bi-LSTM layer that learns temporal features and stage 3 consists of fully connected (FC) and softmax layers that convert outputs to class predictions.

\ contains 920 recordings from 126 patients. Each breathing cycle in a recording is annotated by respiratory experts as one of the four classes: normal, wheeze, crackle and both (wheeze and crackle). The database contains a total of 6898 respiratory cycles out of which 1864 cycles contain crackles, 886 contain wheeze, 506 contain both and rest are normal. The dataset contains samples recorded with different equipment (AKG C417L Microphone, 3M Littmann Classic II SE Stethoscope, 3M Litmmann 3200 Electronic Stethoscope and WelchAllyn Meditron Master Elite Electronic Stethoscope) from hospitals in Portugal and Greece. The data is recorded from different locations of chest: 1) Trachea 2) Anterior left 3) Anterior right 4) Posterior left 5) Posterior right 6) Lateral left and 7) Lateral right. Furthermore, a significant number of samples are noisy. These characteristics make the classification problem more challenging and much closer to real world scenarios compared to manually curated datasets recorded under ideal conditions. Further details about the database and data collection methods can be found in [39].

\

B. Evaluation Metrics

In the original challenge, out of 920 recordings, 539 recordings were marked as training samples and 381 recordings were marked as testing samples. There are no common patients between training and testing set. For this work we used the officially described evaluation metrics for the four-class (normal(N), crackle(C), wheeze(W) and both(B)) classification problem defined as follows:

\

\ For a more complete evaluation of the proposed model, we also evaluate it using other commonly used metrics such as precision, recall and f1-score. Moreover, the dataset has disproportionate number of normal vs anomalous samples and the official metrics are micro-averaged (calculated over all the classes).Therefore, there is a chance that performance of the models on one class overshadows the other classes in overall results. Therefore, we calculated the precision, recall and f1- score using macro-averaging (metrics are computed for each class individually and then averaged).

\

C. Related Work

A number of papers have been published so far analyzing this dataset. Jakovljevic et al [10] used hidden markov model with Gaussian mixture model to classify the breathing cycles. They have used spectral subtraction based noise suppression to pre-process the data and MFCC features are used for classification. Their models obtained a score of 39.56% on the original train-test split and 49.5% on 10-fold cross-validation of the training set.

\ Kochetov et al. [41] proposed a noise marking RNN for the four-class classification. Their proposed model contains two sections: an attention network for binary classification of respiratory cycles into noisy and non-noisy classes and an RNN for four class classification. The attention network learns to identify noisy parts of the audio and suppress those sections and passes the filtered audio to the RNN for classifications. With a 80-20 split, they obtained a score of 65.7%. They didn’t report the score for the original train-test split. Though this method reports relatively higher scores, one primary issue with this method is that there are no noise labels in the metadata of the ICBHI dataset and the paper doesn’t mention any method for obtaining these labels. Since there are no known objective methods to measure the noise labels in these type of audio signals, this kind of manual labeling of the respiratory cycles makes their results unreliable and irreproducible.

\ Perna et al. [42] used a deep CNN architecture to classify the breathing cycles into healthy and unhealthy and obtained an accuracy of 83% using a 80-20 train-test split and MFCC features. They also did a ternary classification of the recordings into healthy,chronic and non-chronic diseases and obtained an accuracy of 82%.

\ Chen et al. [12] used optimized S-transform based feature maps along with deep residual nets (ResNets) on a smaller subset of the dataset (489 recordings) to classify the samples (not individual breathing cycles) into three classes (N, C and W) and obtained an accuracy of 98.79% on a 70-30 train-test split.

\ Finally, Chambres et al. [43] have proposed a patient level model where they classify the individual breathing cycles into one of the four classes using lowlevel features (melbands, mfcc, etc), rythm features (loudness, bpm etc), the SFX features (harmonicity and inharmonicity information) and the tonal features (chords strength, tuning frequency etc). They used boosted tree method for the classification. Next, they classified the patients as healthy or unhealthy based on the percentage of breathing cycles of the patient classified as abnormal. They have obtained an accuracy of 49.63% on the breathing cycle level classification and an accuracy of 85% on patient level classification. The justification for this patient level model is that medical professionals do not take decisions about patients based on individual breathing cycles but rather based on longer breathing sound segments and the trends represented by several breathing cycles of a patient can provide a more consistent diagnosis. A summary of the literature is presented in table I.

\

D. Proposed Method

1) Feature Extraction and Data augmentation: Since the audio samples in the dataset had different sampling frequencies, first all of the signals were down-sampled to 4kHz. Since both wheeze and crackle signals are typically present within frequency range 0 − 2kHz, down-sampling the audio samples to 4kHz should not cause any loss of relevant information.

\ As the dataset is relatively small for training a deep learning model, we used several data augmentation techniques to increase the size of the dataset. We used noise addition, speed variation, random shifting, pitch shift etc to create augmented samples. Aside from increasing the dataset size, these data augmentation methods also help the network learn useful data representations in-spite of different recording conditions, different equipments, patient age and gender, inter-patient variability of breathing rate etc.

\ For feature extraction we have used Mel-frequency spectrogram with a window size of 60 ms with 50% overlap. Each breathing cycle is converted to a 2D image where rows correspond to frequencies in Mel scale and columns correspond to time (window) and each value represent log amplitude value of the signal corresponding to that frequency and time window.

\ 2) Hybrid CNN-RNN: We propose a hybrid CNN-RNN model (figure 1) that consists of three stages: the first stage is a deep CNN model that extracts abstract feature representations from the input data, the second stage consists of a bidirectional long short term memory layer (Bi-LSTM) that learns temporal relations and finally in the third stage we have fully connected and softmax layers that convert the output of previous layers to class prediction. While these type of hybrid CNN-RNN architectures have been more commonly used in sound event detection ([44], [45]), due to sporadic nature of wheeze and crackle as well as their temporal and frequency variance, similar hybrid architectures may prove useful for lung sound classification.

\ The first stage consists of batch-normalization, convolution and max-pool layers. The batch normalization layer scales the input images over each batch to stabilize the training. In the 2D convolution layer the input is convolved with 2D kernels to produce abstract feature maps. Each convolution layer is followed by Rectified Linear activation functions (ReLU). The max-pool layer selects the maximum values from a pixel neighborhood which reduces the overall network parameters and results in shift-invariance [13].

\ LSTM have been proposed by Hochreiter and Schmidhuber [46] consisting of gated recurrent cells that block or pass the data in a sequence or time series by learning the perceived importance of data points. Each current output and the hidden state of a cell is a function of current as well as past values of the data. Bidirectional LSTM consists of two interconnected LSTM layers, one of which operates on the same direction as data sequence while the other operates on the reverse direction. So, the current output of the Bi-LSTM layer is function of current, past and future values of the data. We used tanh as non-linear activation function for this layer.

\

\ To benchmark the performance of our proposed model, we compare it to two standard CNN models, VGG-16 [40] and Mobilenet [37]. Since our dataset size is limited even after data augmentation, it can cause overfitting if we train these models from scratch on our dataset. Hence, we used Imagenet trained weights instead and replaced the dense layers of these models with an architecture similar to the fully connected and softmax layers of our proposed CNN-RNN architecture. Then the models are trained with a small learning rate.

\ TABLE ISUMMARY OF EXISTING LITERATURE ON ICBHI DATASET

\ Fig. 2. Boxplot of intra-patient and inter-patient variability of audio features: Intra-patient variability is computed by normalizing each audio feature by average of that feature for the corresponding patient while for Inter-patient variability, the normalization is done by average of the audio feature over the entire dataset. Diverse set of features are used for comparison including breathing cycle duration, energy related feature (RMS energy), noise related feature (ZCR) and spectral features (bandwidth, roll-off). Inter-patient variability is significantly larger than intra-patient variability for all the cases.

\ 3) Patient Specific Model Tuning: Though most of the existing research concentrate on developing generalized models for classifying respiratory anomalies, the problem with this kind of models is that their performance can often deteriorate for a completely new patient due to inter-patient variability. This kind of inconsistent performance of classification models make them unreliable and thus hinders their wide scale adoption. To qualitatively evaluate the inter-patient variability, we show the boxplot of inter-patient and intra-patient variability of a diverse set of audio features (duration, RMSE, bandwidth, Roll-off, ZCR) in fig. 2. For the intra-patient variability, we normalized each audio feature of a sample by average of that feature for the samples from that specific patient while for inter-patient variability, we normalized the audio features by the average of that feature over the entire dataset. As evident from the figure, inter-patient variability is significantly larger when compared to intra-patient variability.

\ Also, from a medical professional’s perspective, for most of the chronic respiratory patients, some patient data is already available or can be collected and automated long-term monitoring of patient condition after initial treatment is often very important. Though training a model based on existing patient specific data to extract patient specific features result in a more consistent and reliable patient specific model, it is often very difficult to collect enough data from a patient to sufficiently train a machine learning model. Since deep learning models require much larger amount of data for training, the issue is further exacerbated.

\ To address these shortcomings of existing methods, we propose a patient specific model tuning strategy that can take advantage of deep learning techniques even with small amount of patient data available. In this proposed model, the deep network is first trained on a large database to learn domain specific feature representations. Then a smaller part of the network is re-trained on the small amount of patient specific data available. This enables us to transfer the learned domain specific knowledge of the deep network to patient specific models and thus produce consistent patient specific class predictions with high accuracy. In our proposed model we train the 3 stage network on the training samples. Then, for a new patient, only the last stage is re-trained with patient specific breathing cycles while the learned CNN-RNN stage weights are frozen in their pre-trained values. For our proposed strategy only ∼ 1.4% of the network parameters are retrained for patient specific models. For VGG-16 and MobileNet, the same strategy is applied.

\ 4) Weight Quantization: In this proposed weight quantization scheme, the magnitude of weights of each layer are quantized in log domain. The quantized weight (w˜) can be represented as:

\

\

:::info Authors:

(1) Jyotibdha Acharya (Student Member, IEEE), HealthTech NTU, Interdisciplinary Graduate Program, Nanyang Technological University, Singapore;

(2) Arindam Basu (Senior Member, IEEE), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.

:::


:::info This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Mitosis Price Flashes a Massive Breakout Hope; Cup-And-Handle Pattern Signals MITO Targeting 50% Rally To $0.115305 Level

Mitosis Price Flashes a Massive Breakout Hope; Cup-And-Handle Pattern Signals MITO Targeting 50% Rally To $0.115305 Level

The analyst identified a formation of a cup-and-handle pattern on Mitosis’s chart, suggesting that MITO is preparing to see a looming price explosion.
Share
Blockchainreporter2026/01/18 09:00
Spot ETH ETFs Surge: Remarkable $48M Inflow Streak Continues

Spot ETH ETFs Surge: Remarkable $48M Inflow Streak Continues

BitcoinWorld Spot ETH ETFs Surge: Remarkable $48M Inflow Streak Continues The cryptocurrency world is buzzing with exciting news as Spot ETH ETFs continue to capture significant investor attention. For the second consecutive day, these innovative investment vehicles have seen substantial positive flows, reinforcing confidence in the Ethereum ecosystem. This consistent performance signals a growing appetite for regulated crypto exposure among traditional investors. What’s Fueling the Latest Spot ETH ETF Inflows? On September 19, U.S. Spot ETH ETFs collectively recorded a net inflow of an impressive $48 million. This marked another day of positive momentum, building on previous gains. Such figures are not just numbers; they represent tangible capital moving into the Ethereum market through accessible investment products. BlackRock’s ETHA Leads the Charge: A standout performer was BlackRock’s ETHA, which alone attracted a staggering $140 million in inflows. This substantial figure highlights the significant influence of major financial institutions in driving the adoption of crypto-backed ETFs. Institutional Confidence: The consistent inflows, particularly from prominent asset managers like BlackRock, suggest increasing institutional comfort and conviction in Ethereum’s long-term potential. Why Are Consecutive Spot ETH ETF Inflows So Significant? Two consecutive days of net inflows into Spot ETH ETFs are more than just a fleeting trend; they indicate a strengthening pattern of investor interest. This sustained positive movement suggests that initial hesitancy might be giving way to broader acceptance and strategic positioning within the digital asset space. Understanding the implications of these inflows is crucial: Market Validation: Continuous inflows serve as a strong validation for Ethereum as a legitimate and valuable asset class within traditional finance. Liquidity and Stability: Increased capital flowing into these ETFs can contribute to greater market liquidity and potentially enhance price stability for Ethereum itself, reducing volatility over time. Paving the Way: The success of Spot ETH ETFs could also pave the way for other cryptocurrency-based investment products, further integrating digital assets into mainstream financial portfolios. Are All Spot ETH ETFs Experiencing the Same Momentum? While the overall picture for Spot ETH ETFs is overwhelmingly positive, it’s important to note that individual fund performances can vary. The market is dynamic, and different funds may experience unique flow patterns based on investor preferences, fund structure, and underlying strategies. Mixed Performance: On the same day, Fidelity’s FETH saw net outflows of $53.4 million, and Grayscale’s Mini ETH recorded outflows of $11.3 million. Normal Market Fluctuations: These outflows, while notable, are a normal part of market dynamics. Investors might be rebalancing portfolios, taking profits, or shifting capital between different investment vehicles. The net positive inflow across the entire sector indicates that new money is still entering faster than it is leaving. This nuanced view helps us appreciate the complex interplay of forces shaping the market for Spot ETH ETFs. What’s Next for Spot ETH ETFs and the Ethereum Market? The sustained interest in Spot ETH ETFs suggests a potentially bright future for Ethereum’s integration into traditional financial markets. As more investors gain access to ETH through regulated products, the demand for the underlying asset could increase, influencing its price and overall market capitalization. For investors looking to navigate this evolving landscape, here are some actionable insights: Stay Informed: Keep an eye on daily inflow and outflow data, as these can provide early indicators of market sentiment. Understand Diversification: While Spot ETH ETFs offer exposure, remember the importance of a diversified investment portfolio. Monitor Regulatory Developments: The regulatory environment for cryptocurrencies is constantly evolving, which can impact the performance and availability of these investment products. Conclusion: A Promising Horizon for Ethereum The consistent positive net inflows into Spot ETH ETFs for a second straight day underscore a significant shift in how institutional and retail investors view Ethereum. This growing confidence, spearheaded by major players like BlackRock, signals a maturing market where digital assets are increasingly seen as viable components of a modern investment strategy. As the ecosystem continues to develop, these ETFs will likely play a crucial role in shaping Ethereum’s future trajectory and its broader acceptance in global finance. It’s an exciting time to watch the evolution of these groundbreaking financial instruments. Frequently Asked Questions (FAQs) Q1: What is a Spot ETH ETF? A Spot ETH ETF (Exchange-Traded Fund) is an investment product that directly holds Ethereum. It allows investors to gain exposure to Ethereum’s price movements without needing to buy, store, or manage the actual cryptocurrency themselves. Q2: Why are these recent inflows into Spot ETH ETFs important? The recent inflows signify growing institutional and retail investor confidence in Ethereum as an asset. Consistent positive flows can lead to increased market liquidity, potential price stability, and broader acceptance of cryptocurrencies in traditional financial portfolios. Q3: Which funds are leading the inflows for Spot ETH ETFs? On September 19, BlackRock’s ETHA led the group with a substantial $140 million in inflows, demonstrating strong interest from a major financial institution. Q4: Do all Spot ETH ETFs experience inflows simultaneously? No, not all Spot ETH ETFs experience inflows at the same time. While the overall sector may see net positive flows, individual funds like Fidelity’s FETH and Grayscale’s Mini ETH can experience outflows due to various factors such as rebalancing or profit-taking by investors. Q5: What does the success of Spot ETH ETFs mean for Ethereum’s price? Increased demand through Spot ETH ETFs can potentially drive up the price of Ethereum by increasing buying pressure on the underlying asset. However, numerous factors influence crypto prices, so it’s not a guaranteed outcome. If you found this article insightful, consider sharing it with your network! Your support helps us continue to provide valuable insights into the dynamic world of cryptocurrency. Spread the word and help others understand the exciting developments in Spot ETH ETFs! To learn more about the latest crypto market trends, explore our article on key developments shaping Ethereum institutional adoption. This post Spot ETH ETFs Surge: Remarkable $48M Inflow Streak Continues first appeared on BitcoinWorld.
Share
Coinstats2025/09/20 11:10
Trump imposes 10% tariffs on eight European countries over Greenland.

Trump imposes 10% tariffs on eight European countries over Greenland.

PANews reported on January 18th that, according to Jinshi News, on January 17th local time, US President Trump announced via social media that, due to the Greenland
Share
PANews2026/01/18 08:46