https://ift.tt/bF6Vmh3
Identifying plastic materials in post-consumer food containers and packaging waste using terahertz spectroscopy and machine learning
1. Introduction
Plastic pollution has emerged as a critical global environmental issue, with growing attention on marine plastic pollution and its detrimental effects on marine ecosystems and human health. Much of the plastic waste found in oceans is thought to originate from mismanaged plastic waste on land, which is carried down through rivers and other waterways (Jambeck et al., 2015). Despite heightened awareness of the problem, the volume of plastic waste is expected to increase due to the rise in production of plastic products, driven by the hygienic and economic benefits. In 2023, the global amount of plastic waste has doubled compared to levels from two decades ago, and only 9% is recycled, while 22% is mismanaged and often goes into unregulated dumpsites (Organisation for Economic Co-operation and Development (OECD), 2022). The COVID-19 pandemic further exacerbated this issue by increasing the consumption of single-use plastics, resulting in a higher production of plastic waste (Peng et al., 2021a).
Addressing this growing issue requires the development of cost-effective and highly accurate sorting technologies to promote recycling that enhances the economic value of recycled products and reduces environmental impact. High-accuracy material sorting enables the effective use of plastic waste as high-purity recycled materials, thereby lowering the environmental impact by reducing the consumption of fossil resources required to produce virgin plastic products. Traditional sorting methods rely on manual sorting, which is labor-intensive and costly. Oyewale et al. (2023) reviewed studies on plastic waste sorting and pointed out that while manual sorting enables high-accuracy identification, it is not suitable for processing large volumes of plastic waste due to low throughput and the risk of human error. Similarly, Lahtela and Kärki (2018) and Yuan et al. (2015) emphasized the low productivity of manual sorting, highlighting the need for the development of high-precision automated identification systems. The COVID-19 pandemic has further increased health risks for workers exposed to pathogenic waste and it has further driven up operational costs. Although physical methods, such as density and electrical connectivity, are also used as alternatives, they still face limitations, including the need for dynamic feedback mechanisms to improve sorting accuracy (Neo et al., 2022).
Automatic sorting systems have emerged as a promising solution to these challenges, offering economic and health benefits across a wide range of applications (Gundupalli et al., 2017). Automatic sorting systems not only reduce costs through various innovations but also protect human workers from health risks, such as exposure to toxic substances and pathogens present in waste. Gundupalli et al. (2017) emphasized the need for combining multiple types of measurement devices to overcome the limitations of individual devices and highlight the necessity of technologies capable of high-accuracy identification under diverse conditions to further advance automated sorting systems. Since automatic sorting systems can record feature data and identification results, developing identification algorithms that leverage accumulated data is essential for improving identification accuracy.
Automatic sorting systems frequently use image data to identify recyclable materials in mixed waste through image recognition (Stiebel et al., 2018, Wang et al., 2019, Nowakowski and Pamuła, 2020, Zhang et al., 2021, Nguyen et al., 2024). These studies have developed algorithms that identify items such as plastic bottles based on external features, such as shape and color. However, while image recognition effectively detects various recyclables based on external features, it struggles to distinguish between plastics with similar appearances but different material compositions. The quality of recycled plastic materials improves with higher material purity. Moreover, since recycling methods vary depending on the type of plastic, maximizing material identification accuracy is crucial.
Spectroscopic methods can measure the intrinsic characteristics of plastic materials, rather than their external features, by analyzing spectral data such as transmittance, which is expressed as the ratio between the original and transmitted voltage of electromagnetic waves. These methods, including NIR spectroscopy, Raman spectroscopy, and Laser-Induced Breakdown Spectroscopy (LIBS), have been applied to plastic identification (Peng et al., 2021b, Yan et al., 2021, Marica et al., 2022, Tan et al., 2022). For example, Tan et al. (2022) combined image recognition with NIR spectroscopy to develop an algorithm for sorting washing machine parts and identifying plastics such as PP, PS, and ABS. Although NIR spectroscopy is widely used in the recycling industry, its limitations have become increasingly apparent. Neo et al. (2022) reviewed recent studies on spectroscopic methods for sorting plastic waste and highlight that NIR spectroscopy has been commonly employed due to its relatively low cost and effectiveness in identifying plastic materials. However, the study also highlights the limitations of NIR, such as difficulty in detecting black plastics and weak spectral features. Neo et al. (2022) suggested the need for hybrid methods that combine multiple spectroscopic techniques. Additionally, Araujo-Andrade et al. (2021) and Yu et al. (2022) reviewed spectroscopic methods, including the emerging THz spectroscopy, and noted that its application remains limited. Further research on the effectiveness of THz spectroscopy is needed to expand the available spectroscopic options for plastic identification and explore its potential for advancing sorting technology.
THz spectroscopy uses electromagnetic waves with frequencies ranging from 0.1 to 10 THz, occupying the spectral region between radio and light waves and combining the penetrative properties of radio waves with the straightness of light waves. THz spectroscopy, recognized for its non-destructive, contactless, and rapid data acquisition capabilities, has been applied in modern technologies such as Beyond 5G/6G communication systems and autonomous driving technologies, with equipment costs expected to decrease as its usage expands. Despite its anticipated practical value and broad industrial applicability, research on the application of THz transmittance spectroscopy to post-consumer plastic waste remains largely unexplored. Studies employing THz waves have primarily focus on identifying materials using laboratory samples (Küter et al., 2018, Tanabe et al., 2020, Cielecki et al., 2023). Küter et al. (2018) is one of the few studies that investigated black plastic identification using THz spectroscopy. They reported an accuracy exceeding 90% and proposed an automated sorting system based on THz spectroscopy. Tanabe et al. (2018) demonstrated the effectiveness of THz transmittance spectroscopy for classifying plastic materials by analyzing the spectral features of various polymers, such as PET, PP, and PE. Cielecki et al. (2023) used PET and PE samples obtained from plastic products and demonstrated a high identification accuracy, highlighting the effectives of THz spectroscopy.
Plastic products, such as those used in packaging, contain a wide variety of additives designed to enhance product quality, including those for coloring, enhancing appearance, improving processability, plasticity, and gas-barrier properties, among others. Tanabe et al. (2020) examined the relationship between the amount of additives and the transmittance of THz waves, demonstrating that even for the same plastic material, transmittance varies depending on the amount of additives. Measuring the amount of additives in post-consumer plastic waste involves significant time and financial costs and only a limited number of products can be assessed. Therefore, it is essential to develop identification methods that allow for accurate material identification, even when transmittance varies.
This study aims to investigate the effectiveness of THz spectroscopy and propose a combined measurement system with NIR spectroscopy for sorting post-consumer plastic waste by material type. Furthermore, this study aims to develop an identification process that employs machine learning (ML) techniques to accurately identify materials, even when spectral features vary in diverse post-consumer plastic waste. By utilizing eXplainable AI (XAI) techniques, this study evaluates the impact of NIR and multiple THz wave frequencies on identification accuracy. The application of XAI provides insights into the factors influencing model performance, offering quantitative insights into which frequencies and spectroscopic methods should be prioritized. These findings are expected to contribute to the development of hybrid spectroscopic approaches for enhanced plastic waste sorting.
This study uses state-of-the-art ML techniques to achieve high-precision material identification, even under sample-specific variations in spectral data. Neo et al. (2022) reviewed ML techniques used to process spectral data for plastic material identification. The review highlights that while recent ML techniques, such as deep learning, have rapidly advanced, there are still limited studies applying these methods to plastic identification. Neo et al. (2022) also suggested that the potential for future applications of ML techniques in this field should be explored further. Deep learning and eXtreme Gradient Boosting (XGBoost; Chen and Gusterin, 2018) have been recently gained recognition as high-performance algorithms. While deep learning has been applied in a few studies, the use of XGBoost remains less prevalent. Shwartz-Ziv and Armon (2022) demonstrated that XGBoost could outperform deep learning methods even with limited samples. Considering its suitability as a preliminary analysis tool, this study utilizes XGBoost. In this algorithm, various hyperparameters significantly influence the identification performance. These hyperparameters must be carefully tuned, necessitating an extensive search for optimal hyperparameter values. We employ Bayesian optimization (Mockus, 1975, Snoek et al., 2012) to find optimal hyperparameter values. Mockus(1975) proposed a framework for optimizing an unknown function with limited evaluations by updating the solution considering information exploration and exploitation. Snoek et al. (2012) posited that hyperparameter tuning could be considered the optimization problem of an unknown function that reflects model performance. Application of Bayesian optimization for hyperparameter tuning can be found in several studies (e.g., Shahriari et al., 2016).
XGBoost constructs complex identification models, making it difficult to interpret the influence of input variables on identification performance. We employ XAI techniques to evaluate the impact of incorporating frequency-specific transmittance data. Among various XAI techniques, the SHAP (Shapley Additive exPlanations) value, proposed by Lundberg et al. (2017), is used in this study to evaluate the contribution of each transmittance on model accuracy. SHAP values can quantify the contribution of each feature, such as the frequency-specific transmittance values, to the model’s predictions. In the field of waste management, XAI techniques have previously been used to extract important features for evaluating the composition of biomass and waste from spectral data obtained through spectroscopic methods (Liang et al., 2023). However, to the best of our knowledge, XAI has not yet been applied to plastic material identification from post-consumer waste. In this study, we combine multiple spectroscopic methods, such as NIR and THz waves at various frequencies, to enhance material identification and utilize XAI to assess the relative importance of these frequencies.
This paper is organized as follows: In Section 2, we describe the plastic samples collected from post-consumer food containers and packaging waste, the experimental setup employed to measure their spectral transmittance data, and provide an overview of the ML techniques employed in this study. Section 3 presents and discusses the results of our measurements and the predictions made by the identification model, exploring the characteristics of the data and evaluating the model’s performance. Finally, Section 4 concludes this paper by summarizing the key findings and discussing potential directions for future research.
3. Results and Discussion
3.1. Measurement results
Fig. 3 represents the transmittance values of NIR, 0.075 THz, 0.090 THz, and 0.140 THz for each sample measured by MFPI. Each line represents the measurement results for an individual sample, along with the mean transmittance values for each frequency and material type. The transmittance of transparent PET samples is represented by red lines, that of transparent PS by green lines, and that of black PS by blue lines.

Fig. 3. Transmittances of Transparent and Black PSs and Transparent PET for NIR and THz Waves (0.075 THz, 0.090 THz, and 0.140 THz).
The difference in NIR transmittance between black PS and transparent plastics is apparent. The mean transmittance values for transparent PET and PS are close to 1.00, whereas that of black PS is close to 0.00. Black PS can be accurately distinguished from transparent PET and PS. However, relying solely on NIR to differentiate between transparent PET and PS is challenging; therefore, additional frequencies must be explored for more accurate identification.
When comparing the THz transmittance of transparent PS and PET, the transmittance of transparent PS is generally higher than that of transparent PET at 0.075 THz, 0.090 THz, and 0.140 THz. At 0.075 THz, the transmittance of transparent PS is primarily concentrated between 0.90 and 1.00, whereas that of transparent PET is primarily concentrated between 0.70 and 0.90. Therefore, the transmittance of transparent PS tends to be higher than that of transparent PET. Combining THz spectroscopy with NIR improves the accuracy of differentiating between transparent PS and PET. As the frequency increases from 0.075 THz, the mean transmittance of each material generally decreases, while the variance in transmittance increases. Given the overlap in transmittance between the green and red lines, employing ML techniques is essential for accurately identifying transparent PET and PS considering the variations observed in the samples.
3.2. Identification results of ML model
We employed XGBoost to identify plastic materials using the transmittance values of NIR and three THz frequencies as features. We examined a total of 133 samples, including transparent PET, transparent PS, and black PS, and dividing them into a training set of 106 samples and a test set of 27 samples.
As the number of plastic material types to be identified increases, the overall required sample size increases proportionally. Benchmarks for sample size can be drawn from previous studies that applied ML and spectroscopic techniques to plastic waste identification. For example, Liang et al. (2023) used 120 samples, a sample size comparable to our study, and identified 30 types of materials with four samples collected for each material. Similarly, Yan et al.(2021) measured 24 types of materials, including plastics, using LIBS and prepared 264 samples, with 11 samples per material, of which 10 samples were used for training and one for testing. Compared to these studies, our study has a sufficient sample size, comprising 133 samples, including 56 transparent PET, 48 transparent PS, and 29 black PS. These samples were divided into training and testing datasets, and THz spectroscopy enables high-quality feature extraction for material identification.
One of the challenges associated with small sample sizes is the potential decline in generalization performance. To address this issue, we employed K-fold cross-validation to enhance generalization performance. K-fold cross-validation splits the training data into folds, and we trained on all folds but the ’th fold, and tested the trained model on the ’s-fold (Murphy, 2013). We used the average accuracy from K-fold cross-validation to measure generalization performance. We used for cross-validation, which is a commonly used setting for this method (Murphy, 2013).
An identification model was developed using the training data and tested on the test data. As a result, the model achieved a high accuracy of 0.93, as shown in Table 1. The precision for transparent PET was 0.92. The model identified 12 samples as transparent PET; 11 of those samples were correctly identified, while one sample, actually transparent PS, was misidentified as transparent PET. Fig. 4a illustrates this misidentified transparent PS sample, while Fig. 4b depicts its spectral features, highlighted with a thick black line.
Table 1. Model Performance.
Empty Cell | Accuracy | Precision | Recall | score |
---|---|---|---|---|
Transparent PET | 0.93 | 0.92 | 0.92 | 0.92 |
Black PS | 1.00 | 1.00 | 1.00 | |
Transparent PS | 0.90 | 0.90 | 0.90 |

Fig. 4. Misclassified example (transparent PS/PET misclassified as transparent PET/PS).
Examining the spectral features of this sample in Fig. 4b (represented by a thick black line), we observe that its transmittance values at 0.075 THz and 0.140 THz are similar to those observed for transparent PET. Therefore, this suggests that this sample was misclassified as transparent PET, as its transmittance values at 0.075 THz and 0.140 THz are lower, even though it should have been identified as transparent PS. Although the transmittance values at 0.075 Thz and 0.140 THz exhibit these characteristics, Fig. 4b shows that the transmittance of this sample at 0.090 THz is higher than those of most transparent PET samples. This is because the identification model placed greater emphasis on transmittance at 0.075 THz and 0.140 THz than on that at 0.090 THz. A more detailed understanding can be achieved through XAI techniques.
Fig. 5 shows the SHAP summary plots, which quantify the impact of each transmittance on plastic material identification using the test data. The features are ranked in descending order according to their impact on the model’s identification and are shown in the vertical (y) axis on each diagram. The horizontal (x) axis depicts the SHAP values for each test sample. The color of each dot indicates the magnitude of each transmittance, where red dots represent samples with high transmittance and blue dots represent samples with low transmittance. Fig. 5a presents the SHAP summary plot for transparent PET identification, highlighting the significant impact of transmittance at 0.075 THz and 0.140 THz on its identification. Samples with blue dots at 0.075 THz, which exhibit positive SHAP values, are more likely to be identified as transparent PET due to their low transmittance values at 0.075 THz. Similarly, samples with low transmittance values, indicated by blue dots, at 0.140 THz are more likely to be identified as transparent PET. Furthermore, samples with high transmittance in the NIR range are more likely to be identified as transparent PET. The influence of the transmittance at 0.090 THz on this identification is significantly smaller than that of other frequencies. These results show that low transmittance values measured at 0.075 THz and 0.140 THz are crucial features for transparent PET identification. Consequently, the sample depicted in Fig. 4a, a transparent PS, was misidentified as transparent PET due to its low transmittance values at 0.075 THz and 0.140 THz.

Fig. 5. SHAP values for identifying each plastic material.
As shown in Table 1, the precision, recall, and score values for black PS each achieve 1.00. These values, calculated using equations (2), (3), (4), indicate that all five black PS samples in the test data were correctly identified, indicating that there were no FP (samples incorrectly identified as black PS) and no FN (black PS samples that were overlooked). This high accuracy is attributable to the distinct NIR transmittance characteristics of black PS, which is close to 0.00, in contrast to those of transparent PS and PET, which are close to 1.00. These differences in NIR transmittance enabled the clear differentiation of black PS from other transparent plastic waste in this case.
To further analyze the contribution of each feature to identification, Fig. 5b presents the SHAP values for black PS identification. The results reveal that NIR transmittance exhibits high SHAP values, emphasizing its critical role in identification. Specifically, samples with lower NIR transmittance values, indicated by blue dots, are more likely to be correctly identified as black PS.
As shown in Table 1, the precision for transparent PS was 0.90. This was because the identification model identified ten samples as transparent PS, and nine of them were correctly identified. However, the remaining sample (shown in Fig. 4c) was actually transparent PET but was misidentified. Fig. 4d presents the spectral features of this sample, highlighting it with a thick black line. The transmittance values at 0.075 THz, 0.090 THz, and 0.140 THz of this misidentified sample were generally higher than those of other transparent PET samples. They fell within the range where transparent PS transmittance values are commonly found. The observed transmittance values are primarily attributed to the thickness of the material. Transmittance tends to be higher when a plastic material becomes thinner. In this case, the misidentified sample had a thickness of 0.17 mm, significantly thinner than the average thickness of other transparent PET samples (0.32 mm). This thinness likely accounts for the higher transmittance values observed, leading to its misidentification as transparent PS.
Additionally, the sample shown in Fig. 4c appears to be more reflective than the representative sample of transparent PET. This difference in appearance may result from differences in the additives used. While additives are known to affect transmittance (Tanabe et al., 2020), identifying their exact types and quantities in post-consumer plastic waste remains a significant challenge. As Nerin et al. (2013) highlighted, the composition of raw materials and ingredients used in commercially available plastic food containers and packaging is highly complex and often confidential, with many components not disclosed. Consequently, although additives might contribute to the observed transmittance, their exact impact is difficult to ascertain due to the complexity and confidentiality of material compositions. Therefore, their contribution is regarded as secondary to the material’s thinness.
Fig. 5c shows the SHAP summary plot for identifying transparent PS, indicating that samples with high transmittances of 0.140 THz, NIR, and 0.075 THz are more likely to be identified as transparent PS. Notably, the SHAP values for transmittance at 0.090 THz are lower than those for other frequencies. The limited influence of the 0.090 THz transmittance on identifying these plastic materials is a common characteristic of all plastic material identifications. This may be due to interference effects caused by vertical THz irradiation, which led to significant variations in transmittance irrespective of the material. As a result, the 0.090 THz transmittance may have played a minor role in plastic material identification. Fig. 3 demonstrates that the transmittance of THz waves is higher for transparent PS than for transparent PET, a trend that remains robust despite variations in sample size. Furthermore, Fig. 5c highlights the key features—transmittances at 0.075 THz and 0.140 THz—where this trend is most pronounced. These features were quantitatively identified as critical for enhancing classification accuracy. Therefore, our algorithm can effectively use these robust properties, ensuring high generalizability, even as the sample size increases.
The methods developed in this study, including THz spectroscopy, ML algorithms, and key frequencies evaluation, demonstrate strong potential for practical implementation in plastic waste management. For example, NIR transmittance and manual sorting are widely used in plastic waste sorting, but these methods have well-documented limitations. The proposed method could serve as an alternative or complementary approach to these methods, offering enhanced efficiency and accuracy in material identification. Integrating our equipment into the sorting process could significantly improve the effectiveness of real-world sorting systems.
Moreover, the compact size of our equipment makes it suitable for installation in consumer-accessible devices at resource recovery stations where plastic containers are deposited. By implementing this system in such resource recovery stations, this system is expected not only to eliminate the need for waste sorting at recycling facilities but also to enable the direct transportation of pre-sorted plastic waste, allowing recycling facilities to specialize in post-sorting recycling processes, thereby reducing transportation costs. These potential applications illustrate how the proposed method could complement existing technologies and improve the overall efficiency of waste management systems while reducing associated costs. This integration offers both economic and environmental benefits, reinforcing the practicality of the method in addressing current challenges in plastic recycling.
4. Conclusion
To address the growing issue of plastic pollution, it is crucial to develop automated sorting systems capable of accurately sorting post-consumer plastic waste by material type. In this study, we focused on THz spectroscopy as a spectroscopic method for material identification and demonstrated thattransparent and black post-consumer plastic food containers and packaging waste can be identified with high accuracy, particularly for materials such as PET and PS. Notably, while it is challenging to distinguish between transparent PET and PS using only NIR spectroscopy, our findings show that THz waves facilitate this distinction. This confirms the effectiveness of combining NIR and THz spectroscopy for material identification.
Our identification algorithm is based on the XGBoost algorithm and achieved a high precision score exceeding 0.90, indicating that over 90% of these materials were correctly identified. Although XGBoost requires extensive hyperparameter tuning before training, we employed Bayesian optimization to automate the process, minimizing human intervention and efficiently optimizing parameters. This approach reduces the cost and effort required to build identification models.
Furthermore, by using XAI, we evaluated the impact of various spectroscopic methods on identification accuracy, revealing that the key frequencies differ depending on the material. NIR was proved effective for distinguishing black plastics from transparent ones, while THz transmittance at 0.140 THz for PS and 0.075 THz for PET was critical for accurately identifying transparent plastics. These findings emphasize the need to select appropriate spectroscopic methods and frequencies for sorting other materials.
Further study can build on our findings by incorporating additional factors, such as plastic thickness, to improve identification accuracy. Developing technologies that can efficiently measure the thickness of each plastic item, even when sorting large volumes of plastic waste, will enable the integration of thickness data with spectroscopic data, enhancing the accuracy of the method for large-scale plastic waste management applications. Additionally, the effectiveness of our method should be further validated with additional samples of the materials analyzed in this study, as well as tested with a broader range of plastic materials and container colors, including multi-material plastics composed of different materials for the body and lid, and emerging materials such as bioplastics.
Furthermore, it is important to explore the practical implementation of this technology, including its integration into existing sorting processes, its potential role in early-stage sorting at resource recovery stations, and its ability to reduce transportation and processing costs. Future studies should assess the economic feasibility and environmental impact of the method by evaluating installation costs, sorting accuracy, and its overall contribution to waste management systems. These assessments will provide valuable insights into the applicability of our method as part of a sustainable plastic recycling system.
CRediT authorship contribution statement
Kazuaki Okubo: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Conceptualization. Gaku Manago: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation. Tadao Tanabe: Writing – review & editing, Validation, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Conceptualization. Jeongsoo Yu: Writing – review & editing, Supervision, Resources, Methodology, Funding acquisition. Xiaoyue Liu: Writing – review & editing, Visualization, Funding acquisition. Tetsuo Sasaki: Writing – review & editing, Supervision, Methodology, Investigation.
February 19, 2025 at 09:18PM
https://ift.tt/1tYvlPi