This readme file was generated on 2025-03-24 by José Joaquín Peralta Abadía GENERAL INFORMATION Title of Dataset: MU-TCM face-milling dataset Author/Principal Investigator Information Name: Jose Joaquín Peralta Abadía ORCID: https://orcid.org/0000-0003-0261-6792 Institution: Mondragon Goi Eskola Politeknikoa Address: Loramendi, 4, 20500 Arrasate - Mondragon Gipuzkoa, Spain Email: jjperalta@mondragon.edu Author/Associate or Co-investigator Information Name: Mikel Cuesta Zabaljauregui ORCID: https://orcid.org/0000-0001-6419-5394 Institution: Mondragon Goi Eskola Politeknikoa Address: Loramendi, 4, 20500 Arrasate - Mondragon Gipuzkoa, Spain Email: mcuesta@mondragon.edu Author/Associate or Co-investigator Information Name: Felix Larrinaga Barrenechea ORCID: https://orcid.org/0000-0003-1971-0048 Institution: Mondragon Goi Eskola Politeknikoa Address: Loramendi, 4, 20500 Arrasate - Mondragon Gipuzkoa, Spain Email: flarrinaga@mondragon.edu Date of data collection: 2023-07 Geographic location of data collection: Arrasate - Mondragon Gipuzkoa, Spain Information about funding sources that supported the collection of the data: This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 814078 and by the Department of Education, Universities and Research of the Basque Government under the projects Ikerketa Taldeak (Grupo de Ingeniería de Software y Sistemas IT1519-22 and Grupo de investigación de Mecanizado de Alto Rendimiento IT1443-22). SHARING/ACCESS INFORMATION Licenses/restrictions placed on the data: CC-BY Links to publications that cite or use the data: https://rdcu.be/enmg7 Links to other publicly accessible locations of the data: N/A Links/relationships to ancillary data sets: N/A Was data derived from another source? No If yes, list source(s): Recommended citation for this dataset: Peralta Abadia, J. J., Cuesta Zabaljauregui, M. & Larrinaga Barrenechea, F. MU-TCM face-milling dataset. 2025. DATA & FILE OVERVIEW File List: The MU-TCM face-milling dataset is composed of two compressed files: full_dataset.7z and small_subset.7z. The former is the full dataset and the latter is a smaller subset to allow users to evaluate the data before committing to downloading the full dataset, due to the large size of the full dataset. The subset includes data for eight experiments, covering both materials under identical cutting conditions. When extracted, both compressed files are organized into three main folders: -signals_unsynced: Contains .mat files with raw internal and external signals and experiment metadata. -signals_synced: Stores .mat files with synchronized signal data for analysis. -VB_images: Includes subfolders for each cutting insert and edge, with annotated and unannotated images (.jpg) of tool wear measurements. Additionally, two .csv files are included at the root level: -signals_sync.csv provides synchronization details. -signals_stats.csv contains cutting conditions and extracted features in time, frequency, and time-frequency domains. Relationship between files, if important: Each .mat and image file is named based on cutting insert, edge, cutting conditions, and tool wear (VB) measurements for easy identification. Additional related data collected that was not included in the current data package: N/A Are there multiple versions of the dataset? No If yes, name of file(s) that was updated: Why was the file updated? When was the file updated? METHODOLOGICAL INFORMATION Description of methods used for collection/generation of data: The data collection involved capturing both internal and external signals during face-milling experiments. External signals, including vibration, force, and acoustic emissions (AE), were collected via a MATLAB script on an external workstation connected to a data acquisition (DAQ) unit. External signals were acquired at high sampling rates (50 kHz for vibration and force; 1 MHz for AE) to capture detailed information on tool-material interactions. Internal signals were collected directly from the CNC system at 250 Hz using OptiTwin's experiment designer, which also recorded process data, cutting conditions, and tool details. Tool wear was measured at the start of each experiment using a macroscope and documented with both annotated and unannotated images, providing flexibility for image analysis techniques. This wear data was stored in both image files and within the CNC software to link wear values to specific experiments. Methods for processing the data: Due to communication constraints between the CNC machine and the external workstation, automatic synchronisation between internal and external signals was not possible. Additionally, slight delays were noted between the external signals. To address this, the signals were manually synchronised by analyzing the initial and final peaks corresponding to tool entry and exit. The first peak was skipped, focusing synchronisation on when the tool had fully entered the workpiece. Following synchronisation, time, frequency, and time-frequency domain features were extracted from the signals to analyse machining trends and behaviours. This feature extraction supports ML model training for tool condition monitoring. Instrument- or software-specific information needed to interpret the data: Python 3 is the recommended programming environment for working with the MU-TCM dataset, with the primary libraries being Pandas and NumPy for data handling, and Matplotlib for signal plotting and visualisation. All data pre-processing and feature extraction steps were performed using the following Python libraries: -NumPy: For basic statistical calculations and signal processing. -SciPy: For advanced signal processing, including frequency domain analysis. -PyWavelets: For extracting time-frequency domain features using wavelet transforms. Standards and calibration information, if appropriate: N/A Environmental/experimental conditions: The design of experiments (DOE) for the MU-TCM dataset focused on industrial relevance, guided by the manufacturer’s recommended settings for the tools and materials. This DOE included eight combinations of cutting conditions. Two materials were used: cast iron (machined dry) and stainless steel (using minimum quantity lubrication, MQL). The experiments employed four cutting speeds (100 and 200 m/min for cast iron, and 50 and 100 m/min for stainless steel) and four feed rates (0.1 and 0.2 mm/rev for cast iron, and 0.05 and 0.1 mm/rev for stainless steel). In all experiments, the axial depth of cut was 1.5 mm, with a radial depth of cut of 58.4 mm. The procedure was tested with four wear levels (0.0, 0.1, 0.2, and 0.3 mm) for each combination of cutting conditions, resulting in a total of 32 experiments, which were repeated twice. Describe any quality-assurance procedures performed on the data: The quality-assurance procedures of the MU-TCM dataset focused on signal accuracy and reliability. The first step involved manually synchronising internal and external signals due to communication limitations between the CNC machine and workstation. This was done through peak analysis to align key reference points. After synchronisation, time, frequency, and time-frequency features were extracted from the signals aiming for machine learning model training. These features were then compared with VB measurements to identify correlated trends. Pearson and Spearman correlation coefficients were calculated, revealing that there are correlations that could help monitor tool wear during machining. People involved with sample collection, processing, analysis and/or submission: All authors conceived the experiments, J.J.P.A. and M.C.Z. conducted the experiments, and all authors analysed the results. DATA-SPECIFIC INFORMATION FOR: signals_unsynced and signals_synced folders Number of variables: 42 Number of cases/rows: 67 Variable List: -ae: Radial depth of cut, a float64 variable. Always has the value 58.5. Unit: mm. -AE_F: Filtered acoustic emission signal, an array of float64 values. Unit: V. -AE_RMS: RMS acoustic emission signal, an array of float64 values. Unit: V. -ap: Axial depth of cut, a float64 variable. Always has the value 1.5. Unit: mm. -A(x-y-z): Vibrations (gravitational acceleration) in three axes, an array of float64 values. Unit: g. -CV3_(S-X-Y-Z): RMS current feedback of the spindle motor and table motors in three axes, an array of float64 values. Unit: Arms. -Date: Date and time of the start of the experiment, a string variable in the format (yyyy-MM-dd hh:mm). Unit: N/A. -fz: Feed rate per tooth, a float64 variable. Unit: mm/rev. -FREAL: Actual cutting feed rate, an array of float64 values. Unit: mm/rev. -F(x-y-z): Cutting forces in three axes, an array of float64 values. Unit: N. -ID: Experiment identifier, a string variable. Unit: N/A. -Insert: Cutting insert numbers, an integer variable. Unit: N/A. -Edge: Edge number of the insert, an integer variable. Unit: N/A. -Lubrication: Type of lubrication, a string variable with possible values "Dry" or "MQL". Unit: N/A. -Machine: Machine make and model, a string variable. Always has the value "Lagun GVC 1000-HS". Unit: N/A. -POS_S: Spindle angular position within a 360º rotation, an array of float64 values. Unit: º. -POS_(X-Y-Z): Spindle position relative to the machine table in three axes, an array of float64 values. Unit: µm. -Repetition: Number of repetitions, a string variable. Unit: N/A. -SREAL: Actual spindle speed, an array of float64 values. Unit: RPM. -ToolDiameter: Diameter of the tool holder, a float64 variable. Always has the value 80.0. Unit: mm. -ToolID: Tool holder identifier, a string variable. Always has the value "Plato D80 Z1". Unit: N/A. -ToolManufacturer: Manufacturer of the cutting insert, a string variable. Always has the value "AYMA". Unit: N/A. -ToolMaterial: Material of the cutting insert, a string variable. Always has the value "HM". Unit: N/A. -ToolReference: Reference of the cutting insert, a string variable. Always has the value "SPKR 1203EDSRM55 AF720". Unit: N/A. -TV2_(S-X-Y-Z): Motor torque feedback of the spindle motor and table motors in three axes, an array of float64 values. Unit: Nm. -TV50: Spindle motor power feedback, an array of float64 values. Unit: kW. -TV51: Spindle motor active power, an array of float64 values. Unit: W. -VB: Tool wear measured before the experiment beginning, a float64 variable. Unit: mm. -Vc: Cutting speed, a float64 variable. Unit: m/min. -WorkpieceMaterial: Material of the workpiece, a string variable. Unit: N/A. Specialized formats or other abbreviations used: MATLAB-formatted data files (.mat extension) DATA-SPECIFIC INFORMATION FOR: VB_images folder Number of variables: 9 folders Number of cases/rows: 74 Variable List: -Insert0Edge2: Folder containing VB images for edge 2 of cutting insert 0. Contains 4 images. -Insert0Edge3: Folder containing VB images for edge 3 of cutting insert 0. Contains 2 images. -Insert1Edge1: Folder containing VB images for edge 1 of cutting insert 1. Contains 10 images. -Insert2Edge1: Folder containing VB images for edge 1 of cutting insert 2. Contains 10 images. -Insert3Edge1: Folder containing VB images for edge 1 of cutting insert 3. Contains 10 images. -Insert4Edge1: Folder containing VB images for edge 1 of cutting insert 4. Contains 10 images. -Insert6Edge1: Folder containing VB images for edge 1 of cutting insert 6. Contains 20 images. -Insert9Edge1: Folder containing VB images for edge 1 of cutting insert 9. Contains 4 images. -Insert9Edge2: Folder containing VB images for edge 2 of cutting insert 9. Contains 4 images. Specialized formats or other abbreviations used: JPEG image files (.jpg extension) DATA-SPECIFIC INFORMATION FOR: signals_stats.csv Number of variables: 228 Number of cases/rows: 67 Variable List: -_file_name: Name of the signals file. -RPM_avg: Average value calculated from the SREAL signal. -material: Workpiece material. -VB: The VB measured before the beginning of the experiment. -Vc: The v_C of the experiment. -ae: The a_e of the experiment. -ap: The a_p of the experiment. -fz: The f per tooth of the experiment. For each signal: -(signal)_start: Indicates the start index of the signal used to extract the features. -(signal)_end:Indicates the end index of the signal used to extract the features. Time domain features: -(signal)_max: Indicates the maximum value. -(signal)_kurt: Indicates the kurtosis value. -(signal)_rms: Indicates the RMS value. -(signal)_skew: Indicates the skewness value. -(signal)_var: Indicates the variance value. -(signal)_ptp: Indicates the peak-to-peak value. Frequency domain features: -(signal)_speckurt: Indicates the spectral kurtosis value. -(signal)_specskew: Indicates the spectral skewness value. Time-frequency domain feature: -(signal)_wavenergy: Indicates the wavelet energy value. Specialized formats or other abbreviations used: CSV files (.csv extension) with semicolon (;) separators DATA-SPECIFIC INFORMATION FOR: signals_sync.csv Number of variables: 40 Number of cases/rows: 67 Variable List: -_file_name: Name of the signals file. -RPM_avg: Average value calculated from the SREAL signal. For the internal (i) and external signals (e1 for signals at 50 kHz and e2 for signals at 1 MHz), -(e1-e2-i)_signal: Signals selected as reference for synchronisation. -(e1-e2-i)_start: The start index of the signals selected after synchronisation. -(e1-e2-i)_end: The end index of the signals selected after synchronisation. -(e1-e2-i)_peak_distance: The average distance between peaks. -(e1-e2-i)_freq_peaks: The calculated frequency of the signals. This is calculated as RPM_avg / 60 * peak_dist, where peak_dist is the value of peak_distance. -(e1-e2-i)_peak_first: The index of the first peak. -(e1-e2-i)_peak_last: The index of the last peak. -(e1-e2-i)_peak_qty: The quantity of peaks between start and end. -(e1-e2-i)_peak_height: The minimum height (value) to identify peaks selected during the synchronisation process. -(e1-e2-i)_peaks_value_avg: The average value of the peaks. -(e1-e2-i)_peaks_value_max: The maximum value of the peaks. -(e1-e2-i)_peaks_value_min: The minimum value of the peaks. For the e1 and e2 signals, -(e1-e2)_sec_search: Number of seconds of the start of the signal selected to look for the first peak and to identify (e1-e2)_peak_height. -(e1-e2)_sec_between_peaks: Number of seconds of the start of the signal selected to look for the first peak and to identify (e1-e2)_peak_height. Specialized formats or other abbreviations used: CSV files (.csv extension) with semicolon (;) separators