Analytical Chemist: Just Published: Chemometrics and Intelligent Laboratory Systems

A new issue of this journal has just been published. To see abstracts of the papers it contains (with links through to the full papers) click here:

Chemometrics and Intelligent Laboratory Systems
http://rss.sciencedirect.com/publication/science/5232

Selected papers from the latest issue:

Gravitational search algorithm: A new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives

07 February 2013, 10:01:13

15 March 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 122

Choosing the most suitable subset of descriptors among a large number of structural parameters is one of the most important and challenging steps in quantitative structure–activity relationship (QSAR) studies. So far, many feature selection algorithms have been applied in these studies, but none of them behave generally. In this study, a binary version of gravitational search algorithm (GSA) as a novel feature selection method is developed and coded for QSAR studies. The GSA is applied as a descriptor selection tool for anticancer potency modeling of a set of imidazo[4,5-b]pyridine derivatives consisting of 65 compounds. The GSA selected descriptors were subjected to Bayesian regularized artificial neural networks to model the anticancer potency. The generated model satisfactorily describes the experimental variation in the biological activity of the data set compounds. The results of external validation (R v ² =0.98) and internal cross-validation tests (Q LOO ² =0.94, R L4O ² =0.93, R L8O ² =0.92) in conjunction with Y-randomization confirm the predictive ability, robustness and effectiveness of the generated model. Also, comparison between GSA and genetic algorithm (GA) indicates that GSA has certain advantages over the GA.

Highlights

► Gravitational search algorithm (GSA) is developed and coded for QSAR studies. ► Anticancer potency of 65 imidazo[4,5-b]pyridine derivatives is investigated. ► The GSA is applied as descriptor selection tool for anticancer potency modeling. ► BR-ANN is used to model the anticancer potency using GSA selected descriptors. ► Comparison between GSA and GA indicates that GSA has certain merit over the GA.

Validation of soft classification models using partial class memberships: An extended concept of sensitivity & co. applied to grading of astrocytoma tissues

07 February 2013, 10:01:13

15 March 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 122

We use partial class memberships in soft classification to model uncertain labeling and mixtures of classes. Partial class memberships are not restricted to predictions, but may also occur in reference labels (ground truth, gold standard diagnosis) for training and validation data. Classifier performance is usually expressed as fractions of the confusion matrix, like sensitivity, specificity, negative and positive predictive values. We extend this concept to soft classification and discuss the bias and variance properties of the extended performance measures. Ambiguity in reference labels translates to differences between best-case, expected and worst-case performance. We show a second set of measures comparing expected and ideal performance which is closely related to regression performance, namely the root mean squared error RMSE and the mean absolute error MAE. All calculations apply to classical crisp as well as to soft classification (partial class memberships as well as one-class classifiers). The proposed performance measures allow to test classifiers with actual borderline cases. In addition, hardening of e.g. posterior probabilities into class labels is not necessary, avoiding the corresponding information loss and increase in variance. We implemented the proposed performance measures in R package “softclassval” which is available from CRAN and at softclassval.r-forge.r-project.org. Our reasoning as well as the importance of partial memberships for chemometric classification is illustrated by a real-word application: astrocytoma brain tumor tissue grading (80 patients, 37,000 spectra) for finding surgical excision borders. As borderline cases are the actual target of the analytical technique, samples which are diagnosed to be borderline cases must be included in the validation.

Highlights

► Partial class memberships model ambiguity of classification for borderline cases. ► Best-case, expected and worst-case performance, relation to weighted RMSE and MAE ► Better variance properties than usual crisp classifier performance ► Example application: brain tumor grading for intra-surgical guidance ► Implementation available as R package softclassval

Use of multivariate chemometric algorithms on 1H NMR data to assess a soluble fiber (Plantago ovata husk) nutritional intervention

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

The study of nutritional interventions in humans is difficult to assess because the induced metabolic changes are lower than the natural biological variability between subjects. Due to its holistic approach, ¹H NMR is one of the preferred technologies for this type of studies, even though it has a very low sensitivity. This work shows how the use of several chemometric algorithms on the measured data compensates for these drawbacks and allows the study of the effects of the nutritional intervention isolating them from the natural variability inherent to human studies. Mild to moderate hypercholesterolemic patients received either placebo or soluble fiber in a low saturated fat diet. Plasma samples were collected at week 0 and week 8. Spectra obtained with NMR equipment were processed with ANOVA simultaneous component analysis (ASCA). The application of clustering techniques revealed different responses based on the patient's basal state, which allowed the identification of responders from non-responders. Results showed a triglyceride level reduction of up to 15% (p=0.0032), with a higher reduction for those patients with a higher initial lipid profile. Moreover, line-shape fitting techniques applied to the NMR spectra allowed the conclusion that LDL (and VLDL) lipoprotein particles, and more noticeably triglycerides, moved to a profile configuration associated with lower cardiovascular risk. Results shed light on some of the metabolic modifications that husk fiber induces in humans which could not be seen with more conventional data analysis approaches. Our conclusion is that by using the right chemometric techniques it is possible to assess nutritional intervention effects in human NMR human studies despite the low sensitivity and selectivity that the technique offers today.

Highlights

► Mild to moderate hypercholesterolemic patients received either placebo or Po-husk. ► ASCA discerned induced metabolic changes from natural variability between subjects. ► Our study revealed different responses that depended on the patient's basal state. ► Spectral line shape fitting algorithms helped diagnose metabolic syndrome. ► Results showed a triglyceride level reduction of up to 15% (p=0.0032).

An investigation on hydrogen bonding between 3-methylindole and ethanol using trilinear decomposition of fluorescence excitation–emission matrices

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

The multi-state fluorescence characteristics of 3-methylindole (MI) make its spectra rich in chemical information and the spectral interpretation rather challenging. The trilinear decomposition method could be appropriate for this task and provide a deeper insight into the hydrogen bonding to MI. Taking the excitation fluorescence spectra together with the emission counterparts to formulate a three-way data array and solving the data array using the Alternating Trilinear Decomposition (ATLD) algorithm is beneficial for studying hydrogen bonding to MI in several aspects. Firstly, making full use of the excitation spectra could guarantee that the experimentally collected data contain sufficient information necessary for investigating signals originated from the weak interactions buried in the strong interaction background. Secondly, the resolution of a three-way data array could theoretically guarantee the uniqueness of the resolved component spectra with actual physical meaning. Thirdly, the ATLD algorithm resolves spectra of complex mixture and determines the spectra of corresponding individual components of different states without disturbing the complex chemical equilibrium involved. The hydrogen bonding interaction of MI with other molecules has been studied using the ATLD algorithm. A detailed investigation has been undertaken for the ¹La and ¹Lb states as the lowest excited singlet states which dominate the fluorescence emission of MI depending on the effect of other molecules and the surrounding microenvironment. The hydrogen bonding between indole derivatives and other molecules has been examined and some association constants involving hydrogen bond formation have been estimated and compared with theoretical simulation results or experimental observations of previous researchers.

Highlights

► This paper implemented an in-situ interpretation of hydrogen bonding. ► Three types of hydrogen bond interaction were analyzed simultaneously. ► The hydrogen bond interactions were quantitatively detected for the first time. ► The trilinear decomposition method makes effective use of multi-state fluorescence.

Statistical process monitoring via generalized non-negative matrix projection

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

As a famous dimension reduction technique, non-negative matrix factorization (NMF) has been used in diverse scientific fields since its appearance. In this work, we aim to propose a new statistical monitoring method based on NMF framework. Considering that the projection method is standardly used in conventional methods such as principal component analysis (PCA), a new variant of NMF method based on positively constrained projections is presented here. This algorithm also relieves the non-negative restriction for original data. Hence it can be called generalized non-negative matrix projection (GNMP). Then, we use GNMP to extract the latent variables that drive a process and to combine them with process monitoring techniques for fault detection. Kernel density estimation (KDE) is adopted to calculate the confidence limits of defined statistical metrics. In addition, corresponding contribution plots are defined for fault isolation. Afterwards, the proposed method is applied to the Tennessee Eastman process to evaluate the monitoring performance. The experiment results clearly illustrate the feasibility of the proposed method.

Highlights

► Propose a new variant named generalized non-negative matrix projection (GNMP). ► Define the monitoring metrics and adopt KDE to calculate the confidence limits. ► Define the contribution plots for the monitoring indices, respectively. ► Apply TE process for evaluating the monitoring performance.

Nonlinear regression method with variable region selection and application to soft sensors

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

Regions of explanatory variables, X, are attempted to be selected in many fields such as spectral analysis and process control. A genetic algorithm-based wavelength selection (GAWLS) method is one of the methods used to select combinations of important variables from X-variables using regions as a unit of measurement. However, a partial least squares method is used as a regression method, and hence, a GAWLS method cannot handle nonlinear relationship between X and an objective variable, y. We therefore proposed a region selection method based on GAWLS and support vector regression (SVR), one of the nonlinear regression methods. The proposed method is named GAWLS–SVR. We applied GAWLS–SVR to simulation data and industrial polymer process data, and confirmed that predictive, easy-to-interpret, and appropriate models were constructed using the proposed method.

Highlights

► Regions of explanatory variables (X) are attempted to be selected in many fields. ► A traditional method cannot handle nonlinear relationship between variables. ► Our goal is to select appropriate X-variable regions and construct a nonlinear model. ► We proposed new variable region selection method with support vector regression. ► The performance of the proposed method was confirmed with a variety of data sets.

Product quality modelling and prediction based on wavelet relevance vector machines

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

In order to predict product quality and optimize production process, the product quality models need to be built. However, there are complex nonlinear relationship among the product quality parameters and the production process variables. The common methods cannot model the production process with high accuracy and the prediction intervals cannot be given by those methods. In contrast, the kernel methods can transform the original input data into a feature space via kernel function, and then the linear methods can be used to resolve the nonlinear problem accurately. Moreover, the relevance vector machine as a kernel method can give the prediction intervals, and wavelet kernel can inherit the ability of local analysis and feature extraction from the wavelet function. The product quality models based on wavelet relevance vector machine are proposed in this paper. A simulation data set, two chemistry data sets and a real field data set of zinc coating weights from strip hot-dip galvanizing are used to validate the model. The results demonstrate that the model based on wavelet relevance vector machines has a higher prediction precision than the common methods such as partial least squares(PLS), orthogonal signal correction-partial least squares(OSC-PLS), Quadratic-PLS, kernel partial least squares(KPLS), orthogonal signal correction-kernel partial least squares(OSC-PLS), least squares-support vector machines (LS-SVM) and ordinary relevance vector machines(RVM). The prediction intervals are also given by the presented model. Mexican, Morlet and Difference of Gaussian (DOG) wavelet relevance vector machines (WRVMs) for multi-group data show superior prediction performance compared to other methods mentioned above.

Highlights

► A product quality model based on wavelet relevance vector machine (WRVM) is proposed. ► Wavelet relevance vector machine model can give the exact prediction interval. ► Zinc coating weights from strip hot-dip galvanizing are predicted to validate the model. ► WRVM has a higher prediction precisions than PLS, Q-PLS KPLS, SVM and RVM.

Automatic image-based estimation of texture analysis as a monitoring tool for crystal growth

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

Online monitoring and feedback control are crucial elements in a commercial crystallization operation because they ensure that key production variables are closely regulated so as to achieve specified textural and physical properties of the end-product. Digital image texture analysis is a promising method in monitoring and control systems, and is becoming increasingly more attractive due to availability of high speed imaging devices and equally powerful computers. This paper investigates the use of texture analyses in the form of fractal dimension (FD) and energy signatures as characteristic parameters to track the crystal growth. This methodology deals with issues such as touching and overlapping problem in crystal images which limit available off-line and on-line imaging techniques. The algorithm uses a combination of thresholding and wavelet-texture analysis. The thresholding method is used to identify crystal clusters and remove empty backgrounds. Wavelet–fractal and energy signatures are performed afterwards to estimate texture on crystal clusters. A series of images obtained at different crystal growth stages during a NaCl–water–ethanol anti-solvent crystallization system is investigated and their texture characteristics as well as transform tendency during the crystallization process are evaluated.

Highlights

► Fractal dimension and energy signatures as parameters to track crystal growth. ► Uses combination of thresholding and wavelet-texture algorithms. ► Thresholding method identifies crystal clusters and removes empty backgrounds. ► Wavelet-fractal and energy signatures estimate texture on crystal clusters. ► Validated for anti-solvent crystallization.

Detection of Alzheimer's disease by Raman spectra of rat's platelet with a simple feature selection

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

A novel method using feature selection is proposed to classify Alzheimer's disease using Raman spectra. The method first find all the significant peak from the preprocessed spectrum as the feature candidates for classification. We select the most discrimination peak as a reference feature and compute the correlation coefficients between the reference and every peaks chosen. Then we discard highly correlated features to reduce the number of possible feature candidates. With the peak value and their ratio of the remaining features, we carry out the preliminary classification experiments and examine top 10% cases to seek the most frequently appearing features. Among them, we choose top 2 features, intensity of 1658cm⁻¹ and ratio of intensity of 757 and 743cm⁻¹. These features correspond to protein bands of Amide I mode and cytochrome c, which are also considered important for the detection of Alzheimer's disease by other researchers. The classification result using 278 spectra achieved 95.8% classification rates for MLP (multi-layer perceptron) with these two features. It confirms that the features chosen with the proposed method could be effectively used for the diagnosis of Alzheimer's disease.

Development of a new SMP model satisfying all known physical constraints in environmental application

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

A new Solver for Mixture Problem (SMP) model has been developed that satisfies all known fundamental natural physical constraints identified in environmental application. Previously, nonnegative matrix factorization (NMF) models were developed and applied successfully in chemometrics application and aerosol source apportionment studies. NMF models are based on nonnegativity constraints of loadings and scores that are equivalent to source compositions and source apportionments, respectively, in aerosol source apportionment study. In environmental applications of aerosol source apportionment studies, however, more physical constraints must be satisfied in addition to nonnegativity constraints of loadings and scores. A new model has been developed based on alternating primal-dual interior point nonlinear programming, subject to inequality constraints of all known fundamental physical constraints. Previous multivariate receptor models have been partial implementations; they have not been able to satisfy meaningful physical constraints when estimating both source compositions and source apportionments from ambient data. The SMP model, however, successfully estimates both source apportionments and source compositions while satisfying all physical constraints. Source compositions estimated by the SMP model can be used in other source apportionment studies using the Chemical Mass Balance (CMB) receptor model. The SMP model has been applied to an error free data set to examine the capability of estimating source compositions and source apportionments. Two sets of simulations were conducted and the results are compared and discussed. Simulation results show that SMP estimated pairs of source compositions and source apportionments satisfy all known physical constraints and are in good agreement with true values.

Highlights

► SMP model is based on alternating primal–dual interior point nonlinear programming. ► SMP model satisfies all physical constraints identified in environmental application. ► SMP-estimated source compositions can be used in other CMB analysis. ► SMP-estimated source profiles and contributions satisfy all identified constraints. ► The SMP model can be applied to any mixture problem.

Physical and statistical model for predicting a transmembrane pressure jump for a membrane bioreactor

07 February 2013, 10:01:13

15 February 2013
Publication year: 2013
Source:Chemometrics and Intelligent Laboratory Systems, Volume 121

Membrane bioreactors (MBRs) have been widely used to purify wastewater for reuse. However, MBRs are subject to fouling, which is the phenomenon whereby foulants absorb or deposit on the membrane. After long-term operation of MBRs under a condition of constant-rate filtration, transmembrane pressure (TMP) increases rapidly. This TMP jump is one factor making the operation of MBRs difficult. We therefore propose the construction of a model that predicts the timing of a TMP jump. First, a nonlinear function for determining whether a TMP jump will happen was derived from physical knowledge. We then analyzed the nonlinear function statistically with measurement data and constructed a model that detects a TMP jump. The performance of the proposed method was confirmed through the analyses of two data sets obtained from the literature and a data set recorded for a real industrial MBR plant.

Highlights

► MBRs have been widely used to purify wastewater for reuse. ► After the operation of MBRs in the long term, TMP increases rapidly. ► Our goal is to detect this TMP jump with highly predictive accuracy. ► We proposed a physical and statistical discriminant model of TMP jumps. ► The performance of the proposed method was confirmed with a variety of data sets.

World Congress on Biosensors 2014

Thursday, 7 February 2013

Just Published: Chemometrics and Intelligent Laboratory Systems

Gravitational search algorithm: A new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives

Highlights

Validation of soft classification models using partial class memberships: An extended concept of sensitivity & co. applied to grading of astrocytoma tissues

Highlights

Use of multivariate chemometric algorithms on 1H NMR data to assess a soluble fiber (Plantago ovata husk) nutritional intervention

Highlights

An investigation on hydrogen bonding between 3-methylindole and ethanol using trilinear decomposition of fluorescence excitation–emission matrices

Highlights

Statistical process monitoring via generalized non-negative matrix projection

Highlights

Nonlinear regression method with variable region selection and application to soft sensors

Highlights

Product quality modelling and prediction based on wavelet relevance vector machines

Highlights

Automatic image-based estimation of texture analysis as a monitoring tool for crystal growth

Highlights

Detection of Alzheimer's disease by Raman spectra of rat's platelet with a simple feature selection

Development of a new SMP model satisfying all known physical constraints in environmental application

Highlights

Physical and statistical model for predicting a transmembrane pressure jump for a membrane bioreactor

Highlights

No comments:

Post a Comment