Population selection
In this cross-sectional study, all data were drawn from the National Health and Nutrition Examination Survey (NHANES) database. The NHANES is a cross-sectional survey conducted by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention using a multilayer probability sampling design, which aim to assess the health and nutritional status of adults and children in the United States [12]. The NHANES survey combines interviews and physical examinations [13]. The requirement of ethical approval and informed consent of the subjects for this was waived by the Institutional Review Board of The First Affiliated Hospital of Soochow University, because the data was accessed from NHANES (a publicly available database). All methods were carried out in accordance with relevant guidelines and regulations.
In this study, we used data from four cycles of the NHANES database (NHANES 1999–2000, 2001–2002, 2003–2004, 2005–2006). For participants in the NHANES database, only women aged 20–54 were asked diagnostic questions about UL (n = 6,508). Participants who met one of the following criteria were excluded: (1) Women without measurement of urinary phytoestrogen concentrations; (2) Women without assessment of UL; (3) Women with missing information of covariates related to UL. Ultimately, 1,579 participants were included in this study (Fig. 1).
Assessment of urinary phytoestrogen
Urinary phytoestrogens were assessed by measuring urinary excretion of isoflavones (including daidzein, genistein, equol, and O-desmethylangolensin) and enterolignans (including enterodiol and enterolactone) [14]. The collection of urine specimens was carried out in the Mobile Examination Centers, and stored at -20 °C until analyzed [14]. The analyses of urinary excretion were accomplished by using the high-performance liquid chromatography (HPLC)-tandem mass spectrometric (MS) detection in the survey 1999–2004 and HPLC-atmospheric pressure photoionization- MS in the survey 2005–2006 [15]. For 1,579 participants of this study, 1 participant were below the lower limit of detection (LOD) for daidzein (0.40 ng/mL), 9 participants were below the lower LOD for genistein (0.20 ng/mL), 2 participants were below the lower LOD for equol (0.06 ng/mL), 29 participants were below the lower LOD for O-desmethylangolensin (0.20 ng/mL), 0 participants were below the lower LOD for enterodiol (0.04 ng/mL) and 0 participants were below the lower LOD for enterolactone (0.10 ng/mL) [16]. In the case of results below the LOD, the value of this variable is the LOD divided by the square root of two (https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/PHPYPA.htm#URXDAZ). The concentration of daidzein, genistein, equol, O-desmethylangolensin, enterodiol, and enterolactone in urinary phytoestrogens was corrected by creatinine in this study. Geometric mean and tertiles of each phytoestrogen metabolite (ug/g creatinine) were presented in Supplemental Table 1.
Assessment of uterine leiomyomata
The outcome was considered as UL. Participants in the NHANES database were classified as patients with UL when they answered “Yes” to the question “Has a doctor or other health professional ever told you that you had uterine fibroids?”.
Potential covariates
We extracted some characteristics of participants from the NHANES database, including age (years), race/ethnicity (non-Hispanic White/ non-Hispanic Black/ others), marital status (married/ never married/ others), education level [high school and below/ high school grad/ general educational development (GED) or equivalent/ some college or associate of arts (AA) degree/college graduate or above], poverty-to-income ratio (PIR, < 1.0/ ≥ 1.0), smoking status (yes/no), drinking status (yes/no), BMI (kg/m2), waist circumference (cm), cotinine (ng/mL), age at menarche (years), menopausal status (yes/no), ovary removed status (yes/no), hysterectomy (yes/no), use of female hormones (yes/no), hormones/hormone modifiers, pregnancy status (yes/no), number of gravidities, fiber (gm) and total energy (kcal). PIR was classified as in the NHANES database ≥ 1.0 (meaning household income was above the poverty line) and < 1.0 (meaning household income is at or below the poverty line). Smoking status and drinking status in the NHANES database was based on participants’ self‐report. BMI was calculated as weight (kg) divided by height squared (m2). Cotinine was assessed measured in serum using isotope dilution-high performance liquid chromatography/atmospheric pressure chemical ionization tandem mass spectrometry. Similarly, when the result is below the LOD, the value of cotinine is the LOD divided by the square root of two. Information on age at menarche, menopausal status, ovary removed status, use of female hormones, hormones/hormone modifiers, pregnancy status and number of gravidities was obtained from the reproductive health questionnaire. Use of female hormones was judged by self-report ” Have you/Has SP ever used female hormones such as estrogen and progesterone?” and drug code 97–101 in the NHANES database. Hormones/hormone modifiers was defined according to drug codes [97–98, 97–103, 97–288, 97–295, 97–377, 97–411, 97–413, 97–414, 97–416, 97–417, 97–418, 97–420, 97–422, 97–423, 97–426, 97–495].
Statistical analysis
Given the nature of the complex sampling of the NHANES database, we used a weighted analysis: weight variables for the urinary metabolites measurement (WTSB2YR and WTSPH2YR) and study design variables (SDMVPSU and SDMVSTRA). The measurement data were tested for normality using Kolmogorov–Smirnov, and normally distributed measurement data were described as mean (standard error) [Mean (SE)] and compared between two groups using independent samples t-test; non-normal data were described as median and quartiles [M (Q1, Q3)] and compared between groups using Mann–Whitney U rank sum test. Categorical data were described as number of cases and composition ratio N (%) and compared between groups using chi-square test and rank data using rank sum test. In the present study, we adopted chain equation multiple interpolation method based on random forest for some missing data of the variables. The miceforest package in python is used for interpolation processing (https://pypi.org/project/miceforest/). A sensitivity analysis was performed on the data before and after interpolation (Supplemental Table 2). SAS (version 9.4), Python (version 3.9) and R (version 4.0) software were used for statistical analyses. P < 0.05 was considered as statistically significant difference.
First, we performed weighted univariate logistic regression to screen covariates. Then, weighted logistic regression was used to analyze the association between single metabolites of urinary phytoestrogens and UL. Odds ratio (OR) and 95% confidence interval (CI) were calculated in the study. Last, we adopted three statistical models: weighted quantile sum (WQS) regression, Bayesian kernel machine regression (BKMR), and quantile g-computation (qgcomp) models, to investigate the effects of six mixed metabolites on UL.
Weighted quantile sum (WQS) regression
WQS regression was used to investigate the effects of six mixed metabolites on UL and identify the predominant metabolite. The study sample was randomly divided into training dataset (30%, n = 474) and validation dataset (70%, n = 1,105). Exposure to each metabolite in the training dataset was first divided into tertiles. The tertiles were then added together to generate an overall tertiles score for each metabolite. An empirical weight for each metabolite in the mixture was estimated using the bootstrapping method [17]. The WQS score is a combination of six mixed metabolites, representing the whole-body burden of six urinary phytoestrogens [10]. The weight of each metabolite in the WQS score indicates the contribution of each metabolite to the overall result [18]. Metabolites with an estimated weight greater than 0.333 (1/3) were considered to be significant contributors to the WQS score. Using 10,000 bootstrap samples from the training dataset (30%), we calculated the weights for WQS scores. Using the validation dataset (70%), we assess the statistical significance of WQS scores [19]. In addition, WQS regression requires that all exposure-outcome associations be focused in the same direction. Therefore, we estimated the positive and negative effects of the six metabolites on UL separately. R package gWQS was adopted to perform the analysis.
Quantile g-computation (qgcomp) model
gqcomp is a parameterized and generalized linear model based on application of g-computation, aimed to assess the effect of increasing all exposures in the mixture by one quatile simultaneously [20]. In this study, the gqcomp.noboot function was applied to estimate exposure effects, which divides six mixed metabolites into tertiles, assigns a positive or negative weight to each metabolite. If a metabolite has multiple effects in different directions, a positive or negative weight is interpreted as the proportion of exposure effects that have a negative (or positive) effect on UL, with a total weight of up to 2. The relationship of each metabolite endpoint and the mixed metabolites was assessed separately, and the finding models were used to estimate the scaled effect sizes, variable-specific coefficients, and overall model fit p-values. Metabolites with an estimated weight greater than 0.05 were considered to be significant contributors to the gqcomp scores. R package qgcomp was adopted to perform the analysis.
Bayesian kernel machine regression (BKMR)
BKMR is a supervised approach, which could identify nonlinear and nonadditive associations of exposure-outcome [21]. In this study, the BKMR model with 10,000 iterations was adopted. Genistein, equol and enterodiol were divided into two groups according to their positive correlation with UL, while daidzein, O-desmethylangolensin, and enterolactone were divided into one group according to their negative correlation with UL. The combined effect was calculated by comparing mixed metabolites at or above the 60th percentile with the 50th percentile. Group posterior inclusion probability (GroupPIP) and Conditional posterior inclusion probability (CondPIP) represent the probability of each group and metabolite in each group included in the model, representing their contribution to the overall effect. R package bkmr was adopted to perform the analysis.
—————————————————-
Source link