Q4) Bayesian Structure Learning
For this question, you will be using a dataset, called “hailfinder” available from the ‘bnlearn’ R package. which contains 56 variables. This has meteorological data.
Use the following R code to load the hailfinder dataset:
library (bnlearn)
# load the data.
data(hailfinder)
summary(hailfinder)
The true network structure of this dataset can be viewed (plot) using the following R code.
library(bnlearn)
# create and plot the network structure.
modelstring = paste0("[N07muVerMo][SubjVertMo][QGVertMotion][SatContMoist][RaoContMoist]",
"[VISCloudCov][IRCloudCover][AMInstabMt][WndHodograph][MorningBound][LoLevMoistAd][Date]",
"[MorningCIN][LIfr12ZDENSd][AMDewptCalPl][LatestCIN][LLIW]",
"[CombVerMo|N07muVerMo:SubjVertMo:QGVertMotion][CombMoisture|SatContMoist:RaoContMoist]",
"[CombClouds|VISCloudCov:IRCloudCover][Scenario|Date][CurPropConv|LatestCIN:LLIW]",
"[AreaMesoALS|CombVerMo][ScenRelAMCIN|Scenario][ScenRelAMIns|Scenario][ScenRel34|Scenario]",
"[ScnRelPlFcst|Scenario][Dewpoints|Scenario][LowLLapse|Scenario][MeanRH|Scenario]",
"[MidLLapse|Scenario][MvmtFeatures|Scenario][RHRatio|Scenario][SfcWndShfDis|Scenario]",
"[SynForcng|Scenario][TempDis|Scenario][WindAloft|Scenario][WindFieldMt|Scenario]",
"[WindFieldPln|Scenario][AreaMoDryAir|AreaMesoALS:CombMoisture]",
"[AMCINInScen|ScenRelAMCIN:MorningCIN][AMInsWliScen|ScenRelAMIns:LIfr12ZDENSd:AMDewptCalPl]",
"[CldShadeOth|AreaMesoALS:AreaMoDryAir:CombClouds][InsInMt|CldShadeOth:AMInstabMt]",
"[OutflowFrMt|InsInMt:WndHodograph][CldShadeConv|InsInMt:WndHodograph][MountainFcst|InsInMt]",
"[Boundaries|WndHodograph:OutflowFrMt:MorningBound][N34StarFcst|ScenRel34:PlainsFcst]",
"[CompPlFcst|AreaMesoALS:CldShadeOth:Boundaries:CldShadeConv][CapChange|CompPlFcst]",
"[InsChange|CompPlFcst:LoLevMoistAd][CapInScen|CapChange:AMCINInScen]",
"[InsSclInScen|InsChange:AMInsWliScen][R5Fcst|MountainFcst:N34StarFcst]",
"[PlainsFcst|CapInScen:InsSclInScen:CurPropConv:ScnRelPlFcst]")
dag = model2network(modelstring)
par(mfrow = c(1,1))
#BiocManager::install(c("Rgraphviz"))
graphviz.plot(dag)
Use R programming, as appropriate, to answers the following questions.
4.1) Use the hailfinder dataset to learn Bayesian network structures using hillclimbing (hc) algorithm, utilizing two different scoring methods, namely Bayesian Information Criterion score (BIC score) and the Bayesian Dirichlet equivalent (Bde score), for each of the following sample sizes of the data:
a) 500 (first 500 data)
b) 2000 (first 2000 data)
c) 10000 (first 10000 data)
For each of the above cases,
• provide the scores obtained for BIC and BDe,
• Plot the network structure obtained for the BIC and BDe scores.
4.2) Based on the results obtained for the above question (Q 4.1), discuss how the BIC score compare with BDe score for different sample sizes in terms of structure and score of the learned network.
4.3)
a) Find the Bayesian network structures utilising the full dataset, and using both BIC and Bde scores. Show the scores and the obtained networks.
b) Compare the networks obtained above (in Q4.3.a) for each BIC and Bde scoring methods with the true network structure and comment. Use the “compare()” function and “graphviz.compare()” function available in the “bnlearn” R package to perform these comparisons and comment.
c) Fit the data to the network obtained using the BIC score in the above question (Q4.3.a) in order to compute the conditional probability distribution table entries (CPD table values). Show the obtained CPD table entries for the variable “CombMoisture”.
d) Use the above learned network obtained (in Q4.3.c) to find the probability of :
P(CombMoisture ="Dry" | RaoContMoist= " Dry", SatContMoist=" VeryWet")
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.