R-Programming with Machine Learning and Data Science Tools
  • Operating systems: Ubuntu 14.04 LTS / Windows

  • IDE: R Studio

  • Databases: PostgreSQL / MySQL / SQLite

Software Requirements
S. No.Libraries in RTypeDescription
1ggplot2, googleVis, corrplot, lattice, ggfortify,ggrepel, ggalt, ggtree,ggtech, ggplot2 Extensions, rgl, Cairo,extrafont, showtext,animation, gganimate, misc3D, xkcd,imager,hrbrthemes, waffle, dendextend, r2d3, PatchworkData VisualizationVisualizing the graphs with the scales and layers, combining multiple plots, and visualizing the complex position of the plots using mathematical operators
2plyr,dplyr, data.table, lubridate, reshape2,readr, haven, tidyr, broom, rlist, jsonlite, ff, stringi,stringr, bigmemory, fuzzyjoin, tidyverseData ManipulationSupporting consistent, fast, and portable text processing and handling the complex data formats such as data-time and time-spans
3MissForestMissing Value ImputationsImputing the mixed type of data such as continuous and/or categorical data in parallel manner
4MissMDAMissing Value ImputationsHandling missing values over large and complex datasets with multivariate analysis
5OutliersOutlier DetectionProviding a set of tests and functions to detect outliers
6Extreme Values in R (EVIR)Outlier DetectionEstimating extreme quantiles using several functions such as block maxima, exploratory data analysis, peak over thresholds, gev/gpd distributions, and point processes
7FeaturesFeature SelectionExtracting the features such as mean value, local maxima and minima, first and second derivatives, noise and so on from discretely-sampled functional data
8Regularized Random Forest (RRF)Feature SelectionSelecting the features based on the random forest
9FactoMineRFeature SelectionProviding exploratory data analysis methods include Principal Component Analysis (PCA), Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), hierarchical cluster analysis, and multiple factor analysis
10Canonical Correlation Analysis (CCA)Feature SelectionPerforming significance test such as Monte Carlo and asymptotic tests
11Companion to Applied Regression (CAR)Continuous regressionMaking type II and type III Anova tables using its Anova function
12RandomForestClassification, RegressionCreating a large number of decision trees for regression and classification and assessing proximities among the data values in unsupervised model
13RMinerOrdinal regressionSupporting the process of data mining classification and regression methods
14CoreLearnOrdinal regressionProviding a set of classification, regression, and feature evaluation methods to process the dataset having ordinal features
15Classification And REgression Training (Caret)Classification, RegressionCreating the predictive models and optimizing the process through a set of functions
16BigRFClassification, RegressionHandling a very large datasets using random forest algorithms Building multiple random forests in parallel to effectively process too large datasets
17Clustering for Business Analytics (CBA)ClusteringManipulates data and performs efficient computation of cross distances with the help of Proximus and rock, and utility functions
18RankClusterClusteringRanking multivariate data through model-based clustering
19forecastTime SeriesForecasting from time series models or time series based on the class of the first argument
20Linear Time Series Analysis (LTSA)Time SeriesModeling linear time series for simulation, forecasting, and loglikelihood computation
21survivalSurvival AnalysisPredicting the time at which the occurrence of a particular event by creating survival object among the variables
22BastaSurvival AnalysisEstimating the unknown birth and death times, survival trends, and age-specific mortality through multiple Markov Chain Monte Carlo (MCMC) simulations for large number of records having unknown birth and death times
23Least-Squares Means (LSMeans)General Model ValidationComputing least-squares means for many generalized linear, linear, and mixed models
24ComparisonGeneral Model ValidationComparing a model object with the comparison object for validation
25RegTestRegression ValidationConducting regression test for funnel plot asymmetry for ‘Rma’ objects
26ACDRegression ValidationAnalyzing categorical data with missing or complete responses
27BinomToolsClassification ValidationPerforming diagnostics for binomial regression models using a set of diagnostic methods
28DAIMClassification ValidationEvaluating the classification accuracy through performance measures include sensitivity, AUC, specificity, bootstrap estimation, and repeated k-fold cross validation
29ClustEvalClustering ValidationEvaluating the clustering, individual clusters, and clustering algorithms
30SigClustClustering ValidationAssessing the significance of the clustering algorithms using statistical method
31PROCClustering ValidationComputing confidence interval for partial Receiver Operating Characteristic (ROC) curves based on the comparison with statistical tests
32TimeROCClustering ValidationEstimating dynamic or cumulative time-dependent ROC curve
33plotly, ggvis, DataTables, rCharts, heatmaply,d3heatmap, DiagrammeR, dygraphs, formattable, Leaflet,

MetricsGraphics, networkD3, scatterD3, rbokeh, threejs,

timevis, visNetwork, wordcloud2, highcharter

HTML WidgetsProviding interface to visualize the data in the form of plotsOffering numerous chart types with a simple syntax
34knitr, rmarkdown, slidify, tinytex,xtable, rapport, Sweave, texreg, checkpoint, brew,ReporteRs, bookdown, ezknitr, drakeReproducible researchSupporting the conversion of various formats and reproducible report templates
35mlrMachine learningProviding a set of classification and regression techniquesComprising generic resampling, filter and wrapper methods, hyper parameter tuning methods and so on
36eXtreme Gradient Boosting package (Xgboost)Learning and PredictionSupporting, regression, classification, and ranking objective functions
37gbmRegression MethodsSupporting generalized boosted regression modeling and performing an optimal number of iterations through out-of-BA estimator
38ProphetTime SeriesForecasting time series data based on the non-linear trends and handling outliers, missing data, and shifts in trends
39Quality Control Chart (QCC)Quality ControlPlotting Opearional Characteristic (OC) curve, Pareto chart, multivariate charts, cause-and-effect chart, and shewhart chart for attribute, count, and continuous data
40shiny, shinyjs, RCurl, curl,httr, httpuv, XML, rvest, OpenCPU, Rfacebook,RSiteCatalyst, plumberWeb technologies and ServicesProviding interface to client for easily accessing web pages
41Parallel, Rmpi, future, SparkR,DistributedR, ddR, sparklyr, batchtoolsParallel ComputingProviding parallel and interactive computing environment
42Rcpp, Rcpp11, compilerHigh performanceProviding integration between different programming languages
43rJava, jvmr, rJython, rPython,runr, RJulia, JuliaCall, RinRuby, R.matlab,RcppOctave, RSPerl, V8, htmlwidgets, rpy2Language APIProviding interface to other programming languages
44RODBC, DBI, elastic, mongolite,odbc, RMariaDB, RMySQL, ROracle, RPostgreSQL,RSQLite, RJDBC, rmongodb, rredis, RCassandra,RHive, RNeo4j, rpostgisDatabase ManagementProviding interface for accessing the database
45AnomalyDetection, ahaz, arules, bigrf, bigRR,bmrm, Boruta, BreakoutDetection, bst, CausalImpact, C50,caret, CORElearn, CoxBoost, Cubist, e1071, earth,elasticnet, ElemStatLearn, evtree, forecast, forecastHybrid,prophet, FSelector, frbs, GAMBoost, gamboostLSS, gbm,glmnet, glmpath, GMMBoost, grplasso, grpreg, h2o, hda,

ipred, kernlab, klaR, kohonen, lars, lasso2, LiblineaR,ime4, LogicReg, maptree, mboost, mlr, mvpart, MXNet, ncvreg,

nnet, oblique.tree, pamr, party, party.kit, penalized,penalizedLDA, penalizedSVM, quantregForest, randomForest,

randomSRC, ranger, rattle, rda, rdetools, REEMtree, relaxo,rgenoud, rgp, Rmalschains, rminer, ROCR, RoughSets, rpart,

RPMM, RSNNS, Rsomoclu, RWeka, RXshrink, sda, SDDA, SuperLearner,subsemble, svmpath, tgp, tree, varSelRef, xgboost

MLLearning high dimensional and large-scale dataAnalyzing, manipulating, and representing the patterns and transaction data
46text2vec, tm, openNLP, koRpus, zipfR, NLP,LDAvis, topicmodels, syuzhet, SnowballC, quanteda, MonkeyLearn,tidytext, utf8Natural Language ProcessingAnalyzing a set of documents using text mining toolsSupporting natural language text processing in different languages
47coda, mcmc, MCMCpack, R2WinBUGS, BRugs, rjags, rstanBayesianProviding interface for bayesian analysis
48IpSolve, minqa, nloptr, ompr, Rglpk, ROLOptimizationResolving optimization problems include integer, linear, mixed integer, transportation, and assignment problems
49qantmod, TTR, PerformanceAnalytics, zoo, xts, tseries,fAssetsFinanceBuilding technical trading rulesBuilding, trading, and analyzing quantitative financial trading strategies
50Bioconductor, genetics, gap, ape, pheatmapBioinformatics and BiostatisticsOffering control over appearance and dimensionsAnalyzing genetic data and evolution
51Igraph, network, sna, netdiffuseR, networkDynamic,ndtv, statnet, ergm, latentnet, tnet, rgext, visNetworkNetwork AnalysisVisualizing the network data and handling the large graphs efficiently through statistical analysis
52magick, imagerImage ProcessingSupporting different image manipulations and a variety of image formatsProcessing the images up to four dimensions in a fast manner
