Commit graph

118 commits

Author SHA1 Message Date
f3a4ef01ed Add support for offline forests. 2019-11-13 17:08:31 -08:00
Joel Therrien
54af805d4d Fairly significant refactoring;
Made Forest into an abstract class with OnlineForest (in memory; same as previous)
and OfflineForest (reads individual trees only as needed). Many methods were changed.
2019-10-02 13:54:45 -07:00
79a9522ba7 Several changes -
Fixed some tests that weren't running.
Fixed a bug where training crashed if FactorCovariates had any NA
Fixed a bug where FactorCovariates were ignored in splitting if nsplit==0
Added a covariate specific option for whether splitting on an NA variable should have a penalty.

This penalty is accomplished by first calculating the split score and best split for a covariate
without NAs as done previously before. Then NAs are randomly assigned, and the split score is
recalculated on that best split. The new score is the lower of the new score and the original.
2019-08-28 18:07:35 -07:00
Joel Therrien
c24626ff61 Upgrade to latest version of Jackson databind. 2019-08-27 14:53:49 -07:00
Joel Therrien
51696e2546 Fix how variable importance works to be tree based and not forest based 2019-08-12 14:28:31 -07:00
Joel Therrien
f1c5b292ed Add serialVersionUID to Serializable classes
This makes forests between versions more compatible if only method definitions changed.
2019-08-02 15:10:48 -07:00
Joel Therrien
a56ad4433d Add variable importance methods to library 2019-07-29 12:21:35 -07:00
Joel Therrien
f23ee21ef3 Add utility function for R package 2019-07-24 14:26:08 -07:00
Joel Therrien
186de413ed Merge branch 'master' into ibs 2019-07-22 11:38:15 -07:00
Joel Therrien
aa1f544ea2 Upgrade Jackson in executable project
Jackson has some security vulnerability that can only be exploited in a server environment;
not relevant here whatsoever, but might as well update.
2019-07-22 11:36:29 -07:00
Joel Therrien
86f6c195d7 Fix unused imports 2019-07-22 11:29:37 -07:00
Joel Therrien
9258f75e4e Add integrated Brier score error measure 2019-07-22 11:23:34 -07:00
Joel Therrien
7371dab4f1 Fix two bugs on how RightContinuousStepFunction is integrated.
Specifically, the integration returned an NaN if the integration was
*up to* an NaN (real inegrals are robust); and the results were negative
if integrating from a to b where a > b.
2019-07-19 14:14:36 -07:00
Joel Therrien
ae9a6b9a3f Add binary and unary operations to RightContinuousStepFunctions 2019-07-18 15:14:01 -07:00
d7cdc9f6e7 Split into two Maven modules
One for the library (same as before),
the other for the executable part.
2019-07-03 09:37:19 -07:00
ee4b513298 Remove dependencies from project
This project is now purely a library only; the code for running directly from the command line will be
put into a new project. This was important because we were including large dependencies into the R code
that weren't needed and created some minor licensing inconveniences.
2019-07-02 16:54:58 -07:00
bc2c240823 Update Jackson to 2.9.9
Apparently older versions of Jackson contain a security vulnerability
(not really important for this project, given that users are only ever
using Jackson on their own settings files)
2019-07-02 15:39:33 -07:00
21ce7ce135 Removed some unnecessary dependencies in a test file. 2019-06-28 15:25:46 -07:00
88dc1c2316 Add README file 2019-06-28 10:54:42 -07:00
78ca8bad73 Update pom to build package with minimal dependencies 2019-06-27 15:34:09 -07:00
22accdb263 Add support for providing an initial forest to add trees to 2019-06-07 19:55:44 -07:00
7da3bd14a5 Add test verifying that CIFs are averaged together in the same way as randomForestSRC 2019-06-05 15:13:08 -07:00
4aac73b868 Improve progress tracking to terminate when all trees are trained
Instead of only checking every second
2019-05-29 15:05:48 -07:00
1e40b7ff9b Add missing copyright notices 2019-05-10 16:02:59 -07:00
6f318db79e Add support for seeds to control randomness when training forests 2019-05-10 16:02:33 -07:00
17ae3a9f5a Refactor - rename GroupDifferentiators into SplitFinders
SplitRule would have made more sense but it was already taken.
2019-05-08 16:09:09 -07:00
c5c74ad7e9 Fix bug where parallel forests never finish
Add test to detect case
2019-04-29 11:02:14 -07:00
de3de300cf Change how parallel trees are trained to be more robust to Threads getting terminated.
This will hopefully make the package more stable on cluster systems, where sometimes the forests
immediately stall.
2019-04-26 11:04:03 -07:00
1e643385ee Adjust MultipleLogRankDifferentiators to use actual implementation found in randomForestSRC
Also merge SingleLogRankDifferentiators with the Group variants,
as they now reduce tot he simple case when given only one event of focus.
2019-04-23 17:34:39 -07:00
c6a5787975 Use UUIDs to save trees instead of tree number.
Benefits are for when we restart a previously parallel task
in which, say, trees 1, 2, and 4 were completed but tree 3
never did complete. Under the previous implementation we'd start
at tree 4 (we'd just count how many trees were done). To fix this
would require some additional effort. Since the order of trees
is irrelevant, it made sense to just stop ordering them.
2019-04-16 12:58:23 -07:00
fb20b08a23 Add debugging information to RUtils 2019-04-16 12:55:57 -07:00
f1f507d2df Add out-of-bag predict methods for R to use 2019-04-07 15:11:06 -07:00
50b4a3cd89 Fix and optimize how progress is displayed while training trees 2019-04-05 11:22:59 -07:00
ea176cff9a An an evaluateSerial function to Forest
parallelStream doesn't seem to work very well on the ComputeCanada.ca cluster
2019-04-05 11:13:23 -07:00
bf168bc2a5 Improve performance of CompetingRiskFunctionCombiner
Estimate of time improvement is at least 10x faster
2019-03-27 21:06:15 -07:00
c8269ae285 Fix bug in test 2019-03-27 21:06:05 -07:00
526a127b9d Change Forest to keep trees in a List instead of a Collection 2019-03-27 10:57:01 -07:00
585d6d3c5b Make SplitRules their own class; independent of their Covariate parents.
This was done so that when we serialize trees (and thus SplitRules) we don't awkwardly also serialize ntree versions of the Covariates,
which is really awkward when deserializing them.
2019-03-25 14:44:31 -07:00
76b2cdd3c4 WIP - Some changes to how trees are saved. 2019-03-25 10:59:55 -07:00
76614ee68b Better memory management to help prevent OutOfMemoryExceptions 2019-03-25 10:59:26 -07:00
d65e010c48 Very minor improvement to how tree filenames are saved. 2019-03-13 11:21:51 -07:00
02b7a5cb9a Rebrand project as largeRCRF 2019-03-13 10:47:37 -07:00
cfa3a6f432 Attempting memory optimizations 2019-03-13 10:39:18 -07:00
8014bd4629 Fix bug where NAs cause crash 2019-03-04 11:36:21 -08:00
91cf299362 Add some R utility functions to help data get quickly loaded 2019-03-04 11:23:31 -08:00
29b154110a Fix bug where template.yaml gets replaced whenever user wants to look at help dialog 2019-02-28 11:12:21 -08:00
a7f591c2d3 Add integration capabilities to RightContinuousStepFunction; use it for calculating mortality 2019-02-18 14:57:26 -08:00
e74ba23177 Small changes; more tests. 2019-02-18 14:57:13 -08:00
9f513ab75b Add capabilities to get nodes of a certain type in a forest; used to produce summary statistics 2019-02-02 09:36:00 -08:00
77ec780304 Fix theoretical bug 2019-01-29 13:38:45 -08:00