de3de300cf
Change how parallel trees are trained to be more robust to Threads getting terminated.
...
This will hopefully make the package more stable on cluster systems, where sometimes the forests
immediately stall.
2019-04-26 11:04:03 -07:00
1e643385ee
Adjust MultipleLogRankDifferentiators to use actual implementation found in randomForestSRC
...
Also merge SingleLogRankDifferentiators with the Group variants,
as they now reduce tot he simple case when given only one event of focus.
2019-04-23 17:34:39 -07:00
c6a5787975
Use UUIDs to save trees instead of tree number.
...
Benefits are for when we restart a previously parallel task
in which, say, trees 1, 2, and 4 were completed but tree 3
never did complete. Under the previous implementation we'd start
at tree 4 (we'd just count how many trees were done). To fix this
would require some additional effort. Since the order of trees
is irrelevant, it made sense to just stop ordering them.
2019-04-16 12:58:23 -07:00
fb20b08a23
Add debugging information to RUtils
2019-04-16 12:55:57 -07:00
f1f507d2df
Add out-of-bag predict methods for R to use
2019-04-07 15:11:06 -07:00
50b4a3cd89
Fix and optimize how progress is displayed while training trees
2019-04-05 11:22:59 -07:00
ea176cff9a
An an evaluateSerial function to Forest
...
parallelStream doesn't seem to work very well on the ComputeCanada.ca cluster
2019-04-05 11:13:23 -07:00
bf168bc2a5
Improve performance of CompetingRiskFunctionCombiner
...
Estimate of time improvement is at least 10x faster
2019-03-27 21:06:15 -07:00
c8269ae285
Fix bug in test
2019-03-27 21:06:05 -07:00
526a127b9d
Change Forest to keep trees in a List instead of a Collection
2019-03-27 10:57:01 -07:00
585d6d3c5b
Make SplitRules their own class; independent of their Covariate parents.
...
This was done so that when we serialize trees (and thus SplitRules) we don't awkwardly also serialize ntree versions of the Covariates,
which is really awkward when deserializing them.
2019-03-25 14:44:31 -07:00
76b2cdd3c4
WIP - Some changes to how trees are saved.
2019-03-25 10:59:55 -07:00
76614ee68b
Better memory management to help prevent OutOfMemoryExceptions
2019-03-25 10:59:26 -07:00
d65e010c48
Very minor improvement to how tree filenames are saved.
2019-03-13 11:21:51 -07:00
02b7a5cb9a
Rebrand project as largeRCRF
2019-03-13 10:47:37 -07:00
cfa3a6f432
Attempting memory optimizations
2019-03-13 10:39:18 -07:00
8014bd4629
Fix bug where NAs cause crash
2019-03-04 11:36:21 -08:00
91cf299362
Add some R utility functions to help data get quickly loaded
2019-03-04 11:23:31 -08:00
29b154110a
Fix bug where template.yaml gets replaced whenever user wants to look at help dialog
2019-02-28 11:12:21 -08:00
a7f591c2d3
Add integration capabilities to RightContinuousStepFunction; use it for calculating mortality
2019-02-18 14:57:26 -08:00
e74ba23177
Small changes; more tests.
2019-02-18 14:57:13 -08:00
9f513ab75b
Add capabilities to get nodes of a certain type in a forest; used to produce summary statistics
2019-02-02 09:36:00 -08:00
77ec780304
Fix theoretical bug
2019-01-29 13:38:45 -08:00
d8e52ecd82
Add tests around NumericSplitRuleUpdater; fix minor bug.
2019-01-23 19:37:10 -08:00
115c57f829
Improve documentation and add a final
to MeanResponseCombiner.
2019-01-22 11:01:21 -08:00
d935fe0bc0
Improve WeightedVarianceGroupDifferentiator to be faster
2019-01-22 10:56:31 -08:00
ee137370a1
Add GPL-3 Copyright to code
2019-01-14 11:45:23 -08:00
Joel Therrien
7a5a8ab0fc
Merge branch 'optimizations' of joel/RandomSurvivalForests into master
2019-01-14 19:08:14 +00:00
e709c42da1
Update the competing risk GroupDifferentiators to make efficient use of the SplitRuleUpdater updates
...
Results in a speed improvement of over 1/3 according to a timing of the TestCompetingRisk#testLogRankSingleGroupDifferentiatorAllCovariates() test
2019-01-11 22:58:56 -08:00
86122fd90d
Remove comment that isn't true
2019-01-10 17:34:52 -08:00
31d6ce9b3e
Covariates track if they have any NA values, and skip NA handling code if possible
2019-01-10 14:09:43 -08:00
a57741b726
Add PMD rules to pom.xml to enforce higher code quality
2019-01-10 11:23:55 -08:00
a5fe856857
Massive refactor; Use Iterators/Updaters when calculating difference scores for faster calculations.
...
Changed the covariates to be more clever with how they produce the different splits. In the future (not yet implemented) a clever GroupDifferentiator
could update the current score calculation based just on how many rows moved from one hand to the other. There were a few other changes as well;
TreeTrainer#growTree now accepts a Random as a parameter which is used throughout the entire growing process. This means it's now theoretically
possible to grow trees using a seed, so that results can be fully reproducible.
2019-01-09 21:31:27 -08:00
e892076a05
Add a test for the composite log rank splitting rule.
...
Add some debug toString capabilities on Nodes and Trees
2019-01-04 11:22:23 -08:00
ae40a2e664
Removed naive mortality error measurement
...
Naive mortality error was an ad-hoc method I implemented earlier on. It
didn't provide any useful performance, nor was it theoretically
grounded. It's better to remove it before someone accidently uses it.
2018-10-27 19:15:59 -07:00
a887a3cc15
Fix bug in Utils.binarySearchLessThan
2018-10-25 11:21:45 -07:00
ae91dbe9e7
Explicitly store RightContinuousStepFunction in CompetingRiskFunctions
...
Done so that RUtils is useful. Also optimized imports.
2018-10-25 10:49:43 -07:00
c68f67e47a
Massive optimizations;
...
Refactored how MathFunctions are structured to use more primitives and
less objects.
Optimized competing risk group differentiators to run faster.
Removed alternative competing risk response combiners (may be added back
later)
2018-10-25 10:34:27 -07:00
cce5ad1e0f
Add parameter to decide on whether to check for node purity or not
2018-10-15 11:03:35 -07:00
7fba964af9
Optimize CompetingRiskResponseCombiner
2018-10-12 12:11:48 -07:00
aa733d5eba
Switch code to storing Covariate.Value using arrays instead of Maps
2018-09-18 11:17:15 -07:00
de39f60314
Make CovariateRow's serializable; add R utility functions.
2018-09-14 18:42:14 -07:00
7008959999
Add functionality to analyze using validation sets
2018-09-13 12:09:20 -07:00
98cb97a1f1
Improve performance by integrating binary search into MathFunction
2018-09-11 17:12:27 -07:00
6e58122380
Optimize MathFunction
2018-09-10 17:16:43 -07:00
e0681763ef
Add convenience methods to improve R interface performance
2018-09-10 12:31:35 -07:00
b8024275a9
Fix a bug where CompetingRiskFunctions returns NaNs when using set times
...
in response combiner
2018-09-01 09:43:42 -07:00
62198f998d
Small code cleanup
2018-09-01 09:42:48 -07:00
2fb80df5a5
Add test for CompetingRiskFunctions
2018-08-31 22:32:54 -07:00
8333579a1f
Code cleanup; fixed 3 minor bugs in the settings
2018-08-31 13:10:30 -07:00