largeRCRF-Java

Author	SHA1	Message	Date
Joel Therrien	7371dab4f1	Fix two bugs on how RightContinuousStepFunction is integrated. Specifically, the integration returned an NaN if the integration was up to an NaN (real inegrals are robust); and the results were negative if integrating from a to b where a > b.	2019-07-19 14:14:36 -07:00
Joel Therrien	ae9a6b9a3f	Add binary and unary operations to RightContinuousStepFunctions	2019-07-18 15:14:01 -07:00
Joel Therrien	d7cdc9f6e7	Split into two Maven modules One for the library (same as before), the other for the executable part.	2019-07-03 09:37:19 -07:00
Joel Therrien	ee4b513298	Remove dependencies from project This project is now purely a library only; the code for running directly from the command line will be put into a new project. This was important because we were including large dependencies into the R code that weren't needed and created some minor licensing inconveniences.	2019-07-02 16:54:58 -07:00
Joel Therrien	bc2c240823	Update Jackson to 2.9.9 Apparently older versions of Jackson contain a security vulnerability (not really important for this project, given that users are only ever using Jackson on their own settings files)	2019-07-02 15:39:33 -07:00
Joel Therrien	21ce7ce135	Removed some unnecessary dependencies in a test file.	2019-06-28 15:25:46 -07:00
Joel Therrien	88dc1c2316	Add README file	2019-06-28 10:54:42 -07:00
Joel Therrien	78ca8bad73	Update pom to build package with minimal dependencies	2019-06-27 15:34:09 -07:00
Joel Therrien	22accdb263	Add support for providing an initial forest to add trees to	2019-06-07 19:55:44 -07:00
Joel Therrien	7da3bd14a5	Add test verifying that CIFs are averaged together in the same way as randomForestSRC	2019-06-05 15:13:08 -07:00
Joel Therrien	4aac73b868	Improve progress tracking to terminate when all trees are trained Instead of only checking every second	2019-05-29 15:05:48 -07:00
Joel Therrien	1e40b7ff9b	Add missing copyright notices	2019-05-10 16:02:59 -07:00
Joel Therrien	6f318db79e	Add support for seeds to control randomness when training forests	2019-05-10 16:02:33 -07:00
Joel Therrien	17ae3a9f5a	Refactor - rename GroupDifferentiators into SplitFinders SplitRule would have made more sense but it was already taken.	2019-05-08 16:09:09 -07:00
Joel Therrien	c5c74ad7e9	Fix bug where parallel forests never finish Add test to detect case	2019-04-29 11:02:14 -07:00
Joel Therrien	de3de300cf	Change how parallel trees are trained to be more robust to Threads getting terminated. This will hopefully make the package more stable on cluster systems, where sometimes the forests immediately stall.	2019-04-26 11:04:03 -07:00
Joel Therrien	1e643385ee	Adjust MultipleLogRankDifferentiators to use actual implementation found in randomForestSRC Also merge SingleLogRankDifferentiators with the Group variants, as they now reduce tot he simple case when given only one event of focus.	2019-04-23 17:34:39 -07:00
Joel Therrien	c6a5787975	Use UUIDs to save trees instead of tree number. Benefits are for when we restart a previously parallel task in which, say, trees 1, 2, and 4 were completed but tree 3 never did complete. Under the previous implementation we'd start at tree 4 (we'd just count how many trees were done). To fix this would require some additional effort. Since the order of trees is irrelevant, it made sense to just stop ordering them.	2019-04-16 12:58:23 -07:00
Joel Therrien	fb20b08a23	Add debugging information to RUtils	2019-04-16 12:55:57 -07:00
Joel Therrien	f1f507d2df	Add out-of-bag predict methods for R to use	2019-04-07 15:11:06 -07:00
Joel Therrien	50b4a3cd89	Fix and optimize how progress is displayed while training trees	2019-04-05 11:22:59 -07:00
Joel Therrien	ea176cff9a	An an evaluateSerial function to Forest parallelStream doesn't seem to work very well on the ComputeCanada.ca cluster	2019-04-05 11:13:23 -07:00
Joel Therrien	bf168bc2a5	Improve performance of CompetingRiskFunctionCombiner Estimate of time improvement is at least 10x faster	2019-03-27 21:06:15 -07:00
Joel Therrien	c8269ae285	Fix bug in test	2019-03-27 21:06:05 -07:00
Joel Therrien	526a127b9d	Change Forest to keep trees in a List instead of a Collection	2019-03-27 10:57:01 -07:00
Joel Therrien	585d6d3c5b	Make SplitRules their own class; independent of their Covariate parents. This was done so that when we serialize trees (and thus SplitRules) we don't awkwardly also serialize ntree versions of the Covariates, which is really awkward when deserializing them.	2019-03-25 14:44:31 -07:00
Joel Therrien	76b2cdd3c4	WIP - Some changes to how trees are saved.	2019-03-25 10:59:55 -07:00
Joel Therrien	76614ee68b	Better memory management to help prevent OutOfMemoryExceptions	2019-03-25 10:59:26 -07:00
Joel Therrien	d65e010c48	Very minor improvement to how tree filenames are saved.	2019-03-13 11:21:51 -07:00
Joel Therrien	02b7a5cb9a	Rebrand project as largeRCRF	2019-03-13 10:47:37 -07:00
Joel Therrien	cfa3a6f432	Attempting memory optimizations	2019-03-13 10:39:18 -07:00
Joel Therrien	8014bd4629	Fix bug where NAs cause crash	2019-03-04 11:36:21 -08:00
Joel Therrien	91cf299362	Add some R utility functions to help data get quickly loaded	2019-03-04 11:23:31 -08:00
Joel Therrien	29b154110a	Fix bug where template.yaml gets replaced whenever user wants to look at help dialog	2019-02-28 11:12:21 -08:00
Joel Therrien	a7f591c2d3	Add integration capabilities to RightContinuousStepFunction; use it for calculating mortality	2019-02-18 14:57:26 -08:00
Joel Therrien	e74ba23177	Small changes; more tests.	2019-02-18 14:57:13 -08:00
Joel Therrien	9f513ab75b	Add capabilities to get nodes of a certain type in a forest; used to produce summary statistics	2019-02-02 09:36:00 -08:00
Joel Therrien	77ec780304	Fix theoretical bug	2019-01-29 13:38:45 -08:00
Joel Therrien	d8e52ecd82	Add tests around NumericSplitRuleUpdater; fix minor bug.	2019-01-23 19:37:10 -08:00
Joel Therrien	115c57f829	Improve documentation and add a `final` to MeanResponseCombiner.	2019-01-22 11:01:21 -08:00
Joel Therrien	d935fe0bc0	Improve WeightedVarianceGroupDifferentiator to be faster	2019-01-22 10:56:31 -08:00
Joel Therrien	ee137370a1	Add GPL-3 Copyright to code	2019-01-14 11:45:23 -08:00
Joel Therrien	7a5a8ab0fc	Merge branch 'optimizations' of joel/RandomSurvivalForests into master	2019-01-14 19:08:14 +00:00
Joel Therrien	e709c42da1	Update the competing risk GroupDifferentiators to make efficient use of the SplitRuleUpdater updates Results in a speed improvement of over 1/3 according to a timing of the TestCompetingRisk#testLogRankSingleGroupDifferentiatorAllCovariates() test	2019-01-11 22:58:56 -08:00
Joel Therrien	86122fd90d	Remove comment that isn't true	2019-01-10 17:34:52 -08:00
Joel Therrien	31d6ce9b3e	Covariates track if they have any NA values, and skip NA handling code if possible	2019-01-10 14:09:43 -08:00
Joel Therrien	a57741b726	Add PMD rules to pom.xml to enforce higher code quality	2019-01-10 11:23:55 -08:00
Joel Therrien	a5fe856857	Massive refactor; Use Iterators/Updaters when calculating difference scores for faster calculations. Changed the covariates to be more clever with how they produce the different splits. In the future (not yet implemented) a clever GroupDifferentiator could update the current score calculation based just on how many rows moved from one hand to the other. There were a few other changes as well; TreeTrainer#growTree now accepts a Random as a parameter which is used throughout the entire growing process. This means it's now theoretically possible to grow trees using a seed, so that results can be fully reproducible.	2019-01-09 21:31:27 -08:00
Joel Therrien	e892076a05	Add a test for the composite log rank splitting rule. Add some debug toString capabilities on Nodes and Trees	2019-01-04 11:22:23 -08:00
Joel Therrien	ae40a2e664	Removed naive mortality error measurement Naive mortality error was an ad-hoc method I implemented earlier on. It didn't provide any useful performance, nor was it theoretically grounded. It's better to remove it before someone accidently uses it.	2018-10-27 19:15:59 -07:00

1 2 3

106 commits