Background Treatment planning systems for proton and carbon ion beam therapy are commonly based on fast analytic dose calculation engines using pencil beam (PB) algorithms. Monte Carlo (MC) calculations, on the other hand, are recognized for their superior accuracy due to their better consideration of physical processes. Therefore, the value of MC based calculation for proton treatment planning has been investigated. Material and Methods The purpose of this project was to benchmark the MC algorithm against the PB algorithm and to identify clinical useful MC calculation settings for dose calculation in proton therapy. E.g. the mean relative statistical uncertainty per spot (unc=1-5%), the mean relative statistical uncertainty threshold (err=10%-60% of the maximum dose per spot) for voxels included in the uncertainty calculation and the maximum numbers of particles (maxNr=5x103-5x105). Furthermore, treatment planning parameters as e.g. peak width multiplier and spot spacing were investigated. Treatment plans based on the PB algorithm were optimized and recalculated via MC algorithm using the XiO treatment planning system research version v4.62 (Elekta AB, Stockholm, Sweden). A homogenous water phantom with three cubes of different size and complex multi-layer chess pattern phantom (HU of +1000 (bone) and -800 (lung)) embedded in a water tank, with target structures placed within or at different distances behind the chess pattern, was created. The clinical applicability was tested for a prostate and a paranasal sinus (PS) patient. The results of the PB and the MC treatment plans were compared on the basis of dose calculation times, dose profiles, dose difference maps, Gamma-index analysis and conformity index (CI) and homogeneity index (HI) measures. Results To ensure that the number of particles didn't terminate the dose calculation, 5x104 particles were necessary for unc values of 3-5% and 5x105 below 3%. A peak width multiplier of 0.8 and a spot spacing of 0.5cm achieved the best results in regards to the treatment planning parameters. Different dose deposition characteristics of the MC and PB algorithm in the presence of media with large density and composition variations could be observed. The MC algorithm deposited more dose to areas located proximal to low density tissue. Dose-difference maps revealed hotspots having a dose difference of up to 21% (PS patient) and 19% (prostate patient) of the prescribed dose. Gamma-index analysis (2%/2mm) indicated a good agreement between the MC and the PB algorithm. Conclusion A relative statistical uncertainty per spot of 5% seemed acceptable for clinical MC dose calculation, especially when regarding the dose calculation time. The PB algorithm worked accurate and attained comparable result, even in difficult treatment situations involving large density and tissue heterogeneities.