Alternating optimization algorithms for canonical polyadic decomposition (with/without nonnegative constraints) often accompany update rules with low computational cost, but could face problems of swamps, bottlenecks, and slow convergence. All-at-once algorithms can deal with such problems, but always demand significant temporary extra-storage, and high computational cost. In this paper, we propose an all-at-once algorithm with low complexity for sparse and nonnegative tensor factorization based on the damped Gauss-Newton iteration. Especially, for low-rank approximations, the proposed algorithm avoids building up Hessians and gradients, reduces the computational cost dramatically. Moreover, we proposed selection strategies for regularization parameters. The proposed algorithm has been verified to overwhelmingly outperform "state-of-the-art" NTF algorithms for difficult benchmarks, and for real-world application such as clustering of the ORL face database.