Predicting non-deposition sediment transport in sewer pipes using Random Forest.

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Water

Research 189 (2021) 116639

Contents

lists

available

ScienceDirect

Water

Research

journal

homepage:

www.elsevier.com/locate/watres

Predicting

non-deposition

sediment

transport

sewer

pipes

using

Random

forest

Carlos

Montes

∗

Zoran

Kapelan

Juan

Saldarriaga

Department of Civil and Environmental Engineering, Universidad de los Andes, Bogotá, Colombia

Department of Water Management, Delft University of Technology, Delft, Netherlands

Article

history:

Received

15 July 2020

Revised

29 October 2020

Accepted

12 November 2020

Available

online 13 November 2020

Keywords:

Non-deposition

Random

forest

Sediment

transport

Self-cleansing

Sewer

systems

Sediment

transport

sewers

has

been

extensively

studied

the

past.

This

paper

aims

propose

new

method

for

predicting

the

self-cleansing

velocity

required

avoid

permanent

deposition

material

sewer

pipes.

The

new

Random

Forest

(RF)

based

model

was

implemented

using

experimental

data

col-

lected

from

the

literature.

The

accuracy

the

developed

model

was

evaluated

and

compared

with

ten

promising

literature

models

using

multiple

observed

datasets.

The

results

obtained

demonstrate

that

the

model

able

make

predictions

with

high

accuracy

for

the

whole

dataset

used.

These

predictions

clearly

outperform

predictions

made

other

models,

especially

for

the

case

non-deposition

with

de-

posited

bed

criterion

that

used

for

designing

large

sewer

pipes.

The

volumetric

sediment

concentration

was

identiﬁed

the

most

important

parameter

for

predicting

self-cleansing

velocity.

Elsevier

Ltd.

All

rights

reserved.

Introduction

Designing

sediment-carrying

sewer

systems

well-known

ﬁeld

research

hydraulic

engineering.

This

interest

explained

the

problems

the

presence

material

the

systems.

Due

the

varying

environmental

conditions

(i.e.

and

sedi-

ment

characteristics

and

intermittent

ﬂow),

the

risk

building

permanent

sediment

deposit

increases

during

dry

weather

sea-

sons.

These

deposits

lead

problems

such

reduced

pipe

capac-

ity,

increased

roughness,

and

premature

overﬂows.

example,

Ackers

al.

(2001)

showed

that

the

presence

permanent

de-

posit

the

bottom

sewer

pipes

increases

hydraulic

roughness

and

reduces

discharge

capacity

about

20%.

The

most

common

criterion

avoid

permanent

deposit

ma-

terial

sewer

pipes

known

non-deposition.

Several

authors

(

Safari

al.,

2018

;

Vongvisessomjai

al.,

2010

)

have

classiﬁed

this

criterion

into

two

subgroups:

Non-deposition

without

deposited

bed

and

Non-deposition

with

deposited

bed.

Both

groups

are

based

the

presence

sediments

the

bottom

the

pipe.

the

ﬁrst

case,

high

water

velocities

produce

individual

and

sep-

arate

movement

the

particles

slicing

rolling

over

the

pipe

invert,

i.e.

without

deposited

bed.

contrast,

the

second

case

seen

when

lower

water

velocities

are

presented

and

the

particles

are

grouped

and

move

transitional

deposited

bed.

∗

Corresponding

author at: Cra 1 Este No. 19A – 40 Bogota, Colombia.

E-mail

addresses:

cd.montes1256@uniandes.edu.co

(C.

Montes),

Z.Kapelan@tudelft.nl

(Z.

Kapelan),

jsaldarr@uniandes.edu.co

(J.

Saldarriaga).

the

case

‘without

deposited

bed’,

traditional

criteria

minimum

velocities

and

shear

stress

values

are

commonly

found

water

utilities

standards

and

industry

design

codes.

Generally,

these

standards

and

codes

suggest

values

ranging

from

0.30

−1

1.0

−1

for

minimum

velocity

and

from

1.0

4.0

for

shear

stress

(

Montes

al.,

2019

;

Nalluri

and

Ghani,

1996

;

Vongvisessomjai

al.,

2010

Several

authors

(

Merritt

and

Enﬁn-

ger,

2019

;

Nalluri

and

Ghani,

1996

)

have

shown

how

tradi-

tional

threshold

values

lead

over-design

small

diameter

pipes

and

under-design

large

diameter

pipes

(as

rule-of-thumb,

pipes

with

diameter

greater

than

500

mm).

Consequently,

large

sewers

commonly

require

frequent

removal

sediment

deposits

(

Ackers

al.,

2001

)

because

the

minimum

self-cleansing

value

adopted

during

the

design

stage.

unique

design

value

inad-

equate;

hence

sediment

characteristics

and

hydraulic

conditions

must

included

the

deﬁnition

the

self-cleansing

design

cri-

terion.

According

Safari

and

Aksoy

(2020)

existing

traditional

self-

cleansing

criteria

can

20%

different

from

laboratory-

scale

measured

values.

The

channel

cross-section

relevant

the

choice

the

self-cleansing

criterion.

For

example,

rectangu-

lar

cross-sections

require

lower

velocities

compared

V-bottom

U-shape

channels.

Even

criteria

based

the

Shields

diagram,

such

the

Camp

criterion,

seem

inadequate

deﬁne

the

self-cleansing

value

due

the

non-inclusion

sediment

concen-

tration.

The

above

has

motivated

extensive

experimental

research

(

Ghani,

1993

;

El-Zaemey,

1991

;

May,

1993

;

May

al.,

1989

;

https://doi.org/10.1016/j.watres.2020.116639

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Mayerle,

1988

;

Montes

al.,

2020a

2020b

;

Ota,

1999

;

Perrusquía,

1991

;

Vongvisessomjai

al.,

2010

)

aiming

collect

data

and

developing

models

for

predicting

the

self-cleansing

velocity

function

sediment

characteristics

and

sewer

hydraulics,

based

the

concept

non-deposition.

These

studies

have

been

car-

ried

out

laboratory

scale

under

well-controlled

and

steady

ﬂow

conditions,

using

non-cohesive

sediments.

Different

authors

col-

lected

data

pipes

with

different

materials

(e.g.

concrete,

acrylic

PVC,

among

other

materials)

and

internal

diameters,

ranging

from

100

595

mm.

the

end,

all

these

studies

proposed

model

for

predicting

the

self-cleansing

conditions

practice

that

was

either

developed

with

their

own

experimental

data

using

the

benchmark

data

reported

the

literature.

Most

models

devel-

oped

are

regression-based

and

include

the

group

input

param-

eters

that

most

affect

the

prediction

the

self-cleansing

veloc-

ity

(

Ackers

al.,

2001

;

Ebtehaj

and

Bonakdari,

2016a

;

May

al.,

1996

Most

these

models

are

the

form

of:

(

− 1

)

(1)

where

the

self-cleansing

velocity,

d the

mean

particle

diam-

eter,

g the

gravity

acceleration

coeﬃcient,

the

speciﬁc

gravity

sediments,

the

volumetric

sediment

concentration,

the

hy-

draulic

radius,

the

pipe

diameter,

the

channel

friction

fac-

tor,

the dimensionless grain

size

(

−1

)

the wa-

ter

kinematic

viscosity,

the

sediment

deposited

width,

the

wetted

perimeter,

the sediment

deposited

thickness,

the

wa-

ter

surface

width,

the

water

level

and

f ,

g and

re-

gression

coeﬃcients.

Other

parameters

the

threshold

veloc-

ity

required

initiate

movement

(

125

(

− 1

)

(

)

and

the

pipe

slope

have

also

been

included

regression

models

(

May

al.,

1996

;

Montes

al.,

2020a

Most

above

studies

for

both

non-deposition

criteria,

have

de-

veloped

predictive

models

which

tend

overﬁtted

their

own

experimental

data.

This

problem

can

seen

especially

the

ear-

lier

works,

where

advanced

techniques

were

used

develop

regression

models.

For

example,

several

authors

(

Montes

al.,

2020b

;

Safari

al.,

2018

)

have

pointed

out

that

early

work

Mayerle’s

(1988)

has

developed

model

that

shows

high

accu-

racy

prediction

with

its

data

and

poor

prediction

when

other

datasets

are

used.

contrast,

recent

regression-models,

which

used

novel

techniques

such

Evolutionary

Polynomial

Regression

– Multi-Objective

Genetic

Algorithm

(EPR-MOGA)

and

Least

Abso-

lute

Shrinkage

and

Selection

Operator

(LASSO)

have

demonstrated

better

prediction

results

(

Montes

al.,

2020a

2020b

order

address

the

above

overﬁtting

issue

regres-

sion

models,

new

Machine

Learning

(ML)

and

Artiﬁcial

Intelli-

gence

(AI)

techniques

have

been

introduced

for

predicting

the

self-

cleansing

velocity

based

the

concept

non-deposition

sed-

iment

transport.

Examples

models

developed

for

the

‘with-

out

deposited

bed’

case

include

using

techniques

such

Artiﬁ-

cial

Neural

Network

(ANN)

(

Ebtehaj

and

Bonakdari,

2013

Sup-

port

Vector

Regression

(SVR)

coupled

with

the

Fireﬂy

Algorithm

(

Ebtehaj

and

Bonakdari,

2016b

the

Group

Method

Data

Han-

dling

(GMDH)

(

Ebtehaj

and

Bonakdari,

2016a

neuro-fuzzy

in-

ference

system

combined

with

the

Particle

Swarm

Optimisation

(ANFIS-PSO)

(

Ebtehaj

al.,

2019

Decision

Trees

(DT),

Generalised

Regression

Neural

Network

(GRNN),

Multivariate

Adaptive

Regres-

sion

Splines

(MARS)

(

Safari,

2019

)

and

Extreme

Learning

Machine

(ELM)

(

Ebtehaj

al.,

2020

For

the

other

case

‘non-deposition

with

deposited

bed’,

fewer

ML/AI

type

models

have

been

devel-

oped.

Examples

include

models

based

Particle

Swarm

Optimisa-

tion

(PSO)

algorithm

(

Safari

al.,

2017

Gene

Expression

Program-

ming

(GEP)

(

Roushangar

and

Ghasempour,

2017

)

and

Multigene

Genetic

Programming

(MGP)

(

Safari

and

Danandeh

Mehr,

2018

The

above

models,

developed

using

different

ML/AI

tech-

niques

(for

both

non-deposition

criteria),

have

improved

the

prediction

accuracy

self-cleansing

velocities

and

addressed

the

issues

model

overﬁtting

but

only

partially.

noted

Zendehboudi

al.

(2018)

these

models

still

tend

have

rather

limited

extrapolation

capabilities

meaning

that

once

they

are

ap-

plied

datasets

that

were

not

used

for

their

training

they

tend

underperform.

Also,

the

ML/AI

based

models

developed

far

are

largely

black-box

type

models

(e.g.

ANN)

meaning

that,

un-

white-box

type

regression

models,

they

suffer

from

low

inter-

pretability

physical

signiﬁcance

model

inputs

(i.e.

explanatory

factors),

and

interactions

with

the

model

output.

The

aim

this

paper

overcome

above

deﬁciencies

us-

ing

the

Random

Forest

(RF)

technique

for

predicting

self-cleansing

sewer

velocities.

(

Breiman,

2001

)

ﬂexible

and

interpretable

supervised

technique

that

combines

the

results

(outputs)

multiple

individual

decision

trees

make

prediction

interest.

Due

its

good

characteristics

and

easy

application,

has

been

widely

used

for

addressing

many

other

problems

water

en-

gineering.

Tyralis

al.

(2019)

showed

full

review

studies

which

was

successfully

applied

water

resources

problems.

Using

the

technique,

new

predictive

self-cleansing

model

developed

and

presented

here

for

both

non-deposition

criteria

(with

and

without

deposited

bed).

This

model

aims

increase

prediction

accuracy

whilst

avoiding

overﬁtting

issues

and

enabling

interpretability

results

obtained.

The

new

modelling

technique

compared

ten

literature

models

using

multiple

datasets.

Data

2.1.

Non-deposition

without

deposited

bed

data

Several

experimental

data

were

collected

from

the

literature

implement

the

method.

Mayerle

(1988)

studied

the

sediment

transport

152

diameter

pipe

and

two

rectangular

chan-

nels

311.5

and

462.3

bottom

width

(

)

using

granular

sands

ranging

from

0.50

8.74

mm.

Ghani

(1993)

col-

lected

221

data

154

mm,

305

and

450

diameter

pipes,

testing

sands

between

0.46

and

8.40

mm.

Ota

(1999)

used

225

concrete

pipe

with

constant

slope

0.002,

vary-

ing

the

volumetric

sediment

concentration

between

4.2

ppm

59.4

ppm.

Vongvisessomjai

al.

(2010)

used

two

circular

PVC

pipes

100

and

150

diameter

study

the

bedload

and

suspended

load

transport.

Montes

al.

(2020a)

collected

ex-

perimental

data

242

acrylic

pipe

using

granular

mate-

rial

with

mean

particle

diameter

0.35

and

1.51

mm.

Montes

al.

(2020b)

carried

out

107

experiments

595

PVC

pipe,

using

sediments

ranging

from

0.35

2.6

mm.

2.2.

Non-deposition

with

deposited

bed

data

For

the

non-deposition

with

deposited

bed,

El-Zaemey

(1991)

studied

the

sediment

transport

305

diameter

pipe,

using

granular

particles

ranging

from

0.53

8.40

mm.

Perrusquía

(1991)

carried

out

experiments

225

diame-

ter

pipe,

varying

the

sediment

concentration

from

18.7

ppm

408.0

ppm.

Ghani

(1993)

collected

the

deposited

bed

data

only

the

450

concrete

pipe

and

using

granular

sand

with

mean

particle

diameter

0.72

mm.

May

(1993)

extended

their

study

(

May

al.,

1989

)

and

collected

experimental

data

with

sediment

thickness

varying

from

57.6

129.6

mm.

Finally,

Montes

al.

(2020b)

carried

out

experiments

595

PVC

pipe,

considering

relative

sediment

thickness

(

)

between

0.13%

and

1.11%.

Table

outlines

the

characteristics

the

data

used

for

developing

the

algorithm.

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Dat

use

plementing

dat

mining

and

ession

models.

fer

nce

n-deposition

crit

erion

runs

Pipe

diame

width

(mm)

Flo

Velocity

(m/s)

Pipe

slope

(%)

diment

Concentr

tion

(ppm)

diment

thic

kness

bed

(mm)

(1988)

cir

cular

annel

Without

deposit

bed

106

152

0.37

1.10

0.13

0.56

20.0

1275.0

–

(1988)

angular

annel

Without

deposit

bed

105

311.5

and

462.3

0.41

-’

1.04

0.09

–0

14.0

–

1568.0

–

Ghani

(1993)

Without

deposit

bed

221

154,

305

and

405

0.24

1.22

0.04

2.56

0.8

1450.0

–

(1999)

Without

deposit

bed

305

0.39

0.74

0.2

4.2

59.4

–

Vongvisessomjai

al.

(2010)

Without

deposit

bed

100

and

150

0.24

0.63

0.20

0.60

4.0

90.0

–

Mont

al.

(2020a)

Without

deposit

bed

242

0.24

1.05

0.20

0.80

0.3

875.7

–

Mont

al.

(2020b)

Without

deposit

bed

107

595

0.41

1.41

0.04

3.43

1.3

19,957.0

–

El-Zaeme

(1991)

With

deposit

bed

290

305

0.39

0.96

0.05

0.44

7.0

917.0

47.0

–

120.0

ía

(1991)

With

deposit

bed

225

0.29

0.67

0.20

0.60

18.7

408.0

45.0

–9

Ghani

(1993)

With

deposit

bed

450

0.49

1.33

0.07

0.47

21.0

1259.0

52.0

–1

(1993)

With

deposit

bed

450

0.39

1.14

0.07

0.97

3.5

823.0

57.6

–

129.6

Mont

al.

(2020b)

With

deposit

bed

595

0.73

1.53

0.46

5.42

389.0

10,275.0

0.8

–6

shown

Table

total

664

and

454

data

are

available

for

the

development

models

without

deposited

bed

and

with

deposited

bed,

respectively.

Mehodology

3.1.

Random

forest

model

Random

Forest

model

developed

here

predicts

the

par-

ticle

Froude

number

(

∗

)

function

several

well-

known

dimensionless

explanatory

factors

(

Kargar

al.,

2019

;

Vongvisessomjai

al.,

2010

∗

(

− 1

)

(2)

Random

forest

(RF)

bagging

algorithm

for

regression

and

classiﬁcation

problems

proposed

Breiman

(2001)

This

low-

variance

method,

which

randomly

split

the

training

data

and

the

input

variables

predictors

build

set

decision

trees

(

The

results

all

decision

trees

generated

from

bootstrapped

train-

ing

samples

(

;

)

are

then

averaged,

i.e.

the

ﬁnal

result

(

)

the

average

the

output

all

decision

trees

(as

shown

Eq.

(3)

This

procedure

ensures

the

reduction

the

model

vari-

ance

and

consequently

the

reduction

the

risk

overﬁtting.

simpliﬁed

conceptual

diagram

the

method

shown

Fig.

(

)

(

;

)

(3)

this

paper,

the

package

‘RandomForest’

(

Liaw

and

Wiener,

2002

)

was

used

for

constructing

both

non-deposition,

without

deposited

bed

and

deposited

bed,

self-cleansing

models.

The

number

predictors

considered

each

split

(

mtry

)

and

the

number

trees

the

forest

(

)

are

the

parameters

that

deﬁne

the

structure

the

regression

model.

The

mtry

parameter

estimated

using

the

rfcv()

function,

which

shows

the

cross-

validation

performance

for

each

number

predictors.

addition,

the

optimal

number

trees

deﬁned

the

value

that

minimises

the

Mean

Square

Error

(MSE)

value

the

training

data.

These

pa-

rameters

are

estimated

and

the

results

are

shown

Fig.

Accord-

ing

this

ﬁgure,

the

optimal

number

features

(i.e.

the

random

predictors

used

each

tree)

are

three

and

four

non-dimensional

parameters

for

the

cases

without

deposited

bed

and

with

de-

posited

bed,

respectively.

Similarly,

the

optimal

number

trees

471

for

without

deposited

bed

and

229

for

with

deposited

bed.

Cross-validation

carried

out

during

the

training

stage

using

out-of-bag

(OOB)

samples.

mentioned

above,

the

method

ran-

domly

bootstraps

the

training

sample,

that

is,

some

the

train-

ing

data

are

left

out

build

each

decision

tree.

Only

two

out

three

parts

the

total

training

data

are

used

build

the

tree

(

Breiman,

2001

Based

this,

data

not

included

the

boot-

strapped

sample

(OOB

data)

are

predicted,

and

the

prediction

error

averaged

over

the

trees

that

not

include

these

data

(OOB

Er-

ror).

3.1.1.

Splitting

training

and

testing

data

The

whole

benchmarking

data

collected

from

the

literature

are

used

for

both

training

and

testing

stages

the

model.

Usually,

75%

the

data

used

during

the

training

stage

the

model

and

the

other

25%

validate

the

results.

According

Safari

(2020)

the

range

variation

the

training

data

has

direct

implications

for

model

performance

(i.e.

accuracy).

result,

the

model

can

show

overﬁtting

issues

and

poor

extrapolation

capabilities

when

narrow

datasets

are

used

the

training

stage

(i.e.

data

with

low

range

variation).

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Fig.

1. Simpliﬁed conceptual diagram of the RF method.

Fig.

2. Selection of the optimal Random forest parameters.

Fig.

3. Variation of the training and testing error using different combination of percentages between the training and testing dataset. A) Training stage and B) Testing stage.

Checking

the

non-overﬁtting

the

model

carried

out

using

several

sizes

the

training

and

testing

data

(i.e.

changing

the

percentage

data

used

training

and

testing)

and

ver-

ifying

the

error,

deﬁned

the

Coeﬃcient

Determination

(

)

(as

shown

Eq.

(14)

For

this,

ten

different

combinations

per-

centages

are

deﬁned

(i.e.

the

training

data

the

test-

ing

data

[5:95,

15:85,

25:75,

35:65,

45:55,

55:45,

65:35,

75:25,

85:15,

95:5]),

randomly

changing

the

ranges

the

training

and

testing

data,

and

developing

100

models

for

each

combination.

result,

models

are

trained

and

the

error

estimated

for

both

training

and

testing

stage.

Using

this

information,

several

boxplots

are

constructed

showing

the

variation

for

each

stage.

Fig.

shows

how

the

model

error

decreases

the

training

sam-

ple

size

increases.

For

example,

when

only

the

whole

dataset

used

for

training

the

model

and

the

remaining

95%

for

testing

it,

the

error

varies

between

0.84

and

0.96,

for

the

training

stage,

and

between

0.39

and

0.73

for

the

testing

stage.

This

clearly

shows

that

the

model

under-trained;

however,

when

the

ratio

greater

than

50:50

the

error

tends

constant

and

slightly

variable

for

both

stages.

Ratios

greater

than

90:10

tend

generate

unsatis-

factory

results

for

the

testing

stage,

i.e.

the

model

over-trained

and

shows

high

variation

the

error,

i.e.

overﬁtting,

(as

shown

Fig.

b).

Based

this,

combination

75:25

taken

optimal

for

implementing

the

model.

The

variation

the

data

used

for

training

and

testing

dataset

presented

Table

Using

the

above

considerations,

the

model

implemented

with

the

optimal

parameters

deﬁned

Fig.

and

using

the

ranges

variation

the

training

data

outlined

Table

The

full

data

collected

from

the

literature

are

shown

the

Supplementary

ma-

terial.

Table

and

Table

show

the

data

for

non-deposition

without

and

with

deposited

bed,

respectively,

and

the

correspond-

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Table

Variation

of the data for training and testing the RF model.

Non-deposition

criterion

Stage

No.

of runs

Channel

geometry

(mm)

Flow

Velocity (m/s)

Pipe

slope (%)

Sediment

Concentration

(ppm)

Sediment

thickness

bed

(mm)

Without

deposited bed

Training

498

= 100.0 – 595.0

= 311.5 – 462.3

0.237

- 1.41

0.04

– 3.43

0.53

– 19,957

–

Testing

166

= 100.0 – 595.0

= 311.5 – 462.3

0.237

– 1.24

0.04

– 2.74

1.00

– 13,840

–

With

deposited bed

Training

340

= 225 – 595

0.294

– 1.53

0.05

– 5.42

3.50

- 10,274

0.78

– 129.6

Testing

114

= 225 – 595

0.319

– 1.28

0.05

– 2.58

17.00

- 9101

1.78

– 120.0

Fig.

4. Random Forest code to calculate the particle Froude number in sewer pipes.

ing

particle

Froude

number

predictions.

The

implemented

code

for

the

method

shown

Fig.

example

one

the

471

decision

trees

generated

the

model,

for

the

non-deposition

without

deposited

bed,

shown

Figure

S1,

the

Supplemen-

tary

material.

3.1.2.

Measure

feature

importance

Note

that

this

paper,

decrease

model

accuracy

when

the

variable

permuted

(i.e.

the

percentage

the

increase

the

MSE,

IncMSE)

considered

measure

the

importance

model

input

variable.

This

index

shows

the

strength

each

ex-

planatory

variable

based

the

reduction

the

MSE.

The

step-

by-step

calculate

the

IncMSE is

shown

follows

(

Hastie

al.,

2009

(1)

Calculate

the

MSE

the

OOB-sample

data

each

tree

the

forest

(

(2)

Randomly

permute

the

value

the

explanatory

variable

and

calculate

the

MSE

(

(3)

Finally,

calculate

IncMSE for

each

explanatory

variable

as:

IncM

100

− M

(4)

result,

the

IncMSE increases

for

variable,

the

important

is.

3.2.

Performance

assessment

3.2.1.

Models

used

for

comparing

the

results

order

evaluate

the

model

performance,

com-

pared

several

literature

models.

The

models

selected

for

com-

parison

are

the

replicable

white-box

models

with

high

predic-

tion

accuracy

reported

the

literature

and

two

black-box

mod-

els

where

the

implementing

code

provided

the

original

pa-

pers.

Other

black-box

models

cannot

evaluated

due

the

lim-

ited

replicability

shown

these

models

(e.g.

ANN).

Based

this,

the

case

non-deposition

without

deposited

bed,

seven

mod-

els

selected

are

the

EPR-MOGA

model

(

Montes

al.,

2020a

the

GEP

model

(

Kargar

al.,

2019

the

MARS

model

(

Safari,

2019

the

May

al.

(1996)

model,

the

Safari

and

Aksoy

(2020)

model,

the

ANFIS-PSO

model

(

Ebtehaj

al.,

2019

)

and

the

ELM

model

(

Ebtehaj

al.,

2020

the

case

non-deposition

with

de-

posited

bed,

three

models

used

for

comparison

are

the

PSO

model

(

Safari

and

Shirzad,

2019

the

LASSO

model

(

Montes

al.,

2020b

)

and

the

MGP

model

(

Safari

and

Danandeh

Mehr,

2018

The

EPR-

MOGA,

LASSO,

May

al.

(1996)

and

Safari

and

Aksoy

(2020)

are

the

regression

type

models

whilst

GEP,

MARS,

ANFIS-PSO,

ELM,

PSO

and

MGP

models

make

use

ML/AI

techniques.

The

equations

used

above

ten

models

are

follows:

EPR-MOGA:

(

− 1

)

−0

(5)

GEP:

(

− 1

)

atan

(

− ln

(

)

atan

⎛

⎝

tan

− 7

⎞

⎠

⎛

⎝

⎞

⎠

(6)

MARS:

(

− 1

)

− 1

· max

(

− 0

)

· max

(

− d/R

)

+15

· max

(

− 0

)

− 16

· max

(

− C

)

· max

(

− 0

)

− 7

· max

(

− 0

)

−16

· max

(

− 0

)

· max

(

− 0

)

−4

· max

(

− 0

)

· max

(

− 0

)

· max

(

− 0

)

· max

(

− 0

)

(7)

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

May

al.

(1996)

0303

−

(

− 1

)

(8)

Safari

and

Aksoy

(2020)

(

− 1

)

−0

(9)

ANFIS-PSO:

equation.

The

Matlab

code

can

found

Ebtehaj

al.

(2019)

ELM:

(

− 1

)

(

exp

(

−InW

· InV

BHI

)

· OutW

(10)

where

InW

and

OutW

are

the

input

and

output

weights,

BHI the

bias

the

hidden

neurons

and

InV

the

input

variables

(i.e.

and

Full

details

the

values

chosen

for

each

parameter

are

shown

Ebtehaj

al.

(2020)

PSO:

(

− 1

)

−0

(11)

LASSO:

(

− 1

)

144

−0

305

−0

059

−0

169

−0

104

(12)

MGP:

(

− 1

)

− 0

− 2

(13)

3.2.2.

Performance

indices

The

model

performance

evaluated

and

compared

above

ten

models

using

three

performance

indicators.

These

are

the

Co-

eﬃcient

Determination

(

the

Root

Mean

Square

Error

(

RMSE

)

and

the

Mean

Absolute

Percentage

Error

(

MAPE

deﬁned

fol-

lows:

−

∗

OBS

− F

MOD

∗

OBS

− F

∗

OBS

(14)

RMSE

∗

OBS

− F

MOD

(15)

MAP

100

∗

OBS

− F

MOD

∗

OBS

(16)

where

∗

OBS

the

particle

Froude

number

observed

data,

MOD

the

particle

Froude

number

estimated

algorithm

(or

other

pre-

dictive

model),

the

number

data

and

∗

OBS

the

mean

ob-

served

particle

Froude

number

data.

The

Coeﬃcient

Determination

measures

the

percentage

the

model

variance

that

can

explained.

This

coeﬃcient

varies

between

and

with

value

denoting

perfect

match

be-

tween

observed

and

modelled

data.

The

Root

Mean

Square

Error

measures

the

standard

deviation

the

residuals.

Note

that

value

indicates

high

model

prediction

accuracy.

Finally,

the

Mean

Absolute

Percentage

Error

assesses

the

model

prediction

ac-

curacy

(i.e.

bias)

percentage

the

observed

value.

Value

indicates

the

perfect

model

where

there

are

differences

be-

tween

predictions

and

observations.

Results

The

results

obtained

using

the

methodology

shown

the

section

are

presented

Tables

and

for

without

de-

posited

bed

and

deposited

bed

criteria,

respectively.

Graphically,

these

results

are

shown

Figs.

and

shown

these

tables,

for

the

MARS,

ANFIS-PSO,

ELM

and

MGP

models,

the

outliers

the

particle

Froude

number

(i.e.

∗

0.00

and

∗

20.00)

were

re-

moved.

This

because

these

models

can

produce

extreme

values

(e.g.

∗

−58.67

∗

163.59,

among

others)

that

misrepresent

the

model

comparison

when

evaluating

the

performance

indices.

can

seen

from

Table

Random

Forest

model

shows

better

generalisation

capacity

than

other

models

shown,

demonstrated

high

prediction

accuracy

observed

for

all

avail-

able

datasets

(0.88

0.98,

0.24

RMSE

0.73

and

4.36%

MAP

11.09%).

The

following

observations

can

made

from

the

performance

the

other

models

evaluated:

•

EPR-MOGA,

similarly

RF,

shows

good

results

but

has

infe-

rior

accuracy

large

sewer

pipes

(

0.86,

RMSE

1.03

and

MAP

11.31%).

addition,

EPR-MOGA

model

shows

limita-

tions

for

predicting

the

particle

Froude

number

non-circular

sections

(as

shown

the

Mayerle

(1988)

rectangular

data).

This

equation

shows

good

extrapolation

capabilities

because

the

inclusion

the

pipe

slope

input

feature

for

the

self-

cleansing

prediction.

•

GEP

shows

acceptable

results

(0.79

0.87,

0.66

RMSE

0.89

and

11.45%

MAP

22.33%)

for

the

datasets

used

for

its

development

circular

channels

(

Ghani,

1993

;

Mayerle,

1988

;

Vongvisessomjai

al.,

2010

)

and

poor

perfor-

mance

for

other

datasets

(0.00

0.76,

1.00

RMSE

1.95

and

14.35%

MAP

37.92%).

This

model

presents

good

performance

for

large

sewer

pipes.

contrast,

for

non-circular

channels

the

model

quickly

loss

accuracy.

•

According

Safari

(2019)

MARS

model

was

developed

us-

ing

the

experimental

data

collected

Mayerle

(1988)

(in

both

circular

and

rectangular

channels),

May

(1993)

Ghani

(1993)

and

Vongvisessomjai

al.

(2010)

result,

this

model

shows

acceptable

performance

for

these

datasets

(0.49

0.87,

0.81

RMSE

1.15

and

13.63%

MAP

28.08%)

but

poor

performance

for

the

remaining

datasets

(

0.00,

1.48

RMSE

2.88

and

29.14%

MAP

51.28%).

Based

the

above,

and

compared

the

model,

limited

extrapolation

capabilities

are

identiﬁed

for

the

MARS

model.

•

May

al.

(1996)

the

best

regression-based

equation

re-

ported

the

literature

(

Ackers

al.,

2001

;

Ebtehaj

al.,

2014

was

developed

using

several

experimental

datasets.

This

the

equation

proposed

the

Construction

Industry

Re-

and

Information

Association

(CIRIA)

for

designing

self-

cleansing

sewer

pipes

transporting

coarser

granular

material

bedload

(

Ackers

al.,

2001

This

model

shows

good

perfor-

mance

for

pipe

diameters

less

than

500

(0.83

0.99,

0.13

RMSE

0.82

and

2.38%

MAP

11.61%).

con-

trast,

limited

extrapolation

for

large

sewer

pipes

identiﬁed

the

low

performance

indices

values

obtained

(

0.00,

RMSE

4.88

and

MAP

48.97%).

This

equation

shows

better

performance

than

the

model

when

compared

data

from

Vongvisessomjai

al.

(2010)

but

lower

accuracy

when

applied

the

rest

the

datasets.

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Table

Accuracy

of self-cleansing models for without deposited bed criterion using performance indices for training and testing dataset. Bolded values show best performance

model.

Dataset

Performance

Index

Model

EPR-MOGA

GEP

MARS

May

et al. (1996) a Safari and Aksoy (2020)

ANFIS-PSO

ELM

Training

0.98

0.90

0.75

0.00

0.27

0.74

0.51

∗

0.30

∗

RMSE

0.33

0.76

1.22

2.55

2.17

1.25

1.69

∗

1.95

∗

MAPE

(%)

4.88

11.54

23.52

34.16

17.49

17.21

19.32

∗

29.76

∗

Testing

0.91

0.86

0.69

0.00

0.09

0.74

0.40

∗

0.32

∗

RMSE

0.73

0.88

1.33

2.55

2.27

1.21

1.84

∗

1.92

∗

MAPE

(%)

11.09

12.35

26.43

36.57

19.15

17.24

20.95

∗

29.82

∗

Mayerle

(1988)

circular

0.96

0.89

0.87

0.75

0.80

∗

0.42

RMSE

0.45

0.75

0.81

0.82

1.12

1.00

∗

1.71

MAPE

(%)

5.62

8.90

14.77

14.03

11.49

14.91

17.92

∗

26.75

Mayerle

(1988)

rectangular

0.93

0.38

0.30

0.81

–

0.87

0.00

0.47

RMSE

0.49

1.44

1.54

0.81

–

0.66

2.74

1.33

MAPE

(%)

8.49

28.97

33.00

15.51

–

13.14

45.28

20.75

Ghani (1993)

0.97

0.96

0.83

0.72

0.90

0.81

0.88

0.38

RMSE

0.36

0.43

0.89

1.15

0.67

0.94

0.74

1.69

MAPE

(%)

5.94

9.35

22.33

28.08

10.32

15.60

10.34

23.96

Ota

(1999)

0.97

0.98

0.44

0.00

0.96

0.97

0.55

RMSE

0.24

0.20

1.00

1.48

0.27

0.25

0.22

0.90

MAPE

(%)

5.55

6.90

37.92

51.28

7.78

7.90

6.46

19.54

Vongvisessomjai

et al. (2010)

0.88

0.95

0.79

0.49

0.99

0.71

0.97

0.00

RMSE

0.49

0.33

0.66

1.03

0.13

0.78

0.24

1.59

MAPE

(%)

6.56

5.78

11.45

13.63

2.38

13.34

3.62

28.50

Montes

et al. (2020a)

0.96

0.98

0.00

0.83

0.67

0.77

∗

0.00

RMSE

0.31

0.25

1.64

2.37

0.67

0.94

0.75

∗

1.85

MAPE

(%)

4.36

4.94

28.15

49.73

11.61

15.39

12.39

∗

33.96

Montes

et al. (2020b)

0.94

0.86

0.76

0.00

∗

0.00

0.34

0.00

∗

0.00

∗

RMSE

0.70

1.03

1.37

2.88

∗

4.88

2.26

3.01

∗

3.10

∗

MAPE

(%)

7.33

11.31

14.35

29.14

∗

48.97

23.44

30.56

∗

39.30

∗

Model not valid for non-circular channels.

∗

Outliers

removed.

Table

Accuracy

of self-cleansing models for deposited bed criterion using performance in-

dices

for training and testing dataset. Bolded values show best performance model.

Dataset

Performance

Index

Model

PSO

LASSO

MGP

Training

0.98

0.75

0.82

0.51

∗

RMSE

0.32

1.30

1.13

1.69

∗

MAPE

(%)

4.70

14.36

13.07

28.78

∗

Testing

0.91

0.70

0.83

0.29

∗

RMSE

0.80

1.47

1.10

2.19

∗

MAPE

(%)

12.10

15.94

12.59

31.36

∗

El-Zaemey

(1991)

0.94

0.78

0.83

0.54

RMSE

0.38

0.76

0.66

1.08

MAPE

(%)

6.49

14.28

11.97

30.19

Perrusquía

(1991)

0.84

0.65

0.62

0.00

RMSE

0.33

0.49

0.50

1.29

MAPE

(%)

7.07

10.15

12.05

30.58

Ghani (1993)

0.91

0.56

0.74

0.51

RMSE

0.60

1.32

1.01

1.40

MAPE

(%)

6.13

16.26

11.19

13.07

May

(1993)

0.90

0.63

0.64

0.54

RMSE

0.62

1.18

1.16

1.31

MAPE

(%)

6.50

13.47

14.26

14.21

Montes

et al. (2020a)

0.93

0.00

0.73

0.00

∗

RMSE

0.81

3.06

1.56

5.54

∗

MAPE

(%)

6.84

21.05

10.36

58.79

∗

Outliers

removed.

•

Safari

and

Aksoy

(2020)

model

competitive

equation

for

predicting

the

self-cleansing

velocity

both

circular

and

non-

circular

channels.

This

model

shows

similar

but

inferior

per-

formance

EPR-MOGA

model

small

sewer

pipes

(0.67

0.97,

0.25

RMSE

1.12

and

7.90%

MAP

15.60%),

but

large

sewers

the

accuracy

quickly

lost

(

0.34,

RMSE

2.26

and

MAP

23.46%).

contrast,

this

model

outperforms

the

results,

compared

other

regression

models

(EPR-MOGA,

GEP

and

MARS)

and

ML/AI

models

(ANFIS-PSO

and

ELM),

non-circular

channels

(

0.87,

RMSE

0.66

and

MAP

13.41%),

which

competitive

performance

compared

the

model

(

0.89,

RMSE

0.61

and

MAP

10.05%).

This

because

the

inclusion

the

relation

explana-

tory

variable

for

predicting

the

particle

Froude

number.

This

model

competitive

and

shows

good

generalisation

the

problem

for

designing

sewers

under

the

non-deposition

with-

out

deposited

bed

criterion.

•

According

Ebtehaj

al.

(2019)

ANFIS-PSO

model

was

developed

using

the

experimental

data

collected

Ghani

(1993)

Ota

(1999)

and

Vongvisessomjai

al.

(2010)

result,

this

model

shows

good

performance

for

these

datasets

(0.88

0.97,

0.22

RMSE

0.74

and

3.62%

MAP

10.34%).

large

sewers

and

non-circular

channels,

the

model

losses

accuracy

(

0.00,

2.74

RMSE

3.01

and

30.56%

MAP

45.28%).

This

model

produces

some

extreme

values

when

the

particle

Froude

number

calculated,

espe-

cially

the

Montes

al.

(2020b)

dataset.

The

model

gen-

erates

better

results

compared

this

model.

•

ELM

was

trained

with

the

same

dataset

used

for

the

ANFIS-PSO

model.

Not

satisfactory

results

are

obtained

when

this

model

applied

the

dataset

considered

this

study

(0.00

0.55,

0.90

RMSE

3.1

and

19.54%

MAP

39.30%).

Same

comments,

mentioned

above

for

the

ANFIS-PSO

model,

can

shown

here.

According

the

results

shown

Table

(deposited

bed

crite-

rion),

model

outperforms

the

other

models

for

the

entire

con-

sidered

dataset.

This

model

shows

good

accuracy

levels

(0.84

0.98,

0.32

RMSE

0.81

and

4.70%

MAP

12.10%)

for

all

the

range

variation

the

hydraulics

and

sediment

characteris-

tics.

Comments

the

other

models

studied

are

follows:

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Fig.

5. Performance of the models applied in the non-deposition without deposited bed testing dataset.

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Fig.

6. Performance of the models applied in the non-deposition with deposited bed testing dataset. .

•

PSO

model

was

developed

using

the

experimental

data

col-

lected

El-Zaemey

(1991)

Perrusquía

(1991)

May

(1993)

and

Ghani

(1993)

result,

this

model

shows

good

perfor-

mance

for

these

datasets

(0.56

0.78,

0.49

RMSE

1.32

and

10.15%

MAP

16.26%).

However,

when

the

model

compared

the

data

collected

the

large

sewer

pipe,

the

accuracy

quickly

decreases

(

0.00,

RMSE

3.06

and

MAP

21.05%).

•

LASSO

model

reports

good

accuracy

levels

for

all

the

datasets

considered

(0.62

0.83,

0.50

RMSE

1.56

and

10.36%

MAP

14.26%).

However,

the

accuracy

still

inferior

com-

pared

the

model.

This

model

shows

good

extrapolation

capabilities

and

generalisation

the

problem.

•

MGP

was

developed

using

the

same

experimental

datasets

the

PSO

model.

This

model

shows

less

accuracy

compared

the

PSO

model

(0.00

0.54,

1.08

RMSE

5.54

and

13.07%

MAP

58.79%).

large

sewer

pipes,

the

model

shows

poor

performance.

contrast

other

models,

the

MGP

was

developed

using

normalised

values.

Based

this,

the

range

variation

used

for

training

the

model

can

potentially

affect

the

ﬁnal

form/structure

the

ﬁnal

expression

shown

the

MGP.

accuracy

shown

the

Montes

al.

(2020b)

data

es-

pecially

important

due

the

relative

sediment

thickness

(

)

used

laboratory

scale

that

study.

Table

shows,

the

sedi-

ment

thickness

used

laboratory

scale

ranging

from

0.8

(for

Montes

al.

(2020b)

data)

129.6

(for

May

(1993)

data),

i.e.

the

variation

from

1.1%

20.0%

the

pipe

diame-

ter.

Values

20%

unrealistic

consideration

since

the

optimal

sediment

thickness

design

has

been

deﬁned

the

pipe

diameter

(

May

al.,

1989

;

Safari

and

Shirzad,

2019

Data

collected

Montes

al.

(2020b)

seem

the

closer

repre-

sentation

the

real

conditions

found

sewer

systems.

Based

Fig.

7. Variable importance estimated by RF model: A) without deposited bed; B)

with

deposited bed.

this,

the

model

that

best

predicts

the

self-cleansing

velocity

for

data

real

conditions.

4.1.

Variable

importance

model

input

variable

importance

presented

Fig.

shown

this

ﬁgure,

for

both

non-deposition

criteria

the

most

important

variable

the

volumetric

sediment

concentration,

fol-

lowed

the

dimensionless

grain

size

and

the

relative

grain

size.

This

result

consistent

with

ﬁndings

reported

the

lit-

erature

(

Ackers

al.,

2001

;

Ebtehaj

al.,

2020

Less

important

parameters

for

predicting

the

particle

Froude

number

and

thus

the

self-cleansing

velocity,

are

the

relative

sediment

thickness

and

the

channel

friction

factor,

for

the

deposited

bed

criterion.

Parameter

importance

shown

EPR-MOGA,

Safari

and

Ak-

soy

(2020)

PSO

and

LASSO

quite

different.

these

tech-

niques,

the

most

important

parameter

the

relative

grain

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

size

due

the

highest

values

the

regression

coeﬃcients

(

)

−c

; 0

305

)

shown

Eqs.

(5)

(9)

(11)

and

(12)

The

parameter

importance

for

the

GEP,

MARS

and

MGP

model

less

intuitive

because

the

form

the

equations,

shown

Eqs.

(6)

(7)

and

(13)

which

include

logarithmic

and

inverse

tangent

functions

for

calculating

the

particle

Froude

number.

Less

comparable

are

the

results

shown

ANFIS-PSO

and

ELM

since

practical

equation

provided.

Based

the

above

results

shown

Fig.

good

estimate

the

volumetric

sediment

concentration

seems

essential

for

increasing

the

accuracy

the

calculation

the

particle

Froude

number

and

consequently

the

minimum

self-cleansing

velocity

for

both

non-deposition

criteria.

addition,

hydraulic

characteristics

the

pipe

(deﬁned

the

hydraulic

radius)

and

the

sediment

characteristics

(i.e.

particle

diameter

and

speciﬁc

gravity)

are

pro-

portionally

important

for

model

performance.

Discussion

The

prediction

self-cleansing

conditions

sewers

remains

challenge

despite

multiple

models

and

equations

developed

and

reported

the

literature.

Existing

regression-based

equations

and

AI/ML

models

show

limited

generalisation

capabilities

and

overﬁt-

ting

problems.

this

paper,

new

approach

for

addressing

these

issues

proposed

using

the

Random

Forest

method.

Due

the

nature

the

method,

where

the

model

variance

reduced

averaging

the

results

from

ensemble

decision

trees,

the

risk

overﬁtting

low.

using

reduced

number

input

features

for

constructing

each

decision

tree

the

forest,

the

correlation

between

base

trees

avoided.

This

improvement

the

method

compared

single

decision

tree,

which

can

overtrained

(i.e.

the

tree

learns

the

noise

from

the

training

data)

and

thus

shows

poor

performance

the

testing

dataset.

model

showed

good

generalisation

capabilities

when

the

whole

dataset

divided

into

75%

for

the

training

stage

and

25%

for

the

testing

stage.

For

this

percentage

split

data,

the

testing

error

presented

low

variance.

contrast,

increasing

the

num-

ber

data

used

the

training

stage

(e.g.

95%

the

whole

data)

the

testing

error

showed

high

variance,

which

indicator

over-trained

model

with

limited

extrapolation

capabilities

(as

shown

Fig.

b).

Therefore,

choosing

the

right

percentage

split

critical

avoid

model

overﬁtting.

Variable

importance

analysis

showed

that

the

volumetric

sedi-

ment

concentration

the

most

relevant

feature

for

predicting

the

self-cleansing

velocity

practice

for

both

non-deposition

criteria,

followed

the

dimensionless

grain

size.

The

self-cleansing

predic-

tion

conditioned

the

channel

material,

the

low

variable

importance

shown

the

channel

friction

factor.

results

are

compared

existing

models

reported

the

lit-

erature

and

showed

better

performance

for

the

whole

dataset

for

both

non-deposition

without

and

with

deposited

bed

criteria.

This

explained

several

factors,

such

as:

•

able

better

capture

the

non-linearity

the

data

com-

pared

linear

regression

models

(i.e.

regression-based

models

proposed

May

al.

(1996)

and

Safari

and

Aksory

(2020)

The

model

also

better

captures

complex

interactions

be-

tween

features.

This

because

model’s

ability

capture

effectively

non-linear

patterns

data.

•

showed

good

bias-variance

trade-off (i.e.

low

bias

and

low

variance)

for

both

non-deposition

criteria.

contrast,

exist-

ing

non-regression

models

reported

the

literature

(i.e.

MARS,

ANFIS-PSO

and

ELM),

and

compared

the

model

this

paper,

some

cases

presented

low

bias

and

high

variance

(i.e.

overﬁtting)

for

the

non-deposition

without

deposited

bed

crite-

rion,

shown

Fig.

For

the

non-deposition

with

deposited

bed

criterion,

the

existing

models

(i.e.

PSO,

LASSO

and

MGP)

showed

high

bias,

since

these

models

systematically

underes-

timate

the

particle

Froude

number

the

testing

dataset

(as

shown

Fig.

•

The

range

variation

used

for

training

and

testing

the

model

much

larger

than

the

dataset

used

the

literature

for

developing

the

existing

predictive

models.

For

example,

the

ANFIS-PSO

and

ELM

were

trained

and

testing

with

the

Ghani

(1993)

Ota

(1999)

and

Vongvisessomjai

al.

(2010)

data

(i.e.

290

data

approx.).

Given

this,

the

model

developed

here

able

predict

the

particle

Froude

number

for

larger

range

variation

the

input

conditions.

example

this

shown

Fig.

where

the

existing

models

reported

for

the

non-deposition

with

deposited

bed

criterion

underestimate

the

particle

Froude

number

for

values

above

9.0

(

∗

9.0).

Despite

the

presented

this

study

outperforms

the

existing

models

reported

the

literature,

further

tests

with

data

collected

real

sewers

should

conducted.

The

cohesive

effects

the

de-

posited

material

must

included

for

future

developments.

Finally,

further

evaluation

the

performance

the

model

trapezoidal,

ovoid,

U-shape

channels

should

carried

out

check

the

ap-

plicability

the

model

under

these

channel

characteristics.

Conclusions

Random

Forest

based

model

was

developed

for

predicting

the

self-cleansing

velocity

under

the

concept

non-deposition.

This

model

was

implemented

using

the

experimental

benchmark

data

reported

the

literature.

The

model

was

compared

the

fol-

lowing

ten

literature

models:

EPR-MOGA,

MARS,

MGP,

ANFIS-PSO,

ELM,

LASSO,

GEP

and

PSO,

and

two

regression-based

equations

proposed

May

al.

(1996)

and

Safari

and

Aksoy

(2020)

The

following

conclusions

are

made

based

the

results

ob-

tained:

(1)

Random

Forest

model

able

predict

the

particle

Froude

number

(i.e.

minimum

self-cleansing

velocity)

for

the

non-

deposition

self-cleansing

design

criteria

with

high

accuracy

validation

(i.e.

unseen)

data.

This

due

the

ability

better

generalise

the

analysed

data,

i.e.

the

ability

avoid

model

overﬁtting.

(2)

model

prediction

accuracy

consistently

superior

ten

other

literature

models

considered

here.

This

likely

due

the

reason

mentioned

above

but

also

the

capability

bet-

ter

capture

the

complex

interactions

between

input

variables

when

compared

other

models

considered

this

paper.

This

especially

relevant

for

the

non-deposition

with

de-

posited

bed

case

where

the

accuracy

model

predictions

substantially

higher

than

other

models

(i.e.

LASSO,

MGP

and

PSO

models).

(3)

The

volumetric

sediment

concentration

the

most

impor-

tant

input

variable

for

predicting

the

self-cleansing

veloc-

ity

sewer

pipes.

good

characterisation

this

parame-

ter

seems

essential

for

improving

the

design

new

self-cleansing

sewers.

Based

the

above,

can

used

for

predicting

self-cleansing

velocity

with

high

accuracy,

especially

for

large

sewer

pipes

with

the

presence

deposited

bed.

This

technique

can

used

for

de-

signing

self-cleansing

sewer

systems.

Further

testing

the

and

other

self-cleansing

models

real

sewer

systems

required

further

validate

these

models

those

circumstances

and

ensure

their

applicability

engineering

prac-

tice.

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html

Montes, Z. Kapelan and J. Saldarriaga

Water

Research 189 (2021) 116639

Declaration

Competing

Interest

The

authors

declare

that

they

have

known

competing

ﬁnan-

cial

interests

personal

relationships

that

could

have

appeared

inﬂuence

the

work

reported

this

paper.

Funding

This

research

did

not

receive

any

speciﬁc

grant

from

funding

agencies

the

public,

commercial,

not-for-proﬁt

sectors.

Supplementary

materials

Supplementary

material

associated

with

this

article

can

found,

the

online

version,

doi:

10.1016/j.watres.2020.116639

References

Ghani, A. , 1993. Sediment transport in sewers. PhD thesis. University of Newcas-

tle

Upon Tyne, Newcastle Upon Tyne, UK .

Ackers,

J., Butler, D., Leggett, D., May, R., 2001. Designing sewers to control sediment

problems.

In: Urban Drainage Modeling. ASCE, Orlando, FL, pp. 818–823. doi:

10.

1061/40583(275)77

Breiman,

L., 2001. Random forests. Mach. Learn. 45, 5–32. doi:

10.1023/A:

1010933404324

Ebtehaj,

I., Bonakdari, H., 2016a. Bed load sediment transport in sewers at limit of

deposition.

Sci. Iran. 23 (3), 907–917. doi:

10.24200/sci.2016.2169

Ebtehaj,

I., Bonakdari, H., 2016b. A support vector regression-ﬁreﬂy algorithm-based

model

for limiting velocity prediction in sewer pipes. Water Sci. Technol. 73 (9),

2244–2250.

doi:

10.2166/wst.2016.064

Ebtehaj,

I., Bonakdari, H., 2013. Evaluation of sediment transport in sewer us-

ing

artiﬁcial neural network. Eng. Appl. Comput. Fluid Mech. 7 (3), 382–392.

doi:

10.1080/19942060.2013.11015479

Ebtehaj,

I., Bonakdari, H., Es-Haghi, M., 2019. Design of a hybrid ANFIS–PSO model

estimate sediment transport in open channels. Iran. J. Sci. Technol. Trans. 44

(4),

851–857. doi:

10.1007/s40996-

018- 0218- 9

Ebtehaj,

I., Bonakdari, H., Safari, M., Gharabaghi, B., Zaji, A., Riahi Madavar, H., Sheikh

Khozani,

Z., Es-haghi, M., Shishegaran, A., Danandeh Mehr, A., 2020. Combina-

tion

of sensitivity and uncertainty analyses for sediment transport modeling in

sewer

pipes. Int. J. Sediment Res. 35 (2), 157–170. doi:

10.1016/j.ijsrc.2019.08.005

Ebtehaj,

I., Bonakdari, H., Shariﬁ, A., 2014. Design criteria for sediment transport in

sewers

based on self-cleansing concept. J. Zhejiang Univ. Sci. A 15 (11), 914–924.

doi:

10.1631/jzus.a1300135

El-Zaemey,

A. , 1991. Sediment transport over deposited beds in sewers. PhD thesis.

University

of Newcastle Upon Tyne, Newcastle Upon Tyne, UK .

Hastie,

T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning:

data

mining, inference, and prediction. Springer, New York, USA doi:

10.1007/

978-

0- 387- 84858- 7

Kargar,

K., Safari, M., Mohammadi, M., Samadianfard, S., 2019. Sediment transport

modeling

in open channels using neuro-fuzzy and gene expression program-

ming

techniques. Water Sci. Technol. 79 (12), 2318–2327. doi:

10.2166/wst.2019.

229

Liaw,

A. , Wiener, M. , 2002. Classiﬁcation and regression by Random forest. R News

(3), 18–22 .

May,

R. , 1993. Sediment transport in pipes and sewers with deposited beds. HR

Wallingford,

Oxfordshire, UK Report SR 320 .

May,

R., Ackers, J., Butler, D., John, S., 1996. Development of design methodol-

ogy

for self-cleansing sewers. Water Sci. Technol. 33 (9), 195–205. doi:

10.1016/

0273-

1223(96)00387- 3

May,

R. , Brown, P. , Hare, G. , Jones, K. , 1989. Self-cleansing conditions for sewers

carrying

sediment. HR Wallingford, Oxfordshire, UK Report SR 221 .

Mayerle,

R. , 1988. Sediment transport in rigid boundary channels. PhD thesis. Uni-

versity

of Newcastle upon Tyne, Newcastle Upon Tyne, UK .

Merritt,

L. , Enﬁnger, K. , 2019. Tractive force: a key to solids transport in gravity ﬂow

drainage

pipes. In: Pipelines 2019. ASCE, Nashville, TN, pp. 349–358 .

Montes,

C., Berardi, L., Kapelan, Z., Saldarriaga, J., 2020a. Predicting bedload sed-

iment

transport of non-cohesive material in sewer pipes using evolutionary

polynomial

regression – multi-objective genetic algorithm strategy. Urban Wa-

ter

J. 17 (2), 154–162. doi:

10.1080/1573062X.2020.1748210

Montes,

C., Kapelan, Z., Saldarriaga, J., 2019. Impact of self-cleansing criteria choice

the optimal design of sewer networks in South America. Water (Basel) 11,

1148.

doi:

10.3390/w11061148

Montes,

C., Vanegas, S., Kapelan, Z., Berardi, L., Saldarriaga, J., 2020b. Non-deposition

self-cleansing

models for large sewer pipes. Water Sci. Technol. 81 (3), 606–621.

doi:

10.2166/wst.2020.154

Nalluri,

C., Ab Ghani, A., 1996. Design options for self-cleansing storm sewers. Water

Sci.

Technol. 33 (9), 215–220. doi:

10.1016/0273-

1223(96)00389- 7

Ota,

J. , 1999. Effect of particle size and gradation on sediment transport in storm
sewers.

PhD thesis. University of Newcastle upon Tyne, Newcastle Upon Tyne,

Perrusquía,

G. , 1991. Bedload Transport in Storm Sewers: Stream Traction in Pipe

Channels

PhD thesis. Chalmers University of Technology, Gothenburg, Sweden .

Roushangar,

K., Ghasempour, R., 2017. Estimation of bedload discharge in sewer

pipes

with different boundary conditions using an evolutionary algorithm. Int.

Sediment Res. 32 (4), 564–574. doi:

10.1016/j.ijsrc.2017.05.007

Safari,

M., 2019. Decision tree (DT), generalized regression neural network (GR) and

multivariate

adaptive regression splines (MARS) models for sediment transport

sewer pipes. Water Sci. Technol. 79 (6), 1113–1122. doi:

10.2166/wst.2019.106

Safari,

M., Danandeh Mehr, A., 2018. Multigene genetic programming for sediment

transport

modeling in sewers for conditions of non-deposition with a bed de-

posit.

Int. J. Sediment Res. 33 (3), 262–270. doi:

10.1016/j.ijsrc.2018.04.007

Safari,

M., Mohammadi, M., Ab Ghani, A., 2018. Experimental studies of self-

cleansing

drainage system design: a review. J. Pipeline Syst. Eng. Pract. 9 (4),

04018017.

doi:

10.1061/(ASCE)PS.1949-1204.0

0 0 0335

Safari,

M., Shirzad, A., 2019. Self-cleansing design of sewers: deﬁnition of the opti-

mum

deposited bed thickness. Water Environ. Res. 91 (5), 407–416. doi:

10.1002/

wer.1037

Safari,

M., Shirzad, A., Mohammadi, M., 2017. Sediment transport modeling in de-

posited

bed sewers: uniﬁed form of May’s equations using the particle swarm

optimization

algorithm. Water Sci. Technol. 76 (4), 992–10 0 0. doi:

10.2166/wst.

2017.267

Safari,

M., 2020. Hybridization of multivariate adaptive regression splines and ran-

dom

forest models with an empirical equation for sediment deposition predic-

tion

in open channel ﬂow. J. Hydrol. 590 (November 2020), 125392. doi:

10.1016/

j.jhydrol.2020.125392

Safari,

M., Aksoy, H., 2020. Experimental analysis for self-cleansing open channel

design.

J. Hydraul. Res. 1–12. doi:

10.1080/00221686.2020.1780501

Tyralis,

H., Papacharalampous, G., Langousis, A., 2019. A brief review of random

forests

for water scientists and practitioners and their recent history in water

resources.

Water (Basel) 11 (5), 910. doi:

10.3390/w11050910

Vongvisessomjai,

N., Tingsanchali, T., Babel, M., 2010. Non-deposition design cri-

teria

for sewers with part-full ﬂow. Urban Water J. 7 (1), 61–77. doi:

10.1080/

15730620903242824

Zendehboudi,

S., Rezaei, N., Lohi, A., 2018. Applications of hybrid models in chem-

ical,

petroleum, and energy systems: a systematic review. Appl. Energy 228

(2018),

2539–2566. doi:

10.1016/j.apenergy.2018.06.051

Predicting non-deposition sediment transport in sewer pipes using Random Forest.

¿Quiere saber más? Contáctenos

CÓMO PROTEGER TU PRIVACIDAD

ASUME EL CONTROL DE TU PRIVACIDAD