TRABAJO FINAL - Matemática Computacional (3).pdf

TRABAJO FINAL - Matemática Computacional
December 3, 2022
**
TRABAJO FINAL
Matemática Computacional
**
INTEGRANTES: * Ing. Roberto Perez Ferrel * Ing. Vidal Matias Marca * Ing. Freddy Marcos
Guzman Zurita
1. Análisis Exploratorio De Datos * (a) Considere el dataset de diagnostico de cancer de mama
de la universidad de Winconsin. cancer.data * (b) Lea el archivo readme.names e interprete los
datos obtenidos en el anterior archivo y desccriba el dataser en términos cualitativos * (c) Ordene el
Dataset mediante tres clases ‘mean’, ‘standard error’ y ‘worst’, luego cambie el nombre a todas las
columnas a español. Ayuda utilice el código: pd.MultiIndex.fromproduct * (d) Calcule la media,
varianza, simetría y curtosis para cada atributo usando (i) todos los datos y (ii) por separado para
tumores benignos y malignos. Explique el significado de los valores obtenidos. ¿Cuales atributos se
acomodan a una normal? * (e) Visualice la distribución empírica de cada atributo con matplotlib
o seaborn usando (i) todos los datos y (ii) por separado para tumores benignos y malignos. ¿Qué
tipo de distribución sería apropiada para ajustar cada atributo? ¿Puede diferenciar el tipo de
tumor usando los atributos de forma aislada? * (f) Construya una nube de puntos (scatter) usando
matplotlib o pandas o seaborn entre todos los pares de atributos y también la matriz de correlación
con mapa de colores. Use un color distinto para los tumores malignos y los benignos. (i) Use un
modelo de recta (dos parámetros) y ajustelo a cada par de atributos distintos. Muestre la recta
obtenida en cada uno de los scatter plots. (ii) ¿Cuáles pares de variables están más correlacionadas?
Justifique ¿Cuáles pares de variables separan mejor los tipos de tumor? * (g) Realice uno de los
incisos (e) o (f) utilizando la interfaz gráfica de Python, es decir utilizando los Widgets para las
gráficas
Solución inciso (a):
[40]: # importamos las librerias correspondientes
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
[10]: # importamos el conjunto de datos sobre el cancer a python usando el codigo␣
↪correspondiente
1

cancer=pd.read_csv('cancer.data')
cancer
[10]: 842302 M 17.99 10.38 122.8 1001 0.1184 0.2776 0.3001
0 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.08690
1 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.19740
2 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.24140
3 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.19800
4 843786 M 12.45 15.70 82.57 477.1 0.12780 0.17000 0.15780
.. … .. … … … … … … …
563 926424 M 21.56 22.39 142.00 1479.0 0.11100 0.11590 0.24390
564 926682 M 20.13 28.25 131.20 1261.0 0.09780 0.10340 0.14400
565 926954 M 16.60 28.08 108.30 858.1 0.08455 0.10230 0.09251
566 927241 M 20.60 29.33 140.10 1265.0 0.11780 0.27700 0.35140
567 92751 B 7.76 24.54 47.92 181.0 0.05263 0.04362 0.00000
0.1471 … 25.38 17.33 184.6 2019 0.1622 0.6656 0.7119
0 0.07017 … 24.990 23.41 158.80 1956.0 0.12380 0.18660 0.2416
1 0.12790 … 23.570 25.53 152.50 1709.0 0.14440 0.42450 0.4504
2 0.10520 … 14.910 26.50 98.87 567.7 0.20980 0.86630 0.6869
3 0.10430 … 22.540 16.67 152.20 1575.0 0.13740 0.20500 0.4000
4 0.08089 … 15.470 23.75 103.40 741.6 0.17910 0.52490 0.5355
.. … … … … … … … … …
563 0.13890 … 25.450 26.40 166.10 2027.0 0.14100 0.21130 0.4107
564 0.09791 … 23.690 38.25 155.00 1731.0 0.11660 0.19220 0.3215
565 0.05302 … 18.980 34.12 126.70 1124.0 0.11390 0.30940 0.3403
566 0.15200 … 25.740 39.42 184.60 1821.0 0.16500 0.86810 0.9387
567 0.00000 … 9.456 30.37 59.16 268.6 0.08996 0.06444 0.0000
0.2654 0.4601 0.1189
0 0.1860 0.2750 0.08902
1 0.2430 0.3613 0.08758
2 0.2575 0.6638 0.17300
3 0.1625 0.2364 0.07678
4 0.1741 0.3985 0.12440
.. … … …
563 0.2216 0.2060 0.07115
564 0.1628 0.2572 0.06637
565 0.1418 0.2218 0.07820
566 0.2650 0.4087 0.12400
567 0.0000 0.2871 0.07039
[568 rows x 32 columns]
[11]: #Generar numero de filas y columnas
cancer.shape
2

[11]: (568, 32)
[12]: # importamos el conjunto de datos sobre el cancer a python usando el codigo␣
↪correspondiente
# se puso los nombres a las columnas, para poder renombrar una vez leido el␣
↪archivo readme.names
cancer1=pd.read_csv('cancer1.data')
cancer1
[12]: P1 P2 P3 P4 P5 P6 P7 P8 P9
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.30010
1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.08690
2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.19740
3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.24140
4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.19800
.. … .. … … … … … … …
564 926424 M 21.56 22.39 142.00 1479.0 0.11100 0.11590 0.24390
565 926682 M 20.13 28.25 131.20 1261.0 0.09780 0.10340 0.14400
566 926954 M 16.60 28.08 108.30 858.1 0.08455 0.10230 0.09251
567 927241 M 20.60 29.33 140.10 1265.0 0.11780 0.27700 0.35140
568 92751 B 7.76 24.54 47.92 181.0 0.05263 0.04362 0.00000
P10 … P23 P24 P25 P26 P27 P28 P29
0 0.14710 … 25.380 17.33 184.60 2019.0 0.16220 0.66560 0.7119
1 0.07017 … 24.990 23.41 158.80 1956.0 0.12380 0.18660 0.2416
2 0.12790 … 23.570 25.53 152.50 1709.0 0.14440 0.42450 0.4504
3 0.10520 … 14.910 26.50 98.87 567.7 0.20980 0.86630 0.6869
4 0.10430 … 22.540 16.67 152.20 1575.0 0.13740 0.20500 0.4000
.. … … … … … … … … …
564 0.13890 … 25.450 26.40 166.10 2027.0 0.14100 0.21130 0.4107
565 0.09791 … 23.690 38.25 155.00 1731.0 0.11660 0.19220 0.3215
566 0.05302 … 18.980 34.12 126.70 1124.0 0.11390 0.30940 0.3403
567 0.15200 … 25.740 39.42 184.60 1821.0 0.16500 0.86810 0.9387
568 0.00000 … 9.456 30.37 59.16 268.6 0.08996 0.06444 0.0000
P30 P31 P32
0 0.2654 0.4601 0.11890
1 0.1860 0.2750 0.08902
2 0.2430 0.3613 0.08758
3 0.2575 0.6638 0.17300
4 0.1625 0.2364 0.07678
.. … … …
564 0.2216 0.2060 0.07115
565 0.1628 0.2572 0.06637
566 0.1418 0.2218 0.07820
567 0.2650 0.4087 0.12400
568 0.0000 0.2871 0.07039
3

[13]: #Generar numero de filas y columnas
cancer1.shape
[13]: (569, 32)
[ ]:
Solución inciso (b):
Interpretación De Los Datos Del Archivo Del Inciso (a): * El cáncer de mama puede dar
un patrón característico en cada individuo que al ser convertido en un Dataset permite detectarlo
y según el cambio y similitud de los datos diagnosticarlo. * En el archivo anterior se presentan los
datos y diagnostico de pacientes con cancer de mama, con un campos de prediccion de Maligno (M)
o Benigno (B). * Se presentan diagnosticos para obervar el posible padecimiento de cancer de mama
teniendo como base los historiales clinicos de pacientes con afecciones Malignas o Benignas. * En
el conjunto de datos se calculan diez características de valor real para cada núcleo celular como ser:
radio, textura, perímetro, área, suavidad, compacidad, concavidad, puntos cóncavos,
simetría y dimensión fractal. Con dichas caracteristicas se puede obtener una característica o
conjunto de caracterisiticas para decidir si dicho paciente tiene o no cancer de mama. * De cada
característica se puede decir que también tienen subcaracteristicas, como ser en el caso del radio
(media de las distancias del centro a los puntos del perímetro), en el conjunto de datos se tiene:
* Radio medio * Radio sujeto a un error estandar * Peor radio * Las características se calculan a
partir de una imagen digitalizada de una aspiración con aguja fina (FNA) de una masa mamaria.
Describen características de los núcleos celulares presentes en la imagen.
• Nombres a las columnas: Segun lo leido en el archivo readme.names, se pudo obtener
la siguiente información.
• P1=Id_number
• P2=diagnosis
• P3=radius_mean
• P4=texture_mean
• P5=perimeter_mean
• P6=area_mean
• P7=smoothness_mean
• P8=compactness_mean
• P9=concavity _mean
• P10=concave_points_mean
• P11=symmetry_mean
• P12=fractal_dimension_mean
• P13=radius_se
• P14=texture_se
• P15=perimeter_se
• P16=area_se
• P17=smoothness_se
• P18=compactness_se
• P19=concavity_se
4

• P20=concave_points_se
• P21=symmetry_se
• P22=fractal_dimension_se
• P23=radius_worst
• P24=texture_worst
• P25=perimeter_worst
• P26=area_worst
• P27=smoothness_worst
• P28=compactness_worst
• P29=concavity_worst
• P30=concave_points_worst
• P31=symmetry_worst
• P32=fractal_dimension_worst
Solución inciso (c):
[15]: # ponemos nombres a las columnas
cancer1.
↪columns=['Id_number','diagnosis','radius_mean','texture_mean','perimeter_mean','area_mean','
↪_mean','concave_points_mean','symmetry_mean','fractal_dimension_mean','radius_se','texture_s
cancer1
[15]: Id_number diagnosis radius_mean texture_mean perimeter_mean
0 842302 M 17.99 10.38 122.80
1 842517 M 20.57 17.77 132.90
2 84300903 M 19.69 21.25 130.00
3 84348301 M 11.42 20.38 77.58
4 84358402 M 20.29 14.34 135.10
.. … … … … …
564 926424 M 21.56 22.39 142.00
565 926682 M 20.13 28.25 131.20
566 926954 M 16.60 28.08 108.30
567 927241 M 20.60 29.33 140.10
568 92751 B 7.76 24.54 47.92
area_mean smoothness_mean compactness_mean concavity _mean
0 1001.0 0.11840 0.27760 0.30010
1 1326.0 0.08474 0.07864 0.08690
2 1203.0 0.10960 0.15990 0.19740
3 386.1 0.14250 0.28390 0.24140
4 1297.0 0.10030 0.13280 0.19800
.. … … … …
564 1479.0 0.11100 0.11590 0.24390
565 1261.0 0.09780 0.10340 0.14400
566 858.1 0.08455 0.10230 0.09251
567 1265.0 0.11780 0.27700 0.35140
568 181.0 0.05263 0.04362 0.00000
5

concave_points_mean … radius_worst texture_worst perimeter_worst
0 0.14710 … 25.380 17.33 184.60
1 0.07017 … 24.990 23.41 158.80
2 0.12790 … 23.570 25.53 152.50
3 0.10520 … 14.910 26.50 98.87
4 0.10430 … 22.540 16.67 152.20
.. … … … … …
564 0.13890 … 25.450 26.40 166.10
565 0.09791 … 23.690 38.25 155.00
566 0.05302 … 18.980 34.12 126.70
567 0.15200 … 25.740 39.42 184.60
568 0.00000 … 9.456 30.37 59.16
area_worst smoothness_worst compactness_worst concavity_worst
0 2019.0 0.16220 0.66560 0.7119
1 1956.0 0.12380 0.18660 0.2416
2 1709.0 0.14440 0.42450 0.4504
3 567.7 0.20980 0.86630 0.6869
4 1575.0 0.13740 0.20500 0.4000
.. … … … …
564 2027.0 0.14100 0.21130 0.4107
565 1731.0 0.11660 0.19220 0.3215
566 1124.0 0.11390 0.30940 0.3403
567 1821.0 0.16500 0.86810 0.9387
568 268.6 0.08996 0.06444 0.0000
concave_points_worst symmetry_worst fractal_dimension_worst
0 0.2654 0.4601 0.11890
1 0.1860 0.2750 0.08902
2 0.2430 0.3613 0.08758
3 0.2575 0.6638 0.17300
4 0.1625 0.2364 0.07678
.. … … …
564 0.2216 0.2060 0.07115
565 0.1628 0.2572 0.06637
566 0.1418 0.2218 0.07820
567 0.2650 0.4087 0.12400
568 0.0000 0.2871 0.07039
[16]: # informacion del conjunto de datos
cancer1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
# Column Non-Null Count Dtype
6

--- ------ -------------- -----
0 Id_number 569 non-null int64
1 diagnosis 569 non-null object
2 radius_mean 569 non-null float64
3 texture_mean 569 non-null float64
4 perimeter_mean 569 non-null float64
5 area_mean 569 non-null float64
6 smoothness_mean 569 non-null float64
7 compactness_mean 569 non-null float64
8 concavity _mean 569 non-null float64
9 concave_points_mean 569 non-null float64
10 symmetry_mean 569 non-null float64
11 fractal_dimension_mean 569 non-null float64
12 radius_se 569 non-null float64
13 texture_se 569 non-null float64
14 perimeter_se 569 non-null float64
15 area_se 569 non-null float64
16 smoothness_se 569 non-null float64
17 compactness_se 569 non-null float64
18 concavity_se 569 non-null float64
19 concave_points_se 569 non-null float64
20 symmetry_se 569 non-null float64
21 fractal_dimension_se 569 non-null float64
22 radius_worst 569 non-null float64
23 texture_worst 569 non-null float64
24 perimeter_worst 569 non-null float64
25 area_worst 569 non-null float64
26 smoothness_worst 569 non-null float64
27 compactness_worst 569 non-null float64
28 concavity_worst 569 non-null float64
29 concave_points_worst 569 non-null float64
30 symmetry_worst 569 non-null float64
31 fractal_dimension_worst 569 non-null float64
dtypes: float64(30), int64(1), object(1)
memory usage: 142.4+ KB
separamos los que tinenen mean
[17]: # separamos los que tinenen mean
cancer_mean = cancer1.filter(like='mean')
cancer_mean
[17]: radius_mean texture_mean perimeter_mean area_mean smoothness_mean
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
7

.. … … … … …
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263
compactness_mean concavity _mean concave_points_mean symmetry_mean
0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587
fractal_dimension_mean
0 0.07871
1 0.05667
2 0.05999
3 0.09744
4 0.05883
.. …
564 0.05623
565 0.05533
566 0.05648
567 0.07016
568 0.05884
separamos los que tinenen standard error(se)
[18]: # Separamos los que tienen se
cancer_se = cancer1.filter(like='se')
cancer_se
[18]: radius_se texture_se perimeter_se area_se smoothness_se
0 1.0950 0.9053 8.589 153.40 0.006399
1 0.5435 0.7339 3.398 74.08 0.005225
2 0.7456 0.7869 4.585 94.03 0.006150
3 0.4956 1.1560 3.445 27.23 0.009110
4 0.7572 0.7813 5.438 94.44 0.011490
.. … … … … …
8

564 1.1760 1.2560 7.673 158.70 0.010300
565 0.7655 2.4630 5.203 99.04 0.005769
566 0.4564 1.0750 3.425 48.55 0.005903
567 0.7260 1.5950 5.772 86.22 0.006522
568 0.3857 1.4280 2.548 19.15 0.007189
compactness_se concavity_se concave_points_se symmetry_se
0 0.04904 0.05373 0.01587 0.03003
1 0.01308 0.01860 0.01340 0.01389
2 0.04006 0.03832 0.02058 0.02250
3 0.07458 0.05661 0.01867 0.05963
4 0.02461 0.05688 0.01885 0.01756
.. … … … …
564 0.02891 0.05198 0.02454 0.01114
565 0.02423 0.03950 0.01678 0.01898
566 0.03731 0.04730 0.01557 0.01318
567 0.06158 0.07117 0.01664 0.02324
568 0.00466 0.00000 0.00000 0.02676
fractal_dimension_se
0 0.006193
1 0.003532
2 0.004571
3 0.009208
4 0.005115
.. …
564 0.004239
565 0.002498
566 0.003892
567 0.006185
568 0.002783
separamos los que tinenen worst
[19]: # separamos los que tienen worst
cancer_worst = cancer1.filter(like='worst')
cancer_worst
[19]: radius_worst texture_worst perimeter_worst area_worst
0 25.380 17.33 184.60 2019.0
1 24.990 23.41 158.80 1956.0
2 23.570 25.53 152.50 1709.0
3 14.910 26.50 98.87 567.7
4 22.540 16.67 152.20 1575.0
.. … … … …
564 25.450 26.40 166.10 2027.0
9

565 23.690 38.25 155.00 1731.0
566 18.980 34.12 126.70 1124.0
567 25.740 39.42 184.60 1821.0
568 9.456 30.37 59.16 268.6
smoothness_worst compactness_worst concavity_worst
0 0.16220 0.66560 0.7119
1 0.12380 0.18660 0.2416
2 0.14440 0.42450 0.4504
3 0.20980 0.86630 0.6869
4 0.13740 0.20500 0.4000
.. … … …
564 0.14100 0.21130 0.4107
565 0.11660 0.19220 0.3215
566 0.11390 0.30940 0.3403
567 0.16500 0.86810 0.9387
568 0.08996 0.06444 0.0000
concave_points_worst symmetry_worst fractal_dimension_worst
0 0.2654 0.4601 0.11890
1 0.1860 0.2750 0.08902
2 0.2430 0.3613 0.08758
3 0.2575 0.6638 0.17300
4 0.1625 0.2364 0.07678
.. … … …
564 0.2216 0.2060 0.07115
565 0.1628 0.2572 0.06637
566 0.1418 0.2218 0.07820
567 0.2650 0.4087 0.12400
568 0.0000 0.2871 0.07039
[ ]:
[20]: cancer_worst_concat_2c= pd.concat([cancer1[['radius_worst']],␣
↪cancer1[['texture_worst']], cancer1[['perimeter_worst']], ], axis=0)
cancer_worst_concat_2c
[20]: radius_worst texture_worst perimeter_worst
0 25.38 NaN NaN
1 24.99 NaN NaN
2 23.57 NaN NaN
3 14.91 NaN NaN
4 22.54 NaN NaN
.. … … …
564 NaN NaN 166.10
10

565 NaN NaN 155.00
566 NaN NaN 126.70
567 NaN NaN 184.60
568 NaN NaN 59.16
Traducimos los nombres de las columnas a español
[42]: cancer1.
↪columns=['Número_de_identificación','diagnóstico','radio_promedio','textura_media','perímetr
cancer1
[42]: Número_de_identificación diagnóstico radio_promedio textura_media
0 842302 M 17.99 10.38
1 842517 M 20.57 17.77
2 84300903 M 19.69 21.25
3 84348301 M 11.42 20.38
4 84358402 M 20.29 14.34
.. … … … …
564 926424 M 21.56 22.39
565 926682 M 20.13 28.25
566 926954 M 16.60 28.08
567 927241 M 20.60 29.33
568 92751 B 7.76 24.54
perímetro_medio área_media suavidad_media compacidad_media
0 122.80 1001.0 0.11840 0.27760
1 132.90 1326.0 0.08474 0.07864
2 130.00 1203.0 0.10960 0.15990
3 77.58 386.1 0.14250 0.28390
4 135.10 1297.0 0.10030 0.13280
.. … … … …
564 142.00 1479.0 0.11100 0.11590
565 131.20 1261.0 0.09780 0.10340
566 108.30 858.1 0.08455 0.10230
567 140.10 1265.0 0.11780 0.27700
568 47.92 181.0 0.05263 0.04362
concavidad_media media_de_puntos_cóncavos … radio_peor
0 0.30010 0.14710 … 25.380
1 0.08690 0.07017 … 24.990
2 0.19740 0.12790 … 23.570
3 0.24140 0.10520 … 14.910
4 0.19800 0.10430 … 22.540
.. … … … …
564 0.24390 0.13890 … 25.450
565 0.14400 0.09791 … 23.690
11

566 0.09251 0.05302 … 18.980
567 0.35140 0.15200 … 25.740
568 0.00000 0.00000 … 9.456
textura_peor perímetro_peor area_peor suavidad_peor compacidad_peor
0 17.33 184.60 2019.0 0.16220 0.66560
1 23.41 158.80 1956.0 0.12380 0.18660
2 25.53 152.50 1709.0 0.14440 0.42450
3 26.50 98.87 567.7 0.20980 0.86630
4 16.67 152.20 1575.0 0.13740 0.20500
.. … … … … …
564 26.40 166.10 2027.0 0.14100 0.21130
565 38.25 155.00 1731.0 0.11660 0.19220
566 34.12 126.70 1124.0 0.11390 0.30940
567 39.42 184.60 1821.0 0.16500 0.86810
568 30.37 59.16 268.6 0.08996 0.06444
concavidad_peor puntos_cóncavos_peor simetría_peor
0 0.7119 0.2654 0.4601
1 0.2416 0.1860 0.2750
2 0.4504 0.2430 0.3613
3 0.6869 0.2575 0.6638
4 0.4000 0.1625 0.2364
.. … … …
564 0.4107 0.2216 0.2060
565 0.3215 0.1628 0.2572
566 0.3403 0.1418 0.2218
567 0.9387 0.2650 0.4087
568 0.0000 0.0000 0.2871
fractal_dimension_peor
0 0.11890
1 0.08902
2 0.08758
3 0.17300
4 0.07678
.. …
564 0.07115
565 0.06637
566 0.07820
567 0.12400
568 0.07039
[16]: # para obtener la informacion del conjunto de datos
cancer1.info()
12

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Número_de_identificación 569 non-null int64
1 diagnóstico 569 non-null object
2 radio_promedio 569 non-null float64
3 textura_media 569 non-null float64
4 perímetro_medio 569 non-null float64
5 área_media 569 non-null float64
6 suavidad_media 569 non-null float64
7 compacidad_media 569 non-null float64
8 concavidad_media 569 non-null float64
9 media_de_puntos_cóncavos 569 non-null float64
10 simetría_media 569 non-null float64
11 media_dimensión_fractal 569 non-null float64
12 radio_se 569 non-null float64
13 textura_se 569 non-null float64
14 perímetro_se 569 non-null float64
15 area_se 569 non-null float64
16 suavidad_se 569 non-null float64
17 compacidad_se 569 non-null float64
18 concavidad_se 569 non-null float64
19 puntos_cóncavos_se 569 non-null float64
20 simetría_se 569 non-null float64
21 dimension_fractal_se 569 non-null float64
22 radio_peor 569 non-null float64
23 textura_peor 569 non-null float64
24 perímetro_peor 569 non-null float64
25 area_peor 569 non-null float64
26 suavidad_peor 569 non-null float64
27 compacidad_peor 569 non-null float64
28 concavidad_peor 569 non-null float64
29 puntos_cóncavos_peor 569 non-null float64
30 simetría_peor 569 non-null float64
31 fractal_dimension_peor 569 non-null float64
dtypes: float64(30), int64(1), object(1)
memory usage: 142.4+ KB
Solución inciso (d):
(i) Calculamos la media, varianza, simetria y curtosis para todos los datos
[17]: # para calcular la media
cancer1.mean()
C:UsersFREDDYAppDataLocalTempipykernel_2812604760543.py:1: FutureWarning:
Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None')
13

is deprecated; in a future version this will raise TypeError. Select only valid
columns before calling the reduction.
cancer1.mean()
[17]: Número_de_identificación 3.037183e+07
radio_promedio 1.412729e+01
textura_media 1.928965e+01
perímetro_medio 9.196903e+01
área_media 6.548891e+02
suavidad_media 9.636028e-02
compacidad_media 1.043410e-01
concavidad_media 8.879932e-02
media_de_puntos_cóncavos 4.891915e-02
simetría_media 1.811619e-01
media_dimensión_fractal 6.279761e-02
radio_se 4.051721e-01
textura_se 1.216853e+00
perímetro_se 2.866059e+00
area_se 4.033708e+01
suavidad_se 7.040979e-03
compacidad_se 2.547814e-02
concavidad_se 3.189372e-02
puntos_cóncavos_se 1.179614e-02
simetría_se 2.054230e-02
dimension_fractal_se 3.794904e-03
radio_peor 1.626919e+01
textura_peor 2.567722e+01
perímetro_peor 1.072612e+02
area_peor 8.805831e+02
suavidad_peor 1.323686e-01
compacidad_peor 2.542650e-01
concavidad_peor 2.721885e-01
puntos_cóncavos_peor 1.146062e-01
simetría_peor 2.900756e-01
fractal_dimension_peor 8.394582e-02
dtype: float64
[19]: # para calcular la varianza
cancer1.var()
C:UsersFREDDYAppDataLocalTempipykernel_28123132830968.py:2:
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with
'numeric_only=None') is deprecated; in a future version this will raise
TypeError. Select only valid columns before calling the reduction.
cancer1.var()
[19]: Número_de_identificación 1.563015e+16
radio_promedio 1.241892e+01
14

textura_media 1.849891e+01
perímetro_medio 5.904405e+02
área_media 1.238436e+05
suavidad_media 1.977997e-04
compacidad_media 2.789187e-03
concavidad_media 6.355248e-03
media_de_puntos_cóncavos 1.505661e-03
simetría_media 7.515428e-04
media_dimensión_fractal 4.984872e-05
radio_se 7.690235e-02
textura_se 3.043159e-01
perímetro_se 4.087896e+00
area_se 2.069432e+03
suavidad_se 9.015114e-06
compacidad_se 3.207029e-04
concavidad_se 9.111982e-04
puntos_cóncavos_se 3.807242e-05
simetría_se 6.833290e-05
dimension_fractal_se 7.001692e-06
radio_peor 2.336022e+01
textura_peor 3.777648e+01
perímetro_peor 1.129131e+03
area_peor 3.241674e+05
suavidad_peor 5.213198e-04
compacidad_peor 2.475477e-02
concavidad_peor 4.352409e-02
puntos_cóncavos_peor 4.320741e-03
simetría_peor 3.827584e-03
fractal_dimension_peor 3.262094e-04
dtype: float64
[22]: # para calcular la simetria
cancer1.skew()
C:UsersFREDDYAppDataLocalTempipykernel_2812251751913.py:2: FutureWarning:
Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None')
is deprecated; in a future version this will raise TypeError. Select only valid
columns before calling the reduction.
cancer1.skew()
[22]: Número_de_identificación 6.473752
radio_promedio 0.942380
textura_media 0.650450
perímetro_medio 0.990650
área_media 1.645732
suavidad_media 0.456324
compacidad_media 1.190123
concavidad_media 1.401180
15

media_de_puntos_cóncavos 1.171180
simetría_media 0.725609
media_dimensión_fractal 1.304489
radio_se 3.088612
textura_se 1.646444
perímetro_se 3.443615
area_se 5.447186
suavidad_se 2.314450
compacidad_se 1.902221
concavidad_se 5.110463
puntos_cóncavos_se 1.444678
simetría_se 2.195133
dimension_fractal_se 3.923969
radio_peor 1.103115
textura_peor 0.498321
perímetro_peor 1.128164
area_peor 1.859373
suavidad_peor 0.415426
compacidad_peor 1.473555
concavidad_peor 1.150237
puntos_cóncavos_peor 0.492616
simetría_peor 1.433928
fractal_dimension_peor 1.662579
dtype: float64
[23]: # para calcular la curtosis
cancer1.kurt()
C:UsersFREDDYAppDataLocalTempipykernel_28121080576003.py:2:
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with
'numeric_only=None') is deprecated; in a future version this will raise
TypeError. Select only valid columns before calling the reduction.
cancer1.kurt()
[23]: Número_de_identificación 42.193194
radio_promedio 0.845522
textura_media 0.758319
perímetro_medio 0.972214
área_media 3.652303
suavidad_media 0.855975
compacidad_media 1.650130
concavidad_media 1.998638
media_de_puntos_cóncavos 1.066556
simetría_media 1.287933
media_dimensión_fractal 3.005892
radio_se 17.686726
textura_se 5.349169
perímetro_se 21.401905
16

area_se 49.209077
suavidad_se 10.469840
compacidad_se 5.106252
concavidad_se 48.861395
puntos_cóncavos_se 5.126302
simetría_se 7.896130
dimension_fractal_se 26.280847
radio_peor 0.944090
textura_peor 0.224302
perímetro_peor 1.070150
area_peor 4.396395
suavidad_peor 0.517825
compacidad_peor 3.039288
concavidad_peor 1.615253
puntos_cóncavos_peor -0.535535
simetría_peor 4.444560
fractal_dimension_peor 5.244611
dtype: float64
(ii) Calculamos la media, varianza, simetria y curtosis para tumores benignos y ma-
lignos
[22]: # Para agrupar los valores de la columna indicada y sumarlos
cancer1['diagnóstico'].value_counts()
[22]: B 357
M 212
Name: diagnóstico, dtype: int64
De lo anterior nos da conocer 357 registros de personas con tumores diagnosticados como Benignos
y 212 como Malignos, esta información será utilizada para brindar información mediante se estudia
el conjunto de datos sobre el cancer.
[ ]:
[ ]:
Solución inciso (e):
(i) Gráfica de tumores benignos(B) y malignos(M)
[35]: # Visualización gráfico de los datos
sns_plot=sns.pairplot(cancer1, hue="diagnosis", height=2.5)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~anaconda3libsite-packagespandascoreindexesbase.py:3621, in Index.
↪get_loc(self, key, method, tolerance)
3620 try:
17

-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~anaconda3libsite-packagespandas_libsindex.pyx:136, in pandas._libs.
↪index.IndexEngine.get_loc()
File ~anaconda3libsite-packagespandas_libsindex.pyx:163, in pandas._libs.
↪index.IndexEngine.get_loc()
File pandas_libshashtable_class_helper.pxi:5198, in pandas._libs.hashtable.
↪PyObjectHashTable.get_item()
File pandas_libshashtable_class_helper.pxi:5206, in pandas._libs.hashtable.
↪PyObjectHashTable.get_item()
KeyError: 'diagnosis'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Input In [35], in <cell line: 2>()
1 # Visualización gráfico de los datos
----> 2 sns_plot=sns.pairplot(cancer1, hue="diagnosis", height=2.5)
File ~anaconda3libsite-packagesseaborn_decorators.py:46, in␣
↪_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
36 warnings.warn(
37 "Pass the following variable{} as {}keyword arg{}: {}. "
38 "From version 0.12, the only valid positional argument "
(…)
43 FutureWarning
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
File ~anaconda3libsite-packagesseabornaxisgrid.py:2096, in pairplot(data,␣
↪hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers,␣
↪height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
2094 # Set up the PairGrid
2095 grid_kws.setdefault("diag_sharey", diag_kind == "hist")
-> 2096 grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
2097 hue_order=hue_order, palette=palette, corner=corner,
2098 height=height, aspect=aspect, dropna=dropna, **grid_kws)
2100 # Add the markers here as PairGrid has figured out how many levels of the
2101 # hue variable are needed and we don't want to duplicate that process
2102 if markers is not None:
18

File ~anaconda3libsite-packagesseaborn_decorators.py:46, in␣
↪_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
36 warnings.warn(
37 "Pass the following variable{} as {}keyword arg{}: {}. "
38 "From version 0.12, the only valid positional argument "
(…)
43 FutureWarning
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
File ~anaconda3libsite-packagesseabornaxisgrid.py:1289, in PairGrid.
↪__init__(self, data, hue, hue_order, palette, hue_kws, vars, x_vars, y_vars,␣
↪corner, diag_sharey, height, aspect, layout_pad, despine, dropna, size)
1278 self.hue_vals = pd.Series(["_nolegend_"] * len(data),
1279 index=data.index)
1280 else:
1281 # We need hue_order and hue_names because the former is used to␣
↪control
1282 # the order of drawing and the latter is used to control the order of
(…)
1287 # to the axes-level functions, while always handling legend creation.
1288 # See GH2307
-> 1289 hue_names = hue_order = categorical_order(data[hue], hue_order)
1290 if dropna:
1291 # Filter NA from the list of unique hue names
1292 hue_names = list(filter(pd.notnull, hue_names))
File ~anaconda3libsite-packagespandascoreframe.py:3505, in DataFrame.
↪__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~anaconda3libsite-packagespandascoreindexesbase.py:3623, in Index.
↪get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
19

KeyError: 'diagnosis'
Error in callback <function flush_figures at 0x0000013089860700> (for
post_execute):
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
File ~anaconda3libsite-packagesmatplotlib_inlinebackend_inline.py:121, in␣
↪flush_figures()
118 if InlineBackend.instance().close_figures:
119 # ignore the tracking, just draw and close all figures
120 try:
--> 121 return show(True)
122 except Exception as e:
123 # safely show traceback if in IPython, else raise
124 ip = get_ipython()
File ~anaconda3libsite-packagesmatplotlib_inlinebackend_inline.py:41, in␣
↪show(close, block)
39 try:
40 for figure_manager in Gcf.get_all_fig_managers():
---> 41 display(
42 figure_manager.canvas.figure,
43 metadata=_fetch_figure_metadata(figure_manager.canvas.figure)
44 )
45 finally:
46 show._to_draw = []
File ~anaconda3libsite-packagesIPythoncoredisplay_functions.py:298, in␣
↪display(include, exclude, metadata, transient, display_id, raw, clear, *objs,␣
↪**kwargs)
296 publish_display_data(data=obj, metadata=metadata, **kwargs)
297 else:
--> 298 format_dict, md_dict = format(obj, include=include, exclude=exclude)
299 if not format_dict:
300 # nothing to display (e.g. _ipython_display_ took over)
301 continue
File ~anaconda3libsite-packagesIPythoncoreformatters.py:178, in␣
↪DisplayFormatter.format(self, obj, include, exclude)
176 md = None
177 try:
--> 178 data = formatter(obj)
179 except:
180 # FIXME: log the exception
181 raise
20

File ~anaconda3libsite-packagesdecorator.py:232, in decorate.<locals>.
↪fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
↪catch_format_error(method, self, *args, **kwargs)
220 """show traceback on failed format call"""
221 try:
--> 222 r = method(self, *args, **kwargs)
223 except NotImplementedError:
224 # don't warn on NotImplementedErrors
225 return self._check_return(None, args[0])
↪BaseFormatter.__call__(self, obj)
337 pass
338 else:
--> 339 return printer(obj)
340 # Finally look for special method names
341 method = get_real_method(obj, self.print_method)
File ~anaconda3libsite-packagesIPythoncorepylabtools.py:151, in␣
↪print_figure(fig, fmt, bbox_inches, base64, **kwargs)
148 from matplotlib.backend_bases import FigureCanvasBase
149 FigureCanvasBase(fig)
--> 151 fig.canvas.print_figure(bytes_io, **kw)
152 data = bytes_io.getvalue()
153 if fmt == 'svg':
File ~anaconda3libsite-packagesmatplotlibbackend_bases.py:2299, in␣
↪FigureCanvasBase.print_figure(self, filename, dpi, facecolor, edgecolor,␣
↪orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend,␣
↪**kwargs)
2297 if bbox_inches:
2298 if bbox_inches == "tight":
-> 2299 bbox_inches = self.figure.get_tightbbox(
2300 renderer, bbox_extra_artists=bbox_extra_artists)
2301 if pad_inches is None:
2302 pad_inches = rcParams['savefig.pad_inches']
File ~anaconda3libsite-packagesmatplotlibfigure.py:1641, in FigureBase.
↪get_tightbbox(self, renderer, bbox_extra_artists)
1637 if ax.get_visible():
1638 # some axes don't take the bbox_extra_artists kwarg so we
1639 # need this conditional…
1640 try:
21

-> 1641 bbox = ax.get_tightbbox(
1642 renderer, bbox_extra_artists=bbox_extra_artists)
1643 except TypeError:
1644 bbox = ax.get_tightbbox(renderer)
File ~anaconda3libsite-packagesmatplotlibaxes_base.py:4635, in _AxesBase.
↪get_tightbbox(self, renderer, call_axes_locator, bbox_extra_artists,␣
↪for_layout_only)
4633 if bb_yaxis:
4634 bb.append(bb_yaxis)
-> 4635 self._update_title_position(renderer)
4636 axbbox = self.get_window_extent(renderer)
4637 bb.append(axbbox)
File ~anaconda3libsite-packagesmatplotlibaxes_base.py:2986, in _AxesBase.
↪_update_title_position(self, renderer)
2983 for ax in axs:
2984 if (ax.xaxis.get_ticks_position() in ['top', 'unknown']
2985 or ax.xaxis.get_label_position() == 'top'):
-> 2986 bb = ax.xaxis.get_tightbbox(renderer)
2987 else:
2988 bb = ax.get_window_extent(renderer)
File ~anaconda3libsite-packagesmatplotlibaxis.py:1105, in Axis.
↪get_tightbbox(self, renderer, for_layout_only)
1101 return
1103 ticks_to_draw = self._update_ticks()
-> 1105 self._update_label_position(renderer)
1107 # go back to just this axis's tick labels
1108 ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(
1109 ticks_to_draw, renderer)
File ~anaconda3libsite-packagesmatplotlibaxis.py:2083, in XAxis.
↪_update_label_position(self, renderer)
2079 return
2081 # get bounding boxes for this axis and any siblings
2082 # that have been set by `fig.align_xlabels()`
-> 2083 bboxes, bboxes2 = self._get_tick_boxes_siblings(renderer=renderer)
2085 x, y = self.label.get_position()
2086 if self.label_position == 'bottom':
↪_get_tick_boxes_siblings(self, renderer)
1878 for ax in grouper.get_siblings(self.axes):
1879 axis = getattr(ax, f"{axis_name}axis")
-> 1880 ticks_to_draw = axis._update_ticks()
1881 tlb, tlb2 = axis._get_tick_bboxes(ticks_to_draw, renderer)
1882 bboxes.extend(tlb)
22

↪_update_ticks(self)
1040 def _update_ticks(self):
1041 """
1042 Update ticks (position and labels) using the current data interval of
1043 the axes. Return the list of ticks that will be drawn.
1044 """
-> 1045 major_locs = self.get_majorticklocs()
1046 major_labels = self.major.formatter.format_ticks(major_locs)
1047 major_ticks = self.get_major_ticks(len(major_locs))
↪get_majorticklocs(self)
1275 def get_majorticklocs(self):
1276 """Return this Axis' major tick locations in data coordinates."""
-> 1277 return self.major.locator()
File ~anaconda3libsite-packagesmatplotlibticker.py:2114, in MaxNLocator.
↪__call__(self)
2112 def __call__(self):
2113 vmin, vmax = self.axis.get_view_interval()
-> 2114 return self.tick_values(vmin, vmax)
↪tick_values(self, vmin, vmax)
2119 vmin = -vmax
2120 vmin, vmax = mtransforms.nonsingular(
2121 vmin, vmax, expander=1e-13, tiny=1e-14)
-> 2122 locs = self._raw_ticks(vmin, vmax)
2124 prune = self._prune
2125 if prune == 'lower':
↪_raw_ticks(self, vmin, vmax)
2059 if self._nbins == 'auto':
2060 if self.axis is not None:
-> 2061 nbins = np.clip(self.axis.get_tick_space(),
2062 max(1, self._min_n_ticks - 1), 9)
2063 else:
2064 nbins = 9
File ~anaconda3libsite-packagesmatplotlibaxis.py:2263, in XAxis.
↪get_tick_space(self)
2261 def get_tick_space(self):
2262 ends = mtransforms.Bbox.from_bounds(0, 0, 1, 1)
-> 2263 ends = ends.transformed(self.axes.transAxes -
2264 self.figure.dpi_scale_trans)
23

2265 length = ends.width * 72
2266 # There is a heuristic here that the aspect ratio of tick text
2267 # is no more than 3:1
File ~anaconda3libsite-packagesmatplotlibtransforms.py:492, in BboxBase.
↪transformed(self, transform)
488 """
489 Construct a `Bbox` by statically transforming this one by *transform*.
490 """
491 pts = self.get_points()
--> 492 ll, ul, lr = transform.transform(np.array(
493 [pts[0], [pts[0, 0], pts[1, 1]], [pts[1, 0], pts[0, 1]]]))
494 return Bbox([ll, [lr[0], ul[1]]])
File ~anaconda3libsite-packagesmatplotlibtransforms.py:1503, in Transform.
↪transform(self, values)
1500 values = values.reshape((-1, self.input_dims))
1502 # Transform the values
-> 1503 res = self.transform_affine(self.transform_non_affine(values))
1505 # Convert the result back to the shape of the input values.
1506 if ndim == 0:
File ~anaconda3libsite-packagesmatplotlibtransforms.py:2419, in␣
↪CompositeGenericTransform.transform_affine(self, points)
2417 def transform_affine(self, points):
2418 # docstring inherited
-> 2419 return self.get_affine().transform(points)
File ~anaconda3libsite-packagesmatplotlibtransforms.py:2446, in␣
↪CompositeGenericTransform.get_affine(self)
2444 return self._b.get_affine()
2445 else:
-> 2446 return Affine2D(np.dot(self._b.get_affine().get_matrix(),
2447 self._a.get_affine().get_matrix()))
File <__array_function__ internals>:5, in dot(*args, **kwargs)
KeyboardInterrupt:
(ii) Gráfica de tumores benignos(B) y malignos(M)
[33]: #Visualización de la columna "diagnóstico"
sns_plot=sns.countplot(cancer1['diagnóstico'],label="Count")
C:UsersFREDDYanaconda3libsite-packagesseaborn_decorators.py:36:
FutureWarning: Pass the following variable as a keyword arg: x. From version
0.12, the only valid positional argument will be `data`, and passing other
arguments without an explicit keyword will result in an error or
24

misinterpretation.
warnings.warn(
Solución inciso (f):
(i) Nube de puntos
[52]: #sns.scatterplot(x='área_media',y='radio_promedio',data=cancer1,hue='IMC')
#plt.title('Relacion entre diagnóstico y radio promedio de la celula␣
↪cancerigena')
(ii) Matriz de correlación
[47]: # Visualización gráfico de calor
plt.figure(figsize=(20,20))
sns_plot = sns.heatmap(cancer1.corr(), annot=True, fmt='.0%')
25

De lo anterior se puede evidenciar, la relación porcentual que existe entre cada variable y muestra
de una forma muy concisa y útil las relaciones entre todas las magnitudes de interés en este conjunto
de datos. Para este caso por ejemplo se puede evidenciar que el atributo que más influye en que la
diagnosis sea Maligna es el de puntos concavos peores(concave points worst) que cuenta con un 79%
de ocurrencias respecto al atributo diagnosis. Con esta gráfica se pueden separar los atributos que
mayor importancia tienen en el Data-Set, entre estos atributos se encuentran : radio medio(radius
mean) con 73 %, perimetro medio(perimeter mean) con 74 %, area media(area mean) con 71 %,
concavidad media(concavity mean) con 70% y radio peor(radius worst) con 78%.
(iii)
[55]: #sns.lmplot(x='área_media',y='radio_promedio',data=cancer1,hue='Resultado')
26

TRABAJO FINAL - Matemática Computacional (3).pdf

Recomendados

Recomendados

Más contenido relacionado

Similar a TRABAJO FINAL - Matemática Computacional (3).pdf

Similar a TRABAJO FINAL - Matemática Computacional (3).pdf (20)

Último

Último (15)

TRABAJO FINAL - Matemática Computacional (3).pdf