Unanswered Questions
349 questions with no upvoted or accepted answers
6
votes
3
answers
205
views
What Clustering Method Should I Use?
My data is a group of 10 thousand points (each having an node location (x,y)) that are spread across a plane. They are also chromatically-colored based on their weight.
I need to finalize a bayesian ...
6
votes
0
answers
118
views
Fixed-radius range search in non-Euclidean space
I'm trying to find an indexing data structure most suitable for my metric space:
set of IP network related data (IP addresses, ports, TCP flags, ...),
distance function is continuous, non-Euclidean ...
6
votes
3
answers
354
views
Anomaly detection using clustering of highly correlated Categorical data
My data has two columns and both are highly correlated e.g. if column1 has value ABC, column2 should be XYZ i.e. ABC-->XYZ. If column2 has anything else it's Anomaly. Likewise, there are thousands ...
5
votes
1
answer
346
views
Clustering time series based on monotonic similarity
Context
I am involved in the task of clustering 1500 time series of 500 observations into a few clusters. The time series share all the same observed properties at different spatial locations, but ...
3
votes
0
answers
17
views
Clustering metric - why Clustering accuracy (ACC) is not popular as ARI?
I have samples with their's GT clusters.
I want to measure the success of different cluster algorithms.
It seems, that when having the GT, it is popular to use ARI (adjusted rand Index).
I saw there ...
3
votes
0
answers
251
views
Cluster tabular data with text in some columns
Let's say I have a following features in the my dataframe:
user_id
user_age
is_student
is_graduate
salary
resume
integer
integer
binary
binary
integer
text (up to 1000 symbols)
And also a few more ...
3
votes
0
answers
169
views
Clustering large set of images
I've got some big datasets of images (a few million each), and I would like to cluster them according to images' visual similarities. I've extracted a feature vector for each image; the space of ...
3
votes
1
answer
926
views
How to compare topics generated from topic modeling from different datasets?
I have two datasets of a similar theme. Let's assume Dataset A and Dataset B. Using the top2vec model (https://github.com/ddangelov/Top2Vec) (https://arxiv.org/abs/2008.09470) on each dataset, I came ...
3
votes
2
answers
1k
views
Clustering mixed data types - numeric, categorical, arrays, and text
I have a dataset with 4 types of data columns:
...
3
votes
3
answers
305
views
How to decide who to market? Clustering or Decision Tree?
I am working with a dataset that has enough observations and ~ 10 variables,
half of the variables are numeric
another half of the variables are categorical with 2-3 levels (demographics)
one ID ...
3
votes
1
answer
50
views
Visualizing the difference of a set of strings
I have a distance metric on a collection of strings on the order of tens of thousands. What would be an intuitive way to summarize how 'different' these strings are or when they overlap?
My goal is, ...
3
votes
3
answers
672
views
What value can I gain by doing exploratory data analysis on features (and thus data) before doing clustering?
This might not be a very good question, but I would still ask if it's beneficial to do EDA before running a clustering algorithm?
I understand that EDA helps us generate good and helpful insights ...
3
votes
2
answers
3k
views
Is k-means with Mahalanobis a valid option for clustering?
I want more info into if k-means with Mahalanobis distance is a mathematically/methodologically correct option for datasets with different variance clusters.
The steps are:
Create aggregate datasets (...
3
votes
1
answer
47
views
Identify members who are likely to switch where they receive drug administration
I have access to medical claim data from a large health insurance company. As some of you may know there is a large delta between the price of drug X depending on where it is administered.
My company ...
3
votes
0
answers
206
views
How to remove noise using morphological filtering
I have two groups of dots that both contain noise between them:
The line that separates the two groups in the picture is diagonal in shape.
I tried to use morphological filtering on this image to ...